[PERFORM] Partitioning perfomance issue
I tried to use partitioning and have problem with it, as I get very bad perfomance. I cannot understand, what I am doing wrong. I set up master and child tables via inheritance, with range CHECK by date and with trigger on 'insert', as described in the documentation. I was happy with insertion speed, it was about 30 megabytes per second that was more than I expected, and server idle time was near 95 %. I used 100 parallel clients. However, when it came to updates things turned very bad. I set up a test with 30 running client making 1 updates each in a random fashion. updates via master table took 6 times longer and server idle time dropped to 15%, user CPU 75% with load average 15. Test details below 30 updates ( 30 processes 1 selects each) via master table 134 seconds via child table 20 seconds 30 updates via master table without "date1 >= '2017-09-06' and date1 < '2017-09-07'" clause 180 seconds That means that constraint_exlusion works, however, the process of exclusion takes A LOT OF time. I tried to repeat the test with selects 30 selects ( 30 processes 1 selects each) via master table 50 seconds via child table 8 seconds This is very bad too. The documentation says that it is not good to have 1000 partition, probably 100 is OK, but I have only 40 partitions and have noticeable delays with only 5 partitions. What I also cannot understand, why time increase for 'select' is much higher (2.5 times) than time increase for 'update', considering that 'where' clause is identical and assuming time is spent selecting relevant child tables. Best regards, Konstantin Environment description. Postgres 9.5 on linux db=> select version(); version -- PostgreSQL 9.5.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit (1 row) db=> 16 CPU vendor_id: GenuineIntel cpu family: 6 model: 45 model name: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz 128GB ram 32GB shared_buffers Table statistics db=> select count(*) from my_log_daily; count 408568 (1 row) db=> select count(*) from my_log_daily_170906; count 408568 (1 row) db=> explain (ANALYZE,BUFFERS) select stage+1 from my_log_daily_170906 where date1 >= '2017-09-06' and date1 < '2017-09-07' and msgid1=3414253 and msgid2=20756 and msgid3=1504712117 and instance='WS6'; QUERY PLAN - Index Scan using my_log_daily_idx_170906 on my_log_daily_170906 (cost=0.42..8.46 rows=1 width=4) (actual time=0.013..0.014 rows=1 loops=1) Index Cond: ((msgid1 = 3414253) AND (msgid2 = 20756) AND (msgid3 = 1504712117) AND ((instance)::text = 'WS6'::text)) Filter: ((date1 >= '2017-09-06 00:00:00'::timestamp without time zone) AND (date1 < '2017-09-07 00:00:00'::timestamp without time zone)) Buffers: shared hit=4 Planning time: 0.135 ms Execution time: 0.029 ms (6 rows) db=> explain (ANALYZE,BUFFERS) select stage+1 from my_log_daily where date1 >= '2017-09-06' and date1 < '2017-09-07' and msgid1=3414253 and msgid2=20756 and msgid3=1504712117 and instance='WS6'; QUERY PLAN --- Result (cost=0.00..8.46 rows=2 width=4) (actual time=0.016..0.017 rows=1 loops=1) Buffers: shared hit=4 -> Append (cost=0.00..8.45 rows=2 width=4) (actual time=0.013..0.014 rows=1 loops=1) Buffers: shared hit=4 -> Seq Scan on my_log_daily (cost=0.00..0.00 rows=1 width=4) (actual time=0.000..0.000 rows=0 loops=1) Filter: ((date1 >= '2017-09-06 00:00:00'::timestamp without time zone) AND (date1 < '2017-09-07 00:00:00'::timestamp without time zone) AND (msgid1 = 3414253) AND (msgid2 = 20756) AND (msgid3 = 1504712117) AND ((instance)::text = 'WS6'::text)) -> Index Scan using my_log_daily_idx_170906 on my_log_daily_170906 (cost=0.42..8.45 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1) Index Cond: ((msgid1 = 3414253) AND (msgid2 = 20756) AND (msgid3 = 1504712117) AND ((instance)::text = 'WS6'::text)) Filter: ((date1 >= '2017-09-06 00:00:00'::timestamp without time zone) AND (date1 < '2017-09-07 00:00:00'::timestamp without time zone)) Buffers: shared hit=4 Planning time: 2.501 ms Execution time: 0.042 ms (12 rows) db=> explain (ANALYZE,BUFFERS) update my_log_daily_170906 set stage=stage+1 where date1 >= '2017-09-06' and date1 < '2017-09-07' and msgid1=3414253 and msgid2=20756 and msgid3=1504712117 and
Re: [PERFORM] partitioning materialized views
On Fri, Jul 7, 2017 at 10:12 AM, Shaun Thomaswrote: >> I don't think the downstream dependencies will let that work without >> rebuilding them as well. The drop fails (without a cascade), and the other >> views and matviews that are built off of this all simply point to x_old. > > Wow, ouch. Yeah, I'd neglected to consider dependent objects. Your > only "out" at this point is to either add or utilize a "modified_date" > column of some kind, so you can maintain a different MV with some > recent window of data, and regularly merge that into a physical local > copy (not an MV) sort of like a running ETL. Though that won't help > with deletes, unfortunately. You have another out: rebuild the dependent views before the drop. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning materialized views
> I don't think the downstream dependencies will let that work without > rebuilding them as well. The drop fails (without a cascade), and the other > views and matviews that are built off of this all simply point to x_old. Wow, ouch. Yeah, I'd neglected to consider dependent objects. Your only "out" at this point is to either add or utilize a "modified_date" column of some kind, so you can maintain a different MV with some recent window of data, and regularly merge that into a physical local copy (not an MV) sort of like a running ETL. Though that won't help with deletes, unfortunately. -- Shaun M Thomas - 2ndQuadrant PostgreSQL Training, Services and Support shaun.tho...@2ndquadrant.com | www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning materialized views
> > > If you _can't_ do >> that due to cloud restrictions, you'd actually be better off doing an >> atomic swap. >> >> CREATE MATERIALIZED VIEW y AS ...; >> >> BEGIN; >> ALTER MATERIALIZED VIEW x RENAME TO x_old; >> ALTER MATERIALIZED VIEW y RENAME TO x; >> DROP MATERIALIZED VIEW x_old; >> COMMIT; >> >> This is an interesting idea. Thanks! I'll ponder that one. > > I don't think the downstream dependencies will let that work without rebuilding them as well. The drop fails (without a cascade), and the other views and matviews that are built off of this all simply point to x_old.
Re: [PERFORM] partitioning materialized views
On Thu, Jul 6, 2017 at 11:25 AM, Shaun Thomaswrote: > > I'm curious if I'm overlooking other possible architectures or tools > that might make this simpler to manage. > > One of the issues with materialized views is that they are based on > views... For a concurrent update, it essentially performs a looped > merge, which can be pretty ugly. That's the price you pay to be > non-blocking. For this particular setup, I'd actually recommend using > something like pglogical to just maintain a live copy of the remote > table or wait for Postgres 10's logical replication. Unfortunately the foreign database is Hadoop. (As A Service) > If you _can't_ do > that due to cloud restrictions, you'd actually be better off doing an > atomic swap. > > CREATE MATERIALIZED VIEW y AS ...; > > BEGIN; > ALTER MATERIALIZED VIEW x RENAME TO x_old; > ALTER MATERIALIZED VIEW y RENAME TO x; > DROP MATERIALIZED VIEW x_old; > COMMIT; > > This is an interesting idea. Thanks! I'll ponder that one. > You could still follow your partitioned plan if you don't want to > update all of the data at once. Let's face it, 3-4 hours is still a > ton of data transfer and calculation. > > yup.
Re: [PERFORM] partitioning materialized views
> I'm curious if I'm overlooking other possible architectures or tools that > might make this simpler to manage. One of the issues with materialized views is that they are based on views... For a concurrent update, it essentially performs a looped merge, which can be pretty ugly. That's the price you pay to be non-blocking. For this particular setup, I'd actually recommend using something like pglogical to just maintain a live copy of the remote table or wait for Postgres 10's logical replication. If you _can't_ do that due to cloud restrictions, you'd actually be better off doing an atomic swap. CREATE MATERIALIZED VIEW y AS ...; BEGIN; ALTER MATERIALIZED VIEW x RENAME TO x_old; ALTER MATERIALIZED VIEW y RENAME TO x; DROP MATERIALIZED VIEW x_old; COMMIT; You could still follow your partitioned plan if you don't want to update all of the data at once. Let's face it, 3-4 hours is still a ton of data transfer and calculation. -- Shaun M Thomas - 2ndQuadrant PostgreSQL Training, Services and Support shaun.tho...@2ndquadrant.com | www.2ndQuadrant.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] partitioning materialized views
I'm pondering approaches to partitioning large materialized views and was hoping for some feedback and thoughts on it from the [perform] minds. PostgreSQL 9.6.3 on Ubuntu 16.04 in the Google Cloud. I have a foreign table with 250M or so rows and 50 or so columns, with a UUID as the primary key. Queries to the foreign table have high latency. (From several minutes to more than an hour to run) If I create a materialized view of this FT, including indexes, it takes about 3-4 hours. If I refresh the materialized view concurrently, it takes 4-5 DAYS. When I run "refresh materialized view concurrently", it takes about an hour for it to download the 250M rows and load them onto the SSD tempspace. At that point we flatline a single core, and run I/O on the main tablespace up pretty high, and then stay that way until the refresh is complete. In order to speed up the concurrent refreshes, I have it broken into 4 materialized views, manually partitioned (by date) with a "union all view" in front of them. Refreshing the data which is changing regularly (new data, in one of the partitions) doesn't require refreshing the entire data set. This works fairly well, and I can refresh the most recent partition in 1 - 2 hours (daily). However, sometimes I have to reach back in time and refresh the deeper partitions. This is taking 3 or more days to complete, even with the data broken into 4 materialized views. This approache lets me refresh all of the partitions at the same time, which uses more cores at the same time (and more tempspace), [I'd like to use as much of my system resources as possible to get the refresh to finish faster.] Unfortunately I am finding I need to refresh the deeper data more and more often (at least once per week), and my table growth is going to jump from adding 3-5M rows per day to adding 10-20M rows per day over the next month or two. Waiting 3 or 4 days for the deeper data to be ready for consumption in PostgreSQL is no longer acceptable to the business. It doesn't look like partman supports partitioning materialized views. It also doesn't look like PG 10's new partitioning features will work with materialized views (although I haven't tried it yet). Citus DB also seems to be unable to help in this scenario. I could create new materialized views every time I need new data, and then swap out the view that is in front of them. There are other objects in the database which have dependencies on that view. In my experiments so far, "create and replace" seems to let me get away with this as long as the columns don't change. Alternatively, I could come up with a new partitioning scheme that lets me more selectively run "refresh concurrently", and run more of those at the same time. I was leaning towards this latter solution. Suppose I make a separate materialized view for each month of data. At the beginning of each month I would have to make a new materialized view, and then add it into the "union all view" on the fly. I would then need a "refresh all" script which refreshed as many of them concurrently as I am willing to dedicate cores to. And I need some handy ways to selectively refresh specific months when I know data for a particular month or set of months changed. So, I actually have 2 of these 250M row tables in the Foreign Database, that I want to do this with. And maybe more coming soon? I'm curious if I'm overlooking other possible architectures or tools that might make this simpler to manage. Similarly, could I construct the "union all view" in front of the partitions to be partition aware so that the query planner doesn't try to look in every one of the materialized views behind it to find the rows I want? If I go with the monthly partition, I'll start with about 36 materialized views behind the main view.
Re: [PERFORM] Partitioning and performance
On 5/28/15 9:31 AM, Ravi Krishna wrote: explain select count(*) from tstesting.account where account_row_inst = 101 ; Aggregate (cost=8.16..8.17 rows=1 width=0) - Index Only Scan using account_pkey on account (cost=0.14..8.16 rows=1 width=0) Index Cond: (account_row_inst = 101) EXPLAIN only shows what the planner thinks a query will cost. For any real testing, you need EXPLAIN ANALYZE. Also, understand that partitioning isn't a magic bullet. It can make some operations drastically faster, but it's not going to help every scenario, and will actually slow some other operations down. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Partitioning and performance
I am testing partitioning of a large table. I am doing a range partitioning based on a sequence col, which also acts as the primary key. For inserts I am using a trigger which will redirect insert to the right table based on the value of the primary key. Based on my testing, I see that the insert speed is less than 10% different than a non partitioned table. I am using SET constraint_exclusion = on and I checked that via ANALYZE that the planner does not consider non qualifying child tables. yet, selects and updates based on the primary key show anywhere from 40 to 200% slowness as compared to non partition. One thing I notice is that, even with partition pruning, the planner scans the base table and the table matching the condition. Is that the additional overhead. I am attaching below the output of analyze. === On a non partitioned table explain select count(*) from tstesting.account where account_row_inst = 101 ; Aggregate (cost=8.16..8.17 rows=1 width=0) - Index Only Scan using account_pkey on account (cost=0.14..8.16 rows=1 width=0) Index Cond: (account_row_inst = 101) (3 rows) With partition pruning: Aggregate (cost=8.45..8.46 rows=1 width=0) - Append (cost=0.00..8.44 rows=2 width=0) - Seq Scan on account (cost=0.00..0.00 rows=1 width=0) Filter: (account_row_inst = 101) - Index Only Scan using account_part1_pkey on account_part1 (cost=0.42..8.44 rows=1 width=0) Index Cond: (account_row_inst = 101) (6 rows) On a partitioned table, with no partition pruning. explain analyze select count(*) from tstesting.account where account_row_inst = 101 ; Aggregate (cost=29.77..29.78 rows=1 width=0) (actual time=0.032..0.032 rows=1 loops=1) - Append (cost=0.00..29.76 rows=5 width=0) (actual time=0.029..0.029 rows=0 loops=1) - Seq Scan on account (cost=0.00..0.00 rows=1 width=0) (actual time=0.000..0.000 rows=0 loops=1) Filter: (account_row_inst = 101) - Index Only Scan using account_part1_pkey on account_part1 (cost=0.42..4.44 rows=1 width=0) (actual time=0.008..0.008 rows=0 loops=1) Index Cond: (account_row_inst = 101) Heap Fetches: 0 - Index Only Scan using account_part2_pkey on account_part2 (cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (account_row_inst = 101) Heap Fetches: 0 - Index Only Scan using account_part3_pkey on account_part3 (cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (account_row_inst = 101) Heap Fetches: 0 - Index Only Scan using account_part4_pkey on account_part4 (cost=0.42..8.44 rows=1 width=0) (actual time=0.006..0.006 rows=0 loops=1) Index Cond: (account_row_inst = 101) Heap Fetches: 0 Planning time: 0.635 ms Execution time: 0.137 ms (18 rows) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitioning and performance
On Thu, May 28, 2015 at 10:31 AM, Ravi Krishna sravikrish...@gmail.com wrote: I am testing partitioning of a large table. I am doing a range Sorry I forgot to clarify. I am using INHERITS for partioning with check constraing built for range partitioning. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] partitioning versus clustering
Postgres Performance Wizards, I am a few years into developing and operating a system underpinned by Postgres that sees the arrival a significant number of events around the clock, not an epic amount of data, but enough to be challenging to me, in particular when system downtime is not very palatable and the data is retained quasi-indefinitely. I have various tables that house different kinds of events, and in addition to wanting to look at a small number of rows of data, users often want to generate summary reports on large swaths of data that span days or weeks. At present, these reports can trigger index scans that take minutes to service, and the parameters of the reports are user specified, making their pre-generation infeasible. Generally the rows in these tables are write-once, but they contain a pointer to the related BLOB from which they were constructed, and every now and again some new field in the originating BLOB becomes of interest, causing me to alter the table and then do a sweep of the full table with corresponding updates, violating the otherwise INSERT-only nature. These event tables generally have an event time column that is indexed and which is an obvious candidate for either partitioning or clustering of the table. I'm trying to make sense of which is the better option for me. As best I can tell, the decision points are as follows... PARTITIONING Pros: * no outage; data just starts flowing into new partitions seamlessly * allows more control over where the data goes, creating retrieval parallelization opportunities * clustering cannot be inadvertently undone in a way that requires scheduled downtime to repair * probably more resilient in the case of the event time being different from the time that I processed the event Cons: * does not deal with legacy data without extra migration (over time this becomes less relevant) * requires some kind of background process to manage partition creation * partition size will affect performance and choosing its size is not a science CLUSTERING Pros: * no particularly custom development work on my part * once done, it puts all existing data in a good state for efficient querying without extra work Cons: * will lock up the system for the duration of the CLUSTER command * somehow need to make sure that ANALYZE commands run often enough * does not give me much control of the underlying storage layout * may have problems when the occasional mass-UPDATE is done * unclear whether a VACUUM FULL is required to prevent subsequent un-clustered-ness despite having a fill factor of 100, stemming from the mass-UPDATE operations * could generate a huge number of WAL segments to archive * could possibly be sabotaged by the event time property not being well correlated with the time that the event is processed in the face of upstream systems have momentary issues As far as questions to the group go: * Is my understanding of the pros and cons of the options reasonably correct and comprehensive? * What has governed your decisions in making such a choice on past projects of your own? * If I go the clustering route, will the occasional mass update really mess with things, requiring a re-cluster and possibly even a full vacuum (to prevent re-un-clustering)? * Might it make more sense to cluster when the event time property is the time that I processed the event but partition when it is the time that the event occurred in some other system? * Is running a CLUSTER command actually necessary to get the performance benefits if the table ought already be in a good order, or is just running a CLUSTER command on a well ordered table enough to get query execution to yield nice sequential access to the disk? Many thanks in advance for your insights... -- AWG -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning performance question
On Sat, Jun 9, 2012 at 7:58 PM, Kevin Kempter cs_...@consistentstate.com wrote: Hi All; We have a client that has a table where large blobs (bytea) are stored. the table has a key column that is numbers (like 112362) but unfortunately it's a varchar column so the blobs are accessed via queries like: select * from bigtable where keycol = '217765' The primary reason we want to partition the table is for maintenance, the table is well over 1.2 Terabytes in size and they have never successfully vacuumed it. However I don't want to make performance even worse. The table does have a serial key, I'm thinking the best options will be to partition by range on the serial key, or maybe via the keycol character column via using an in statement on the check constraints, thus allowing the planner to actually leverage the above sql. I suspect doing a typecast to integer in the check constraints will prove to be a bad idea if the keycol column remains a varchar. Thoughts? Here's the table: Table problemchild Column | Type | Modifiers ---+--+ keycol | character varying | blob_data | bytea | removed_date | timestamp with time zone | alt_key | bigint | not null default nextval('problemchild_alt_key_seq'::regclass) Indexes: pc_pkey PRIMARY KEY, btree (alt_key) key2 btree (keycol) I find it odd that you have a column keycol which is not the PK and your PK column is named alt_key. Is keycol always the character representation of alt_key? Are they unrelated? It would also help to know how the data in this table changes. Do you only ever add data? Is some data removed from time to time (maybe based on the removed_date)? If the table grows continually then range partitioning sounds good. However, I think you should really make keycol a number type because otherwise range partitioning will be a pain (you would need to include the length of the string in the criterion if you want your varchar ranges to mimic number ranges). However, if you are deleting from time to time and hence the table does not grow in the long run then hash partitioning might be a better idea because then you do not need to create new partitions all the time. Example on alt_key create table problemchild ( keycol varchar(100), blob_data bytea, removed_date timestamp with time zone, alt_key bigint primary key ); create table problemchild_00 ( check ( alt_key % 16 = 0 ) ) inherits (problemchild); create table problemchild_01 ( check ( alt_key % 16 = 1 ) ) inherits (problemchild); create table problemchild_02 ( check ( alt_key % 16 = 2 ) ) inherits (problemchild); ... Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] partitioning performance question
Hi All; We have a client that has a table where large blobs (bytea) are stored. the table has a key column that is numbers (like 112362) but unfortunately it's a varchar column so the blobs are accessed via queries like: select * from bigtable where keycol = '217765' The primary reason we want to partition the table is for maintenance, the table is well over 1.2 Terabytes in size and they have never successfully vacuumed it. However I don't want to make performance even worse. The table does have a serial key, I'm thinking the best options will be to partition by range on the serial key, or maybe via the keycol character column via using an in statement on the check constraints, thus allowing the planner to actually leverage the above sql. I suspect doing a typecast to integer in the check constraints will prove to be a bad idea if the keycol column remains a varchar. Thoughts? Here's the table: Table problemchild Column | Type | Modifiers ---+--+ keycol| character varying| blob_data | bytea| removed_date| timestamp with time zone | alt_key | bigint | not null default nextval('problemchild_alt_key_seq'::regclass) Indexes: pc_pkey PRIMARY KEY, btree (alt_key) key2 btree (keycol) Thanks in advance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Partitioning / Strange optimizer behaviour
We have an optimizer problem regarding partitioned tables on 8.4.11. We started partitioning a large table containing approx. 1 billion records. So far, there is only the master table, called edifactmsgpart (which is empty) and 1 partition, called edifactmsgpart_pact. There is a bigint column called emg_id with a btree-index on it. \d edifactmsgpart_pact ... ...emp_emg_ept_i_pact btree (emg_id, ept_id) ... gdw= select relname, reltuples from pg_class where relname in( 'edifactmsgpart', 'edifactmsgpart_pact' ); relname | reltuples -+- edifactmsgpart_pact | 1.03102e+09 edifactmsgpart | 0 a select on the big partition yields a decent plan and performs as expected, lasting only a fraction of a second. gdw= explain select min( emg_id ) from edifactmsgpart_pact; QUERY PLAN -- Result (cost=2.05..2.06 rows=1 width=0) InitPlan 1 (returns $0) - Limit (cost=0.00..2.05 rows=1 width=8) - Index Scan using emp_emg_ept_i_pact on edifactmsgpart_pact (cost=0.00..2109171123.79 rows=1031020672 width=8) Filter: (emg_id IS NOT NULL) gdw= select min( emg_id ) from edifactmsgpart_pact; min --- 58178 = very fast. a select on the partitioned table, however, yields a... shall we call it sub-optimal plan: gdw= explain select min( emg_id ) from edifactmsgpart; QUERY PLAN -- Aggregate (cost=23521692.03..23521692.04 rows=1 width=8) - Append (cost=0.00..20944139.42 rows=1031021042 width=8) - Seq Scan on edifactmsgpart (cost=0.00..13.70 rows=370 width=8) - Seq Scan on edifactmsgpart_pact edifactmsgpart (cost=0.00..20944125.72 rows=1031020672 width=8) I would expect this to run half an hour or so, completely overloading the server... Any Ideas? Kind regards Marc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitioning / Strange optimizer behaviour
On 5 Březen 2012, 16:11, Marc Schablewski wrote: We have an optimizer problem regarding partitioned tables on 8.4.11. ... gdw= explain select min( emg_id ) from edifactmsgpart; QUERY PLAN -- Aggregate (cost=23521692.03..23521692.04 rows=1 width=8) - Append (cost=0.00..20944139.42 rows=1031021042 width=8) - Seq Scan on edifactmsgpart (cost=0.00..13.70 rows=370 width=8) - Seq Scan on edifactmsgpart_pact edifactmsgpart (cost=0.00..20944125.72 rows=1031020672 width=8) I would expect this to run half an hour or so, completely overloading the server... Any Ideas? This is a well known feature of pre-9.1 releases - it simply does not handle min/max on partitioned tables well. There's even an example of a workaround on the wiki: https://wiki.postgresql.org/wiki/Efficient_min/max_over_partitioned_table Another option is to upgrade to 9.1 which handles this fine. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitioning / Strange optimizer behaviour
Thanks for pointing me to that article. I totally forgot that the postgres wiki existed. Updating is not an option at the moment, but we'll probably do so in the future. Until then I can live with the workaround. Kind regards, Marc -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitioning by status?
Em 13-01-2012 17:05, Josh Berkus escreveu: On 1/13/12 2:44 AM, alexandre - aldeia digital wrote: Also, (2) only really works if you're going to obsolesce (remove) archive records after a certain period of time. Otherwise the sub-partitioning hurts performance. Is there any moves to include the easy table partitioning in the 9.2 version ? Nobody has been submitting patches. I'm sorry hear this. Table partitioning is a very good helper in a large number of performance issues. If there was a bounty to help anyone to make this, I would be a happy contributor. :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitioning by status?
Also, (2) only really works if you're going to obsolesce (remove) archive records after a certain period of time. Otherwise the sub-partitioning hurts performance. Is there any moves to include the easy table partitioning in the 9.2 version ? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitioning by status?
On 1/13/12 2:44 AM, alexandre - aldeia digital wrote: Also, (2) only really works if you're going to obsolesce (remove) archive records after a certain period of time. Otherwise the sub-partitioning hurts performance. Is there any moves to include the easy table partitioning in the 9.2 version ? Nobody has been submitting patches. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Partitioning by status?
Mike, Is it practical to partition on the status column and, eg, use triggers to move a row between the two partitions when status is updated? Any surprises to watch for, given the status column is actually NULL for active data and contains a value when archived? When I've done this before, I've had a setup like the following: 1. One active partition 2. Multiple archive partitions, also partitioned by time (month or year) 3. stored procedure for archiving a record or records. I'd recommend against triggers because they'll be extremely inefficient if you need to archive a large number of rows at once. Also, (2) only really works if you're going to obsolesce (remove) archive records after a certain period of time. Otherwise the sub-partitioning hurts performance. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Partitioning by status?
We have a set of large tables. One of the columns is a status indicator (active / archived). The queries against these tables almost always include the status, so partitioning against that seems to makes sense from a logical standpoint, especially given most of the data is archived and most of the processes want active records. Is it practical to partition on the status column and, eg, use triggers to move a row between the two partitions when status is updated? Any surprises to watch for, given the status column is actually NULL for active data and contains a value when archived? Mike
Re: [PERFORM] Partitioning by status?
Mike Blackwell mike.blackw...@rrd.com wrote: We have a set of large tables. One of the columns is a status indicator (active / archived). The queries against these tables almost always include the status, so partitioning against that seems to makes sense from a logical standpoint, especially given most of the data is archived and most of the processes want active records. Is it practical to partition on the status column and, eg, use triggers to move a row between the two partitions when status is updated? Any surprises to watch for, given the status column is actually NULL for active data and contains a value when archived? If i where you, i would try a partial index where status is null. But yes, partitioning is an other option, depends on your workload. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) If I was god, I would recompile penguin with --enable-fly. (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889° -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
-Original Message- From: Ben [mailto:midfi...@gmail.com] Sent: Thursday, October 28, 2010 12:37 PM To: pgsql-performance@postgresql.org Subject: partitioning question 1 hello -- my last email was apparently too long to respond to so i'll split it up into shorter pieces. my first question : my understanding of how range partitioning and constraint exclusion works leads me to believe that it does not buy any query performance that a clustered index doesn't already give you -- the advantages are all in maintainability. an index is able to eliminate pages just as well as constraint exclusion is able to eliminate table partitions. the I/O advantages of having queries target small subtables are the same as the I/O advantages of clustering the index : result pages in a small range are very close to each other on disk. finally, since constraint exclusion isn't as flexible as indexing (i've seen old mailing list posts that say that constraint exclusion only works with static constants in where clauses, and only works with simple operators like , which basically forces btree indexes when i want to use gist) it is indeed likely that partitioning can be slower than one big table with a clustered index. is my intuition completely off on this? best regards, ben If your SELECT retrieves substantial amount of records, table scan could be more efficient than index access. Now, if while retrieving large amount of records WHERE clause of this SELECT still satisfies constraints on some partition(s), then obviously one (or few) partition scans will be more efficient than full table scan of non-partitioned table. So, yes partitioning provides performance improvements, not only maintenance convenience. Regards, Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
On Oct 29, 2010, at 7:38 AM, Igor Neyman wrote: is my intuition completely off on this? best regards, ben If your SELECT retrieves substantial amount of records, table scan could be more efficient than index access. Now, if while retrieving large amount of records WHERE clause of this SELECT still satisfies constraints on some partition(s), then obviously one (or few) partition scans will be more efficient than full table scan of non-partitioned table. So, yes partitioning provides performance improvements, not only maintenance convenience. my impression was that a *clustered* index would give a lot of the same I/O benefits, in a more flexible way. if you're clustered on the column in question, then an index scan for a range is much like a sequential scan over a partition (as far as i understand.) b -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
-Original Message- From: Ben [mailto:midfi...@gmail.com] Sent: Friday, October 29, 2010 12:16 PM To: Igor Neyman Cc: pgsql-performance@postgresql.org Subject: Re: partitioning question 1 On Oct 29, 2010, at 7:38 AM, Igor Neyman wrote: is my intuition completely off on this? best regards, ben If your SELECT retrieves substantial amount of records, table scan could be more efficient than index access. Now, if while retrieving large amount of records WHERE clause of this SELECT still satisfies constraints on some partition(s), then obviously one (or few) partition scans will be more efficient than full table scan of non-partitioned table. So, yes partitioning provides performance improvements, not only maintenance convenience. my impression was that a *clustered* index would give a lot of the same I/O benefits, in a more flexible way. if you're clustered on the column in question, then an index scan for a range is much like a sequential scan over a partition (as far as i understand.) b Even with clustered index you still read index+table, which is more expensive than just table scan (in situation I described above). PG clustered index is not the same as SQL Server clustered index (which includes actual table pages on the leaf level). Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] partitioning question 1
hello -- my last email was apparently too long to respond to so i'll split it up into shorter pieces. my first question : my understanding of how range partitioning and constraint exclusion works leads me to believe that it does not buy any query performance that a clustered index doesn't already give you -- the advantages are all in maintainability. an index is able to eliminate pages just as well as constraint exclusion is able to eliminate table partitions. the I/O advantages of having queries target small subtables are the same as the I/O advantages of clustering the index : result pages in a small range are very close to each other on disk. finally, since constraint exclusion isn't as flexible as indexing (i've seen old mailing list posts that say that constraint exclusion only works with static constants in where clauses, and only works with simple operators like , which basically forces btree indexes when i want to use gist) it is indeed likely that partitioning can be slower than one big table with a clustered index. is my intuition completely off on this? best regards, ben -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
On Thu, 2010-10-28 at 09:36 -0700, Ben wrote: hello -- my last email was apparently too long to respond to so i'll split it up into shorter pieces. my first question : my understanding of how range partitioning and constraint exclusion works leads me to believe that it does not buy any query performance that a clustered index doesn't already give you -- the advantages are all in maintainability. an index is able to eliminate pages just as well as constraint exclusion is able to eliminate table partitions. the I/O advantages of having queries target small subtables are the same as the I/O advantages of clustering the index : result pages in a small range are very close to each other on disk. Not entirely true. One a clustered index will not stay clustered if you are still updating data that is in the partition. You shouldn't underestimate the benefit of smaller relations in terms of maintenance either. finally, since constraint exclusion isn't as flexible as indexing (i've seen old mailing list posts that say that constraint exclusion only works with static constants in where clauses, and only works with simple operators like , which basically forces btree indexes when i want to use gist) it is indeed likely that partitioning can be slower than one big table with a clustered index. Yes the constraints have to be static. Not sure about the operator question honestly. is my intuition completely off on this? You may actually want to look into expression indexes, not clustered ones. Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
thanks for the prompt response. some comments / questions below : On Oct 28, 2010, at 10:31 AM, Joshua D. Drake wrote: ...constraint exclusion is able to eliminate table partitions. the I/O advantages of having queries target small subtables are the same as the I/O advantages of clustering the index : result pages in a small range are very close to each other on disk. Not entirely true. One a clustered index will not stay clustered if you are still updating data that is in the partition. You shouldn't underestimate the benefit of smaller relations in terms of maintenance either. in my situation, the update come in-order (it is timeseries data and the clustered index is on time.) so the table should remain relatively clustered. updates also happen relatively infrequently (once a day in one batch.) so it appears that we will continue to get the I/O benefits described above. are there any other benefits which partitioning provides for query performance (as opposed to update performance) besides the ones which i have mentioned? Yes the constraints have to be static. Not sure about the operator question honestly. this seems to severely restrict their usefulness -- our queries are data warehouse analytical -type queries, so the constraints are usually data-driven (come from joining against other tables.) is my intuition completely off on this? You may actually want to look into expression indexes, not clustered ones. what would expression indexes give me? thanks and best regards, ben -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
On Thu, 2010-10-28 at 11:44 -0700, Ben wrote: Yes the constraints have to be static. Not sure about the operator question honestly. this seems to severely restrict their usefulness -- our queries are data warehouse analytical -type queries, so the constraints are usually data-driven (come from joining against other tables.) Well it does and it doesn't. Keep in mind that the constraint can be: date = '2010-10-01 and date = '2010-10-31' What it can't be is something that contains date_part() or extract() (as an example) is my intuition completely off on this? You may actually want to look into expression indexes, not clustered ones. Take a look at the docs: http://www.postgresql.org/docs/8.4/interactive/indexes-expressional.html It could be considered partitioning without breaking up the table, just the indexes. Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
On Oct 28, 2010, at 11:50 AM, Joshua D. Drake wrote: Yes the constraints have to be static. Not sure about the operator question honestly. this seems to severely restrict their usefulness -- our queries are data warehouse analytical -type queries, so the constraints are usually data-driven (come from joining against other tables.) Well it does and it doesn't. Keep in mind that the constraint can be: date = '2010-10-01 and date = '2010-10-31' What it can't be is something that contains date_part() or extract() (as an example) i think we are talking about two different things here: the constraints on the table, and the where-clause constraints in a query which may or may not trigger constraint exclusion. i understand that table constraints have to be constants -- it doesn't make much sense otherwise. what i am wondering about is, will constraint exclusion be triggered for queries where the column that is being partitioned on is being constrained things that are not static constants, for instance, in a join. (i'm pretty sure the answer is no, because i think constraint exclusion happens before real query planning.) a concrete example : create table foo (i integer not null, j float not null); create table foo_1 (check ( i = 0 and i 10) ) inherits (foo); create table foo_2 (check ( i = 10 and i 20) ) inherits (foo); create table foo_3 (check ( i = 20 and i 30) ) inherits (foo); etc.. create table bar (i integer not null, k float not null); my understanding is that a query like select * from foo, bar using (i); can't use constraint exclusion, even if the histogram of i-values on table bar says they only live in the range 0-9, and so the query will touch all of the tables. i think this is not favorable compared to a single foo table with a well-maintained btree index on i. is my intuition completely off on this? You may actually want to look into expression indexes, not clustered ones. Take a look at the docs: http://www.postgresql.org/docs/8.4/interactive/indexes-expressional.html It could be considered partitioning without breaking up the table, just the indexes. do you mean partial indexes? i have to confess to not understanding how this is relevant -- how could partial indexes give any advantage over a full clustered index? b -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
On Thu, 2010-10-28 at 12:25 -0700, Ben wrote: i think we are talking about two different things here: the constraints on the table, and the where-clause constraints in a query which may or may not trigger constraint exclusion. i understand that table constraints have to be constants -- it doesn't make much sense otherwise. what i am wondering about is, will constraint exclusion be triggered for queries where the column that is being partitioned on is being constrained things that are not static constants, for instance, in a join. (i'm pretty sure the answer is no, because i think constraint exclusion happens before real query planning.) a concrete example : create table foo (i integer not null, j float not null); create table foo_1 (check ( i = 0 and i 10) ) inherits (foo); create table foo_2 (check ( i = 10 and i 20) ) inherits (foo); create table foo_3 (check ( i = 20 and i 30) ) inherits (foo); etc.. create table bar (i integer not null, k float not null); my understanding is that a query like select * from foo, bar using (i); can't use constraint exclusion, even if the histogram of i-values on table bar says they only live in the range 0-9, and so the query will touch all of the tables. i think this is not favorable compared to a single foo table with a well-maintained btree index on i. My tests show you are incorrect: part_test=# explain analyze select * from foo join bar using (i) where i=9; QUERY PLAN -- Nested Loop (cost=34.26..106.76 rows=200 width=20) (actual time=0.004..0.004 rows=0 loops=1) - Append (cost=0.00..68.50 rows=20 width=12) (actual time=0.004..0.004 rows=0 loops=1) - Seq Scan on foo (cost=0.00..34.25 rows=10 width=12) (actual time=0.001..0.001 rows=0 loops=1) Filter: (i = 9) - Seq Scan on foo_1 foo (cost=0.00..34.25 rows=10 width=12) (actual time=0.000..0.000 rows=0 loops=1) Filter: (i = 9) - Materialize (cost=34.26..34.36 rows=10 width=12) (never executed) - Seq Scan on bar (cost=0.00..34.25 rows=10 width=12) (never executed) Filter: (i = 9) Total runtime: 0.032 ms (10 rows) -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
On Oct 28, 2010, at 12:44 PM, Joshua D. Drake wrote: My tests show you are incorrect: part_test=# explain analyze select * from foo join bar using (i) where i=9; QUERY PLAN -- Nested Loop (cost=34.26..106.76 rows=200 width=20) (actual time=0.004..0.004 rows=0 loops=1) - Append (cost=0.00..68.50 rows=20 width=12) (actual time=0.004..0.004 rows=0 loops=1) - Seq Scan on foo (cost=0.00..34.25 rows=10 width=12) (actual time=0.001..0.001 rows=0 loops=1) Filter: (i = 9) - Seq Scan on foo_1 foo (cost=0.00..34.25 rows=10 width=12) (actual time=0.000..0.000 rows=0 loops=1) Filter: (i = 9) - Materialize (cost=34.26..34.36 rows=10 width=12) (never executed) - Seq Scan on bar (cost=0.00..34.25 rows=10 width=12) (never executed) Filter: (i = 9) Total runtime: 0.032 ms (10 rows) strange. my tests don't agree with your tests : create table foo (i integer not null, j float not null); create table foo_1 ( check (i = 0 and i 10) ) inherits (foo); create table foo_2 ( check (i = 10 and i 20) ) inherits (foo); create table foo_3 ( check (i = 20 and i 30) ) inherits (foo); create index foo_1_idx on foo_1 (i); create index foo_2_idx on foo_2 (i); create index foo_3_idx on foo_3 (i); insert into foo_1 select generate_series, generate_series from generate_series(0,9); insert into foo_2 select generate_series, generate_series from generate_series(10,19); insert into foo_3 select generate_series, generate_series from generate_series(20,29); create table bar (i integer not null, k float not null); create index bar_idx on bar (i); insert into bar select generate_series, -generate_series from generate_series(0,9); vacuum analyze; explain analyze select * from foo join bar using (i); QUERY PLAN -- Hash Join (cost=1.23..42.29 rows=98 width=20) (actual time=0.056..0.118 rows=10 loops=1) Hash Cond: (public.foo.i = bar.i) - Append (cost=0.00..32.70 rows=1970 width=12) (actual time=0.008..0.043 rows=30 loops=1) - Seq Scan on foo (cost=0.00..29.40 rows=1940 width=12) (actual time=0.001..0.001 rows=0 loops=1) - Seq Scan on foo_1 foo (cost=0.00..1.10 rows=10 width=12) (actual time=0.005..0.008 rows=10 loops=1) - Seq Scan on foo_2 foo (cost=0.00..1.10 rows=10 width=12) (actual time=0.003..0.006 rows=10 loops=1) - Seq Scan on foo_3 foo (cost=0.00..1.10 rows=10 width=12) (actual time=0.003..0.006 rows=10 loops=1) - Hash (cost=1.10..1.10 rows=10 width=12) (actual time=0.025..0.025 rows=10 loops=1) - Seq Scan on bar (cost=0.00..1.10 rows=10 width=12) (actual time=0.005..0.013 rows=10 loops=1) Total runtime: 0.205 ms (10 rows) i'm running pg 8.4.3 with constraint_exclusion=on (just to be safe.) best, b -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
On Thu, 2010-10-28 at 12:59 -0700, Ben wrote: explain analyze select * from foo join bar using (i); vs explain analyze select * from foo join bar using (i) where i=9;
Re: [PERFORM] partitioning question 1
On Thu, 2010-10-28 at 12:59 -0700, Ben wrote: On Oct 28, 2010, at 12:44 PM, Joshua D. Drake wrote: My tests show you are incorrect: part_test=# explain analyze select * from foo join bar using (i) where i=9; QUERY PLAN -- Nested Loop (cost=34.26..106.76 rows=200 width=20) (actual time=0.004..0.004 rows=0 loops=1) - Append (cost=0.00..68.50 rows=20 width=12) (actual time=0.004..0.004 rows=0 loops=1) - Seq Scan on foo (cost=0.00..34.25 rows=10 width=12) (actual time=0.001..0.001 rows=0 loops=1) Filter: (i = 9) - Seq Scan on foo_1 foo (cost=0.00..34.25 rows=10 width=12) (actual time=0.000..0.000 rows=0 loops=1) Filter: (i = 9) - Materialize (cost=34.26..34.36 rows=10 width=12) (never executed) - Seq Scan on bar (cost=0.00..34.25 rows=10 width=12) (never executed) Filter: (i = 9) Total runtime: 0.032 ms (10 rows) strange. my tests don't agree with your tests : Do you have constraint_exclusion turned on? You should verify with show constraint_exclusion (I saw what you wrote below). JD P.S. Blatant plug, you coming to http://www.postgresqlconference.org ? -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning question 1
whoops, didn't see the i=9 (linebreak! linebreak!) nonetheless that is a static constant constraint on the column i, and i was asking if constraint exclusions would work for dynamic constraints (like those derived from a table joined against.) so for example the bar table has only 0-9 in its histogram for i, but constraint exclusion can't use that to eliminate tables foo_2 and foo_3. this is precisely the kind of information an index can use via join selectivity. i am not going to the pg conference, sorry to say. b On Oct 28, 2010, at 1:48 PM, Joshua D. Drake wrote: On Thu, 2010-10-28 at 12:59 -0700, Ben wrote: On Oct 28, 2010, at 12:44 PM, Joshua D. Drake wrote: My tests show you are incorrect: part_test=# explain analyze select * from foo join bar using (i) where i=9; QUERY PLAN -- Nested Loop (cost=34.26..106.76 rows=200 width=20) (actual time=0.004..0.004 rows=0 loops=1) - Append (cost=0.00..68.50 rows=20 width=12) (actual time=0.004..0.004 rows=0 loops=1) - Seq Scan on foo (cost=0.00..34.25 rows=10 width=12) (actual time=0.001..0.001 rows=0 loops=1) Filter: (i = 9) - Seq Scan on foo_1 foo (cost=0.00..34.25 rows=10 width=12) (actual time=0.000..0.000 rows=0 loops=1) Filter: (i = 9) - Materialize (cost=34.26..34.36 rows=10 width=12) (never executed) - Seq Scan on bar (cost=0.00..34.25 rows=10 width=12) (never executed) Filter: (i = 9) Total runtime: 0.032 ms (10 rows) strange. my tests don't agree with your tests : Do you have constraint_exclusion turned on? You should verify with show constraint_exclusion (I saw what you wrote below). JD P.S. Blatant plug, you coming to http://www.postgresqlconference.org ? -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning max() sql not using index
Kevin Kempter wrote: Hi all I have a large table (2billion rows) that's partitioned by date based on an epoch int value. We're running a select max(id) where id is the PK. I have a PK index on each of the partitions, no indexes at all on the base table. If I hit a partition table directly I get an index scan as expected: The planner isn't smart enough to create the plan you're expecting. There was discussion and even a patch posted recently about that: http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php It seems the thread petered out, but the concept seems sane. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning max() sql not using index
In case you aren't comfortable running unreleased planner patches from pgsql-hackers, a workaround was discussed on this list recently: http://archives.postgresql.org/pgsql-performance/2009-09/msg00036.php On Wed, 09 Sep 2009 06:05:22 -0400, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Kevin Kempter wrote: Hi all I have a large table (2billion rows) that's partitioned by date based on an epoch int value. We're running a select max(id) where id is the PK. I have a PK index on each of the partitions, no indexes at all on the base table. If I hit a partition table directly I get an index scan as expected: The planner isn't smart enough to create the plan you're expecting. There was discussion and even a patch posted recently about that: http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php It seems the thread petered out, but the concept seems sane. -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partitioning max() sql not using index
On Wednesday 09 September 2009 07:56:53 Kenneth Cox wrote: In case you aren't comfortable running unreleased planner patches from pgsql-hackers, a workaround was discussed on this list recently: http://archives.postgresql.org/pgsql-performance/2009-09/msg00036.php On Wed, 09 Sep 2009 06:05:22 -0400, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Kevin Kempter wrote: Hi all I have a large table (2billion rows) that's partitioned by date based on an epoch int value. We're running a select max(id) where id is the PK. I have a PK index on each of the partitions, no indexes at all on the base table. If I hit a partition table directly I get an index scan as expected: The planner isn't smart enough to create the plan you're expecting. There was discussion and even a patch posted recently about that: http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php It seems the thread petered out, but the concept seems sane. Excellent! thanks this is quite helpful -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] partitioning max() sql not using index
Hi all I have a large table (2billion rows) that's partitioned by date based on an epoch int value. We're running a select max(id) where id is the PK. I have a PK index on each of the partitions, no indexes at all on the base table. If I hit a partition table directly I get an index scan as expected: explain select max(id) from pwreport.bigtab_2009_09; QUERY PLAN -- Result (cost=0.06..0.07 rows=1 width=0) InitPlan - Limit (cost=0.00..0.06 rows=1 width=8) - Index Scan Backward using bigtab_2009_09_pk on bigtab_2009_09 (cost=0.00..12403809.95 rows=205659919 width=8) Filter: (id IS NOT NULL) (5 rows) However if I hit the base table I get a sequential scan on every partition as opposed to index scans: explain select max(id) from pwreport.bigtab; QUERY PLAN Aggregate (cost=27214318.67..27214318.68 rows=1 width=8) - Append (cost=0.00..24477298.53 rows=1094808053 width=8) - Seq Scan on bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_12 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_11 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_10 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_09 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_08 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_07 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_06 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_05 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_04 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_03 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_02 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2011_01 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_12 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_11 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_10 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_09 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_08 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_07 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_06 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_05 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_04 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_03 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_02 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2010_01 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2009_12 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2009_11 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2009_10 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2009_09 bigtab (cost=0.00..4599227.19 rows=205659919 width=8) - Seq Scan on bigtab_2009_07 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2009_06 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2009_05 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2009_04 bigtab (cost=0.00..11.70 rows=170 width=8) - Seq Scan on bigtab_2009_03 bigtab (cost=0.00..11.70 rows=170
Re: [PERFORM] Partitioning: INSERT 0 0 but want INSERT 0 1
I have the same problem in PG 8.2 To resolve this issue I had to create a new table with the same structure than the partitioned table with a trigger for insert and update. All the operations the application have to do are directed to this new table. When a new record is inserted in the new table the trigger insert a new record with the same values into the partitioned table and then delete all records from this new table. In updates operations the trigger redirect the operation to the partitioned table too. With this _not elegant_ solution our Java application is able to do its job. If you find a better solution please let me know. Neil Peter Braggio [EMAIL PROTECTED] On Tue, May 13, 2008 at 11:48 AM, Nikolas Everett [EMAIL PROTECTED] wrote: I figure this subject belongs on the performance mailing list because it is about partitioning, which is a performance issue. I'm working on partitioning some of the tables used by an application that uses OpenJPA. It turns out that OpenJPA is sensitive to the numbers returned when you do an insert. So I put together a test and attached it. My postgres version is 8.3.1 compiled from source. My problem is that this: test= INSERT INTO ttt (a, b) VALUES ('5-5-08', 'test11212'); INSERT 0 0 Time: 21.646 ms needs to show: INSERT 0 1 or OpenJPA will not accept it. The insert works, but OpenJPA does not believe it and aborts the current transaction. Is it possible to have partitioning and have insert show the right number of rows inserted? Thanks, --Nik -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Partitioning in postgres - basic question
Hi, I am new to postgres having worked with Oracle in the past. I am interested in understanding Postgres's table partition functionality better. Specifically, I have a third party application running against my postgres database, but the database is becoming rather large to maintain. I am thinking about partitioning the biggest table. Would I be able to set-up partitioning on this table with it being seemless to the third party app (assuming that it performs pretty standard DML statements against the table in question)? Thanks Tore ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioning in postgres - basic question
Tore Lukashaugen wrote: Hi, I am new to postgres having worked with Oracle in the past. I am interested in understanding Postgres's table partition functionality better. Specifically, I have a third party application running against my postgres database, but the database is becoming rather large to maintain. I am thinking about partitioning the biggest table. Would I be able to set-up partitioning on this table with it being seemless to the third party app (assuming that it performs pretty standard DML statements against the table in question)? http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html#DDL-PARTITIONING-IMPLEMENTATION The examples use rules but some on the list have said triggers work better if you have a lot of partitions. -- Postgresql php tutorials http://www.designmagick.com/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Partitioning
Abu Mushayeed wrote: I have partitioned a table based on period (e.g., cdate = '2007-01-01'::date and cdate=.2007-03-31':;date). Now, I am issuing query like cdate = CURRENT_DATE - 1 and cdate = CURRENT_DATE, it scans all the partitions. But if I do cdate = '2007-01-01'::date and cdate=.2007-03-31'::date it picks the correct partition. Also if I join the cdate field with another table, it does not pick the correct partition. I would like to know if it is possible to pick the correct partition using the above example. from http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html ...For the same reason, stable functions such as CURRENT_DATE must be avoided. Joining the partition key to a column of another table will not be optimized, either Rigmor Thanks Abu Need Mail bonding? Go to the Yahoo! Mail QA http://answers.yahoo.com/dir/index;_ylc=X3oDMTFvbGNhMGE3BF9TAzM5NjU0NTEwOARfcwMzOTY1NDUxMDMEc2VjA21haWxfdGFnbGluZQRzbGsDbWFpbF90YWcx?link=asksid=396546091 for great tips from Yahoo! Answers http://answers.yahoo.com/dir/index;_ylc=X3oDMTFvbGNhMGE3BF9TAzM5NjU0NTEwOARfcwMzOTY1NDUxMDMEc2VjA21haWxfdGFnbGluZQRzbGsDbWFpbF90YWcx?link=asksid=396546091 users. !DSPAM:5,45beea6d287779832115503! -- Rigmor Ukuhe Finestmedia Ltd | Software Development Team Manager gsm : (+372)56467729 | tel : (+372)6558043 | e-mail : [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Partitioning
I have partitioned a table based on period (e.g., cdate = '2007-01-01'::date and cdate=.2007-03-31':;date). Now, I am issuing query like cdate = CURRENT_DATE - 1 and cdate = CURRENT_DATE, it scans all the partitions. But if I do cdate = '2007-01-01'::date and cdate=.2007-03-31'::date it picks the correct partition. Also if I join the cdate field with another table, it does not pick the correct partition. I would like to know if it is possible to pick the correct partition using the above example. Thanks Abu - Need Mail bonding? Go to the Yahoo! Mail QA for great tips from Yahoo! Answers users.
Re: [PERFORM] Partitioning
Can anybody help me out I just wanted to knw what will be the configuraion settings for partitioning table so as to make inserts faster on the partitioned tables. Well, that depends. Many questions are important here. Will you insert into several partitions or only to a single one? Do you need to enforce some constraint between several partitioned tables? If you need to insert into several partitions, it can be faster as you can place them on different drives. If you need to insert only into the last one (that's usually the case with 'logging' tables) then this probably won't give a huge performance benefit. If you need to enforce some kind of constraint between multiple partitions (possibly from several tables), you'll have to do that manually using a plpgsql procedure (for example). This is the case with UNIQUE constraint on a single table, FOREIGN KEY between multimple partitioned tables, etc. This can mean a serious performance penalty, esecially when you do mostly insert/update on that table. This is mostly about application architecture - if you use partitions incorrectly it's almost impossible to fix that by changing settings in postgresql.conf. Tomas ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Partitioning
Can anybody help me out I just wanted to knw what will be the configuraion settings for partitioning table so as to make inserts faster on the partitioned tables. -- Regards Gauri
Re: [PERFORM] Partitioning
On Wed, 10 Jan 2007 15:30:16 -0600, Scott Marlowe [EMAIL PROTECTED] wrote: [...] And I don't think the mysql partition supports tablespaces either. MySQL supports distributing partitions over multiple disks via the SUBPARTITION clause [1]. I leave it to you, wether their syntax is cleaner, more powerful or easier or ;) Bernd [1] http://dev.mysql.com/doc/refman/5.1/en/partitioning-subpartitions.html ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Partitioning
On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote: Take a look at the set of partitioning functions I wrote shortly after the 8.1 release: http://www.studenter.hb.se/~arch/files/part_functions.sql You could probably work something out using those functions (as-is, or as inspiration) together with pgAgent (http://www.pgadmin.org/docs/1.4/pgagent.html) /Mikael Those are some great functions. Well, they're less than optimal in one aspect: they add one rule per partition, making them unsuitable for OLTP type applications (actually: any application where insert performance is crucial). Someone with time and/or energy could probably fix that, I guess...patches are welcome :) /Mikael ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Partitioning
Well - whether or not MySQL's implementation of partitioning has some deficiency, it sure is a lot easier to set up than PostgreSQL. And I don't think there is any technical reason that setting up partitioning on Postgres couldn't be very easy and still be robust. On Thu, 11 Jan 2007 13:59:20 +0100, Mikael Carneholm [EMAIL PROTECTED] said: On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote: Take a look at the set of partitioning functions I wrote shortly after the 8.1 release: http://www.studenter.hb.se/~arch/files/part_functions.sql You could probably work something out using those functions (as-is, or as inspiration) together with pgAgent (http://www.pgadmin.org/docs/1.4/pgagent.html) /Mikael Those are some great functions. Well, they're less than optimal in one aspect: they add one rule per partition, making them unsuitable for OLTP type applications (actually: any application where insert performance is crucial). Someone with time and/or energy could probably fix that, I guess...patches are welcome :) /Mikael ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning
Each partition can have its own disk, without using subpartitions. CREATE TABLE th (id INT, name VARCHAR(30), adate DATE) PARTITION BY LIST(YEAR(adate)) ( PARTITION p1999 VALUES IN (1995, 1999, 2003) DATA DIRECTORY = '/var/appdata/95/data' INDEX DIRECTORY = '/var/appdata/95/idx', PARTITION p2000 VALUES IN (1996, 2000, 2004) DATA DIRECTORY = '/var/appdata/96/data' INDEX DIRECTORY = '/var/appdata/96/idx', PARTITION p2001 VALUES IN (1997, 2001, 2005) DATA DIRECTORY = '/var/appdata/97/data' INDEX DIRECTORY = '/var/appdata/97/idx', PARTITION p2000 VALUES IN (1998, 2002, 2006) DATA DIRECTORY = '/var/appdata/98/data' INDEX DIRECTORY = '/var/appdata/98/idx' ); Subpartitions are just a way to break (parent) partitions up into smaller pieces. Those of course can be moved to other disks just like the main partitions. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bernd Helmle Sent: Thursday, January 11, 2007 6:51 AM To: Scott Marlowe Cc: Jim C. Nasby; Jeremy Haile; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Partitioning On Wed, 10 Jan 2007 15:30:16 -0600, Scott Marlowe [EMAIL PROTECTED] wrote: [...] And I don't think the mysql partition supports tablespaces either. MySQL supports distributing partitions over multiple disks via the SUBPARTITION clause [1]. I leave it to you, wether their syntax is cleaner, more powerful or easier or ;) Bernd [1] http://dev.mysql.com/doc/refman/5.1/en/partitioning-subpartitions.html ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Partitioning
On Thu, 11 Jan 2007 08:18:39 -0600, Adam Rich [EMAIL PROTECTED] wrote: Subpartitions are just a way to break (parent) partitions up into smaller pieces. Those of course can be moved to other disks just like the main partitions. Ah, didn't know that (i just wondered why i need a subpartition to change the location of a partition). Thanks for your clarification... Bernd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Partitioning
BTW, someone coming up with a set of functions to handle partitioning for the general 'partition by time' case would make a GREAT project on pgFoundry. On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote: Take a look at the set of partitioning functions I wrote shortly after the 8.1 release: http://www.studenter.hb.se/~arch/files/part_functions.sql You could probably work something out using those functions (as-is, or as inspiration) together with pgAgent (http://www.pgadmin.org/docs/1.4/pgagent.html) /Mikael -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Arnau Sent: den 5 januari 2007 12:02 To: pgsql-performance@postgresql.org Subject: [PERFORM] Partitioning Hi all, I'm not sure if this question fits in the topic of this list. I'm interested in partitioning and it's the first time I'd use it. There is an issue I don't know how you handle it. Lets say I'm interested in store monthly based statistical data like the example of http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What I don't like of this approach is that the monthly tables, rules... must be created manually or at least I haven't found any other option. My question is how do you manage this? do you have a cron task that creates automatically these monthly elements (tables, rules, ... ) or there is another approach that doesn't require external things like cron only PostgreSQL. -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Partitioning
On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote: Take a look at the set of partitioning functions I wrote shortly after the 8.1 release: http://www.studenter.hb.se/~arch/files/part_functions.sql You could probably work something out using those functions (as-is, or as inspiration) together with pgAgent (http://www.pgadmin.org/docs/1.4/pgagent.html) /Mikael Those are some great functions. -- erik jones [EMAIL PROTECTED] software development emma(r) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Partitioning
I really wish that PostgreSQL supported a nice partitioning syntax like MySQL has. Here is an example: CREATE TABLE tr (id INT, name VARCHAR(50), purchased DATE) PARTITION BY RANGE( YEAR(purchased) ) ( PARTITION p0 VALUES LESS THAN (1990), PARTITION p1 VALUES LESS THAN (1995), PARTITION p2 VALUES LESS THAN (2000), PARTITION p3 VALUES LESS THAN (2005) ); And to drop a partition: ALTER TABLE tr DROP PARTITION p2; This seems so much more intuitive and simpler than what is required to set it up in PostgreSQL. Does PostgreSQL's approach to table partitioning have any advantage over MySQL? Is a nicer syntax planned for Postgres? On Wed, 10 Jan 2007 14:20:06 -0600, Jim C. Nasby [EMAIL PROTECTED] said: BTW, someone coming up with a set of functions to handle partitioning for the general 'partition by time' case would make a GREAT project on pgFoundry. On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote: Take a look at the set of partitioning functions I wrote shortly after the 8.1 release: http://www.studenter.hb.se/~arch/files/part_functions.sql You could probably work something out using those functions (as-is, or as inspiration) together with pgAgent (http://www.pgadmin.org/docs/1.4/pgagent.html) /Mikael -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Arnau Sent: den 5 januari 2007 12:02 To: pgsql-performance@postgresql.org Subject: [PERFORM] Partitioning Hi all, I'm not sure if this question fits in the topic of this list. I'm interested in partitioning and it's the first time I'd use it. There is an issue I don't know how you handle it. Lets say I'm interested in store monthly based statistical data like the example of http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What I don't like of this approach is that the monthly tables, rules... must be created manually or at least I haven't found any other option. My question is how do you manage this? do you have a cron task that creates automatically these monthly elements (tables, rules, ... ) or there is another approach that doesn't require external things like cron only PostgreSQL. -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Partitioning
On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote: This seems so much more intuitive and simpler than what is required to set it up in PostgreSQL. Does PostgreSQL's approach to table partitioning have any advantage over MySQL? Is a nicer syntax planned for Postgres? The focus was to get the base functionality working, and working correctly. Another consideration is that there's multiple ways to accomplish the partitioning; exposing the basic functionality without enforcing a given interface provides more flexibility (ie: it appears that you can't do list partitioning with MySQL, while you can with PostgreSQL). -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning
You can do list partitioning in MySQL: http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html My comment was not meant as a criticism of PostgreSQL's current state - I'm glad that it has partitioning. I'm simply wondering if there are any plans of adopting a more user-friendly syntax in the future similar to MySQL partitioning support. Having first-class citizen support of partitions would also allow some nice administrative GUIs and views to be built for managing them. Jeremy Haile On Wed, 10 Jan 2007 15:09:31 -0600, Jim C. Nasby [EMAIL PROTECTED] said: On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote: This seems so much more intuitive and simpler than what is required to set it up in PostgreSQL. Does PostgreSQL's approach to table partitioning have any advantage over MySQL? Is a nicer syntax planned for Postgres? The focus was to get the base functionality working, and working correctly. Another consideration is that there's multiple ways to accomplish the partitioning; exposing the basic functionality without enforcing a given interface provides more flexibility (ie: it appears that you can't do list partitioning with MySQL, while you can with PostgreSQL). -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Partitioning
On Wed, 2007-01-10 at 15:15, Jeremy Haile wrote: You can do list partitioning in MySQL: http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html My comment was not meant as a criticism of PostgreSQL's current state - I'm glad that it has partitioning. I'm simply wondering if there are any plans of adopting a more user-friendly syntax in the future similar to MySQL partitioning support. Having first-class citizen support of partitions would also allow some nice administrative GUIs and views to be built for managing them. I don't think anyone took it as a negative criticism. Jim and I were both more pointing out that the development process of the two projects is somewhat different. In MySQL a small group that doesn't necessarily interact with a large user community sets out to implement a feature in a given time line with a given set of requirements and they tend to ignore what they see as esoteric requirements. In PostgreSQL a large development community that communicates fairly well with it's large user community put somewhat of the onus of proving the need and doing the initial proof of concept on those who say they need a feature, often working in a method where the chief hackers lend a hand to someone who wants the feature so they can get a proof of concept up and running. And example would be the auditing / time travel in the contrib/spi project. After several iterations, and given the chance to learn from the mistakes of the previous incarnations, something often rises out of that to produce the feature needed. Generally speaking the postgresql method takes longer, making life harder today, but produces cleaner more easily maintained solutions, making life easier in the future. Meanwhile the mysql method works faster, making life easier today, but makes compromises that might make life harder in the future. Something that embodies that difference is the table handler philosophy of both databases. PostgreSQL has the abstraction to have more than one table handler, but in practice has exactly one table handler. MySQL has the ability to have many table handlers, and in fact uses many of them. With PostgreSQL this means that things like the query parsing / execution and the table handler are tightly coupled. This results in things like transactable DDL. Sometimes this results in suggestions being dismissed out of hand because they would have unintended consequences. In MySQL, because of the multiple table handlers, many compromises on the query parsing have to be made. The most common one being that you can define constraints / foreign keys in a column item, and they will simply be ignored with no error or notice. The fk constraints have to go at the end of the column list to be parsed and executed. So, partitioning, being something that will touch a lot of parts of the database, isn't gonna just show up one afternoon in pgsql. It will likely take a few people making proof of concept versions before a consensus is reached and someone who has the ability codes it up. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Partitioning
Hi all, I'm not sure if this question fits in the topic of this list. I'm interested in partitioning and it's the first time I'd use it. There is an issue I don't know how you handle it. Lets say I'm interested in store monthly based statistical data like the example of http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What I don't like of this approach is that the monthly tables, rules... must be created manually or at least I haven't found any other option. My question is how do you manage this? do you have a cron task that creates automatically these monthly elements (tables, rules, ... ) or there is another approach that doesn't require external things like cron only PostgreSQL. -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Partitioning
Take a look at the set of partitioning functions I wrote shortly after the 8.1 release: http://www.studenter.hb.se/~arch/files/part_functions.sql You could probably work something out using those functions (as-is, or as inspiration) together with pgAgent (http://www.pgadmin.org/docs/1.4/pgagent.html) /Mikael -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Arnau Sent: den 5 januari 2007 12:02 To: pgsql-performance@postgresql.org Subject: [PERFORM] Partitioning Hi all, I'm not sure if this question fits in the topic of this list. I'm interested in partitioning and it's the first time I'd use it. There is an issue I don't know how you handle it. Lets say I'm interested in store monthly based statistical data like the example of http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What I don't like of this approach is that the monthly tables, rules... must be created manually or at least I haven't found any other option. My question is how do you manage this? do you have a cron task that creates automatically these monthly elements (tables, rules, ... ) or there is another approach that doesn't require external things like cron only PostgreSQL. -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning / constrain exlusion not working with %-operator
Tom Lane [EMAIL PROTECTED] writes: It's usually better to use partitioning rules that have something to do with the WHERE-clauses you'd be using anyway. For instance, try to partition on ranges. I agree and tried to create new partitioned tables. But now I ran into some other performance-related trouble when inserting (parts of) the old (unpartioned) table into the new one: CREATE TABLE t_unparted (id1 int, cont varchar); -- Populate table with 1000 records with id1 from 1 to 1000 and ANALYZE CREATE TABLE t_parted (id1 int, cont varchar); CREATE TABLE t_parted_000 (check (id1 =0 AND id1 100)) INHERITS (t_parted); CREATE RULE ins_000 AS ON INSERT TO t_parted WHERE id1 = 0 AND id1 100 DO INSTEAD INSERT INTO t_parted_000 VALUES (new.*); -- ... 8 more tables + 8 more rules CREATE TABLE t_parted_900 (check (id1 =900 AND id1 1000)) INHERITS (t_parted); CREATE RULE ins_900 AS ON INSERT TO t_parted WHERE id1 = 900 AND id1 1000 DO INSTEAD INSERT INTO t_parted_900 VALUES (new.*); And now: EXPLAIN INSERT INTO t_parted SELECT * FROM t_parted WHERE id1=0 AND id1100; Result (cost=0.00..170.80 rows=12 width=36) - Append (cost=0.00..170.80 rows=12 width=36) - Seq Scan on t_parted (cost=0.00..85.40 rows=6 width=36) Filter: ((id1 = 0) AND (id1 100) AND (((id1 = 0) AND (id1 100)) IS NOT TRUE) AND (((id1 = 100) AND (id1 200)) IS NOT TRUE) AND (((id1 = 200) AND (id1 300)) IS NOT TRUE) AND (((id1 = 300) AND (id1 400)) IS NOT TRUE) AND (((id1 = 400) AND (id1 500)) IS NOT TRUE) AND (((id1 = 500) AND (id1 600)) IS NOT TRUE) AND (((id1 = 600) AND (id1 700)) IS NOT TRUE) AND (((id1 = 700) AND (id1 800)) IS NOT TRUE) AND (((id1 = 800) AND (id1 900)) IS NOT TRUE) AND (((id1 = 900) AND (id1 1000)) IS NOT TRUE)) - Seq Scan on t_parted_000 t_parted (cost=0.00..85.40 rows=6 width=36) Filter: ((id1 = 0) AND (id1 100) AND (((id1 = 0) AND (id1 100)) IS NOT TRUE) AND (((id1 = 100) AND (id1 200)) IS NOT TRUE) AND (((id1 = 200) AND (id1 300)) IS NOT TRUE) AND (((id1 = 300) AND (id1 400)) IS NOT TRUE) AND (((id1 = 400) AND (id1 500)) IS NOT TRUE) AND (((id1 = 500) AND (id1 600)) IS NOT TRUE) AND (((id1 = 600) AND (id1 700)) IS NOT TRUE) AND (((id1 = 700) AND (id1 800)) IS NOT TRUE) AND (((id1 = 800) AND (id1 900)) IS NOT TRUE) AND (((id1 = 900) AND (id1 1000)) IS NOT TRUE)) Result (cost=0.00..66.40 rows=12 width=36) - Append (cost=0.00..66.40 rows=12 width=36) - Seq Scan on t_parted (cost=0.00..33.20 rows=6 width=36) Filter: ((id1 = 0) AND (id1 100) AND (id1 = 0) AND (id1 100)) - Seq Scan on t_parted_000 t_parted (cost=0.00..33.20 rows=6 width=36) Filter: ((id1 = 0) AND (id1 100) AND (id1 = 0) AND (id1 100)) ... Result (cost=0.00..33.20 rows=6 width=36) - Append (cost=0.00..33.20 rows=6 width=36) - Seq Scan on t_parted (cost=0.00..33.20 rows=6 width=36) Filter: ((id1 = 0) AND (id1 100) AND (id1 = 900) AND (id1 1000)) (58 rows) The filters appended by the planner do not make any sense and cost too much time if the old table is huge. (constraint_exclusion was ON) Is there a better way to partition an existing table with a large number of rows (100 mio)? TIA, Martin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Partitioning / constrain exlusion not working with %-operator
I try to partition a large table (~ 120 mio. rows) into 50 smaller tables but using the IMO immutable %-function constraint exclusion does not work as expected: CREATE TABLE tt_m (id1 int, cont varchar); CREATE TABLE tt_0 (check (id1 % 50 = 0)) INHERITS (tt_m); CREATE TABLE tt_1 (check (id1 % 50 = 1)) INHERITS (tt_m); CREATE RULE ins_tt_0 AS ON INSERT TO tt_m WHERE id1 % 50 = 0 DO INSTEAD INSERT INTO tt_0 VALUES (new.*); CREATE RULE ins_tt_1 AS ON INSERT TO tt_m WHERE id1 % 50 = 1 DO INSTEAD INSERT INTO tt_1 VALUES (new.*); ... INSERT INTO tt_m (id1,cont) VALUES (0,'Test1'); INSERT INTO tt_m (id1,cont) VALUES (1,'Test2'); EXPLAIN SELECT * FROM tt_m WHERE id1=1; QUERY PLAN --- Result (cost=0.00..73.50 rows=18 width=36) - Append (cost=0.00..73.50 rows=18 width=36) - Seq Scan on tt_m (cost=0.00..24.50 rows=6 width=36) Filter: (id1 = 1) - Seq Scan on tt_0 tt_m (cost=0.00..24.50 rows=6 width=36) Filter: (id1 = 1) - Seq Scan on tt_1 tt_m (cost=0.00..24.50 rows=6 width=36) Filter: (id1 = 1) ... Only adding an explicit %-call to the query results in the expected plan: EXPLAIN SELECT * FROM tt_m WHERE id1=1 AND id1 % 50 = 1; QUERY PLAN --- Result (cost=0.00..60.60 rows=2 width=36) - Append (cost=0.00..60.60 rows=2 width=36) - Seq Scan on tt_m (cost=0.00..30.30 rows=1 width=36) Filter: ((id1 = 1) AND ((id1 % 50) = 1)) - Seq Scan on tt_1 tt_m (cost=0.00..30.30 rows=1 width=36) Filter: ((id1 = 1) AND ((id1 % 50) = 1)) Did I miss something and/or how could I force the planner to use constraint exclusion without adding the explicit second condition above? TIA, Martin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Partitioning / constrain exlusion not working with %-operator
Martin Lesser [EMAIL PROTECTED] writes: I try to partition a large table (~ 120 mio. rows) into 50 smaller tables but using the IMO immutable %-function constraint exclusion does not work as expected: The constraint exclusion mechanism is not as bright as you think. There are some very limited cases where it can make a deduction that a WHERE clause implies a CHECK constraint that's not an exact textual equivalent ... but all those cases have to do with related b-tree operators, and % is not one. It's usually better to use partitioning rules that have something to do with the WHERE-clauses you'd be using anyway. For instance, try to partition on ranges of id1 instead of id1 % 50. That works because the CHECK clauses will be like id1 = x and id1 y and those operators are btree-related to the id1 = z clauses you'll have in the query. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] partitioning
On Sun, Mar 19, 2006 at 01:31:42PM +0100, Antoine wrote: Hi, Is there any work on the cards for implementing other partitioning strategies? I see mysql 5.1 will have support for hashes and stuff but didn't see anything in the todos for postgres. You'd have to provide a pretty convincing argument for providing hash partitioning I think. I can't really think of any real-world scenarios where it's better than other forms. In any case, the next logical step on the partitioning front is to add some 'syntactic sugar' to make it easier for people to work with partitions. I seem to remember some discussion about that, but I don't recall where it lead to. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] partitioning
Hi, Is there any work on the cards for implementing other partitioning strategies? I see mysql 5.1 will have support for hashes and stuff but didn't see anything in the todos for postgres. Cheers Antoine -- This is where I should put some witty comment. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] partitioning and locking problems
Hi, Marc, Marc Morin wrote: 1- long running report is running on view 2- continuous inserters into view into a table via a rule 3- truncate or rule change occurs, taking an exclusive lock. Must wait for #1 to finish. 4- new reports and inserters must now wait for #3. 5- now everyone is waiting for a single query in #1. Results in loss of insert data granularity (important for our application). Apart from having two separate views (one for report, one for insert) as Richard suggested: If you have fixed times for #3, don't start any #1 that won't finish before it's time for #3. You could also use the LOCK command on an empty lock table at the beginning of each #1 or #3 transaction to prevent #3 from getting the view lock before #1 is finished. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] partitioning and locking problems
Using a separate lock table is what we've decided to do in this particular case to serialize #1 and #3. Inserters don't take this lock and as such will not be stalled. -Original Message- From: Markus Schaber [mailto:[EMAIL PROTECTED] Sent: Thursday, February 02, 2006 7:44 AM To: Marc Morin Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] partitioning and locking problems Hi, Marc, Marc Morin wrote: 1- long running report is running on view 2- continuous inserters into view into a table via a rule 3- truncate or rule change occurs, taking an exclusive lock. Must wait for #1 to finish. 4- new reports and inserters must now wait for #3. 5- now everyone is waiting for a single query in #1. Results in loss of insert data granularity (important for our application). Apart from having two separate views (one for report, one for insert) as Richard suggested: If you have fixed times for #3, don't start any #1 that won't finish before it's time for #3. You could also use the LOCK command on an empty lock table at the beginning of each #1 or #3 transaction to prevent #3 from getting the view lock before #1 is finished. HTH, Markus -- Markus Schaber | Logical TrackingTracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] partitioning and locking problems
Marc Morin wrote: Under both these circumstances (truncate and create / replace rule) the locking behaviour of these commands can cause locking problems for us. The scenario is best illustrated as a series of steps: 1- long running report is running on view 2- continuous inserters into view into a table via a rule 3- truncate or rule change occurs, taking an exclusive lock. Must wait for #1 to finish. 4- new reports and inserters must now wait for #3. 5- now everyone is waiting for a single query in #1. Results in loss of insert data granularity (important for our application). How much would you get from splitting the view into two: reporting and inserting? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] partitioning and locking problems
Tom, Do you mean it would be impossible to change the code so that existing selects continue to use the pre-truncated table until they commit? Or just require a more extensive change? The update/insert rule change appears to be more more doable? No? -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 01, 2006 12:50 AM To: Marc Morin Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] partitioning and locking problems Marc Morin [EMAIL PROTECTED] writes: Would like to understand the implications of changing postgres' code/locking for rule changes and truncate to not require locking out select statements? It won't work... regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] partitioning and locking problems
Marc Morin [EMAIL PROTECTED] writes: Do you mean it would be impossible to change the code so that existing selects continue to use the pre-truncated table until they commit? Yes, because that table won't exist any more (as in the file's been unlinked) once the TRUNCATE commits. The update/insert rule change appears to be more more doable? No? You've still got race conditions there: do onlooker transactions see the old set of rules, or the new set, or some unholy mixture? Removing the lock as you suggest would make it possible for the rule rewriter to pick up non-self-consistent data from the system catalogs, leading to arbitrarily bad behavior ... if you're lucky, it'll just crash, if you're not lucky the incorrect rule will do a fandango on your data. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] partitioning and locking problems
We have a large database system designed around partitioning. Our application is characterized with - terabytes of data - billions of rows in dozens of base tables (and 100s of paritions) - 24x7 insert load of new data that cannot be stopped, data is time sensitive. - periodic reports that can have long running queries with query times measured in hours We have 2 classes of maintenance activities that are causing us problems: - periodically we need to change an insert rule on a view to point to a different partition. - periodically we need to delete data that is no longer needed. Performed via truncate. Under both these circumstances (truncate and create / replace rule) the locking behaviour of these commands can cause locking problems for us. The scenario is best illustrated as a series of steps: 1- long running report is running on view 2- continuous inserters into view into a table via a rule 3- truncate or rule change occurs, taking an exclusive lock. Must wait for #1 to finish. 4- new reports and inserters must now wait for #3. 5- now everyone is waiting for a single query in #1. Results in loss of insert data granularity (important for our application). Would like to understand the implications of changing postgres' code/locking for rule changes and truncate to not require locking out select statements? The following is a simplified schema to help illustrate the problem. create table a_1 ( pkey int primary key ); create table a_2 ( pkey int primary key ); create view a as select * from a_1 union all select * from a_2; create function change_rule(int) returns void as ' begin execute ''create or replace rule insert as on insert to a do instead insert into a_''||$1||''(pkey) values(NEW.pkey)''; end; ' language plpgsql; -- change rule, execute something like the following periodically select change_rule(1); We've looked at the code and the rule changes appear easy but we are concerned about the required changes for truncate. Thanks Marc ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] partitioning and locking problems
Marc Morin [EMAIL PROTECTED] writes: Would like to understand the implications of changing postgres' code/locking for rule changes and truncate to not require locking out select statements? It won't work... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] partitioning
Hi, I've been working on trying to partition a big table (I've never partitioned a table in any other database till now). Everything went ok, except one query that didn't work afterwards. I've put the partition description, indexes, etc ..., and the explain plan attached. The query is extremely fast without partition (index scan backards on the primary key) The query is : select * from logs order by id desc limit 100; id is the primary key. It is indexed on all partitions. But the explain plan does full table scan on all partitions. While I think I understand why it is doing this plan right now, is there something that could be done to optimize this case ? Or put a warning in the docs about this kind of behaviour. I guess normally someone would partition to get faster queries :) Anyway, I thought I should mention this, as it has been quite a surprise. CREATE TABLE logs_150 (CHECK ( id_machine = 150)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_163 (CHECK ( id_machine = 163)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_289 (CHECK ( id_machine = 289)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_319 (CHECK ( id_machine = 319)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_238 (CHECK ( id_machine = 238)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_148 (CHECK ( id_machine = 148)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_176 (CHECK ( id_machine = 176)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_164 (CHECK ( id_machine = 164)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_316 (CHECK ( id_machine = 316)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_313 (CHECK ( id_machine = 313)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_217 (CHECK ( id_machine = 217)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_167 (CHECK ( id_machine = 167)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_287 (CHECK ( id_machine = 287)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_279 (CHECK ( id_machine = 279)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_248 (CHECK ( id_machine = 248)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_280 (CHECK ( id_machine = 280)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_165 (CHECK ( id_machine = 165)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_330 (CHECK ( id_machine = 330)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_149 (CHECK ( id_machine = 149)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_157 (CHECK ( id_machine = 157)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_272 (CHECK ( id_machine = 272)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_311 (CHECK ( id_machine = 311)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_309 (CHECK ( id_machine = 309)) INHERITS (logs) TABLESPACE data_logs; CREATE TABLE logs_318 (CHECK ( id_machine = 318)) INHERITS (logs) TABLESPACE data_logs; CREATE UNIQUE INDEX logs_150_pkey ON logs_150 (id) TABLESPACE index_logs;CREATE INDEX date_150 ON logs_150 (date) TABLESPACE index_logs ;CREATE INDEX event_150 ON logs_150 (evenement) TABLESPACE index_logs; . logs= explain select * from logs order by id desc limit 100; QUERY PLAN - Limit (cost=16524647.29..16524647.54 rows=100 width=295) - Sort (cost=16524647.29..16568367.11 rows=17487927 width=295) Sort Key: public.logs.id - Result (cost=0.00..827622.27 rows=17487927 width=295) - Append (cost=0.00..827622.27 rows=17487927 width=295) - Seq Scan on logs (cost=0.00..826232.78 rows=17451978 width=165) - Seq Scan on logs_150 logs (cost=0.00..199.04 rows=6104 width=144) - Seq Scan on logs_163 logs (cost=0.00..261.79 rows=7079 width=169) - Seq Scan on logs_289 logs (cost=0.00..428.93 rows=10693 width=200) - Seq Scan on logs_319 logs (cost=0.00..31.92 rows=992 width=129) - Seq Scan on logs_238 logs (cost=0.00..28.01 rows=701 width=199) - Seq Scan on logs_148 logs (cost=0.00..80.15 rows=2015 width=195) - Seq Scan on logs_176 logs (cost=0.00..12.40 rows=240 width=295) - Seq Scan on logs_164 logs (cost=0.00..12.40 rows=240 width=295) - Seq Scan on logs_316 logs (cost=0.00..12.40 rows=240 width=295) - Seq Scan on logs_313 logs (cost=0.00..12.40 rows=240 width=295) - Seq Scan on logs_217 logs (cost=0.00..12.40 rows=240 width=295) - Seq Scan on logs_167 logs (cost=0.00..57.36 rows=1536 width=170) - Seq Scan on logs_287 logs (cost=0.00..12.40
Re: [PERFORM] partitioning
Did you set constraint_exclusion = true in postgresql.conf file? On 12/13/05, Marc Cousin [EMAIL PROTECTED] wrote: Hi, I've been working on trying to partition a big table (I've never partitioned a table in any other database till now). Everything went ok, except one query that didn't work afterwards. I've put the partition description, indexes, etc ..., and the explain plan attached. The query is extremely fast without partition (index scan backards on the primary key) The query is : select * from logs order by id desc limit 100; id is the primary key. It is indexed on all partitions. But the explain plan does full table scan on all partitions. While I think I understand why it is doing this plan right now, is there something that could be done to optimize this case ? Or put a warning in the docs about this kind of behaviour. I guess normally someone would partition to get faster queries :) Anyway, I thought I should mention this, as it has been quite a surprise. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Regards Pandu ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] partitioning
I just saw that there is no where clause in the query, that you had fed to explain plan. you need to include a where clause based on id_machine column to see the effect. On 12/13/05, Pandurangan R S [EMAIL PROTECTED] wrote: Did you set constraint_exclusion = true in postgresql.conf file? On 12/13/05, Marc Cousin [EMAIL PROTECTED] wrote: Hi, I've been working on trying to partition a big table (I've never partitioned a table in any other database till now). Everything went ok, except one query that didn't work afterwards. I've put the partition description, indexes, etc ..., and the explain plan attached. The query is extremely fast without partition (index scan backards on the primary key) The query is : select * from logs order by id desc limit 100; id is the primary key. It is indexed on all partitions. But the explain plan does full table scan on all partitions. While I think I understand why it is doing this plan right now, is there something that could be done to optimize this case ? Or put a warning in the docs about this kind of behaviour. I guess normally someone would partition to get faster queries :) Anyway, I thought I should mention this, as it has been quite a surprise. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Regards Pandu ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] partitioning
Yes, that's how I solved it... and I totally agree that it's hard for the planner to guess what to do on the partitions. But maybe there should be something in the docs explaining the limitations ... I'm only asking for the biggest 100 ids from the table, so I thought maybe the planner would take the 100 biggest from all partitions or something like that and return me the 100 biggest from those results. It didn't and that's quite logical. What I meant is that I understand why the planner chooses this plan, but maybe it should be written somewhere in the docs that some plans will be worse after partitionning. Le Mardi 13 Décembre 2005 12:50, vous avez écrit : I just saw that there is no where clause in the query, that you had fed to explain plan. you need to include a where clause based on id_machine column to see the effect. On 12/13/05, Pandurangan R S [EMAIL PROTECTED] wrote: Did you set constraint_exclusion = true in postgresql.conf file? On 12/13/05, Marc Cousin [EMAIL PROTECTED] wrote: Hi, I've been working on trying to partition a big table (I've never partitioned a table in any other database till now). Everything went ok, except one query that didn't work afterwards. I've put the partition description, indexes, etc ..., and the explain plan attached. The query is extremely fast without partition (index scan backards on the primary key) The query is : select * from logs order by id desc limit 100; id is the primary key. It is indexed on all partitions. But the explain plan does full table scan on all partitions. While I think I understand why it is doing this plan right now, is there something that could be done to optimize this case ? Or put a warning in the docs about this kind of behaviour. I guess normally someone would partition to get faster queries :) Anyway, I thought I should mention this, as it has been quite a surprise. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Regards Pandu ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Partitioning / Clustering
If you make the assertion that you are transferring equal or less session data between your session server (lets say an RDBMS) and the app server than you are between the app server and the client, an out of band 100Mb network for session information is plenty of bandwidth. So if you count on a mean page size of 6-8 kbytes gzipped, that will prevent you from caching the N first results of the Big Slow Search Query in a native object in the user session state (say, a list of integers indicating which rows match), so you will have to redo the Big Slow Search Query everytime the user clicks on Next Page instead of grabbing a set of cached row id's and doing a fast SELECT WHERE id IN ... This is the worst case ... I'd gzip() the row id's and stuff them in the session, that's always better than blowing up the database with the Big Slow Search Query everytime someone does Next Page... This also represents OLTP style traffic, which postgresql is pretty good at. You should easily be able to get over 100Tps. 100 hits per second is an awful lot of traffic, more than any website I've managed will ever see. On the latest anandtech benchmarks, 100 hits per second on a blog/forum software is a big bi-opteron server running dotNET, at 99% load... it's a lot if you count only dynamic page hits. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning / Clustering
On 11 May 2005, at 23:35, PFC wrote: However, memcached (and for us, pg_memcached) is an excellent way to improve horizontal scalability by taking disposable data (like session information) out of the database and putting it in protected RAM. So, what is the advantage of such a system versus, say, a sticky sessions system where each session is assigned to ONE application server (not PHP then) which keeps it in RAM as native objects instead of serializing and deserializing it on each request ? I'd say the sticky sessions should perform a lot better, and if one machine dies, only the sessions on this one are lost. But of course you can't do it with PHP as you need an app server which can manage sessions. Potentially the savings are huge, though. Theres no reason it couldn't be done with PHP to be fair as long as you could ensure that the client was always routed back to the same machines. Which has it's own set of issues entirely. I am not entirely sure that memcached actually does serialize data when it's comitted into memcached either, although I could be wrong, I have not looked at the source. Certainly if you can ensure that a client always goes back to the same machine you can simplify the whole thing hugely. It's generally not that easy though, you need a proxy server of some description capable of understanding the HTTP traffic and maintaining a central session lookup table to redirect with. Which isn't really solving the problem so much as moving it somewhere else. Instead of needing huge memcached pools, you need hardcore loadbalancers. Load Balancers tend to cost $ in comparison. Distributed sticky sessions are a rather nice idea, I would like to hear a way of implementing them cheaply (and on PHP) as well. I may have to give that some thought in fact. Oh yeah, and load balancers software often sucks in annoying (if not always important) ways. On Google, their distributed system spans a huge number of PCs and it has redundancy, ie. individual PC failure is a normal thing and is a part of the system, it is handled gracefully. I read a paper on this matter, it's pretty impressive. The google filesystem has nothing to do with databases though, it's more a massive data store / streaming storage. Since when did Massive Data stores have nothing to do with DBs? Isn't Oracle Cluster entirely based on forming an enormous scalable disk array to store your DB on? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Partitioning / Clustering
machines. Which has it's own set of issues entirely. I am not entirely sure that memcached actually does serialize data when it's comitted into I think it does, ie. it's a simple mapping of [string key] = [string value]. memcached either, although I could be wrong, I have not looked at the source. Certainly if you can ensure that a client always goes back to the same machine you can simplify the whole thing hugely. It's generally not that easy though, you need a proxy server of some description capable of understanding the HTTP traffic and maintaining a central Yes... You could implement it by mapping servers to the hash of the user session id. Statistically, the servers would get the same numbers of sessions on each of them, but you have to trust statistics... It does eliminate the lookup table though. idea, I would like to hear a way of implementing them cheaply (and on PHP) as well. I may have to give that some thought in fact. Oh yeah, and load balancers software often sucks in annoying (if not always important) ways. You can use lighttpd as a load balancer, I believe it has a stick sessions plugin (or you could code one in, it's open source after all). It definitely support simple round-robin load balancing, acting as a proxy to any number of independent servers. matter, it's pretty impressive. The google filesystem has nothing to do with databases though, it's more a massive data store / streaming storage. Since when did Massive Data stores have nothing to do with DBs? Isn't Oracle Cluster entirely based on forming an enormous scalable disk array to store your DB on? Um, well, the Google Filesystem is (like its name implies) a filesystem designed to store huge files in a distributed and redundant manner. Files are structured as a stream of records (which are themselves big in size) and it's designed to support appending records to these stream files efficiently and without worrying about locking. It has no querying features however, that is why I said it was not a database. I wish I could find the whitepaper, I think the URL was on this list some day, maybe it's on Google's site ? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning / Clustering
Having local sessions is unnesesary, and here is my logic: Generaly most people have less than 100Mb of bandwidth to the internet. If you make the assertion that you are transferring equal or less session data between your session server (lets say an RDBMS) and the app server than you are between the app server and the client, an out of band 100Mb network for session information is plenty of bandwidth. This also represents OLTP style traffic, which postgresql is pretty good at. You should easily be able to get over 100Tps. 100 hits per second is an awful lot of traffic, more than any website I've managed will ever see. Why solve the complicated clustered sessions problem, when you don't really need to? Alex Turner netEconomist On 5/11/05, PFC [EMAIL PROTECTED] wrote: However, memcached (and for us, pg_memcached) is an excellent way to improve horizontal scalability by taking disposable data (like session information) out of the database and putting it in protected RAM. So, what is the advantage of such a system versus, say, a sticky sessions system where each session is assigned to ONE application server (not PHP then) which keeps it in RAM as native objects instead of serializing and deserializing it on each request ? I'd say the sticky sessions should perform a lot better, and if one machine dies, only the sessions on this one are lost. But of course you can't do it with PHP as you need an app server which can manage sessions. Potentially the savings are huge, though. On Google, their distributed system spans a huge number of PCs and it has redundancy, ie. individual PC failure is a normal thing and is a part of the system, it is handled gracefully. I read a paper on this matter, it's pretty impressive. The google filesystem has nothing to do with databases though, it's more a massive data store / streaming storage. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioning / Clustering
On 12 May 2005, at 15:08, Alex Turner wrote: Having local sessions is unnesesary, and here is my logic: Generaly most people have less than 100Mb of bandwidth to the internet. If you make the assertion that you are transferring equal or less session data between your session server (lets say an RDBMS) and the app server than you are between the app server and the client, an out of band 100Mb network for session information is plenty of bandwidth. This also represents OLTP style traffic, which postgresql is pretty good at. You should easily be able to get over 100Tps. 100 hits per second is an awful lot of traffic, more than any website I've managed will ever see. Why solve the complicated clustered sessions problem, when you don't really need to? 100 hits a second = 8,640,000 hits a day. I work on a site which does 100 million dynamic pages a day. In comparison Yahoo probably does 100,000,000,000 (100 billion) views a day if I am interpreting Alexa's charts correctly. Which is about 1,150,000 a second. Now considering the site I work on is not even in the top 1000 on Alexa, theres a lot of sites out there which need to solve this problem I would assume. There are also only so many hash table lookups a single machine can do, even if its a Quad Opteron behemoth. Alex Turner netEconomist On 5/11/05, PFC [EMAIL PROTECTED] wrote: However, memcached (and for us, pg_memcached) is an excellent way to improve horizontal scalability by taking disposable data (like session information) out of the database and putting it in protected RAM. So, what is the advantage of such a system versus, say, a sticky sessions system where each session is assigned to ONE application server (not PHP then) which keeps it in RAM as native objects instead of serializing and deserializing it on each request ? I'd say the sticky sessions should perform a lot better, and if one machine dies, only the sessions on this one are lost. But of course you can't do it with PHP as you need an app server which can manage sessions. Potentially the savings are huge, though. On Google, their distributed system spans a huge number of PCs and it has redundancy, ie. individual PC failure is a normal thing and is a part of the system, it is handled gracefully. I read a paper on this matter, it's pretty impressive. The google filesystem has nothing to do with databases though, it's more a massive data store / streaming storage. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Partitioning / Clustering
Ok - my common sense alarm is going off here... There are only 6.446 billion people worldwide. 100 Billion page views would require every person in the world to view 18 pages of yahoo every day. Not very likely. http://www.internetworldstats.com/stats.htm suggests that there are around 1 billion people actualy on the internet. That means each and every person on the internet has to view 100 pages per day of yahoo. pretty unlikely IMHO. I for one don't even use Yahoo ;) 100 million page views per day suggests that 1 in 100 people on the internet each viewed 10 pages of a site. Thats a pretty high percentage if you ask me. If I visit 20 web sites in a day, and see an average of 10 pages per site. that means only about 2000 or so sites generate 100 million page views in a day or better. 100 million pageviews averages to 1157/sec, which we'll double for peak load to 2314. I can easily see a system doing 2314 hash lookups per second. Hell I wrote a system that could do a thousand times that four years ago on a single 1Ghz Athlon. Heck - you can get 2314 lookups/sec on a 486 ;) Given that session information doesn't _have_ to persist to storage, and can be kept in RAM. A single server could readily manage session information for even very large sites (of course over a million concurrent users could really start chewing into RAM, but if you are Yahoo, you can probably afford a box with 100GB of RAM ;). We get over 1000 tps on a dual opteron with a couple of mid size RAID arrays on 10k discs with fsync on for small transactions. I'm sure that could easily be bettered with a few more dollars. Maybe my number are off, but somehow it doesn't seem like that many people need a highly complex session solution to me. Alex Turner netEconomist On 5/12/05, Alex Stapleton [EMAIL PROTECTED] wrote: On 12 May 2005, at 15:08, Alex Turner wrote: Having local sessions is unnesesary, and here is my logic: Generaly most people have less than 100Mb of bandwidth to the internet. If you make the assertion that you are transferring equal or less session data between your session server (lets say an RDBMS) and the app server than you are between the app server and the client, an out of band 100Mb network for session information is plenty of bandwidth. This also represents OLTP style traffic, which postgresql is pretty good at. You should easily be able to get over 100Tps. 100 hits per second is an awful lot of traffic, more than any website I've managed will ever see. Why solve the complicated clustered sessions problem, when you don't really need to? 100 hits a second = 8,640,000 hits a day. I work on a site which does 100 million dynamic pages a day. In comparison Yahoo probably does 100,000,000,000 (100 billion) views a day if I am interpreting Alexa's charts correctly. Which is about 1,150,000 a second. Now considering the site I work on is not even in the top 1000 on Alexa, theres a lot of sites out there which need to solve this problem I would assume. There are also only so many hash table lookups a single machine can do, even if its a Quad Opteron behemoth. Alex Turner netEconomist On 5/11/05, PFC [EMAIL PROTECTED] wrote: However, memcached (and for us, pg_memcached) is an excellent way to improve horizontal scalability by taking disposable data (like session information) out of the database and putting it in protected RAM. So, what is the advantage of such a system versus, say, a sticky sessions system where each session is assigned to ONE application server (not PHP then) which keeps it in RAM as native objects instead of serializing and deserializing it on each request ? I'd say the sticky sessions should perform a lot better, and if one machine dies, only the sessions on this one are lost. But of course you can't do it with PHP as you need an app server which can manage sessions. Potentially the savings are huge, though. On Google, their distributed system spans a huge number of PCs and it has redundancy, ie. individual PC failure is a normal thing and is a part of the system, it is handled gracefully. I read a paper on this matter, it's pretty impressive. The google filesystem has nothing to do with databases though, it's more a massive data store / streaming storage. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Partitioning / Clustering
Alex Turner wrote: Ok - my common sense alarm is going off here... There are only 6.446 billion people worldwide. 100 Billion page views would require every person in the world to view 18 pages of yahoo every day. Not very likely. http://www.internetworldstats.com/stats.htm suggests that there are around 1 billion people actualy on the internet. That means each and every person on the internet has to view 100 pages per day of yahoo. pretty unlikely IMHO. I for one don't even use Yahoo ;) 100 million page views per day suggests that 1 in 100 people on the internet each viewed 10 pages of a site. Thats a pretty high percentage if you ask me. In general I think your point is valid. Just remember that it probably also matters how you count page views. Because technically images are a separate page (and this thread did discuss serving up images). So if there are 20 graphics on a specific page, that is 20 server hits just for that one page. I could easily see an image heavy site getting 100 hits / page. Which starts meaning that if 1M users hit 10 pages, then you get 1M*10*100 = 1G. I still think 100G views on a single website is a lot, but 100M is certainly possible. John =:- signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Partitioning / Clustering
100 hits a second = 8,640,000 hits a day. I work on a site which does 100 million dynamic pages a day. In comparison Yahoo probably does 100,000,000,000 (100 billion) views a day if I am interpreting Alexa's charts correctly. Which is about 1,150,000 a second. Read the help on Alexa's site... ;) ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Partitioning / Clustering
People, In general I think your point is valid. Just remember that it probably also matters how you count page views. Because technically images are a separate page (and this thread did discuss serving up images). So if there are 20 graphics on a specific page, that is 20 server hits just for that one page. Also, there's bots and screen-scrapers and RSS, web e-mails, and web services and many other things which create hits but are not people. I'm currently working on clickstream for a site which is nowhere in the top 100, and is getting 3 million real hits a day ... and we know for a fact that at least 1/4 of that is bots. Regardless, the strategy you should be employing for a high traffic site is that if your users hit the database for anything other than direct interaction (like filling out a webform) then you're lost.Use memcached, squid, lighttpd caching, ASP.NET caching, pools, etc. Keep the load off the database except for the stuff that only the database can do. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioning / Clustering
On 12 May 2005, at 18:33, Josh Berkus wrote: People, In general I think your point is valid. Just remember that it probably also matters how you count page views. Because technically images are a separate page (and this thread did discuss serving up images). So if there are 20 graphics on a specific page, that is 20 server hits just for that one page. Also, there's bots and screen-scrapers and RSS, web e-mails, and web services and many other things which create hits but are not people. I'm currently working on clickstream for a site which is nowhere in the top 100, and is getting 3 million real hits a day ... and we know for a fact that at least 1/4 of that is bots. I doubt bots are generally Alexa toolbar enabled. Regardless, the strategy you should be employing for a high traffic site is that if your users hit the database for anything other than direct interaction (like filling out a webform) then you're lost.Use memcached, squid, lighttpd caching, ASP.NET caching, pools, etc. Keep the load off the database except for the stuff that only the database can do. This is the aproach I would take as well. There is no point storing stuff in a DB, if your only doing direct lookups on it and it isn't the sort of data that you care so much about the integrity of. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning / Clustering
Ross, Memcached is a PG memory store, I gather, Nope. It's a hyperfast resident-in-memory hash that allows you to stash stuff like user session information and even materialized query set results. Thanks to SeanC, we even have a plugin, pgmemcached. but...what is squid, lighttpd? anything directly PG-related? No. These are all related to making the web server do more. The idea is NOT to hit the database every time you have to serve up a web page, and possibly not to hit the web server either. For example, you can use squid 3 for reverse caching in front of your web server, and serve far more page views than you could with Apache alone. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Partitioning / Clustering
On Tue, 2005-05-10 at 11:03 +0100, Alex Stapleton wrote: So, when/is PG meant to be getting a decent partitioning system? ISTM that your question seems to confuse where code comes from. Without meaning to pick on you, or reply rudely, I'd like to explore that question. Perhaps it should be a FAQ entry. All code is written by someone, and those people need to eat. Some people are fully or partly funded to perform their tasks on this project (coding, patching, etc). Others contribute their time for a variety of reasons where involvement has a positive benefit. You should ask these questions: - Is anyone currently working on (Feature X)? - If not, Can I do it myself? - If not, and I still want it, can I fund someone else to build it for me? Asking when is Feature X going to happen is almost certainly going to get the answer never otherwise, if the initial development is large and complex. There are many TODO items that have lain untouched for years, even though adding the feature has been discussed and agreed. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioning / Clustering
For an interesting look at scalability, clustering, caching, etc for a large site have a look at how livejournal did it. http://www.danga.com/words/2004_lisa/lisa04.pdf They have 2.6 Million active users, posting 200 new blog entries per minute, plus many comments and countless page views. Although this system is of a different sort to the type I work on it's interesting to see how they've made it scale. They use mysql on dell hardware! And found single master replication did not scale. There's a section on multimaster replication, not sure if they use it. The main approach they use is to parition users into spefic database clusters. Caching is done using memcached at the application level to avoid hitting the db for rendered pageviews. It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up. David ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioning / Clustering
On 11 May 2005, at 08:57, David Roussel wrote: For an interesting look at scalability, clustering, caching, etc for a large site have a look at how livejournal did it. http://www.danga.com/words/2004_lisa/lisa04.pdf I have implemented similar systems in the past, it's a pretty good technique, unfortunately it's not very Plug-and-Play as you have to base most of your API on memcached (I imagine MySQLs NDB tables might work as well actually) for it to work well. They have 2.6 Million active users, posting 200 new blog entries per minute, plus many comments and countless page views. Although this system is of a different sort to the type I work on it's interesting to see how they've made it scale. They use mysql on dell hardware! And found single master replication did not scale. There's a section on multimaster replication, not sure if they use it. The main approach they use is to parition users into spefic database clusters. Caching is done using memcached at the application level to avoid hitting the db for rendered pageviews I don't think they are storing pre-rendered pages (or bits of) in memcached, but are principally storing the data for the pages in it. Gluing pages together is not a hugely intensive process usually :) The only problem with memcached is that the clients clustering/ partitioning system will probably break if a node dies, and probably get confused if you add new nodes onto it as well. Easily extensible clustering (no complete redistribution of data required when you add/ remove nodes) with the data distributed across nodes seems to be nothing but a pipe dream right now. It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up. Don't Google use indexing servers which keep track of where data is? So that you only need to update them when you add or move data, deletes don't even have to be propagated among indexes immediately really because you'll find out if data isn't there when you visit where it should be. Or am I talking crap? David ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioning / Clustering
On 11 May 2005, at 09:50, Alex Stapleton wrote: On 11 May 2005, at 08:57, David Roussel wrote: For an interesting look at scalability, clustering, caching, etc for a large site have a look at how livejournal did it. http://www.danga.com/words/2004_lisa/lisa04.pdf I have implemented similar systems in the past, it's a pretty good technique, unfortunately it's not very Plug-and-Play as you have to base most of your API on memcached (I imagine MySQLs NDB tables might work as well actually) for it to work well. They have 2.6 Million active users, posting 200 new blog entries per minute, plus many comments and countless page views. Although this system is of a different sort to the type I work on it's interesting to see how they've made it scale. They use mysql on dell hardware! And found single master replication did not scale. There's a section on multimaster replication, not sure if they use it. The main approach they use is to parition users into spefic database clusters. Caching is done using memcached at the application level to avoid hitting the db for rendered pageviews I don't think they are storing pre-rendered pages (or bits of) in memcached, but are principally storing the data for the pages in it. Gluing pages together is not a hugely intensive process usually :) The only problem with memcached is that the clients clustering/ partitioning system will probably break if a node dies, and probably get confused if you add new nodes onto it as well. Easily extensible clustering (no complete redistribution of data required when you add/remove nodes) with the data distributed across nodes seems to be nothing but a pipe dream right now. It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up. Don't Google use indexing servers which keep track of where data is? So that you only need to update them when you add or move data, deletes don't even have to be propagated among indexes immediately really because you'll find out if data isn't there when you visit where it should be. Or am I talking crap? That will teach me to RTFA first ;) Ok so LJ maintain an index of which cluster each user is on, kinda of like google do :) David ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning / Clustering
Acceptable Answers to 'So, when/is PG meant to be getting a decent partitioning system?': 1. Person X is working on it I believe. 2. It's on the list, but nobody has done anything about it yet 3. Your welcome to take a stab at it, I expect the community would support your efforts as well. 4. If you have a huge pile of money you could probably buy the Moon. Thinking along those lines, you can probably pay someone to write it for you. 5. It's a stupid idea, and it's never going to work, and heres why.. Unacceptable Answers to the same question: 1. Yours. Be more helpful, and less arrogant please. Everyone else who has contributed to this thread has been very helpful in clarifying the state of affairs and pointing out what work is and isn't being done, and alternatives to just waiting for PG do it for you. Please YOU be more helpful and less arrogant. I thought your inital email was arrogant, demanding and insulting. Your followup email has done nothing to dispel my impression. Simon (one of PostgreSQL's major contributors AND one of the very few people working on partitioning in PostgreSQL, as you requested) told you all the reasons clearly and politely. Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Partitioning / Clustering
Mischa Sandberg [EMAIL PROTECTED] writes: So, simplicity dictates something like: table pg_remote(schemaname text, connectby text, remoteschema text) Previous discussion of this sort of thing concluded that we wanted to follow the SQL-MED standard. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning / Clustering
David, It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up. Yes, although again, they're using memcached as pseudo-clustering software, and as a result are limited to what fits in RAM (RAM on 27 machines, but it's still RAM). And due to limitations on memcached, the whole thing blows whenever a server goes out (the memcached project is working on this). But any LJ user could tell you that it's a low-availability system. However, memcached (and for us, pg_memcached) is an excellent way to improve horizontal scalability by taking disposable data (like session information) out of the database and putting it in protected RAM. On some websites, adding memcached can result is as much as a 60% decrease in database traffic. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Partitioning / Clustering
On Wed, 2005-05-11 at 17:13 +0800, Christopher Kings-Lynne wrote: Alex Stapleton wrote Be more helpful, and less arrogant please. Simon told you all the reasons clearly and politely. Thanks Chris for your comments. PostgreSQL can always do with one more developer and my sole intent was to encourage Alex and other readers to act themselves. If my words seem arrogant, then I apologise to any and all that think so. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioning / Clustering
On Wed, May 11, 2005 at 08:57:57AM +0100, David Roussel wrote: For an interesting look at scalability, clustering, caching, etc for a large site have a look at how livejournal did it. http://www.danga.com/words/2004_lisa/lisa04.pdf They have 2.6 Million active users, posting 200 new blog entries per minute, plus many comments and countless page views. Neither of which is that horribly impressive. 200 TPM is less than 4TPS. While I haven't run high transaction rate databases under PostgreSQL, I suspect others who have will say that 4TPS isn't that big of a deal. Although this system is of a different sort to the type I work on it's interesting to see how they've made it scale. They use mysql on dell hardware! And found single master replication did not scale. There's a section on multimaster replication, not sure if Probably didn't scale because they used to use MyISAM. they use it. The main approach they use is to parition users into spefic database clusters. Caching is done using memcached at the Which means they've got a huge amount of additional code complexity, not to mention how many times you can't post something because 'that cluster is down for maintenance'. application level to avoid hitting the db for rendered pageviews. Memcached is about the only good thing I've seen come out of livejournal. It's interesting that the solution livejournal have arrived at is quite similar in ways to the way google is set up. Except that unlike LJ, google stays up and it's fast. Though granted, LJ is quite a bit faster than it was 6 months ago. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Partitioning / Clustering
However, memcached (and for us, pg_memcached) is an excellent way to improve horizontal scalability by taking disposable data (like session information) out of the database and putting it in protected RAM. So, what is the advantage of such a system versus, say, a sticky sessions system where each session is assigned to ONE application server (not PHP then) which keeps it in RAM as native objects instead of serializing and deserializing it on each request ? I'd say the sticky sessions should perform a lot better, and if one machine dies, only the sessions on this one are lost. But of course you can't do it with PHP as you need an app server which can manage sessions. Potentially the savings are huge, though. On Google, their distributed system spans a huge number of PCs and it has redundancy, ie. individual PC failure is a normal thing and is a part of the system, it is handled gracefully. I read a paper on this matter, it's pretty impressive. The google filesystem has nothing to do with databases though, it's more a massive data store / streaming storage. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Partitioning / Clustering
Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be using better machines, but I am trying to tackle the real issue) is not a hugely price efficient way of getting extra performance, nor particularly scalable in the long term. Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far away from Big Iron. I don't know what performance you are looking for, but you can easily get into inserting 10M rows/day with quality hardware. But actually is it your SELECT load that is too high, or your INSERT load, or something inbetween. Because Slony is around if it is a SELECT problem. http://gborg.postgresql.org/project/slony1/projdisplay.php Basically, Slony is a Master/Slave replication system. So if you have INSERT going into the Master, you can have as many replicated slaves, which can handle your SELECT load. Slony is an asynchronous replicator, so there is a time delay from the INSERT until it will show up on a slave, but that time could be pretty small. This would require some application level support, since an INSERT goes to a different place than a SELECT. But there has been some discussion about pg_pool being able to spread the query load, and having it be aware of the difference between a SELECT and an INSERT and have it route the query to the correct host. The biggest problem being that functions could cause a SELECT func() to actually insert a row, which pg_pool wouldn't know about. There are 2 possible solutions, a) don't do that when you are using this system, b) add some sort of comment hint so that pg_pool can understand that the select is actually an INSERT, and needs to be done on the master. So, when/is PG meant to be getting a decent partitioning system? MySQL is getting one (eventually) which is apparently meant to be similiar to Oracle's according to the docs. Clusgres does not appear to be widely/or at all used, and info on it seems pretty thin on the ground, so I am not too keen on going with that. Is the real solution to multi- machine partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL actually doing it in our application API? This seems like a less than perfect solution once we want to add redundancy and things into the mix. There is also PGCluster http://pgfoundry.org/projects/pgcluster/ Which is trying to be more of a Synchronous multi-master system. I haven't heard of Clusgres, so I'm guessing it is an older attempt, which has been overtaken by pgcluster. Just realize that clusters don't necessarily scale like you would want them too. Because at some point you have to insert into the same table, which means you need to hold a lock which prevents the other machine from doing anything. And with synchronous replication, you have to wait for all of the machines to get a copy of the data before you can say it has been committed, which does *not* scale well with the number of machines. If you can make it work, I think having a powerful master server, who can finish an INSERT quickly, and then having a bunch of Slony slaves with a middleman (like pg_pool) to do load balancing among them, is the best way to scale up. There are still some requirements, like not having to see the results of an INSERT instantly (though if you are using hinting to pg_pool, you could hint that this query must be done on the master, realizing that the more you do it, the more you slow everything down). John =:- PS I don't know what functionality has been actually implemented in pg_pool, just that it was discussed in the past. Slony-II is also in the works. signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Partitioning / Clustering
I think that perhaps he was trying to avoid having to buy Big Iron at all. With all the Opteron v. Xeon around here, and talk of $30,000 machines, perhaps it would be worth exploring the option of buying 10 cheapass machines for $300 each. At the moment, that $300 buys you, from Dell, a 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet. The aggregate CPU and bandwidth is pretty stupendous, but not as easy to harness as a single machine. For those of us looking at batch and data warehousing applications, it would be really handy to be able to partition databases, tables, and processing load across banks of cheap hardware. Yes, clustering solutions can distribute the data, and can even do it on a per-table basis in some cases. This still leaves it up to the application's logic to handle reunification of the data. Ideas: 1. Create a table/storage type that consists of a select statement on another machine. While I don't think the current executor is capable of working on multiple nodes of an execution tree at the same time, it would be great if it could offload a select of tuples from a remote table to an entirely different server and merge the resulting data into the current execution. I believe MySQL has this, and Oracle may implement it in another way. 2. There is no #2 at this time, but I'm sure one can be hypothesized. ...Google and other companies have definitely proved that one can harness huge clusters of cheap hardware. It can't be _that_ hard, can it. :) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John A Meinel Sent: Tuesday, May 10, 2005 7:41 AM To: Alex Stapleton Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Partitioning / Clustering Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be using better machines, but I am trying to tackle the real issue) is not a hugely price efficient way of getting extra performance, nor particularly scalable in the long term. Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far away from Big Iron. I don't know what performance you are looking for, but you can easily get into inserting 10M rows/day with quality hardware. But actually is it your SELECT load that is too high, or your INSERT load, or something inbetween. Because Slony is around if it is a SELECT problem. http://gborg.postgresql.org/project/slony1/projdisplay.php Basically, Slony is a Master/Slave replication system. So if you have INSERT going into the Master, you can have as many replicated slaves, which can handle your SELECT load. Slony is an asynchronous replicator, so there is a time delay from the INSERT until it will show up on a slave, but that time could be pretty small. This would require some application level support, since an INSERT goes to a different place than a SELECT. But there has been some discussion about pg_pool being able to spread the query load, and having it be aware of the difference between a SELECT and an INSERT and have it route the query to the correct host. The biggest problem being that functions could cause a SELECT func() to actually insert a row, which pg_pool wouldn't know about. There are 2 possible solutions, a) don't do that when you are using this system, b) add some sort of comment hint so that pg_pool can understand that the select is actually an INSERT, and needs to be done on the master. So, when/is PG meant to be getting a decent partitioning system? MySQL is getting one (eventually) which is apparently meant to be similiar to Oracle's according to the docs. Clusgres does not appear to be widely/or at all used, and info on it seems pretty thin on the ground, so I am not too keen on going with that. Is the real solution to multi- machine partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL actually doing it in our application API? This seems like a less than perfect solution once we want to add redundancy and things into the mix. There is also PGCluster http://pgfoundry.org/projects/pgcluster/ Which is trying to be more of a Synchronous multi-master system. I haven't heard of Clusgres, so I'm guessing it is an older attempt, which has been overtaken by pgcluster. Just realize that clusters don't necessarily scale like you would want them too. Because at some point you have to insert into the same table, which means you need to hold a lock which prevents the other machine from doing anything. And with synchronous replication, you have to wait for all of the machines to get a copy of the data before you can say it has been committed, which does *not* scale well with the number of machines. If you can
Re: [PERFORM] Partitioning / Clustering
On 10 May 2005, at 15:41, John A Meinel wrote: Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be using better machines, but I am trying to tackle the real issue) is not a hugely price efficient way of getting extra performance, nor particularly scalable in the long term. Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far away from Big Iron. I don't know what performance you are looking for, but you can easily get into inserting 10M rows/day with quality hardware. Better hardware = More Efficient != More Scalable But actually is it your SELECT load that is too high, or your INSERT load, or something inbetween. Because Slony is around if it is a SELECT problem. http://gborg.postgresql.org/project/slony1/projdisplay.php Basically, Slony is a Master/Slave replication system. So if you have INSERT going into the Master, you can have as many replicated slaves, which can handle your SELECT load. Slony is an asynchronous replicator, so there is a time delay from the INSERT until it will show up on a slave, but that time could be pretty small. snip So, when/is PG meant to be getting a decent partitioning system? MySQL is getting one (eventually) which is apparently meant to be similiar to Oracle's according to the docs. Clusgres does not appear to be widely/or at all used, and info on it seems pretty thin on the ground, so I am not too keen on going with that. Is the real solution to multi- machine partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL actually doing it in our application API? This seems like a less than perfect solution once we want to add redundancy and things into the mix. There is also PGCluster http://pgfoundry.org/projects/pgcluster/ Which is trying to be more of a Synchronous multi-master system. I haven't heard of Clusgres, so I'm guessing it is an older attempt, which has been overtaken by pgcluster. Just realize that clusters don't necessarily scale like you would want them too. Because at some point you have to insert into the same table, which means you need to hold a lock which prevents the other machine from doing anything. And with synchronous replication, you have to wait for all of the machines to get a copy of the data before you can say it has been committed, which does *not* scale well with the number of machines. This is why I mention partitioning. It solves this issue by storing different data sets on different machines under the same schema. These seperate chunks of the table can then be replicated as well for data redundancy and so on. MySQL are working on these things, but PG just has a bunch of third party extensions, I wonder why these are not being integrated into the main trunk :/ Thanks for pointing me to PGCluster though. It looks like it should be better than Slony at least. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Partitioning / Clustering
On 10 May 2005, at 16:02, Adam Haberlach wrote: I think that perhaps he was trying to avoid having to buy Big Iron at all. You would be right. Although we are not against paying a bit more than $300 for a server ;) With all the Opteron v. Xeon around here, and talk of $30,000 machines, perhaps it would be worth exploring the option of buying 10 cheapass machines for $300 each. At the moment, that $300 buys you, from Dell, a 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet. The aggregate CPU and bandwidth is pretty stupendous, but not as easy to harness as a single machine. snip Yes, clustering solutions can distribute the data, and can even do it on a per-table basis in some cases. This still leaves it up to the application's logic to handle reunification of the data. If your going to be programming that sort of logic into your API in the beginning, it's not too much more work to add basic replication, load balancing and partitioning into it either. But the DB should be able to do it for you, adding that stuff in later is often more difficult and less likely to get done. Ideas: 1. Create a table/storage type that consists of a select statement on another machine. While I don't think the current executor is capable of working on multiple nodes of an execution tree at the same time, it would be great if it could offload a select of tuples from a remote table to an entirely different server and merge the resulting data into the current execution. I believe MySQL has this, and Oracle may implement it in another way. MySQL sort of has this, it's not as good as Oracle's though. Apparently there is a much better version of it in 5.1 though, that should make it to stable sometime next year I imagine. 2. There is no #2 at this time, but I'm sure one can be hypothesized. I would of thought a particularly smart version of pg_pool could do it. It could partition data to different servers if it knew which columns to key by on each table. ...Google and other companies have definitely proved that one can harness huge clusters of cheap hardware. It can't be _that_ hard, can it. :) I shudder to think how much the Big Iron equivalent of a google data-center would cost. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John A Meinel Sent: Tuesday, May 10, 2005 7:41 AM To: Alex Stapleton Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Partitioning / Clustering Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be using better machines, but I am trying to tackle the real issue) is not a hugely price efficient way of getting extra performance, nor particularly scalable in the long term. Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far away from Big Iron. I don't know what performance you are looking for, but you can easily get into inserting 10M rows/day with quality hardware. But actually is it your SELECT load that is too high, or your INSERT load, or something inbetween. Because Slony is around if it is a SELECT problem. http://gborg.postgresql.org/project/slony1/projdisplay.php Basically, Slony is a Master/Slave replication system. So if you have INSERT going into the Master, you can have as many replicated slaves, which can handle your SELECT load. Slony is an asynchronous replicator, so there is a time delay from the INSERT until it will show up on a slave, but that time could be pretty small. This would require some application level support, since an INSERT goes to a different place than a SELECT. But there has been some discussion about pg_pool being able to spread the query load, and having it be aware of the difference between a SELECT and an INSERT and have it route the query to the correct host. The biggest problem being that functions could cause a SELECT func() to actually insert a row, which pg_pool wouldn't know about. There are 2 possible solutions, a) don't do that when you are using this system, b) add some sort of comment hint so that pg_pool can understand that the select is actually an INSERT, and needs to be done on the master. So, when/is PG meant to be getting a decent partitioning system? MySQL is getting one (eventually) which is apparently meant to be similiar to Oracle's according to the docs. Clusgres does not appear to be widely/or at all used, and info on it seems pretty thin on the ground, so I am not too keen on going with that. Is the real solution to multi- machine partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL actually doing it in our application API? This seems like a less than perfect solution once we want to add