subject:"\[PERFORM\] Partitioning"

[PERFORM] Partitioning perfomance issue

2017-09-17 Thread Konstantin Kivi

I tried to use partitioning and have problem with it,
as I get very bad perfomance. I cannot understand, what I am doing wrong.


I set up master and child tables via inheritance, with range CHECK by date
and with
trigger on 'insert', as described in the documentation.

I was happy with insertion speed, it was about 30 megabytes per second that
was more than I expected,
and server idle time was near 95 %. I used 100 parallel clients.

However, when it came to updates things turned very bad.
I set up a test with 30 running client making 1 updates each in a
random fashion.
updates via master table took 6 times longer and server idle time dropped
to 15%, user CPU 75% with load average 15.

Test details below

30 updates ( 30 processes 1 selects each)

via master table 134 seconds
via child table  20 seconds

30 updates via master table without "date1 >= '2017-09-06' and date1 <
'2017-09-07'" clause
180 seconds
That means that constraint_exlusion works, however, the process of
exclusion takes A LOT OF time.

I tried to repeat the test with selects

30 selects ( 30 processes 1 selects each)

via master table 50 seconds
via child table  8 seconds

This is very bad too.

The documentation says that it is not good to have 1000 partition, probably
100 is OK, but I have only 40 partitions
and have noticeable delays with only 5 partitions.

What I also cannot understand, why time increase for 'select'
is much higher (2.5 times) than time increase for 'update', considering
that 'where' clause is identical
and assuming time is spent selecting relevant child tables.

Best regards, Konstantin

Environment description.


Postgres 9.5 on linux

db=> select version();

version
--
 PostgreSQL 9.5.8 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-11), 64-bit
(1 row)
db=>


16 CPU

vendor_id: GenuineIntel
cpu family: 6
model: 45
model name: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz

128GB ram

32GB shared_buffers


Table statistics

db=> select count(*) from my_log_daily;
 count

 408568
(1 row)

db=> select count(*) from my_log_daily_170906;
 count

 408568
(1 row)

db=>

explain (ANALYZE,BUFFERS) select stage+1 from  my_log_daily_170906  where
date1 >= '2017-09-06' and date1 < '2017-09-07' and msgid1=3414253 and
msgid2=20756 and msgid3=1504712117 and instance='WS6';
   QUERY
PLAN
-
 Index Scan using my_log_daily_idx_170906 on my_log_daily_170906
(cost=0.42..8.46 rows=1 width=4) (actual time=0.013..0.014 rows=1 loops=1)
   Index Cond: ((msgid1 = 3414253) AND (msgid2 = 20756) AND (msgid3 =
1504712117) AND ((instance)::text = 'WS6'::text))
   Filter: ((date1 >= '2017-09-06 00:00:00'::timestamp without time zone)
AND (date1 < '2017-09-07 00:00:00'::timestamp without time zone))
   Buffers: shared hit=4
 Planning time: 0.135 ms
 Execution time: 0.029 ms
(6 rows)

db=>

explain (ANALYZE,BUFFERS) select stage+1 from  my_log_daily  where  date1
>= '2017-09-06' and date1 < '2017-09-07' and msgid1=3414253 and
msgid2=20756 and msgid3=1504712117 and instance='WS6';

QUERY
PLAN

---
 Result  (cost=0.00..8.46 rows=2 width=4) (actual time=0.016..0.017 rows=1
loops=1)
   Buffers: shared hit=4
   ->  Append  (cost=0.00..8.45 rows=2 width=4) (actual time=0.013..0.014
rows=1 loops=1)
 Buffers: shared hit=4
 ->  Seq Scan on my_log_daily  (cost=0.00..0.00 rows=1 width=4)
(actual time=0.000..0.000 rows=0 loops=1)
   Filter: ((date1 >= '2017-09-06 00:00:00'::timestamp without
time zone) AND (date1 < '2017-09-07 00:00:00'::timestamp without time zone)
AND (msgid1 = 3414253) AND (msgid2 = 20756) AND (msgid3 = 1504712117) AND
((instance)::text = 'WS6'::text))
 ->  Index Scan using my_log_daily_idx_170906 on
my_log_daily_170906  (cost=0.42..8.45 rows=1 width=4) (actual
time=0.012..0.013 rows=1 loops=1)
   Index Cond: ((msgid1 = 3414253) AND (msgid2 = 20756) AND
(msgid3 = 1504712117) AND ((instance)::text = 'WS6'::text))
   Filter: ((date1 >= '2017-09-06 00:00:00'::timestamp without
time zone) AND (date1 < '2017-09-07 00:00:00'::timestamp without time zone))
   Buffers: shared hit=4
 Planning time: 2.501 ms
 Execution time: 0.042 ms
(12 rows)

db=>

explain (ANALYZE,BUFFERS) update  my_log_daily_170906 set stage=stage+1
where  date1 >= '2017-09-06' and date1 < '2017-09-07' and msgid1=3414253
and msgid2=20756 and msgid3=1504712117 and

Re: [PERFORM] partitioning materialized views

2017-07-19 Thread Claudio Freire

On Fri, Jul 7, 2017 at 10:12 AM, Shaun Thomas
 wrote:
>> I don't think the downstream dependencies will let that work without
>> rebuilding them as well.   The drop fails (without a cascade), and the other
>> views and matviews that are built off of this all simply point to x_old.
>
> Wow, ouch. Yeah, I'd neglected to consider dependent objects. Your
> only "out" at this point is to either add or utilize a "modified_date"
> column of some kind, so you can maintain a different MV with some
> recent window of data, and regularly merge that into a physical local
> copy (not an MV) sort of like a running ETL. Though that won't help
> with deletes, unfortunately.

You have another out: rebuild the dependent views before the drop.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning materialized views

2017-07-07 Thread Shaun Thomas

> I don't think the downstream dependencies will let that work without
> rebuilding them as well.   The drop fails (without a cascade), and the other
> views and matviews that are built off of this all simply point to x_old.

Wow, ouch. Yeah, I'd neglected to consider dependent objects. Your
only "out" at this point is to either add or utilize a "modified_date"
column of some kind, so you can maintain a different MV with some
recent window of data, and regularly merge that into a physical local
copy (not an MV) sort of like a running ETL. Though that won't help
with deletes, unfortunately.

-- 
Shaun M Thomas - 2ndQuadrant
PostgreSQL Training, Services and Support
shaun.tho...@2ndquadrant.com | www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning materialized views

2017-07-06 Thread Rick Otten

>
>
> If you _can't_ do
>> that due to cloud restrictions, you'd actually be better off doing an
>> atomic swap.
>>
>> CREATE MATERIALIZED VIEW y AS ...;
>>
>> BEGIN;
>> ALTER MATERIALIZED VIEW x RENAME TO x_old;
>> ALTER MATERIALIZED VIEW y RENAME TO x;
>> DROP MATERIALIZED VIEW x_old;
>> COMMIT;
>>
>> This is an interesting idea.  Thanks!  I'll ponder that one.
>
>
I don't think the downstream dependencies will let that work without
rebuilding them as well.   The drop fails (without a cascade), and the
other views and matviews that are built off of this all simply point to
x_old.

Re: [PERFORM] partitioning materialized views

2017-07-06 Thread Rick Otten

On Thu, Jul 6, 2017 at 11:25 AM, Shaun Thomas 
wrote:

> > I'm curious if I'm overlooking other possible architectures or tools
> that might make this simpler to manage.
>
> One of the issues with materialized views is that they are based on
> views... For a concurrent update, it essentially performs a looped
> merge, which can be pretty ugly. That's the price you pay to be
> non-blocking. For this particular setup, I'd actually recommend using
> something like pglogical to just maintain a live copy of the remote
> table or wait for Postgres 10's logical replication.


Unfortunately the foreign database is Hadoop.  (As A Service)



> If you _can't_ do
> that due to cloud restrictions, you'd actually be better off doing an
> atomic swap.
>
> CREATE MATERIALIZED VIEW y AS ...;
>
> BEGIN;
> ALTER MATERIALIZED VIEW x RENAME TO x_old;
> ALTER MATERIALIZED VIEW y RENAME TO x;
> DROP MATERIALIZED VIEW x_old;
> COMMIT;
>
> This is an interesting idea.  Thanks!  I'll ponder that one.



> You could still follow your partitioned plan if you don't want to
> update all of the data at once. Let's face it, 3-4 hours is still a
> ton of data transfer and calculation.
>
>
yup.

Re: [PERFORM] partitioning materialized views

2017-07-06 Thread Shaun Thomas

> I'm curious if I'm overlooking other possible architectures or tools that 
> might make this simpler to manage.

One of the issues with materialized views is that they are based on
views... For a concurrent update, it essentially performs a looped
merge, which can be pretty ugly. That's the price you pay to be
non-blocking. For this particular setup, I'd actually recommend using
something like pglogical to just maintain a live copy of the remote
table or wait for Postgres 10's logical replication. If you _can't_ do
that due to cloud restrictions, you'd actually be better off doing an
atomic swap.

CREATE MATERIALIZED VIEW y AS ...;

BEGIN;
ALTER MATERIALIZED VIEW x RENAME TO x_old;
ALTER MATERIALIZED VIEW y RENAME TO x;
DROP MATERIALIZED VIEW x_old;
COMMIT;

You could still follow your partitioned plan if you don't want to
update all of the data at once. Let's face it, 3-4 hours is still a
ton of data transfer and calculation.

-- 
Shaun M Thomas - 2ndQuadrant
PostgreSQL Training, Services and Support
shaun.tho...@2ndquadrant.com | www.2ndQuadrant.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] partitioning materialized views

2017-07-06 Thread Rick Otten

I'm pondering approaches to partitioning large materialized views and was
hoping for some feedback and thoughts on it from the [perform] minds.

PostgreSQL 9.6.3 on Ubuntu 16.04 in the Google Cloud.

I have a foreign table with 250M or so rows and 50 or so columns, with a
UUID as the primary key.  Queries to the foreign table have high latency.
 (From several minutes to more than an hour to run)

If I create a materialized view of this FT, including indexes, it takes
about 3-4 hours.

If I refresh the materialized view concurrently, it takes 4-5 DAYS.

When I run "refresh materialized view concurrently", it takes about an hour
for it to download the 250M rows and load them onto the SSD tempspace.   At
that point we flatline a single core, and run I/O on the main tablespace up
pretty high, and then stay that way until the refresh is complete.

In order to speed up the concurrent refreshes, I have it broken into 4
materialized views, manually partitioned (by date) with a "union all view"
in front of them.  Refreshing the data which is changing regularly (new
data, in one of the partitions) doesn't require refreshing the entire data
set.  This works fairly well, and I can refresh the most recent partition
in 1 - 2 hours (daily).

However, sometimes I have to reach back in time and refresh the deeper
partitions.  This is taking 3 or more days to complete, even with the data
broken into 4 materialized views.  This approache lets me refresh all of
the partitions at the same time, which uses more cores at the same time
(and more tempspace),   [I'd like to use as much of my system resources as
possible to get the refresh to finish faster.]   Unfortunately I am finding
I need to refresh the deeper data more and more often (at least once per
week), and my table growth is going to jump from adding 3-5M rows per day
to adding 10-20M rows per day over the next month or two.  Waiting 3 or 4
days for the deeper data to be ready for consumption in PostgreSQL is no
longer acceptable to the business.

It doesn't look like partman supports partitioning materialized views.  It
also doesn't look like PG 10's new partitioning features will work with
materialized views (although I haven't tried it yet).   Citus DB also seems
to be unable to help in this scenario.

I could create new materialized views every time I need new data, and then
swap out the view that is in front of them.  There are other objects in the
database which have dependencies on that view. In my experiments so far,
"create and replace" seems to let me get away with this as long as the
columns don't change.

Alternatively, I could come up with a new partitioning scheme that lets me
more selectively run "refresh concurrently", and run more of those at the
same time.

I was leaning towards this latter solution.

Suppose I make a separate materialized view for each month of data.  At the
beginning of each month I would have to make a new materialized view, and
then add it into the "union all view" on the fly.

I would then need a "refresh all" script which refreshed as many of them
concurrently as I am willing to dedicate cores to.  And I need some handy
ways to selectively refresh specific months when I know data for a
particular month or set of months changed.

So, I actually have 2 of these 250M row tables in the Foreign Database,
that I want to do this with.  And maybe more coming soon?

I'm curious if I'm overlooking other possible architectures or tools that
might make this simpler to manage.


Similarly, could I construct the "union all view" in front of the
partitions to be partition aware so that the query planner doesn't try to
look in every one of the materialized views behind it to find the rows I
want?   If I go with the monthly partition, I'll start with about 36
materialized views behind the main view.

Re: [PERFORM] Partitioning and performance

2015-05-28 Thread Jim Nasby


On 5/28/15 9:31 AM, Ravi Krishna wrote:

explain select count(*) from tstesting.account where account_row_inst = 101 ;
Aggregate (cost=8.16..8.17 rows=1 width=0)
- Index Only Scan using account_pkey on account (cost=0.14..8.16
rows=1 width=0)
Index Cond: (account_row_inst = 101)


EXPLAIN only shows what the planner thinks a query will cost. For any 
real testing, you need EXPLAIN ANALYZE.


Also, understand that partitioning isn't a magic bullet. It can make 
some operations drastically faster, but it's not going to help every 
scenario, and will actually slow some other operations down.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Partitioning and performance

2015-05-28 Thread Ravi Krishna

I am testing partitioning of a large table. I am doing a range
partitioning based on a sequence col, which also acts as the primary
key. For inserts I am using a trigger which will redirect insert to
the right table based on the value of the primary key.

Based on my testing, I see that the insert speed is less than 10%
different than a non partitioned table. I am using  SET
constraint_exclusion = on and I checked that via ANALYZE that the
planner does not consider non qualifying child tables.

yet, selects and updates based on the primary key show anywhere from
40 to 200% slowness as compared to non partition. One thing I notice
is that, even with partition pruning, the planner scans the base table
and the table matching the condition. Is that the additional overhead.

I am attaching below the output of analyze.

===
On a non partitioned table

explain select count(*) from tstesting.account where account_row_inst = 101 ;
Aggregate (cost=8.16..8.17 rows=1 width=0)
- Index Only Scan using account_pkey on account (cost=0.14..8.16
rows=1 width=0)
Index Cond: (account_row_inst = 101)
(3 rows)


With partition pruning:

Aggregate (cost=8.45..8.46 rows=1 width=0)
- Append (cost=0.00..8.44 rows=2 width=0)
- Seq Scan on account (cost=0.00..0.00 rows=1 width=0)
Filter: (account_row_inst = 101)
- Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..8.44 rows=1 width=0)
Index Cond: (account_row_inst = 101)
(6 rows)

On a partitioned table, with no partition pruning.

explain analyze select count(*) from tstesting.account where
account_row_inst = 101 ;
Aggregate (cost=29.77..29.78 rows=1 width=0) (actual time=0.032..0.032
rows=1 loops=1)
- Append (cost=0.00..29.76 rows=5 width=0) (actual time=0.029..0.029
rows=0 loops=1)
- Seq Scan on account (cost=0.00..0.00 rows=1 width=0) (actual
time=0.000..0.000 rows=0 loops=1)
Filter: (account_row_inst = 101)
- Index Only Scan using account_part1_pkey on account_part1
(cost=0.42..4.44 rows=1 width=0) (actual time=0.008..0.008 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
- Index Only Scan using account_part2_pkey on account_part2
(cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
- Index Only Scan using account_part3_pkey on account_part3
(cost=0.42..8.44 rows=1 width=0) (actual time=0.007..0.007 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
- Index Only Scan using account_part4_pkey on account_part4
(cost=0.42..8.44 rows=1 width=0) (actual time=0.006..0.006 rows=0
loops=1)
Index Cond: (account_row_inst = 101)
Heap Fetches: 0
Planning time: 0.635 ms
Execution time: 0.137 ms
(18 rows)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Partitioning and performance

2015-05-28 Thread Ravi Krishna

On Thu, May 28, 2015 at 10:31 AM, Ravi Krishna sravikrish...@gmail.com wrote:
 I am testing partitioning of a large table. I am doing a range

Sorry I forgot to clarify. I am using INHERITS for partioning with
check constraing built for range partitioning.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] partitioning versus clustering

2012-11-19 Thread Andrew W. Gibbs

Postgres Performance Wizards,

I am a few years into developing and operating a system underpinned by
Postgres that sees the arrival a significant number of events around
the clock, not an epic amount of data, but enough to be challenging to
me, in particular when system downtime is not very palatable and the
data is retained quasi-indefinitely.

I have various tables that house different kinds of events, and in
addition to wanting to look at a small number of rows of data, users
often want to generate summary reports on large swaths of data that
span days or weeks.  At present, these reports can trigger index scans
that take minutes to service, and the parameters of the reports are
user specified, making their pre-generation infeasible.  Generally the
rows in these tables are write-once, but they contain a pointer to the
related BLOB from which they were constructed, and every now and again
some new field in the originating BLOB becomes of interest, causing me
to alter the table and then do a sweep of the full table with
corresponding updates, violating the otherwise INSERT-only nature.

These event tables generally have an event time column that is
indexed and which is an obvious candidate for either partitioning or
clustering of the table.  I'm trying to make sense of which is the
better option for me.

As best I can tell, the decision points are as follows...

PARTITIONING

Pros:

  * no outage; data just starts flowing into new partitions seamlessly
  * allows more control over where the data goes, creating retrieval 
parallelization opportunities
  * clustering cannot be inadvertently undone in a way that requires 
scheduled downtime to repair
  * probably more resilient in the case of the event time being different 
from the time that I processed the event

Cons:

  * does not deal with legacy data without extra migration (over time this 
becomes less relevant)
  * requires some kind of background process to manage partition creation
  * partition size will affect performance and choosing its size is not a 
science

CLUSTERING

Pros:

  * no particularly custom development work on my part
  * once done, it puts all existing data in a good state for efficient querying 
without extra work

Cons:

  * will lock up the system for the duration of the CLUSTER command
  * somehow need to make sure that ANALYZE commands run often enough
  * does not give me much control of the underlying storage layout
  * may have problems when the occasional mass-UPDATE is done
  * unclear whether a VACUUM FULL is required to prevent subsequent 
un-clustered-ness despite having a fill factor of 100, stemming from the 
mass-UPDATE operations
  * could generate a huge number of WAL segments to archive
  * could possibly be sabotaged by the event time property not being well 
correlated with the time that the event is processed in the face of upstream 
systems have momentary issues

As far as questions to the group go:

  * Is my understanding of the pros and cons of the options reasonably correct 
and comprehensive?
  * What has governed your decisions in making such a choice on past projects 
of your own?
  * If I go the clustering route, will the occasional mass update really mess 
with things, requiring a re-cluster and possibly even a full vacuum (to prevent 
re-un-clustering)?
  * Might it make more sense to cluster when the event time property is the 
time that I processed the event but partition when it is the time that the 
event occurred in some other system?
  * Is running a CLUSTER command actually necessary to get the performance 
benefits if the table ought already be in a good order, or is just running a 
CLUSTER command on a well ordered table enough to get query execution to yield 
nice sequential access to the disk?

Many thanks in advance for your insights...

  -- AWG


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning performance question

2012-06-10 Thread Robert Klemme

On Sat, Jun 9, 2012 at 7:58 PM, Kevin Kempter
cs_...@consistentstate.com wrote:
 Hi All;

 We have a client that has a table where large blobs (bytea) are stored. the
 table has a key column that is numbers (like 112362) but unfortunately it's
 a varchar column so the blobs are accessed via queries like:

 select * from bigtable where keycol = '217765'

 The primary reason we want to partition the table is for maintenance, the
 table is well over 1.2 Terabytes in size and they have never successfully
 vacuumed it. However I don't want to make performance even worse. The table
 does have a serial key, I'm thinking the best options will be to partition
 by range on the serial key, or maybe via the keycol character column via
 using an in statement on the check constraints, thus allowing the planner to
 actually leverage the above sql.  I suspect doing a typecast to integer in
 the check constraints will prove to be a bad idea if the keycol column
 remains a varchar.

 Thoughts?

 Here's the table:


                                        Table problemchild
  Column   |           Type           |                             Modifiers
 ---+--+
  keycol                | character varying        |
  blob_data           | bytea                    |
  removed_date    | timestamp with time zone |
  alt_key | bigint                   | not null default
 nextval('problemchild_alt_key_seq'::regclass)
 Indexes:
    pc_pkey PRIMARY KEY, btree (alt_key)
    key2 btree (keycol)

I find it odd that you have a column keycol which is not the PK and
your PK column is named alt_key.  Is keycol always the character
representation of alt_key?  Are they unrelated?

It would also help to know how the data in this table changes.  Do you
only ever add data?  Is some data removed from time to time (maybe
based on the removed_date)?

If the table grows continually then range partitioning sounds good.
However, I think you should really make keycol a number type because
otherwise range partitioning will be a pain (you would need to include
the length of the string in the criterion if you want your varchar
ranges to mimic number ranges).

However, if you are deleting from time to time and hence the table
does not grow in the long run then hash partitioning might be a better
idea because then you do not need to create new partitions all the
time.  Example on alt_key

create table problemchild (
  keycol varchar(100),
  blob_data bytea,
  removed_date timestamp with time zone,
  alt_key bigint primary key
);
create table problemchild_00 (
  check ( alt_key % 16 = 0 )
) inherits (problemchild);
create table problemchild_01 (
  check ( alt_key % 16 = 1 )
) inherits (problemchild);
create table problemchild_02 (
  check ( alt_key % 16 = 2 )
) inherits (problemchild);
...

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] partitioning performance question

2012-06-09 Thread Kevin Kempter


Hi All;

We have a client that has a table where large blobs (bytea) are stored. 
the table has a key column that is numbers (like 112362) but 
unfortunately it's a varchar column so the blobs are accessed via 
queries like:


select * from bigtable where keycol = '217765'

The primary reason we want to partition the table is for maintenance, 
the table is well over 1.2 Terabytes in size and they have never 
successfully vacuumed it. However I don't want to make performance even 
worse. The table does have a serial key, I'm thinking the best options 
will be to partition by range on the serial key, or maybe via the keycol 
character column via using an in statement on the check constraints, 
thus allowing the planner to actually leverage the above sql.  I suspect 
doing a typecast to integer in the check constraints will prove to be a 
bad idea if the keycol column remains a varchar.


Thoughts?

Here's the table:


Table problemchild
  Column   |   Type   | 
Modifiers
---+--+ 


 keycol| character varying|
 blob_data   | bytea|
 removed_date| timestamp with time zone |
 alt_key | bigint   | not null default 
nextval('problemchild_alt_key_seq'::regclass)

Indexes:
pc_pkey PRIMARY KEY, btree (alt_key)
key2 btree (keycol)



Thanks in advance




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Partitioning / Strange optimizer behaviour

2012-03-05 Thread Marc Schablewski

We have an optimizer problem regarding partitioned tables on 8.4.11.

We started partitioning a large table containing approx. 1 billion records.

So far, there is only the master table, called edifactmsgpart (which is empty) 
and 1 partition,
called edifactmsgpart_pact.
There is a bigint column called emg_id with a btree-index on it.

\d edifactmsgpart_pact
...
...emp_emg_ept_i_pact btree (emg_id, ept_id)
...

gdw= select relname, reltuples from pg_class where relname in( 
'edifactmsgpart',
'edifactmsgpart_pact' );
   relname   |  reltuples
-+-
 edifactmsgpart_pact | 1.03102e+09
 edifactmsgpart  |   0


a select on the big partition yields a decent plan and performs as expected, 
lasting only a fraction
of a second.

gdw= explain select min( emg_id ) from edifactmsgpart_pact;
  QUERY PLAN
--
 Result  (cost=2.05..2.06 rows=1 width=0)
   InitPlan 1 (returns $0)
 -  Limit  (cost=0.00..2.05 rows=1 width=8)
   -  Index Scan using emp_emg_ept_i_pact on edifactmsgpart_pact  
(cost=0.00..2109171123.79
rows=1031020672 width=8)
 Filter: (emg_id IS NOT NULL)


gdw= select min( emg_id ) from edifactmsgpart_pact;
min
---
 58178

= very fast.


a select on the partitioned table, however, yields a... shall we call it 
sub-optimal plan:

gdw= explain select min( emg_id ) from edifactmsgpart;
  QUERY PLAN
--
 Aggregate  (cost=23521692.03..23521692.04 rows=1 width=8)
   -  Append  (cost=0.00..20944139.42 rows=1031021042 width=8)
 -  Seq Scan on edifactmsgpart  (cost=0.00..13.70 rows=370 width=8)
 -  Seq Scan on edifactmsgpart_pact edifactmsgpart  
(cost=0.00..20944125.72 rows=1031020672
width=8)

I would expect this to run half an hour or so, completely overloading the 
server...

Any Ideas?

Kind regards
Marc


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Partitioning / Strange optimizer behaviour

2012-03-05 Thread Tomas Vondra

On 5 Březen 2012, 16:11, Marc Schablewski wrote:
 We have an optimizer problem regarding partitioned tables on 8.4.11.
...
 gdw= explain select min( emg_id ) from edifactmsgpart;
   QUERY PLAN
 --
  Aggregate  (cost=23521692.03..23521692.04 rows=1 width=8)
-  Append  (cost=0.00..20944139.42 rows=1031021042 width=8)
  -  Seq Scan on edifactmsgpart  (cost=0.00..13.70 rows=370
 width=8)
  -  Seq Scan on edifactmsgpart_pact edifactmsgpart
 (cost=0.00..20944125.72 rows=1031020672
 width=8)

 I would expect this to run half an hour or so, completely overloading the
 server...

 Any Ideas?

This is a well known feature of pre-9.1 releases - it simply does not
handle min/max on partitioned tables well. There's even an example of a
workaround on the wiki:
https://wiki.postgresql.org/wiki/Efficient_min/max_over_partitioned_table

Another option is to upgrade to 9.1 which handles this fine.

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Partitioning / Strange optimizer behaviour

2012-03-05 Thread Marc Schablewski

Thanks for pointing me to that article. I totally forgot that the postgres wiki 
existed.

Updating is not an option at the moment, but we'll probably do so in the 
future. Until then I can
live with the workaround.

Kind regards,
Marc


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Partitioning by status?

2012-01-23 Thread alexandre - aldeia digital


Em 13-01-2012 17:05, Josh Berkus escreveu:

On 1/13/12 2:44 AM, alexandre - aldeia digital wrote:


Also, (2) only really works if you're going to obsolesce (remove)
archive records after a certain period of time.  Otherwise the
sub-partitioning hurts performance.



Is there any moves to include the easy table partitioning in the 9.2
version ?


Nobody has been submitting patches.



I'm sorry hear this. Table partitioning is a very good helper in a large 
number of performance issues. If there was a bounty to help anyone to 
make this, I would be a happy contributor. :)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Partitioning by status?

2012-01-13 Thread alexandre - aldeia digital



Also, (2) only really works if you're going to obsolesce (remove)
archive records after a certain period of time.  Otherwise the
sub-partitioning hurts performance.



Is there any moves to include the easy table partitioning in the 9.2 
version ?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Partitioning by status?

2012-01-13 Thread Josh Berkus

On 1/13/12 2:44 AM, alexandre - aldeia digital wrote:

 Also, (2) only really works if you're going to obsolesce (remove)
 archive records after a certain period of time.  Otherwise the
 sub-partitioning hurts performance.

 
 Is there any moves to include the easy table partitioning in the 9.2
 version ?

Nobody has been submitting patches.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Partitioning by status?

2012-01-12 Thread Josh Berkus

Mike,

 Is it practical to partition on the status column and, eg, use triggers to
 move a row between the two partitions when status is updated?  Any
 surprises to watch for, given the status column is actually NULL for active
 data and contains a value when archived?

When I've done this before, I've had a setup like the following:

1. One active partition

2. Multiple archive partitions, also partitioned by time (month or year)

3. stored procedure for archiving a record or records.

I'd recommend against triggers because they'll be extremely inefficient
if you need to archive a large number of rows at once.

Also, (2) only really works if you're going to obsolesce (remove)
archive records after a certain period of time.  Otherwise the
sub-partitioning hurts performance.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Partitioning by status?

2012-01-10 Thread Mike Blackwell

We have a set of large tables.  One of the columns is a status indicator
(active / archived).  The queries against these tables almost always
include the status, so partitioning against that seems to makes sense from
a logical standpoint, especially given most of the data is archived and
most of the processes want active records.

Is it practical to partition on the status column and, eg, use triggers to
move a row between the two partitions when status is updated?  Any
surprises to watch for, given the status column is actually NULL for active
data and contains a value when archived?

Mike

Re: [PERFORM] Partitioning by status?

2012-01-10 Thread Andreas Kretschmer

Mike Blackwell mike.blackw...@rrd.com wrote:

 We have a set of large tables.  One of the columns is a status indicator
 (active / archived).  The queries against these tables almost always include
 the status, so partitioning against that seems to makes sense from a logical
 standpoint, especially given most of the data is archived and most of the
 processes want active records.
 
 Is it practical to partition on the status column and, eg, use triggers to 
 move
 a row between the two partitions when status is updated?  Any surprises to
 watch for, given the status column is actually NULL for active data and
 contains a value when archived?

If i where you, i would try a partial index where status is null. But
yes, partitioning is an other option, depends on your workload.


Andreas
-- 
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.  (Linus Torvalds)
If I was god, I would recompile penguin with --enable-fly.   (unknown)
Kaufbach, Saxony, Germany, Europe.  N 51.05082°, E 13.56889°

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-29 Thread Igor Neyman

 -Original Message-
 From: Ben [mailto:midfi...@gmail.com] 
 Sent: Thursday, October 28, 2010 12:37 PM
 To: pgsql-performance@postgresql.org
 Subject: partitioning question 1

 hello --

 my last email was apparently too long to respond to so i'll 
 split it up into shorter pieces.  my first question :

 my understanding of how range partitioning and constraint 
 exclusion works leads me to believe that it does not buy any 
 query performance that a clustered index doesn't already give 
 you -- the advantages are all in maintainability.  an index 
 is able to eliminate pages just as well as constraint 
 exclusion is able to eliminate table partitions.  the I/O 
 advantages of having queries target small subtables are the 
 same as the I/O advantages of clustering the index : result 
 pages in a small range are very close to each other on disk.

 finally, since constraint exclusion isn't as flexible as 
 indexing (i've seen old mailing list posts that say that 
 constraint exclusion only works with static constants in 
 where clauses, and only works with simple operators like ,  
 which basically forces btree indexes when i want to use gist) 
 it is indeed likely that partitioning can be slower than one 
 big table with a clustered index.

 is my intuition completely off on this?

 best regards, ben

If your SELECT retrieves substantial amount of records, table scan could
be more efficient than index access.

Now, if while retrieving large amount of records WHERE clause of this
SELECT still satisfies constraints on some partition(s), then obviously
one (or few) partition scans will be more efficient than full table scan
of non-partitioned table.

So, yes partitioning provides performance improvements, not only
maintenance convenience.

Regards,
Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-29 Thread Ben

On Oct 29, 2010, at 7:38 AM, Igor Neyman wrote:

 is my intuition completely off on this?
 
 best regards, ben
 
 
 If your SELECT retrieves substantial amount of records, table scan could
 be more efficient than index access.
 
 Now, if while retrieving large amount of records WHERE clause of this
 SELECT still satisfies constraints on some partition(s), then obviously
 one (or few) partition scans will be more efficient than full table scan
 of non-partitioned table.
 
 So, yes partitioning provides performance improvements, not only
 maintenance convenience.

my impression was that a *clustered* index would give a lot of the same I/O 
benefits, in a more flexible way.  if you're clustered on the column in 
question, then an index scan for a range is much like a sequential scan over a 
partition (as far as i understand.)

b
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-29 Thread Igor Neyman

 -Original Message-
 From: Ben [mailto:midfi...@gmail.com] 
 Sent: Friday, October 29, 2010 12:16 PM
 To: Igor Neyman
 Cc: pgsql-performance@postgresql.org
 Subject: Re: partitioning question 1

 On Oct 29, 2010, at 7:38 AM, Igor Neyman wrote:

  is my intuition completely off on this?

  best regards, ben

  If your SELECT retrieves substantial amount of records, table scan 
  could be more efficient than index access.

  Now, if while retrieving large amount of records WHERE clause of 
  this SELECT still satisfies constraints on some partition(s), then 
  obviously one (or few) partition scans will be more efficient than 
  full table scan of non-partitioned table.

  So, yes partitioning provides performance improvements, not only 
  maintenance convenience.

 my impression was that a *clustered* index would give a lot 
 of the same I/O benefits, in a more flexible way.  if you're 
 clustered on the column in question, then an index scan for a 
 range is much like a sequential scan over a partition (as far 
 as i understand.)

 b

Even with clustered index you still read index+table, which is more
expensive than just table scan (in situation I described above).
PG clustered index is not the same as SQL Server clustered index (which
includes actual table pages on the leaf level).

Igor Neyman

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] partitioning question 1

2010-10-28 Thread Ben

hello --

my last email was apparently too long to respond to so i'll split it up into 
shorter pieces.  my first question :

my understanding of how range partitioning and constraint exclusion works leads 
me to believe that it does not buy any query performance that a clustered index 
doesn't already give you -- the advantages are all in maintainability.  an 
index is able to eliminate pages just as well as constraint exclusion is able 
to eliminate table partitions.  the I/O advantages of having queries target 
small subtables are the same as the I/O advantages of clustering the index : 
result pages in a small range are very close to each other on disk.

finally, since constraint exclusion isn't as flexible as indexing (i've seen 
old mailing list posts that say that constraint exclusion only works with 
static constants in where clauses, and only works with simple operators like , 
 which basically forces btree indexes when i want to use gist) it is indeed 
likely that partitioning can be slower than one big table with a clustered 
index.

is my intuition completely off on this?

best regards, ben
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-28 Thread Joshua D. Drake

On Thu, 2010-10-28 at 09:36 -0700, Ben wrote:
 hello --
 
 my last email was apparently too long to respond to so i'll split it up into 
 shorter pieces.  my first question :
 
 my understanding of how range partitioning and constraint exclusion works 
 leads me to believe that it does not buy any query performance that a 
 clustered index doesn't already give you -- the advantages are all in 
 maintainability.  an index is able to eliminate pages just as well as 
 constraint exclusion is able to eliminate table partitions.  the I/O 
 advantages of having queries target small subtables are the same as the I/O 
 advantages of clustering the index : result pages in a small range are very 
 close to each other on disk.

Not entirely true. One a clustered index will not stay clustered if you
are still updating data that is in the partition. You shouldn't
underestimate the benefit of smaller relations in terms of maintenance
either.

 
 finally, since constraint exclusion isn't as flexible as indexing (i've seen 
 old mailing list posts that say that constraint exclusion only works with 
 static constants in where clauses, and only works with simple operators like 
 ,  which basically forces btree indexes when i want to use gist) it is 
 indeed likely that partitioning can be slower than one big table with a 
 clustered index.

Yes the constraints have to be static. Not sure about the operator
question honestly.


 is my intuition completely off on this?

You may actually want to look into expression indexes, not clustered
ones.

Sincerely,

Joshua D. Drake

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-28 Thread Ben

thanks for the prompt response.  some comments / questions below :

On Oct 28, 2010, at 10:31 AM, Joshua D. Drake wrote:
 ...constraint exclusion is able to eliminate table partitions.  the I/O 
 advantages of having queries target small subtables are the same as the I/O 
 advantages of clustering the index : result pages in a small range are very 
 close to each other on disk.
 
 Not entirely true. One a clustered index will not stay clustered if you
 are still updating data that is in the partition. You shouldn't
 underestimate the benefit of smaller relations in terms of maintenance
 either.

in my situation, the update come in-order (it is timeseries data and the 
clustered index is on time.)  so the table should remain relatively clustered.  
updates also happen relatively infrequently (once a day in one batch.)  so it 
appears that we will continue to get the I/O benefits described above.

are there any other benefits which partitioning provides for query performance 
(as opposed to update performance) besides the ones which i have mentioned?


 Yes the constraints have to be static. Not sure about the operator
 question honestly.

this seems to severely restrict their usefulness -- our queries are data 
warehouse analytical -type  queries, so the constraints are usually data-driven 
(come from joining against other tables.)

 is my intuition completely off on this?
 
 You may actually want to look into expression indexes, not clustered
 ones.


what would expression indexes give me?

thanks and best regards, ben


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-28 Thread Joshua D. Drake

On Thu, 2010-10-28 at 11:44 -0700, Ben wrote:

  Yes the constraints have to be static. Not sure about the operator
  question honestly.
 
 this seems to severely restrict their usefulness -- our queries are data 
 warehouse analytical -type  queries, so the constraints are usually 
 data-driven (come from joining against other tables.)

Well it does and it doesn't. Keep in mind that the constraint can be:

date = '2010-10-01 and date = '2010-10-31'

What it can't be is something that contains date_part() or extract() (as
an example) 

 
  is my intuition completely off on this?
  
  You may actually want to look into expression indexes, not clustered
  ones.

Take a look at the docs:

http://www.postgresql.org/docs/8.4/interactive/indexes-expressional.html

It could be considered partitioning without breaking up the table,
just the indexes.

Sincerely,

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-28 Thread Ben


On Oct 28, 2010, at 11:50 AM, Joshua D. Drake wrote:
 Yes the constraints have to be static. Not sure about the operator
 question honestly.
 
 this seems to severely restrict their usefulness -- our queries are data 
 warehouse analytical -type  queries, so the constraints are usually 
 data-driven (come from joining against other tables.)
 
 Well it does and it doesn't. Keep in mind that the constraint can be:
 
 date = '2010-10-01 and date = '2010-10-31'
 
 What it can't be is something that contains date_part() or extract() (as
 an example) 

i think we are talking about two different things here: the constraints on the 
table, and the where-clause constraints in a query which may or may not trigger 
constraint exclusion.  i understand that table constraints have to be constants 
-- it doesn't make much sense otherwise.  what i am wondering about is, will 
constraint exclusion be triggered for queries where the column that is being 
partitioned on is being constrained things that are not static constants, for 
instance, in a join.  (i'm pretty sure the answer is no, because i think 
constraint exclusion happens before real query planning.)  a concrete example :

create table foo (i integer not null, j float not null);
create table foo_1 (check ( i = 0 and i  10) ) inherits (foo);
create table foo_2 (check ( i = 10 and i  20) ) inherits (foo);
create table foo_3 (check ( i = 20 and i  30) ) inherits (foo);
etc..

create table bar (i integer not null, k float not null);

my understanding is that a query like

select * from foo, bar using (i);

can't use constraint exclusion, even if the histogram of i-values on table bar 
says they only live in the range 0-9, and so the query will touch all of the 
tables.  i think this is not favorable compared to a single foo table with a 
well-maintained btree index on i.

 is my intuition completely off on this?
 
 You may actually want to look into expression indexes, not clustered
 ones.
 
 Take a look at the docs:
 
 http://www.postgresql.org/docs/8.4/interactive/indexes-expressional.html
 
 It could be considered partitioning without breaking up the table,
 just the indexes.

do you mean partial indexes?  i have to confess to not understanding how this 
is relevant -- how could partial indexes give any advantage over a full 
clustered index?

b 
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-28 Thread Joshua D. Drake

On Thu, 2010-10-28 at 12:25 -0700, Ben wrote:

 i think we are talking about two different things here: the constraints on 
 the table, and the where-clause constraints in a query which may or may not 
 trigger constraint exclusion.  i understand that table constraints have to be 
 constants -- it doesn't make much sense otherwise.  what i am wondering about 
 is, will constraint exclusion be triggered for queries where the column that 
 is being partitioned on is being constrained things that are not static 
 constants, for instance, in a join.  (i'm pretty sure the answer is no, 
 because i think constraint exclusion happens before real query planning.)  a 
 concrete example :
 
 create table foo (i integer not null, j float not null);
 create table foo_1 (check ( i = 0 and i  10) ) inherits (foo);
 create table foo_2 (check ( i = 10 and i  20) ) inherits (foo);
 create table foo_3 (check ( i = 20 and i  30) ) inherits (foo);
 etc..
 
 create table bar (i integer not null, k float not null);
 
 my understanding is that a query like
 
 select * from foo, bar using (i);
 
 can't use constraint exclusion, even if the histogram of i-values on table 
 bar says they only live in the range 0-9, and so the query will touch all of 
 the tables.  i think this is not favorable compared to a single foo table 
 with a well-maintained btree index on i.
 

My tests show you are incorrect:


part_test=# explain analyze select * from foo join bar using (i) where
i=9;
QUERY
PLAN
--
 Nested Loop  (cost=34.26..106.76 rows=200 width=20) (actual
time=0.004..0.004 rows=0 loops=1)
   -  Append  (cost=0.00..68.50 rows=20 width=12) (actual
time=0.004..0.004 rows=0 loops=1)
 -  Seq Scan on foo  (cost=0.00..34.25 rows=10 width=12)
(actual time=0.001..0.001 rows=0 loops=1)
   Filter: (i = 9)
 -  Seq Scan on foo_1 foo  (cost=0.00..34.25 rows=10 width=12)
(actual time=0.000..0.000 rows=0 loops=1)
   Filter: (i = 9)
   -  Materialize  (cost=34.26..34.36 rows=10 width=12) (never
executed)
 -  Seq Scan on bar  (cost=0.00..34.25 rows=10 width=12) (never
executed)
   Filter: (i = 9)
 Total runtime: 0.032 ms
(10 rows)



-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-28 Thread Ben


On Oct 28, 2010, at 12:44 PM, Joshua D. Drake wrote:
 
 My tests show you are incorrect:
 
 
 part_test=# explain analyze select * from foo join bar using (i) where
 i=9;
QUERY
 PLAN
 --
 Nested Loop  (cost=34.26..106.76 rows=200 width=20) (actual
 time=0.004..0.004 rows=0 loops=1)
   -  Append  (cost=0.00..68.50 rows=20 width=12) (actual
 time=0.004..0.004 rows=0 loops=1)
 -  Seq Scan on foo  (cost=0.00..34.25 rows=10 width=12)
 (actual time=0.001..0.001 rows=0 loops=1)
   Filter: (i = 9)
 -  Seq Scan on foo_1 foo  (cost=0.00..34.25 rows=10 width=12)
 (actual time=0.000..0.000 rows=0 loops=1)
   Filter: (i = 9)
   -  Materialize  (cost=34.26..34.36 rows=10 width=12) (never
 executed)
 -  Seq Scan on bar  (cost=0.00..34.25 rows=10 width=12) (never
 executed)
   Filter: (i = 9)
 Total runtime: 0.032 ms
 (10 rows)

strange.  my tests don't agree with your tests :

create table foo (i integer not null, j float not null);
create table foo_1 ( check (i = 0 and i  10) ) inherits (foo);
create table foo_2 ( check (i = 10 and i  20) ) inherits (foo);
create table foo_3 ( check (i = 20 and i  30) ) inherits (foo);
create index foo_1_idx on foo_1 (i);
create index foo_2_idx on foo_2 (i);
create index foo_3_idx on foo_3 (i);
insert into foo_1 select generate_series, generate_series from 
generate_series(0,9);
insert into foo_2 select generate_series, generate_series from 
generate_series(10,19);
insert into foo_3 select generate_series, generate_series from 
generate_series(20,29);
create table bar (i integer not null, k float not null);
create index bar_idx on bar (i);
insert into bar select generate_series, -generate_series from 
generate_series(0,9);
vacuum analyze;
explain analyze select * from foo join bar using (i);

QUERY PLAN  
 
--
 Hash Join  (cost=1.23..42.29 rows=98 width=20) (actual time=0.056..0.118 
rows=10 loops=1)
   Hash Cond: (public.foo.i = bar.i)
   -  Append  (cost=0.00..32.70 rows=1970 width=12) (actual time=0.008..0.043 
rows=30 loops=1)
 -  Seq Scan on foo  (cost=0.00..29.40 rows=1940 width=12) (actual 
time=0.001..0.001 rows=0 loops=1)
 -  Seq Scan on foo_1 foo  (cost=0.00..1.10 rows=10 width=12) (actual 
time=0.005..0.008 rows=10 loops=1)
 -  Seq Scan on foo_2 foo  (cost=0.00..1.10 rows=10 width=12) (actual 
time=0.003..0.006 rows=10 loops=1)
 -  Seq Scan on foo_3 foo  (cost=0.00..1.10 rows=10 width=12) (actual 
time=0.003..0.006 rows=10 loops=1)
   -  Hash  (cost=1.10..1.10 rows=10 width=12) (actual time=0.025..0.025 
rows=10 loops=1)
 -  Seq Scan on bar  (cost=0.00..1.10 rows=10 width=12) (actual 
time=0.005..0.013 rows=10 loops=1)
 Total runtime: 0.205 ms
(10 rows)


i'm running pg 8.4.3 with constraint_exclusion=on (just to be safe.)

best, b
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-28 Thread Reid Thompson

On Thu, 2010-10-28 at 12:59 -0700, Ben wrote:
 explain analyze select * from foo join bar using (i);
vs
explain analyze select * from foo join bar using (i) where i=9;

Re: [PERFORM] partitioning question 1

2010-10-28 Thread Joshua D. Drake

On Thu, 2010-10-28 at 12:59 -0700, Ben wrote:
 On Oct 28, 2010, at 12:44 PM, Joshua D. Drake wrote:
  
  My tests show you are incorrect:
  
  
  part_test=# explain analyze select * from foo join bar using (i) where
  i=9;
 QUERY
  PLAN
  --
  Nested Loop  (cost=34.26..106.76 rows=200 width=20) (actual
  time=0.004..0.004 rows=0 loops=1)
-  Append  (cost=0.00..68.50 rows=20 width=12) (actual
  time=0.004..0.004 rows=0 loops=1)
  -  Seq Scan on foo  (cost=0.00..34.25 rows=10 width=12)
  (actual time=0.001..0.001 rows=0 loops=1)
Filter: (i = 9)
  -  Seq Scan on foo_1 foo  (cost=0.00..34.25 rows=10 width=12)
  (actual time=0.000..0.000 rows=0 loops=1)
Filter: (i = 9)
-  Materialize  (cost=34.26..34.36 rows=10 width=12) (never
  executed)
  -  Seq Scan on bar  (cost=0.00..34.25 rows=10 width=12) (never
  executed)
Filter: (i = 9)
  Total runtime: 0.032 ms
  (10 rows)
 
 strange.  my tests don't agree with your tests :

Do you have constraint_exclusion turned on? You should verify with show
constraint_exclusion (I saw what you wrote below).

JD

P.S. Blatant plug, you coming to http://www.postgresqlconference.org ?


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning question 1

2010-10-28 Thread Ben

whoops, didn't see the i=9 (linebreak! linebreak!)

nonetheless that is a static constant constraint on the column i, and i was 
asking if constraint exclusions would work for dynamic constraints (like those 
derived from a table joined against.)  so for example the bar table has only 
0-9 in its histogram for i, but constraint exclusion can't use that to 
eliminate tables foo_2 and foo_3.  this is precisely the kind of information an 
index can use via join selectivity.

i am not going to the pg conference, sorry to say.

b


On Oct 28, 2010, at 1:48 PM, Joshua D. Drake wrote:

 On Thu, 2010-10-28 at 12:59 -0700, Ben wrote:
 On Oct 28, 2010, at 12:44 PM, Joshua D. Drake wrote:
 
 My tests show you are incorrect:
 
 
 part_test=# explain analyze select * from foo join bar using (i) where
 i=9;
   QUERY
 PLAN
 --
 Nested Loop  (cost=34.26..106.76 rows=200 width=20) (actual
 time=0.004..0.004 rows=0 loops=1)
  -  Append  (cost=0.00..68.50 rows=20 width=12) (actual
 time=0.004..0.004 rows=0 loops=1)
-  Seq Scan on foo  (cost=0.00..34.25 rows=10 width=12)
 (actual time=0.001..0.001 rows=0 loops=1)
  Filter: (i = 9)
-  Seq Scan on foo_1 foo  (cost=0.00..34.25 rows=10 width=12)
 (actual time=0.000..0.000 rows=0 loops=1)
  Filter: (i = 9)
  -  Materialize  (cost=34.26..34.36 rows=10 width=12) (never
 executed)
-  Seq Scan on bar  (cost=0.00..34.25 rows=10 width=12) (never
 executed)
  Filter: (i = 9)
 Total runtime: 0.032 ms
 (10 rows)
 
 strange.  my tests don't agree with your tests :
 
 Do you have constraint_exclusion turned on? You should verify with show
 constraint_exclusion (I saw what you wrote below).
 
 JD
 
 P.S. Blatant plug, you coming to http://www.postgresqlconference.org ?
 
 
 -- 
 PostgreSQL.org Major Contributor
 Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
 Consulting, Training, Support, Custom Development, Engineering
 http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning max() sql not using index

2009-09-09 Thread Heikki Linnakangas

Kevin Kempter wrote:
 Hi all I have a large table (2billion rows) that's partitioned by date based 
 on an epoch int value.  We're running a select max(id) where id is the PK. I 
 have a PK index on each of the partitions, no indexes at all on the base 
 table.
 
 If I hit a partition table directly I get an index scan as expected:

The planner isn't smart enough to create the plan you're expecting.
There was discussion and even a patch posted recently about that:

http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php

It seems the thread petered out, but the concept seems sane.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning max() sql not using index

2009-09-09 Thread Kenneth Cox

In case you aren't comfortable running unreleased planner patches from  
pgsql-hackers, a workaround was discussed on this list recently:


http://archives.postgresql.org/pgsql-performance/2009-09/msg00036.php

On Wed, 09 Sep 2009 06:05:22 -0400, Heikki Linnakangas  
heikki.linnakan...@enterprisedb.com wrote:



Kevin Kempter wrote:
Hi all I have a large table (2billion rows) that's partitioned by date  
based
on an epoch int value.  We're running a select max(id) where id is the  
PK. I

have a PK index on each of the partitions, no indexes at all on the base
table.

If I hit a partition table directly I get an index scan as expected:


The planner isn't smart enough to create the plan you're expecting.
There was discussion and even a patch posted recently about that:

http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php

It seems the thread petered out, but the concept seems sane.





--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] partitioning max() sql not using index

2009-09-09 Thread Kevin Kempter

On Wednesday 09 September 2009 07:56:53 Kenneth Cox wrote:
 In case you aren't comfortable running unreleased planner patches from
 pgsql-hackers, a workaround was discussed on this list recently:

 http://archives.postgresql.org/pgsql-performance/2009-09/msg00036.php

 On Wed, 09 Sep 2009 06:05:22 -0400, Heikki Linnakangas

 heikki.linnakan...@enterprisedb.com wrote:
  Kevin Kempter wrote:
  Hi all I have a large table (2billion rows) that's partitioned by date
  based
  on an epoch int value.  We're running a select max(id) where id is the
  PK. I
  have a PK index on each of the partitions, no indexes at all on the base
  table.
 
  If I hit a partition table directly I get an index scan as expected:
 
  The planner isn't smart enough to create the plan you're expecting.
  There was discussion and even a patch posted recently about that:
 
  http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php
 
  It seems the thread petered out, but the concept seems sane.

Excellent! thanks this is quite helpful

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] partitioning max() sql not using index

2009-09-08 Thread Kevin Kempter

Hi all I have a large table (2billion rows) that's partitioned by date based 
on an epoch int value.  We're running a select max(id) where id is the PK. I 
have a PK index on each of the partitions, no indexes at all on the base 
table.

If I hit a partition table directly I get an index scan as expected:

explain select max(id) from pwreport.bigtab_2009_09;
QUERY PLAN
--
 Result  (cost=0.06..0.07 rows=1 width=0)
   InitPlan
 -  Limit  (cost=0.00..0.06 rows=1 width=8)
   -  Index Scan Backward using bigtab_2009_09_pk on bigtab_2009_09  
(cost=0.00..12403809.95 rows=205659919 width=8)
 Filter: (id IS NOT NULL)
(5 rows)


However if I hit the base table I get a sequential scan on every partition as 
opposed to index scans:
explain select max(id) from pwreport.bigtab;

 QUERY PLAN 

  


  
 Aggregate  (cost=27214318.67..27214318.68 rows=1 width=8)  
   
   -  Append  (cost=0.00..24477298.53 rows=1094808053 width=8) 
   
 -  Seq Scan on bigtab  (cost=0.00..11.70 rows=170 width=8)
   
 -  Seq Scan on bigtab_2011_12 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_11 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_10 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_09 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_08 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_07 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_06 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_05 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_04 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_03 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_02 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2011_01 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_12 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_11 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_10 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_09 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_08 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_07 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_06 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_05 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_04 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_03 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_02 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2010_01 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2009_12 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2009_11 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2009_10 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2009_09 bigtab  (cost=0.00..4599227.19 
rows=205659919 width=8) 
 -  Seq Scan on bigtab_2009_07 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2009_06 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2009_05 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2009_04 bigtab  (cost=0.00..11.70 rows=170 
width=8)
 -  Seq Scan on bigtab_2009_03 bigtab  (cost=0.00..11.70 rows=170

Re: [PERFORM] Partitioning: INSERT 0 0 but want INSERT 0 1

2008-05-12 Thread Neil Peter Braggio

I have the same problem in PG 8.2

To resolve this issue I had to create a new table with the same
structure than the partitioned table with a trigger for insert and
update. All the operations the application have to do are directed to
this new table.

When a new record is inserted in the new table the trigger insert a
new record with the same values into the partitioned table and then
delete all records from this new table. In updates operations the
trigger redirect the operation to the partitioned table too.

With this _not elegant_ solution our Java application is able to do its job.

If you find a better solution please let me know.


Neil Peter Braggio
[EMAIL PROTECTED]


On Tue, May 13, 2008 at 11:48 AM, Nikolas Everett [EMAIL PROTECTED] wrote:
 I figure this subject belongs on the performance mailing list because it is
 about partitioning, which is a performance issue.

 I'm working on partitioning some of the tables used by an application that
 uses OpenJPA.  It turns out that OpenJPA is sensitive to the numbers
 returned when you do an insert.  So I put together a test and attached it.
 My postgres version is 8.3.1 compiled from source.

 My problem is that this:
 test= INSERT INTO ttt (a, b) VALUES ('5-5-08', 'test11212');
 INSERT 0 0
 Time: 21.646 ms
 needs to show:
 INSERT 0 1

 or OpenJPA will not accept it.  The insert works, but OpenJPA does not
 believe it and aborts the current transaction.

 Is it possible to have partitioning and have insert show the right number of
 rows inserted?

 Thanks,

 --Nik


  --
  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Partitioning in postgres - basic question

2007-10-04 Thread Tore Lukashaugen

Hi,

I am new to postgres having worked with Oracle in the past. I am interested 
in understanding Postgres's table partition functionality better. 
Specifically, I have a third party application running against my postgres 
database, but the database is becoming rather large to maintain. I am 
thinking about partitioning the biggest table.

Would I be able to set-up partitioning on this table with it being seemless 
to the third party app (assuming that it performs pretty standard DML 
statements against the table in question)?

Thanks
Tore 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Partitioning in postgres - basic question

2007-10-04 Thread Chris


Tore Lukashaugen wrote:

Hi,

I am new to postgres having worked with Oracle in the past. I am interested 
in understanding Postgres's table partition functionality better. 
Specifically, I have a third party application running against my postgres 
database, but the database is becoming rather large to maintain. I am 
thinking about partitioning the biggest table.


Would I be able to set-up partitioning on this table with it being seemless 
to the third party app (assuming that it performs pretty standard DML 
statements against the table in question)?


http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html#DDL-PARTITIONING-IMPLEMENTATION

The examples use rules but some on the list have said triggers work 
better if you have a lot of partitions.


--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Partitioning

2007-01-30 Thread Rigmor Ukuhe


Abu Mushayeed wrote:
I have partitioned a table based on period (e.g., cdate = 
'2007-01-01'::date and cdate=.2007-03-31':;date).


Now, I am issuing query like cdate = CURRENT_DATE - 1 and cdate = 
CURRENT_DATE, it scans all the partitions. But if I do cdate = 
'2007-01-01'::date and cdate=.2007-03-31'::date it picks the correct 
partition. Also if I join the cdate field with another table, it does 
not pick the correct partition.


I would like to know if it is possible to pick the correct partition 
using the above example.


from http://www.postgresql.org/docs/8.1/interactive/ddl-partitioning.html

...For the same reason, stable functions such as CURRENT_DATE must be 
avoided. Joining the partition key to a column of another table will not be 
optimized, either



Rigmor



Thanks
Abu


Need Mail bonding?
Go to the Yahoo! Mail QA 
http://answers.yahoo.com/dir/index;_ylc=X3oDMTFvbGNhMGE3BF9TAzM5NjU0NTEwOARfcwMzOTY1NDUxMDMEc2VjA21haWxfdGFnbGluZQRzbGsDbWFpbF90YWcx?link=asksid=396546091 
for great tips from Yahoo! Answers 
http://answers.yahoo.com/dir/index;_ylc=X3oDMTFvbGNhMGE3BF9TAzM5NjU0NTEwOARfcwMzOTY1NDUxMDMEc2VjA21haWxfdGFnbGluZQRzbGsDbWFpbF90YWcx?link=asksid=396546091 
users. !DSPAM:5,45beea6d287779832115503!



--
Rigmor Ukuhe
Finestmedia Ltd | Software Development Team Manager
gsm : (+372)56467729 | tel : (+372)6558043 | e-mail : [EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] Partitioning

2007-01-29 Thread Abu Mushayeed

I have partitioned a table based on period (e.g., cdate = '2007-01-01'::date 
and cdate=.2007-03-31':;date).

Now, I am issuing query like cdate = CURRENT_DATE - 1 and cdate = 
CURRENT_DATE, it scans all the partitions. But if I do cdate = 
'2007-01-01'::date and cdate=.2007-03-31'::date it picks the correct 
partition. Also if I join the cdate field with another table, it does not pick 
the correct partition.

I would like to know if it is possible to pick the correct partition using the 
above example.

Thanks
Abu

 
-
Need Mail bonding?
Go to the Yahoo! Mail QA for great tips from Yahoo! Answers users.

Re: [PERFORM] Partitioning

2007-01-29 Thread Tomas Vondra

 Can anybody help me out
  
 I just wanted to knw what will be the configuraion settings for
 partitioning table so as to make inserts faster on the partitioned tables.

Well, that depends. Many questions are important here. Will you insert
into several partitions or only to a single one? Do you need to enforce
some constraint between several partitioned tables?

If you need to insert into several partitions, it can be faster as you
can place them on different drives. If you need to insert only into the
last one (that's usually the case with 'logging' tables) then this
probably won't give a huge performance benefit.

If you need to enforce some kind of constraint between multiple
partitions (possibly from several tables), you'll have to do that
manually using a plpgsql procedure (for example). This is the case with
UNIQUE constraint on a single table, FOREIGN KEY between multimple
partitioned tables, etc. This can mean a serious performance penalty,
esecially when you do mostly insert/update on that table.

This is mostly about application architecture - if you use partitions
incorrectly it's almost impossible to fix that by changing settings in
postgresql.conf.

Tomas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] Partitioning

2007-01-12 Thread Gauri Kanekar


Can anybody help me out

I just wanted to knw what will be the configuraion settings for partitioning
table so as to make inserts faster on the partitioned tables.


--
Regards
Gauri

Re: [PERFORM] Partitioning

2007-01-11 Thread Bernd Helmle




On Wed, 10 Jan 2007 15:30:16 -0600, Scott Marlowe [EMAIL PROTECTED] wrote:

[...]

 
 And I don't think the mysql partition supports tablespaces either.
 

MySQL supports distributing partitions over multiple disks via the SUBPARTITION 
clause [1].
I leave it to you, wether their syntax is cleaner, more powerful or easier or 
;)


Bernd

[1] http://dev.mysql.com/doc/refman/5.1/en/partitioning-subpartitions.html

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Partitioning

2007-01-11 Thread Mikael Carneholm

 On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:
  Take a look at the set of partitioning functions I wrote shortly
after
  the 8.1 release:
 
  http://www.studenter.hb.se/~arch/files/part_functions.sql
 
  You could probably work something out using those functions (as-is,
or
  as inspiration) together with pgAgent
  (http://www.pgadmin.org/docs/1.4/pgagent.html)
 
  /Mikael
 
 Those are  some great functions.
 

Well, they're less than optimal in one aspect: they add one rule per
partition, making them unsuitable for OLTP type applications (actually:
any application where insert performance is crucial). Someone with time
and/or energy could probably fix that, I guess...patches are welcome :)

/Mikael



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Partitioning

2007-01-11 Thread Jeremy Haile

Well - whether or not MySQL's implementation of partitioning has some
deficiency, it sure is a lot easier to set up than PostgreSQL.  And I
don't think there is any technical reason that setting up partitioning
on Postgres couldn't be very easy and still be robust.

On Thu, 11 Jan 2007 13:59:20 +0100, Mikael Carneholm
[EMAIL PROTECTED] said:
  On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:
   Take a look at the set of partitioning functions I wrote shortly
 after
   the 8.1 release:
  
   http://www.studenter.hb.se/~arch/files/part_functions.sql
  
   You could probably work something out using those functions (as-is,
 or
   as inspiration) together with pgAgent
   (http://www.pgadmin.org/docs/1.4/pgagent.html)
  
   /Mikael
  
  Those are  some great functions.
  
 
 Well, they're less than optimal in one aspect: they add one rule per
 partition, making them unsuitable for OLTP type applications (actually:
 any application where insert performance is crucial). Someone with time
 and/or energy could probably fix that, I guess...patches are welcome :)
 
 /Mikael
 
 
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Partitioning

2007-01-11 Thread Adam Rich


Each partition can have its own disk, without using subpartitions.

CREATE TABLE th (id INT, name VARCHAR(30), adate DATE)
PARTITION BY LIST(YEAR(adate))
(
  PARTITION p1999 VALUES IN (1995, 1999, 2003)
DATA DIRECTORY = '/var/appdata/95/data'
INDEX DIRECTORY = '/var/appdata/95/idx',
  PARTITION p2000 VALUES IN (1996, 2000, 2004)
DATA DIRECTORY = '/var/appdata/96/data'
INDEX DIRECTORY = '/var/appdata/96/idx',
  PARTITION p2001 VALUES IN (1997, 2001, 2005)
DATA DIRECTORY = '/var/appdata/97/data'
INDEX DIRECTORY = '/var/appdata/97/idx',
  PARTITION p2000 VALUES IN (1998, 2002, 2006)
DATA DIRECTORY = '/var/appdata/98/data'
INDEX DIRECTORY = '/var/appdata/98/idx'
);

Subpartitions are just a way to break (parent) partitions up into 
smaller pieces.  Those of course can be moved to other disks 
just like the main partitions.



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bernd
Helmle
Sent: Thursday, January 11, 2007 6:51 AM
To: Scott Marlowe
Cc: Jim C. Nasby; Jeremy Haile; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Partitioning





On Wed, 10 Jan 2007 15:30:16 -0600, Scott Marlowe
[EMAIL PROTECTED] wrote:

[...]

 
 And I don't think the mysql partition supports tablespaces either.
 

MySQL supports distributing partitions over multiple disks via the
SUBPARTITION clause [1].
I leave it to you, wether their syntax is cleaner, more powerful or
easier or ;)


Bernd

[1]
http://dev.mysql.com/doc/refman/5.1/en/partitioning-subpartitions.html

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Partitioning

2007-01-11 Thread Bernd Helmle




On Thu, 11 Jan 2007 08:18:39 -0600, Adam Rich [EMAIL PROTECTED] wrote:

 
 Subpartitions are just a way to break (parent) partitions up into
 smaller pieces.  Those of course can be moved to other disks
 just like the main partitions.

Ah, didn't know that (i just wondered why i need a subpartition to
change the location of a partition). 

Thanks for your clarification...

Bernd

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Partitioning

2007-01-10 Thread Jim C. Nasby

BTW, someone coming up with a set of functions to handle partitioning
for the general 'partition by time' case would make a GREAT project on
pgFoundry.

On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:
 Take a look at the set of partitioning functions I wrote shortly after
 the 8.1 release:
 
 http://www.studenter.hb.se/~arch/files/part_functions.sql
 
 You could probably work something out using those functions (as-is, or
 as inspiration) together with pgAgent
 (http://www.pgadmin.org/docs/1.4/pgagent.html)
 
 /Mikael
 
  -Original Message-
  From: [EMAIL PROTECTED]
 [mailto:pgsql-performance-
  [EMAIL PROTECTED] On Behalf Of Arnau
  Sent: den 5 januari 2007 12:02
  To: pgsql-performance@postgresql.org
  Subject: [PERFORM] Partitioning
  
  Hi all,
  
 I'm not sure if this question fits in the topic of this list.
  
 I'm interested in partitioning and it's the first time I'd use it.
  There is an issue I don't know how you handle it. Lets say I'm
  interested in store monthly based statistical data like the example of
  http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What
 I
  don't like of this approach is that the monthly tables, rules... must
 be
  created manually or at least I haven't found any other option.
  
 My question is how do you manage this? do you have a cron task that
  creates automatically these monthly elements (tables, rules, ... ) or
  there is another approach that doesn't require external things like
 cron
only PostgreSQL.
  --
  Arnau
  
  ---(end of
 broadcast)---
  TIP 5: don't forget to increase your free space map settings
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 

-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Partitioning

2007-01-10 Thread Erik Jones


On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:

Take a look at the set of partitioning functions I wrote shortly after
the 8.1 release:

http://www.studenter.hb.se/~arch/files/part_functions.sql

You could probably work something out using those functions (as-is, or
as inspiration) together with pgAgent
(http://www.pgadmin.org/docs/1.4/pgagent.html)

/Mikael


Those are  some great functions.

--
erik jones [EMAIL PROTECTED]
software development
emma(r)


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] Partitioning

2007-01-10 Thread Jeremy Haile

I really wish that PostgreSQL supported a nice partitioning syntax
like MySQL has.  

Here is an example:
CREATE TABLE tr (id INT, name VARCHAR(50), purchased DATE)
  PARTITION BY RANGE( YEAR(purchased) ) (
  PARTITION p0 VALUES LESS THAN (1990),
  PARTITION p1 VALUES LESS THAN (1995),
  PARTITION p2 VALUES LESS THAN (2000),
  PARTITION p3 VALUES LESS THAN (2005)
);

And to drop a partition:
ALTER TABLE tr DROP PARTITION p2;


This seems so much more intuitive and simpler than what is required to
set it up in PostgreSQL.  Does PostgreSQL's approach to table
partitioning have any advantage over MySQL?  Is a nicer syntax planned
for Postgres?


On Wed, 10 Jan 2007 14:20:06 -0600, Jim C. Nasby [EMAIL PROTECTED] said:
 BTW, someone coming up with a set of functions to handle partitioning
 for the general 'partition by time' case would make a GREAT project on
 pgFoundry.
 
 On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:
  Take a look at the set of partitioning functions I wrote shortly after
  the 8.1 release:
  
  http://www.studenter.hb.se/~arch/files/part_functions.sql
  
  You could probably work something out using those functions (as-is, or
  as inspiration) together with pgAgent
  (http://www.pgadmin.org/docs/1.4/pgagent.html)
  
  /Mikael
  
   -Original Message-
   From: [EMAIL PROTECTED]
  [mailto:pgsql-performance-
   [EMAIL PROTECTED] On Behalf Of Arnau
   Sent: den 5 januari 2007 12:02
   To: pgsql-performance@postgresql.org
   Subject: [PERFORM] Partitioning
   
   Hi all,
   
  I'm not sure if this question fits in the topic of this list.
   
  I'm interested in partitioning and it's the first time I'd use it.
   There is an issue I don't know how you handle it. Lets say I'm
   interested in store monthly based statistical data like the example of
   http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What
  I
   don't like of this approach is that the monthly tables, rules... must
  be
   created manually or at least I haven't found any other option.
   
  My question is how do you manage this? do you have a cron task that
   creates automatically these monthly elements (tables, rules, ... ) or
   there is another approach that doesn't require external things like
  cron
 only PostgreSQL.
   --
   Arnau
   
   ---(end of
  broadcast)---
   TIP 5: don't forget to increase your free space map settings
  
  
  ---(end of broadcast)---
  TIP 9: In versions below 8.0, the planner will ignore your desire to
 choose an index scan if your joining column's datatypes do not
 match
  
 
 -- 
 Jim Nasby[EMAIL PROTECTED]
 EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Partitioning

2007-01-10 Thread Jim C. Nasby

On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote:
 This seems so much more intuitive and simpler than what is required to
 set it up in PostgreSQL.  Does PostgreSQL's approach to table
 partitioning have any advantage over MySQL?  Is a nicer syntax planned
 for Postgres?

The focus was to get the base functionality working, and working
correctly. Another consideration is that there's multiple ways to
accomplish the partitioning; exposing the basic functionality without
enforcing a given interface provides more flexibility (ie: it appears
that you can't do list partitioning with MySQL, while you can with
PostgreSQL).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Partitioning

2007-01-10 Thread Jeremy Haile

You can do list partitioning in MySQL:
http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html

My comment was not meant as a criticism of PostgreSQL's current state -
I'm glad that it has partitioning.  I'm simply wondering if there are
any plans of adopting a more user-friendly syntax in the future similar
to MySQL partitioning support.  Having first-class citizen support of
partitions would also allow some nice administrative GUIs and views to
be built for managing them.  

Jeremy Haile


On Wed, 10 Jan 2007 15:09:31 -0600, Jim C. Nasby [EMAIL PROTECTED] said:
 On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote:
  This seems so much more intuitive and simpler than what is required to
  set it up in PostgreSQL.  Does PostgreSQL's approach to table
  partitioning have any advantage over MySQL?  Is a nicer syntax planned
  for Postgres?
 
 The focus was to get the base functionality working, and working
 correctly. Another consideration is that there's multiple ways to
 accomplish the partitioning; exposing the basic functionality without
 enforcing a given interface provides more flexibility (ie: it appears
 that you can't do list partitioning with MySQL, while you can with
 PostgreSQL).
 -- 
 Jim Nasby[EMAIL PROTECTED]
 EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Partitioning

2007-01-10 Thread Scott Marlowe

On Wed, 2007-01-10 at 15:15, Jeremy Haile wrote:
 You can do list partitioning in MySQL:
 http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html
 
 My comment was not meant as a criticism of PostgreSQL's current state -
 I'm glad that it has partitioning.  I'm simply wondering if there are
 any plans of adopting a more user-friendly syntax in the future similar
 to MySQL partitioning support.  Having first-class citizen support of
 partitions would also allow some nice administrative GUIs and views to
 be built for managing them.  

I don't think anyone took it as a negative criticism.  Jim and I were
both more pointing out that the development process of the two projects
is somewhat different.

In MySQL a small group that doesn't necessarily interact with a large
user community sets out to implement a feature in a given time line with
a given set of requirements and they tend to ignore what they see as
esoteric requirements.

In PostgreSQL a large development community that communicates fairly
well with it's large user community put somewhat of the onus of proving
the need and doing the initial proof of concept on those who say they
need a feature, often working in a method where the chief hackers lend a
hand to someone who wants the feature so they can get a proof of concept
up and running.  And example would be the auditing / time travel in the
contrib/spi project.  After several iterations, and given the chance to
learn from the mistakes of the previous incarnations, something often
rises out of that to produce the feature needed.

Generally speaking the postgresql method takes longer, making life
harder today, but produces cleaner more easily maintained solutions,
making life easier in the future.  Meanwhile the mysql method works
faster, making life easier today, but makes compromises that might make
life harder in the future.

Something that embodies that difference is the table handler philosophy
of both databases.  PostgreSQL has the abstraction to have more than one
table handler, but in practice has exactly one table handler.  MySQL has
the ability to have many table handlers, and in fact uses many of them.

With PostgreSQL this means that things like the query parsing /
execution and the table handler are tightly coupled.  This results in
things like transactable DDL.  Sometimes this results in suggestions
being dismissed out of hand because they would have unintended
consequences.

In MySQL, because of the multiple table handlers, many compromises on
the query parsing have to be made.  The most common one being that you
can define constraints / foreign keys in a column item, and they will
simply be ignored with no error or notice.  The fk constraints have to
go at the end of the column list to be parsed and executed.

So, partitioning, being something that will touch a lot of parts of the
database, isn't gonna just show up one afternoon in pgsql.  It will
likely take a few people making proof of concept versions before a
consensus is reached and someone who has the ability codes it up.

---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] Partitioning

2007-01-05 Thread Arnau


Hi all,

  I'm not sure if this question fits in the topic of this list.

  I'm interested in partitioning and it's the first time I'd use it.
There is an issue I don't know how you handle it. Lets say I'm
interested in store monthly based statistical data like the example of
http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What I
don't like of this approach is that the monthly tables, rules... must be
created manually or at least I haven't found any other option.

  My question is how do you manage this? do you have a cron task that
creates automatically these monthly elements (tables, rules, ... ) or
there is another approach that doesn't require external things like cron
 only PostgreSQL.
--
Arnau

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Partitioning

2007-01-05 Thread Mikael Carneholm

Take a look at the set of partitioning functions I wrote shortly after
the 8.1 release:

http://www.studenter.hb.se/~arch/files/part_functions.sql

You could probably work something out using those functions (as-is, or
as inspiration) together with pgAgent
(http://www.pgadmin.org/docs/1.4/pgagent.html)

/Mikael

 -Original Message-
 From: [EMAIL PROTECTED]
[mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Arnau
 Sent: den 5 januari 2007 12:02
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Partitioning
 
 Hi all,
 
I'm not sure if this question fits in the topic of this list.
 
I'm interested in partitioning and it's the first time I'd use it.
 There is an issue I don't know how you handle it. Lets say I'm
 interested in store monthly based statistical data like the example of
 http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What
I
 don't like of this approach is that the monthly tables, rules... must
be
 created manually or at least I haven't found any other option.
 
My question is how do you manage this? do you have a cron task that
 creates automatically these monthly elements (tables, rules, ... ) or
 there is another approach that doesn't require external things like
cron
   only PostgreSQL.
 --
 Arnau
 
 ---(end of
broadcast)---
 TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Partitioning / constrain exlusion not working with %-operator

2006-08-04 Thread Martin Lesser

Tom Lane [EMAIL PROTECTED] writes:

 It's usually better to use partitioning rules that have something to
 do with the WHERE-clauses you'd be using anyway.  For instance, try
 to partition on ranges.

I agree and tried to create new partitioned tables. But now I ran into
some other performance-related trouble when inserting (parts of) the old
(unpartioned) table into the new one:

CREATE TABLE t_unparted (id1 int, cont varchar);
-- Populate table with 1000 records with id1 from 1 to 1000 and ANALYZE

CREATE TABLE t_parted (id1 int, cont varchar);
CREATE TABLE t_parted_000 (check (id1 =0 AND id1  100)) INHERITS (t_parted);
CREATE RULE ins_000 AS ON INSERT TO t_parted  WHERE id1 = 0 AND id1  100 DO 
INSTEAD INSERT INTO t_parted_000 VALUES (new.*);
-- ... 8 more tables + 8 more rules
CREATE TABLE t_parted_900 (check (id1 =900 AND id1  1000)) INHERITS 
(t_parted);
CREATE RULE ins_900 AS ON INSERT TO t_parted  WHERE id1 = 900 AND id1  1000 
DO INSTEAD INSERT INTO t_parted_900 VALUES (new.*);

And now:

EXPLAIN INSERT INTO t_parted SELECT * FROM t_parted WHERE id1=0 AND id1100;

 Result  (cost=0.00..170.80 rows=12 width=36)
   -  Append  (cost=0.00..170.80 rows=12 width=36)
 -  Seq Scan on t_parted  (cost=0.00..85.40 rows=6 width=36)
   Filter: ((id1 = 0) AND (id1  100) AND (((id1 = 0) AND (id1  
100)) IS NOT TRUE) AND (((id1 = 100) AND (id1  200)) IS NOT TRUE) AND (((id1 
= 200) AND (id1  300)) IS NOT TRUE) AND (((id1 = 300) AND (id1  400)) IS 
NOT TRUE) AND (((id1 = 400) AND (id1  500)) IS NOT TRUE) AND (((id1 = 500) 
AND (id1  600)) IS NOT TRUE) AND (((id1 = 600) AND (id1  700)) IS NOT TRUE) 
AND (((id1 = 700) AND (id1  800)) IS NOT TRUE) AND (((id1 = 800) AND (id1  
900)) IS NOT TRUE) AND (((id1 = 900) AND (id1  1000)) IS NOT TRUE))
 -  Seq Scan on t_parted_000 t_parted  (cost=0.00..85.40 rows=6 
width=36)
   Filter: ((id1 = 0) AND (id1  100) AND (((id1 = 0) AND (id1  
100)) IS NOT TRUE) AND (((id1 = 100) AND (id1  200)) IS NOT TRUE) AND (((id1 
= 200) AND (id1  300)) IS NOT TRUE) AND (((id1 = 300) AND (id1  400)) IS 
NOT TRUE) AND (((id1 = 400) AND (id1  500)) IS NOT TRUE) AND (((id1 = 500) 
AND (id1  600)) IS NOT TRUE) AND (((id1 = 600) AND (id1  700)) IS NOT TRUE) 
AND (((id1 = 700) AND (id1  800)) IS NOT TRUE) AND (((id1 = 800) AND (id1  
900)) IS NOT TRUE) AND (((id1 = 900) AND (id1  1000)) IS NOT TRUE))

 Result  (cost=0.00..66.40 rows=12 width=36)
   -  Append  (cost=0.00..66.40 rows=12 width=36)
 -  Seq Scan on t_parted  (cost=0.00..33.20 rows=6 width=36)
   Filter: ((id1 = 0) AND (id1  100) AND (id1 = 0) AND (id1  
100))
 -  Seq Scan on t_parted_000 t_parted  (cost=0.00..33.20 rows=6 
width=36)
   Filter: ((id1 = 0) AND (id1  100) AND (id1 = 0) AND (id1  
100))
 ...
 Result  (cost=0.00..33.20 rows=6 width=36)
   -  Append  (cost=0.00..33.20 rows=6 width=36)
 -  Seq Scan on t_parted  (cost=0.00..33.20 rows=6 width=36)
   Filter: ((id1 = 0) AND (id1  100) AND (id1 = 900) AND (id1  
1000))
(58 rows)

The filters appended by the planner do not make any sense and cost too
much time if the old table is huge. (constraint_exclusion was ON)

Is there a better way to partition an existing table with a large
number of rows (100 mio)?

TIA, Martin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] Partitioning / constrain exlusion not working with %-operator

2006-07-31 Thread Martin Lesser

I try to partition a large table (~ 120 mio. rows) into 50 smaller
tables but using the IMO immutable %-function constraint exclusion
does not work as expected:

CREATE TABLE tt_m (id1 int, cont varchar);
CREATE TABLE tt_0 (check (id1 % 50 = 0)) INHERITS (tt_m);
CREATE TABLE tt_1 (check (id1 % 50 = 1)) INHERITS (tt_m);

CREATE RULE ins_tt_0 AS ON INSERT TO tt_m WHERE id1 % 50 = 0 DO INSTEAD INSERT 
INTO tt_0 VALUES (new.*);
CREATE RULE ins_tt_1 AS ON INSERT TO tt_m WHERE id1 % 50 = 1 DO INSTEAD INSERT 
INTO tt_1 VALUES (new.*);
...
INSERT INTO tt_m (id1,cont) VALUES (0,'Test1');
INSERT INTO tt_m (id1,cont) VALUES (1,'Test2');

EXPLAIN SELECT * FROM tt_m WHERE id1=1;
  QUERY PLAN
---
 Result  (cost=0.00..73.50 rows=18 width=36)
   -  Append  (cost=0.00..73.50 rows=18 width=36)
 -  Seq Scan on tt_m  (cost=0.00..24.50 rows=6 width=36)
   Filter: (id1 = 1)
 -  Seq Scan on tt_0 tt_m  (cost=0.00..24.50 rows=6 width=36)
   Filter: (id1 = 1)
 -  Seq Scan on tt_1 tt_m  (cost=0.00..24.50 rows=6 width=36)
   Filter: (id1 = 1)
 ...

Only adding an explicit %-call to the query results in the expected plan:

EXPLAIN SELECT * FROM tt_m WHERE id1=1 AND id1 % 50 = 1;
  QUERY PLAN
---
 Result  (cost=0.00..60.60 rows=2 width=36)
   -  Append  (cost=0.00..60.60 rows=2 width=36)
 -  Seq Scan on tt_m  (cost=0.00..30.30 rows=1 width=36)
   Filter: ((id1 = 1) AND ((id1 % 50) = 1))
 -  Seq Scan on tt_1 tt_m  (cost=0.00..30.30 rows=1 width=36)
   Filter: ((id1 = 1) AND ((id1 % 50) = 1))

Did I miss something and/or how could I force the planner to use
constraint exclusion without adding the explicit second condition above?

TIA, Martin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Partitioning / constrain exlusion not working with %-operator

2006-07-31 Thread Tom Lane

Martin Lesser [EMAIL PROTECTED] writes:
 I try to partition a large table (~ 120 mio. rows) into 50 smaller
 tables but using the IMO immutable %-function constraint exclusion
 does not work as expected:

The constraint exclusion mechanism is not as bright as you think.
There are some very limited cases where it can make a deduction that
a WHERE clause implies a CHECK constraint that's not an exact textual
equivalent ... but all those cases have to do with related b-tree
operators, and % is not one.

It's usually better to use partitioning rules that have something to
do with the WHERE-clauses you'd be using anyway.  For instance, try
to partition on ranges of id1 instead of id1 % 50.  That works because
the CHECK clauses will be like id1 = x and id1  y and those
operators are btree-related to the id1 = z clauses you'll have in
the query.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] partitioning

2006-03-21 Thread Jim C. Nasby

On Sun, Mar 19, 2006 at 01:31:42PM +0100, Antoine wrote:
 Hi,
 Is there any work on the cards for implementing other partitioning
 strategies? I see mysql 5.1 will have support for hashes and stuff but
 didn't see anything in the todos for postgres.

You'd have to provide a pretty convincing argument for providing hash
partitioning I think. I can't really think of any real-world scenarios
where it's better than other forms.

In any case, the next logical step on the partitioning front is to add
some 'syntactic sugar' to make it easier for people to work with
partitions. I seem to remember some discussion about that, but I don't
recall where it lead to.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend

[PERFORM] partitioning

2006-03-19 Thread Antoine

Hi,
Is there any work on the cards for implementing other partitioning
strategies? I see mysql 5.1 will have support for hashes and stuff but
didn't see anything in the todos for postgres.
Cheers
Antoine
--
This is where I should put some witty comment.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] partitioning and locking problems

2006-02-02 Thread Markus Schaber

Hi, Marc,

Marc Morin wrote:

   1- long running report is running on view
   2- continuous inserters into view into a table via a rule
   3- truncate or rule change occurs, taking an exclusive lock.
 Must wait for #1 to finish.
   4- new reports and inserters must now wait for #3.
   5- now everyone is waiting for a single query in #1.   Results
 in loss of insert data granularity (important for our application).

Apart from having two separate views (one for report, one for insert) as
Richard suggested:

If you have fixed times for #3, don't start any #1 that won't finish
before it's time for #3.

You could also use the LOCK command on an empty lock table at the
beginning of each #1 or #3 transaction to prevent #3 from getting the
view lock before #1 is finished.


HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] partitioning and locking problems

2006-02-02 Thread Marc Morin

Using a separate lock table is what we've decided to do in this
particular case to serialize #1 and #3.  Inserters don't take this lock
and as such will not be stalled. 

 -Original Message-
 From: Markus Schaber [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, February 02, 2006 7:44 AM
 To: Marc Morin
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] partitioning and locking problems

 Hi, Marc,

 Marc Morin wrote:

  1- long running report is running on view
  2- continuous inserters into view into a table via a rule
  3- truncate or rule change occurs, taking an exclusive lock.
  Must wait for #1 to finish.
  4- new reports and inserters must now wait for #3.
  5- now everyone is waiting for a single query in #1.   Results
  in loss of insert data granularity (important for our application).

 Apart from having two separate views (one for report, one for 
 insert) as Richard suggested:

 If you have fixed times for #3, don't start any #1 that won't 
 finish before it's time for #3.

 You could also use the LOCK command on an empty lock table at 
 the beginning of each #1 or #3 transaction to prevent #3 from 
 getting the view lock before #1 is finished.

 HTH,
 Markus

 --
 Markus Schaber | Logical TrackingTracing International AG
 Dipl. Inf. | Software Development GIS

 Fight against software patents in EU! www.ffii.org 
 www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] partitioning and locking problems

2006-02-01 Thread Richard Huxton


Marc Morin wrote:

Under both these circumstances (truncate and create / replace rule) the
locking behaviour of these commands can cause locking problems for us.
The scenario is best illustrated as a series of steps:
 


1- long running report is running on view
2- continuous inserters into view into a table via a rule
3- truncate or rule change occurs, taking an exclusive lock.
Must wait for #1 to finish.
4- new reports and inserters must now wait for #3.
5- now everyone is waiting for a single query in #1.   Results
in loss of insert data granularity (important for our application).


How much would you get from splitting the view into two: reporting and 
inserting?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] partitioning and locking problems

2006-02-01 Thread Marc Morin

Tom,

Do you mean it would be impossible to change the code so that existing
selects continue to use the pre-truncated table until they commit? Or
just require a more extensive change?

The update/insert rule change appears to be more more doable? No? 

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, February 01, 2006 12:50 AM
 To: Marc Morin
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] partitioning and locking problems 
 
 Marc Morin [EMAIL PROTECTED] writes:
  Would like to understand the implications of changing postgres'
  code/locking for rule changes and truncate to not require 
 locking out 
  select statements?
 
 It won't work...
 
   regards, tom lane
 
 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] partitioning and locking problems

2006-02-01 Thread Tom Lane

Marc Morin [EMAIL PROTECTED] writes:
 Do you mean it would be impossible to change the code so that existing
 selects continue to use the pre-truncated table until they commit?

Yes, because that table won't exist any more (as in the file's been
unlinked) once the TRUNCATE commits.

 The update/insert rule change appears to be more more doable? No? 

You've still got race conditions there: do onlooker transactions see the
old set of rules, or the new set, or some unholy mixture?  Removing the
lock as you suggest would make it possible for the rule rewriter to pick
up non-self-consistent data from the system catalogs, leading to
arbitrarily bad behavior ... if you're lucky, it'll just crash, if
you're not lucky the incorrect rule will do a fandango on your data.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] partitioning and locking problems

2006-01-31 Thread Marc Morin

We have a large database system designed around partitioning.  Our
application is characterized with
 
- terabytes of data
- billions of rows in dozens of base tables (and 100s of paritions)
- 24x7 insert load of new data that cannot be stopped, data is time
sensitive.
- periodic reports that can have long running queries with query times
measured in hours
 
We have 2 classes of maintenance activities that are causing us
problems:
- periodically we need to change an insert rule on a view to point to a
different partition.
- periodically we need to delete data that is no longer needed.
Performed via truncate.
 
Under both these circumstances (truncate and create / replace rule) the
locking behaviour of these commands can cause locking problems for us.
The scenario is best illustrated as a series of steps:
 

1- long running report is running on view
2- continuous inserters into view into a table via a rule
3- truncate or rule change occurs, taking an exclusive lock.
Must wait for #1 to finish.
4- new reports and inserters must now wait for #3.
5- now everyone is waiting for a single query in #1.   Results
in loss of insert data granularity (important for our application).

 
Would like to understand the implications of changing postgres'
code/locking for rule changes and truncate to not require locking out
select statements?  
 
The following is a simplified schema to help illustrate the problem.
 

create table a_1
(
pkey int primary key
);
create table a_2
(
pkey int primary key
);
 
create view a as select * from a_1 union all select * from a_2;
 
create function change_rule(int) returns void as
'
begin
execute ''create or replace rule insert as on insert to a do
instead insert into a_''||$1||''(pkey) values(NEW.pkey)'';
end;
' language plpgsql;
 
-- change rule, execute something like the following
periodically
select change_rule(1);

 
We've looked at the code and the rule changes appear easy but we are
concerned about the required changes for truncate.
 
Thanks
Marc


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] partitioning and locking problems

2006-01-31 Thread Tom Lane

Marc Morin [EMAIL PROTECTED] writes:
 Would like to understand the implications of changing postgres'
 code/locking for rule changes and truncate to not require locking out
 select statements?  

It won't work...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[PERFORM] partitioning

2005-12-13 Thread Marc Cousin

Hi,

I've been working on trying to partition a big table (I've never partitioned a 
table in any other database till now).
Everything went ok, except one query that didn't work afterwards.

I've put the partition description, indexes, etc ..., and the explain plan 
attached.

The query is extremely fast without partition (index scan backards on the 
primary key)

The query is : select * from logs order by id desc limit 100;
id is the primary key.

It is indexed on all partitions.

But the explain plan does full table scan on all partitions.

While I think I understand why it is doing this plan right now, is there 
something that could be done to optimize this case ? Or put a warning in the 
docs about this kind of behaviour. I guess normally someone would partition 
to get faster queries :)

Anyway, I thought I should mention this, as it has been quite a surprise.
CREATE TABLE logs_150 (CHECK ( id_machine = 150)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_163 (CHECK ( id_machine = 163)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_289 (CHECK ( id_machine = 289)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_319 (CHECK ( id_machine = 319)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_238 (CHECK ( id_machine = 238)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_148 (CHECK ( id_machine = 148)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_176 (CHECK ( id_machine = 176)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_164 (CHECK ( id_machine = 164)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_316 (CHECK ( id_machine = 316)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_313 (CHECK ( id_machine = 313)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_217 (CHECK ( id_machine = 217)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_167 (CHECK ( id_machine = 167)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_287 (CHECK ( id_machine = 287)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_279 (CHECK ( id_machine = 279)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_248 (CHECK ( id_machine = 248)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_280 (CHECK ( id_machine = 280)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_165 (CHECK ( id_machine = 165)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_330 (CHECK ( id_machine = 330)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_149 (CHECK ( id_machine = 149)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_157 (CHECK ( id_machine = 157)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_272 (CHECK ( id_machine = 272)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_311 (CHECK ( id_machine = 311)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_309 (CHECK ( id_machine = 309)) INHERITS (logs) TABLESPACE 
data_logs;
CREATE TABLE logs_318 (CHECK ( id_machine = 318)) INHERITS (logs) TABLESPACE 
data_logs;


CREATE UNIQUE INDEX logs_150_pkey ON logs_150 (id) TABLESPACE index_logs;CREATE 
INDEX date_150 ON logs_150 (date) TABLESPACE index_logs ;CREATE INDEX event_150 
ON logs_150 (evenement) TABLESPACE index_logs;


.


logs= explain select * from logs order by id desc limit 100;
 QUERY PLAN
-
 Limit  (cost=16524647.29..16524647.54 rows=100 width=295)
   -  Sort  (cost=16524647.29..16568367.11 rows=17487927 width=295)
 Sort Key: public.logs.id
 -  Result  (cost=0.00..827622.27 rows=17487927 width=295)
   -  Append  (cost=0.00..827622.27 rows=17487927 width=295)
 -  Seq Scan on logs  (cost=0.00..826232.78 rows=17451978 
width=165)
 -  Seq Scan on logs_150 logs  (cost=0.00..199.04 
rows=6104 width=144)
 -  Seq Scan on logs_163 logs  (cost=0.00..261.79 
rows=7079 width=169)
 -  Seq Scan on logs_289 logs  (cost=0.00..428.93 
rows=10693 width=200)
 -  Seq Scan on logs_319 logs  (cost=0.00..31.92 rows=992 
width=129)
 -  Seq Scan on logs_238 logs  (cost=0.00..28.01 rows=701 
width=199)
 -  Seq Scan on logs_148 logs  (cost=0.00..80.15 rows=2015 
width=195)
 -  Seq Scan on logs_176 logs  (cost=0.00..12.40 rows=240 
width=295)
 -  Seq Scan on logs_164 logs  (cost=0.00..12.40 rows=240 
width=295)
 -  Seq Scan on logs_316 logs  (cost=0.00..12.40 rows=240 
width=295)
 -  Seq Scan on logs_313 logs  (cost=0.00..12.40 rows=240 
width=295)
 -  Seq Scan on logs_217 logs  (cost=0.00..12.40 rows=240 
width=295)
 -  Seq Scan on logs_167 logs  (cost=0.00..57.36 rows=1536 
width=170)
 -  Seq Scan on logs_287 logs  (cost=0.00..12.40

Re: [PERFORM] partitioning

2005-12-13 Thread Pandurangan R S

Did you set constraint_exclusion = true in postgresql.conf file?

On 12/13/05, Marc Cousin [EMAIL PROTECTED] wrote:
 Hi,

 I've been working on trying to partition a big table (I've never partitioned a
 table in any other database till now).
 Everything went ok, except one query that didn't work afterwards.

 I've put the partition description, indexes, etc ..., and the explain plan
 attached.

 The query is extremely fast without partition (index scan backards on the
 primary key)

 The query is : select * from logs order by id desc limit 100;
 id is the primary key.

 It is indexed on all partitions.

 But the explain plan does full table scan on all partitions.

 While I think I understand why it is doing this plan right now, is there
 something that could be done to optimize this case ? Or put a warning in the
 docs about this kind of behaviour. I guess normally someone would partition
 to get faster queries :)

 Anyway, I thought I should mention this, as it has been quite a surprise.



 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly






--
Regards
Pandu

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] partitioning

2005-12-13 Thread Pandurangan R S

I just saw that there is no where clause in the query, that you had
fed to explain plan.
you need to include a where clause based on id_machine column to see the effect.

On 12/13/05, Pandurangan R S [EMAIL PROTECTED] wrote:
 Did you set constraint_exclusion = true in postgresql.conf file?

 On 12/13/05, Marc Cousin [EMAIL PROTECTED] wrote:
  Hi,
 
  I've been working on trying to partition a big table (I've never 
  partitioned a
  table in any other database till now).
  Everything went ok, except one query that didn't work afterwards.
 
  I've put the partition description, indexes, etc ..., and the explain plan
  attached.
 
  The query is extremely fast without partition (index scan backards on the
  primary key)
 
  The query is : select * from logs order by id desc limit 100;
  id is the primary key.
 
  It is indexed on all partitions.
 
  But the explain plan does full table scan on all partitions.
 
  While I think I understand why it is doing this plan right now, is there
  something that could be done to optimize this case ? Or put a warning in the
  docs about this kind of behaviour. I guess normally someone would partition
  to get faster queries :)
 
  Anyway, I thought I should mention this, as it has been quite a surprise.
 
 
 
  ---(end of broadcast)---
  TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 
 
 
 


 --
 Regards
 Pandu


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] partitioning

2005-12-13 Thread Marc Cousin

Yes, that's how I solved it... and I totally agree that it's hard for the 
planner to guess what to do on the partitions. But maybe there should be 
something in the docs explaining the limitations ...

I'm only asking for the biggest 100 ids from the table, so I thought maybe the 
planner would take the 100 biggest from all partitions or something like that 
and return me the 100 biggest from those results. It didn't and that's quite 
logical.

What I meant is that I understand why the planner chooses this plan, but maybe 
it should be written somewhere in the docs that some plans will be worse 
after partitionning.

Le Mardi 13 Décembre 2005 12:50, vous avez écrit :
 I just saw that there is no where clause in the query, that you had
 fed to explain plan.
 you need to include a where clause based on id_machine column to see the
 effect.

 On 12/13/05, Pandurangan R S [EMAIL PROTECTED] wrote:
  Did you set constraint_exclusion = true in postgresql.conf file?
 
  On 12/13/05, Marc Cousin [EMAIL PROTECTED] wrote:
   Hi,
  
   I've been working on trying to partition a big table (I've never
   partitioned a table in any other database till now).
   Everything went ok, except one query that didn't work afterwards.
  
   I've put the partition description, indexes, etc ..., and the explain
   plan attached.
  
   The query is extremely fast without partition (index scan backards on
   the primary key)
  
   The query is : select * from logs order by id desc limit 100;
   id is the primary key.
  
   It is indexed on all partitions.
  
   But the explain plan does full table scan on all partitions.
  
   While I think I understand why it is doing this plan right now, is
   there something that could be done to optimize this case ? Or put a
   warning in the docs about this kind of behaviour. I guess normally
   someone would partition to get faster queries :)
  
   Anyway, I thought I should mention this, as it has been quite a
   surprise.
  
  
  
   ---(end of
   broadcast)--- TIP 1: if posting/reading through
   Usenet, please send an appropriate subscribe-nomail command to
   [EMAIL PROTECTED] so that your message can get through to the
   mailing list cleanly
 
  --
  Regards
  Pandu

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Partitioning / Clustering

2005-05-14 Thread PFC


If you make the assertion that you are transferring equal or less
session data between your session server (lets say an RDBMS) and the
app server than you are between the app server and the client, an out
of band 100Mb network for session information is plenty of bandwidth.
	So if you count on a mean page size of 6-8 kbytes gzipped, that will  
prevent you from caching the N first results of the Big Slow Search Query  
in a native object in the user session state (say, a list of integers  
indicating which rows match), so you will have to redo the Big Slow Search  
Query everytime the user clicks on Next Page instead of grabbing a set of  
cached row id's and doing a fast SELECT WHERE id IN ...
	This is the worst case ... I'd gzip() the row id's and stuff them in the  
session, that's always better than blowing up the database with the Big  
Slow Search Query everytime someone does Next Page...

This also represents OLTP style traffic, which postgresql is pretty
good at.  You should easily be able to get over 100Tps.  100 hits per
second is an awful lot of traffic, more than any website I've managed
will ever see.
	On the latest anandtech benchmarks, 100 hits per second on a blog/forum  
software is a big bi-opteron server running dotNET, at 99% load... it's a  
lot if you count only dynamic page hits.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Stapleton

On 11 May 2005, at 23:35, PFC wrote:


However, memcached (and for us, pg_memcached) is an excellent way  
to improve
horizontal scalability by taking disposable data (like session  
information)
out of the database and putting it in protected RAM.

So, what is the advantage of such a system versus, say, a  
sticky sessions system where each session is assigned to ONE  
application server (not PHP then) which keeps it in RAM as native  
objects instead of serializing and deserializing it on each request ?
I'd say the sticky sessions should perform a lot better, and if  
one machine dies, only the sessions on this one are lost.
But of course you can't do it with PHP as you need an app  
server which can manage sessions. Potentially the savings are huge,  
though.
Theres no reason it couldn't be done with PHP to be fair as long as  
you could ensure that the client was always routed back to the same  
machines. Which has it's own set of issues entirely. I am not  
entirely sure that memcached actually does serialize data when it's  
comitted into memcached either, although I could be wrong, I have not  
looked at the source. Certainly if you can ensure that a client  
always goes back to the same machine you can simplify the whole thing  
hugely. It's generally not that easy though, you need a proxy server  
of some description capable of understanding the HTTP traffic and  
maintaining a central session lookup table to redirect with. Which  
isn't really solving the problem so much as moving it somewhere else.  
Instead of needing huge memcached pools, you need hardcore  
loadbalancers. Load Balancers tend to cost $ in comparison.  
Distributed sticky sessions are a rather nice idea, I would like to  
hear a way of implementing them cheaply (and on PHP) as well. I may  
have to give that some thought in fact. Oh yeah, and load balancers  
software often sucks in annoying (if not always important) ways.

On Google, their distributed system spans a huge number of PCs  
and it has redundancy, ie. individual PC failure is a normal thing  
and is a part of the system, it is handled gracefully. I read a  
paper on this matter, it's pretty impressive. The google filesystem  
has nothing to do with databases though, it's more a massive data  
store / streaming storage.

Since when did Massive Data stores have nothing to do with DBs? Isn't  
Oracle Cluster entirely based on forming an enormous scalable disk  
array to store your DB on?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread PFC


machines. Which has it's own set of issues entirely. I am not entirely  
sure that memcached actually does serialize data when it's comitted into
	I think it does, ie. it's a simple mapping of [string key] = [string  
value].

memcached either, although I could be wrong, I have not looked at the  
source. Certainly if you can ensure that a client always goes back to  
the same machine you can simplify the whole thing hugely. It's generally  
not that easy though, you need a proxy server of some description  
capable of understanding the HTTP traffic and maintaining a central
	Yes...
	You could implement it by mapping servers to the hash of the user session  
id.
	Statistically, the servers would get the same numbers of sessions on each  
of them, but you have to trust statistics...
	It does eliminate the lookup table though.

idea, I would like to hear a way of implementing them cheaply (and on  
PHP) as well. I may have to give that some thought in fact. Oh yeah, and  
load balancers software often sucks in annoying (if not always  
important) ways.
	You can use lighttpd as a load balancer, I believe it has a stick  
sessions plugin (or you could code one in, it's open source after all). It  
definitely support simple round-robin load balancing, acting as a proxy to  
any number of independent servers.


matter, it's pretty impressive. The google filesystem has nothing to do  
with databases though, it's more a massive data store / streaming  
storage.
Since when did Massive Data stores have nothing to do with DBs? Isn't  
Oracle Cluster entirely based on forming an enormous scalable disk array  
to store your DB on?
	Um, well, the Google Filesystem is (like its name implies) a filesystem  
designed to store huge files in a distributed and redundant manner. Files  
are structured as a stream of records (which are themselves big in size)  
and it's designed to support appending records to these stream files  
efficiently and without worrying about locking.

	It has no querying features however, that is why I said it was not a  
database.

	I wish I could find the whitepaper, I think the URL was on this list some  
day, maybe it's on Google's site ?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Turner

Having local sessions is unnesesary, and here is my logic:

Generaly most people have less than 100Mb of bandwidth to the internet.

If you make the assertion that you are transferring equal or less
session data between your session server (lets say an RDBMS) and the
app server than you are between the app server and the client, an out
of band 100Mb network for session information is plenty of bandwidth. 
This also represents OLTP style traffic, which postgresql is pretty
good at.  You should easily be able to get over 100Tps.  100 hits per
second is an awful lot of traffic, more than any website I've managed
will ever see.

Why solve the complicated clustered sessions problem, when you don't
really need to?

Alex Turner
netEconomist

On 5/11/05, PFC [EMAIL PROTECTED] wrote:
 
 
  However, memcached (and for us, pg_memcached) is an excellent way to
  improve
  horizontal scalability by taking disposable data (like session
  information)
  out of the database and putting it in protected RAM.
 
 So, what is the advantage of such a system versus, say, a sticky
 sessions system where each session is assigned to ONE application server
 (not PHP then) which keeps it in RAM as native objects instead of
 serializing and deserializing it on each request ?
 I'd say the sticky sessions should perform a lot better, and if one
 machine dies, only the sessions on this one are lost.
 But of course you can't do it with PHP as you need an app server which
 can manage sessions. Potentially the savings are huge, though.
 
 On Google, their distributed system spans a huge number of PCs and it 
 has
 redundancy, ie. individual PC failure is a normal thing and is a part of
 the system, it is handled gracefully. I read a paper on this matter, it's
 pretty impressive. The google filesystem has nothing to do with databases
 though, it's more a massive data store / streaming storage.
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Stapleton

On 12 May 2005, at 15:08, Alex Turner wrote:
Having local sessions is unnesesary, and here is my logic:
Generaly most people have less than 100Mb of bandwidth to the  
internet.

If you make the assertion that you are transferring equal or less
session data between your session server (lets say an RDBMS) and the
app server than you are between the app server and the client, an out
of band 100Mb network for session information is plenty of bandwidth.
This also represents OLTP style traffic, which postgresql is pretty
good at.  You should easily be able to get over 100Tps.  100 hits per
second is an awful lot of traffic, more than any website I've managed
will ever see.
Why solve the complicated clustered sessions problem, when you don't
really need to?
100 hits a second = 8,640,000 hits a day. I work on a site which does  
 100 million dynamic pages a day. In comparison Yahoo probably does  
 100,000,000,000 (100 billion) views a day
 if I am interpreting Alexa's charts correctly. Which is about  
1,150,000 a second.

Now considering the site I work on is not even in the top 1000 on  
Alexa, theres a lot of sites out there which need to solve this  
problem I would assume.

There are also only so many hash table lookups a single machine can  
do, even if its a Quad Opteron behemoth.


Alex Turner
netEconomist
On 5/11/05, PFC [EMAIL PROTECTED] wrote:


However, memcached (and for us, pg_memcached) is an excellent way to
improve
horizontal scalability by taking disposable data (like session
information)
out of the database and putting it in protected RAM.
So, what is the advantage of such a system versus, say, a  
sticky
sessions system where each session is assigned to ONE application  
server
(not PHP then) which keeps it in RAM as native objects instead of
serializing and deserializing it on each request ?
I'd say the sticky sessions should perform a lot better,  
and if one
machine dies, only the sessions on this one are lost.
But of course you can't do it with PHP as you need an app  
server which
can manage sessions. Potentially the savings are huge, though.

On Google, their distributed system spans a huge number of  
PCs and it has
redundancy, ie. individual PC failure is a normal thing and is a  
part of
the system, it is handled gracefully. I read a paper on this  
matter, it's
pretty impressive. The google filesystem has nothing to do with  
databases
though, it's more a massive data store / streaming storage.

---(end of  
broadcast)---
TIP 1: subscribe and unsubscribe commands go to  
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Turner

Ok - my common sense alarm is going off here...

There are only 6.446 billion people worldwide.  100 Billion page views
would require every person in the world to view 18 pages of yahoo
every day.  Not very likely.

http://www.internetworldstats.com/stats.htm
suggests that there are around 1 billion people actualy on the internet.

That means each and every person on the internet has to view 100 pages
per day of yahoo.

pretty unlikely IMHO.  I for one don't even use Yahoo ;)

100 million page views per day suggests that 1 in 100 people on the
internet each viewed 10 pages of a site.  Thats a pretty high
percentage if you ask me.

If I visit 20 web sites in a day, and see an average of 10 pages per
site. that means only about 2000 or so sites generate 100 million page
views in a day or better.

100 million pageviews averages to 1157/sec, which we'll double for
peak load to 2314.

I can easily see a system doing 2314 hash lookups per second.  Hell I
wrote a system that could do a thousand times that four years ago on a
single 1Ghz Athlon.  Heck - you can get 2314 lookups/sec on a 486 ;)

Given that session information doesn't _have_ to persist to storage,
and can be kept in RAM.  A single server could readily manage session
information for even very large sites (of course over a million
concurrent users could really start chewing into RAM, but if you are
Yahoo, you can probably afford a box with 100GB of RAM ;).

We get over 1000 tps on a dual opteron with a couple of mid size RAID
arrays on 10k discs with fsync on for small transactions.  I'm sure
that could easily be bettered with a few more dollars.

Maybe my number are off, but somehow it doesn't seem like that many
people need a highly complex session solution to me.

Alex Turner
netEconomist

On 5/12/05, Alex Stapleton [EMAIL PROTECTED] wrote:
 
 On 12 May 2005, at 15:08, Alex Turner wrote:
 
  Having local sessions is unnesesary, and here is my logic:
 
  Generaly most people have less than 100Mb of bandwidth to the
  internet.
 
  If you make the assertion that you are transferring equal or less
  session data between your session server (lets say an RDBMS) and the
  app server than you are between the app server and the client, an out
  of band 100Mb network for session information is plenty of bandwidth.
  This also represents OLTP style traffic, which postgresql is pretty
  good at.  You should easily be able to get over 100Tps.  100 hits per
  second is an awful lot of traffic, more than any website I've managed
  will ever see.
 
  Why solve the complicated clustered sessions problem, when you don't
  really need to?
 
 100 hits a second = 8,640,000 hits a day. I work on a site which does
   100 million dynamic pages a day. In comparison Yahoo probably does
   100,000,000,000 (100 billion) views a day
   if I am interpreting Alexa's charts correctly. Which is about
 1,150,000 a second.
 
 Now considering the site I work on is not even in the top 1000 on
 Alexa, theres a lot of sites out there which need to solve this
 problem I would assume.
 
 There are also only so many hash table lookups a single machine can
 do, even if its a Quad Opteron behemoth.
 
 
  Alex Turner
  netEconomist
 
  On 5/11/05, PFC [EMAIL PROTECTED] wrote:
 
 
 
 
  However, memcached (and for us, pg_memcached) is an excellent way to
  improve
  horizontal scalability by taking disposable data (like session
  information)
  out of the database and putting it in protected RAM.
 
 
  So, what is the advantage of such a system versus, say, a
  sticky
  sessions system where each session is assigned to ONE application
  server
  (not PHP then) which keeps it in RAM as native objects instead of
  serializing and deserializing it on each request ?
  I'd say the sticky sessions should perform a lot better,
  and if one
  machine dies, only the sessions on this one are lost.
  But of course you can't do it with PHP as you need an app
  server which
  can manage sessions. Potentially the savings are huge, though.
 
  On Google, their distributed system spans a huge number of
  PCs and it has
  redundancy, ie. individual PC failure is a normal thing and is a
  part of
  the system, it is handled gracefully. I read a paper on this
  matter, it's
  pretty impressive. The google filesystem has nothing to do with
  databases
  though, it's more a massive data store / streaming storage.
 
  ---(end of
  broadcast)---
  TIP 1: subscribe and unsubscribe commands go to
  [EMAIL PROTECTED]
 
 
 
 
 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread John A Meinel

Alex Turner wrote:
Ok - my common sense alarm is going off here...
There are only 6.446 billion people worldwide.  100 Billion page views
would require every person in the world to view 18 pages of yahoo
every day.  Not very likely.
http://www.internetworldstats.com/stats.htm
suggests that there are around 1 billion people actualy on the internet.
That means each and every person on the internet has to view 100 pages
per day of yahoo.
pretty unlikely IMHO.  I for one don't even use Yahoo ;)
100 million page views per day suggests that 1 in 100 people on the
internet each viewed 10 pages of a site.  Thats a pretty high
percentage if you ask me.
In general I think your point is valid. Just remember that it probably
also matters how you count page views. Because technically images are a
separate page (and this thread did discuss serving up images). So if
there are 20 graphics on a specific page, that is 20 server hits just
for that one page.
I could easily see an image heavy site getting 100 hits / page. Which
starts meaning that if 1M users hit 10 pages, then you get 1M*10*100 = 1G.
I still think 100G views on a single website is a lot, but 100M is
certainly possible.
John
=:-


signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread PFC


100 hits a second = 8,640,000 hits a day. I work on a site which does
100 million dynamic pages a day. In comparison Yahoo probably does
100,000,000,000 (100 billion) views a day
  if I am interpreting Alexa's charts correctly. Which is about  
1,150,000 a second.

Read the help on Alexa's site... ;)
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Josh Berkus

People,

 In general I think your point is valid. Just remember that it probably
 also matters how you count page views. Because technically images are a
 separate page (and this thread did discuss serving up images). So if
 there are 20 graphics on a specific page, that is 20 server hits just
 for that one page.

Also, there's bots and screen-scrapers and RSS, web e-mails, and web services 
and many other things which create hits but are not people.  I'm currently 
working on clickstream for a site which is nowhere in the top 100, and is 
getting 3 million real hits a day ... and we know for a fact that at least 
1/4 of that is bots.

Regardless, the strategy you should be employing for a high traffic site is 
that if your users hit the database for anything other than direct 
interaction (like filling out a webform) then you're lost.Use memcached, 
squid, lighttpd caching, ASP.NET caching, pools, etc.   Keep the load off the 
database except for the stuff that only the database can do.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Stapleton

On 12 May 2005, at 18:33, Josh Berkus wrote:
People,

In general I think your point is valid. Just remember that it  
probably
also matters how you count page views. Because technically images  
are a
separate page (and this thread did discuss serving up images). So if
there are 20 graphics on a specific page, that is 20 server hits just
for that one page.

Also, there's bots and screen-scrapers and RSS, web e-mails, and  
web services
and many other things which create hits but are not people.  I'm  
currently
working on clickstream for a site which is nowhere in the top 100,  
and is
getting 3 million real hits a day ... and we know for a fact that  
at least
1/4 of that is bots.
I doubt bots are generally Alexa toolbar enabled.
Regardless, the strategy you should be employing for a high traffic  
site is
that if your users hit the database for anything other than direct
interaction (like filling out a webform) then you're lost.Use  
memcached,
squid, lighttpd caching, ASP.NET caching, pools, etc.   Keep the  
load off the
database except for the stuff that only the database can do.
This is the aproach I would take as well. There is no point storing  
stuff in a DB, if your only doing direct lookups on it and it isn't  
the sort of data that you care so much about the integrity of.


--
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of  
broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Josh Berkus

Ross,

 Memcached is a PG memory store, I gather,

Nope.  It's a hyperfast resident-in-memory hash that allows you to stash stuff 
like user session information and even materialized query set results.  
Thanks to SeanC, we even have a plugin, pgmemcached.

 but...what is squid, lighttpd? 
 anything directly PG-related?

No.   These are all related to making the web server do more.   The idea is 
NOT to hit the database every time you have to serve up a web page, and 
possibly not to hit the web server either.  For example, you can use squid 3 
for reverse caching in front of your web server, and serve far more page 
views than you could with Apache alone.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Simon Riggs

On Tue, 2005-05-10 at 11:03 +0100, Alex Stapleton wrote:
 So, when/is PG meant to be getting a decent partitioning system?  

ISTM that your question seems to confuse where code comes from. Without
meaning to pick on you, or reply rudely, I'd like to explore that
question. Perhaps it should be a FAQ entry.

All code is written by someone, and those people need to eat. Some
people are fully or partly funded to perform their tasks on this project
(coding, patching, etc). Others contribute their time for a variety of
reasons where involvement has a positive benefit.

You should ask these questions:
- Is anyone currently working on (Feature X)?
- If not, Can I do it myself?
- If not, and I still want it, can I fund someone else to build it for
me?

Asking when is Feature X going to happen is almost certainly going to
get the answer never otherwise, if the initial development is large
and complex. There are many TODO items that have lain untouched for
years, even though adding the feature has been discussed and agreed.

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread David Roussel

For an interesting look at scalability, clustering, caching, etc for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf

They have 2.6 Million active users, posting 200 new blog entries per
minute, plus many comments and countless page views.

Although this system is of a different sort to the type I work on it's
interesting to see how they've made it scale.

They use mysql on dell hardware! And found single master replication did
not scale.  There's a section on multimaster replication, not sure if
they use it.  The main approach they use is to parition users into
spefic database clusters.  Caching is done using memcached at the
application level to avoid hitting the db for rendered pageviews.

It's interesting that the solution livejournal have arrived at is quite
similar in ways to the way google is set up.

David

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Alex Stapleton

On 11 May 2005, at 08:57, David Roussel wrote:
For an interesting look at scalability, clustering, caching, etc for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf
I have implemented similar systems in the past, it's a pretty good  
technique, unfortunately it's not very Plug-and-Play as you have to  
base most of your API on memcached (I imagine MySQLs NDB tables might  
work as well actually) for it to work well.

They have 2.6 Million active users, posting 200 new blog entries per
minute, plus many comments and countless page views.
Although this system is of a different sort to the type I work on it's
interesting to see how they've made it scale.
They use mysql on dell hardware! And found single master  
replication did
not scale.  There's a section on multimaster replication, not sure if
they use it.  The main approach they use is to parition users into
spefic database clusters.  Caching is done using memcached at the
application level to avoid hitting the db for rendered pageviews
I don't think they are storing pre-rendered pages (or bits of) in  
memcached, but are principally storing the data for the pages in it.  
Gluing pages together is not a hugely intensive process usually :)
The only problem with memcached is that the clients clustering/ 
partitioning system will probably break if a node dies, and probably  
get confused if you add new nodes onto it as well. Easily extensible  
clustering (no complete redistribution of data required when you add/ 
remove nodes) with the data distributed across nodes seems to be  
nothing but a pipe dream right now.

It's interesting that the solution livejournal have arrived at is  
quite
similar in ways to the way google is set up.
Don't Google use indexing servers which keep track of where data is?  
So that you only need to update them when you add or move data,  
deletes don't even have to be propagated among indexes immediately  
really because you'll find out if data isn't there when you visit  
where it should be. Or am I talking crap?

David
---(end of  
broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Alex Stapleton

On 11 May 2005, at 09:50, Alex Stapleton wrote:
On 11 May 2005, at 08:57, David Roussel wrote:

For an interesting look at scalability, clustering, caching, etc  
for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf

I have implemented similar systems in the past, it's a pretty good  
technique, unfortunately it's not very Plug-and-Play as you have  
to base most of your API on memcached (I imagine MySQLs NDB tables  
might work as well actually) for it to work well.


They have 2.6 Million active users, posting 200 new blog entries per
minute, plus many comments and countless page views.
Although this system is of a different sort to the type I work on  
it's
interesting to see how they've made it scale.

They use mysql on dell hardware! And found single master  
replication did
not scale.  There's a section on multimaster replication, not sure if
they use it.  The main approach they use is to parition users into
spefic database clusters.  Caching is done using memcached at the
application level to avoid hitting the db for rendered pageviews

I don't think they are storing pre-rendered pages (or bits of) in  
memcached, but are principally storing the data for the pages in  
it. Gluing pages together is not a hugely intensive process usually :)
The only problem with memcached is that the clients clustering/ 
partitioning system will probably break if a node dies, and  
probably get confused if you add new nodes onto it as well. Easily  
extensible clustering (no complete redistribution of data required  
when you add/remove nodes) with the data distributed across nodes  
seems to be nothing but a pipe dream right now.


It's interesting that the solution livejournal have arrived at is  
quite
similar in ways to the way google is set up.

Don't Google use indexing servers which keep track of where data  
is? So that you only need to update them when you add or move data,  
deletes don't even have to be propagated among indexes immediately  
really because you'll find out if data isn't there when you visit  
where it should be. Or am I talking crap?
That will teach me to RTFA first ;) Ok so LJ maintain an index of  
which cluster each user is on, kinda of like google do :)


David
---(end of  
broadcast)---
TIP 8: explain analyze is your friend



---(end of  
broadcast)---
TIP 4: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Christopher Kings-Lynne

Acceptable Answers to 'So, when/is PG meant to be getting a decent  
partitioning system?':

1. Person X is working on it I believe.
2. It's on the list, but nobody has done anything about it yet
3. Your welcome to take a stab at it, I expect the community  would 
support your efforts as well.
4. If you have a huge pile of money you could probably buy the  
Moon. Thinking along those lines, you can probably pay someone to  write 
it for you.
5. It's a stupid idea, and it's never going to work, and heres  
why..

Unacceptable Answers to the same question:
1. Yours.
Be more helpful, and less arrogant please. Everyone else who has  
contributed to this thread has been very helpful in clarifying the  
state of affairs and pointing out what work is and isn't being done,  
and alternatives to just waiting for PG do it for you.
Please YOU be more helpful and less arrogant.  I thought your inital 
email was arrogant, demanding and insulting.  Your followup email has 
done nothing to dispel my impression.  Simon (one of PostgreSQL's major 
contributors AND one of the very few people working on partitioning in 
PostgreSQL, as you requested) told you all the reasons clearly and politely.

Chris
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Tom Lane

Mischa Sandberg [EMAIL PROTECTED] writes:
 So, simplicity dictates something like:

 table pg_remote(schemaname text, connectby text, remoteschema text)

Previous discussion of this sort of thing concluded that we wanted to
follow the SQL-MED standard.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Josh Berkus

David,

 It's interesting that the solution livejournal have arrived at is quite
 similar in ways to the way google is set up.

Yes, although again, they're using memcached as pseudo-clustering software, 
and as a result are limited to what fits in RAM (RAM on 27 machines, but it's 
still RAM).  And due to limitations on memcached, the whole thing blows 
whenever a server goes out (the memcached project is working on this).  But 
any LJ user could tell you that it's a low-availability system.

However, memcached (and for us, pg_memcached) is an excellent way to improve 
horizontal scalability by taking disposable data (like session information) 
out of the database and putting it in protected RAM.  On some websites, 
adding memcached can result is as much as a 60% decrease in database traffic.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Simon Riggs

On Wed, 2005-05-11 at 17:13 +0800, Christopher Kings-Lynne wrote:
  Alex Stapleton wrote
  Be more helpful, and less arrogant please. 
 
 Simon told you all the reasons clearly and politely.

Thanks Chris for your comments.

PostgreSQL can always do with one more developer and my sole intent was
to encourage Alex and other readers to act themselves. If my words seem
arrogant, then I apologise to any and all that think so.

Best Regards, Simon Riggs 



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Jim C. Nasby

On Wed, May 11, 2005 at 08:57:57AM +0100, David Roussel wrote:
 For an interesting look at scalability, clustering, caching, etc for a
 large site have a look at how livejournal did it.
 http://www.danga.com/words/2004_lisa/lisa04.pdf
 
 They have 2.6 Million active users, posting 200 new blog entries per
 minute, plus many comments and countless page views.

Neither of which is that horribly impressive. 200 TPM is less than 4TPS.
While I haven't run high transaction rate databases under PostgreSQL, I
suspect others who have will say that 4TPS isn't that big of a deal.

 Although this system is of a different sort to the type I work on it's
 interesting to see how they've made it scale.
 
 They use mysql on dell hardware! And found single master replication did
 not scale.  There's a section on multimaster replication, not sure if
Probably didn't scale because they used to use MyISAM.

 they use it.  The main approach they use is to parition users into
 spefic database clusters.  Caching is done using memcached at the
Which means they've got a huge amount of additional code complexity, not
to mention how many times you can't post something because 'that cluster
is down for maintenance'.

 application level to avoid hitting the db for rendered pageviews.
Memcached is about the only good thing I've seen come out of
livejournal.

 It's interesting that the solution livejournal have arrived at is quite
 similar in ways to the way google is set up.

Except that unlike LJ, google stays up and it's fast. Though granted, LJ
is quite a bit faster than it was 6 months ago.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread PFC


However, memcached (and for us, pg_memcached) is an excellent way to  
improve
horizontal scalability by taking disposable data (like session  
information)
out of the database and putting it in protected RAM.
	So, what is the advantage of such a system versus, say, a sticky  
sessions system where each session is assigned to ONE application server  
(not PHP then) which keeps it in RAM as native objects instead of  
serializing and deserializing it on each request ?
	I'd say the sticky sessions should perform a lot better, and if one  
machine dies, only the sessions on this one are lost.
	But of course you can't do it with PHP as you need an app server which  
can manage sessions. Potentially the savings are huge, though.

	On Google, their distributed system spans a huge number of PCs and it has  
redundancy, ie. individual PC failure is a normal thing and is a part of  
the system, it is handled gracefully. I read a paper on this matter, it's  
pretty impressive. The google filesystem has nothing to do with databases  
though, it's more a massive data store / streaming storage.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread John A Meinel

Alex Stapleton wrote:
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded your  box
and tuned the conf files as much as you can? But your query load  is
just too high for a single machine?
Upgrading stock Dell boxes (I know we could be using better machines,
but I am trying to tackle the real issue) is not a hugely price
efficient way of getting extra performance, nor particularly scalable
in the long term.
Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is
far away from Big Iron. I don't know what performance you are looking
for, but you can easily get into inserting 10M rows/day with quality
hardware.
But actually is it your SELECT load that is too high, or your INSERT
load, or something inbetween.
Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php
Basically, Slony is a Master/Slave replication system. So if you have
INSERT going into the Master, you can have as many replicated slaves,
which can handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.
This would require some application level support, since an INSERT goes
to a different place than a SELECT. But there has been some discussion
about pg_pool being able to spread the query load, and having it be
aware of the difference between a SELECT and an INSERT and have it route
the query to the correct host. The biggest problem being that functions
could cause a SELECT func() to actually insert a row, which pg_pool
wouldn't know about. There are 2 possible solutions, a) don't do that
when you are using this system, b) add some sort of comment hint so that
pg_pool can understand that the select is actually an INSERT, and needs
to be done on the master.
So, when/is PG meant to be getting a decent partitioning system?  MySQL
is getting one (eventually) which is apparently meant to be  similiar to
Oracle's according to the docs. Clusgres does not appear  to be
widely/or at all used, and info on it seems pretty thin on the  ground,
so I am
not too keen on going with that. Is the real solution to multi- machine
partitioning (as in, not like MySQLs MERGE tables) on  PostgreSQL
actually doing it in our application API? This seems like  a less than
perfect solution once we want to add redundancy and  things into the mix.
There is also PGCluster
http://pgfoundry.org/projects/pgcluster/
Which is trying to be more of a Synchronous multi-master system. I
haven't heard of Clusgres, so I'm guessing it is an older attempt, which
has been overtaken by pgcluster.
Just realize that clusters don't necessarily scale like you would want
them too. Because at some point you have to insert into the same table,
which means you need to hold a lock which prevents the other machine
from doing anything. And with synchronous replication, you have to wait
for all of the machines to get a copy of the data before you can say it
has been committed, which does *not* scale well with the number of machines.
If you can make it work, I think having a powerful master server, who
can finish an INSERT quickly, and then having a bunch of Slony slaves
with a middleman (like pg_pool) to do load balancing among them, is the
best way to scale up. There are still some requirements, like not having
to see the results of an INSERT instantly (though if you are using
hinting to pg_pool, you could hint that this query must be done on the
master, realizing that the more you do it, the more you slow everything
down).
John
=:-
PS I don't know what functionality has been actually implemented in
pg_pool, just that it was discussed in the past. Slony-II is also in the
works.


signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Adam Haberlach


I think that perhaps he was trying to avoid having to buy Big Iron at all.

With all the Opteron v. Xeon around here, and talk of $30,000 machines,
perhaps it would be worth exploring the option of buying 10 cheapass
machines for $300 each.  At the moment, that $300 buys you, from Dell, a
2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet.
The aggregate CPU and bandwidth is pretty stupendous, but not as easy to
harness as a single machine.

For those of us looking at batch and data warehousing applications, it would
be really handy to be able to partition databases, tables, and processing
load across banks of cheap hardware.

Yes, clustering solutions can distribute the data, and can even do it on a
per-table basis in some cases.  This still leaves it up to the application's
logic to handle reunification of the data.

Ideas:
1. Create a table/storage type that consists of a select statement
on another machine.  While I don't think the current executor is capable of
working on multiple nodes of an execution tree at the same time, it would be
great if it could offload a select of tuples from a remote table to an
entirely different server and merge the resulting data into the current
execution.  I believe MySQL has this, and Oracle may implement it in another
way.

2. There is no #2 at this time, but I'm sure one can be
hypothesized.

...Google and other companies have definitely proved that one can harness
huge clusters of cheap hardware.  It can't be _that_ hard, can it.  :)


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John A Meinel
Sent: Tuesday, May 10, 2005 7:41 AM
To: Alex Stapleton
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Partitioning / Clustering

Alex Stapleton wrote:
 What is the status of Postgres support for any sort of multi-machine 
 scaling support? What are you meant to do once you've upgraded your  
 box and tuned the conf files as much as you can? But your query load  
 is just too high for a single machine?

 Upgrading stock Dell boxes (I know we could be using better machines, 
 but I am trying to tackle the real issue) is not a hugely price 
 efficient way of getting extra performance, nor particularly scalable 
 in the long term.

Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far
away from Big Iron. I don't know what performance you are looking for, but
you can easily get into inserting 10M rows/day with quality hardware.

But actually is it your SELECT load that is too high, or your INSERT load,
or something inbetween.

Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php

Basically, Slony is a Master/Slave replication system. So if you have INSERT
going into the Master, you can have as many replicated slaves, which can
handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.

This would require some application level support, since an INSERT goes to a
different place than a SELECT. But there has been some discussion about
pg_pool being able to spread the query load, and having it be aware of the
difference between a SELECT and an INSERT and have it route the query to the
correct host. The biggest problem being that functions could cause a SELECT
func() to actually insert a row, which pg_pool wouldn't know about. There
are 2 possible solutions, a) don't do that when you are using this system,
b) add some sort of comment hint so that pg_pool can understand that the
select is actually an INSERT, and needs to be done on the master.


 So, when/is PG meant to be getting a decent partitioning system?  
 MySQL is getting one (eventually) which is apparently meant to be  
 similiar to Oracle's according to the docs. Clusgres does not appear  
 to be widely/or at all used, and info on it seems pretty thin on the  
 ground, so I am not too keen on going with that. Is the real solution 
 to multi- machine partitioning (as in, not like MySQLs MERGE tables) 
 on  PostgreSQL actually doing it in our application API? This seems 
 like  a less than perfect solution once we want to add redundancy and  
 things into the mix.

There is also PGCluster
http://pgfoundry.org/projects/pgcluster/

Which is trying to be more of a Synchronous multi-master system. I haven't
heard of Clusgres, so I'm guessing it is an older attempt, which has been
overtaken by pgcluster.

Just realize that clusters don't necessarily scale like you would want them
too. Because at some point you have to insert into the same table, which
means you need to hold a lock which prevents the other machine from doing
anything. And with synchronous replication, you have to wait for all of the
machines to get a copy of the data before you can say it has been committed,
which does *not* scale well with the number of machines.

If you can

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Alex Stapleton

On 10 May 2005, at 15:41, John A Meinel wrote:
Alex Stapleton wrote:
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded  
your  box
and tuned the conf files as much as you can? But your query load  is
just too high for a single machine?

Upgrading stock Dell boxes (I know we could be using better machines,
but I am trying to tackle the real issue) is not a hugely price
efficient way of getting extra performance, nor particularly scalable
in the long term.
Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is
far away from Big Iron. I don't know what performance you are looking
for, but you can easily get into inserting 10M rows/day with quality
hardware.
Better hardware = More Efficient != More Scalable
But actually is it your SELECT load that is too high, or your INSERT
load, or something inbetween.
Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php
Basically, Slony is a Master/Slave replication system. So if you have
INSERT going into the Master, you can have as many replicated slaves,
which can handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.
snip

So, when/is PG meant to be getting a decent partitioning system?   
MySQL
is getting one (eventually) which is apparently meant to be   
similiar to
Oracle's according to the docs. Clusgres does not appear  to be
widely/or at all used, and info on it seems pretty thin on the   
ground,
so I am
not too keen on going with that. Is the real solution to multi-  
machine
partitioning (as in, not like MySQLs MERGE tables) on  PostgreSQL
actually doing it in our application API? This seems like  a less  
than
perfect solution once we want to add redundancy and  things into  
the mix.

There is also PGCluster
http://pgfoundry.org/projects/pgcluster/
Which is trying to be more of a Synchronous multi-master system. I
haven't heard of Clusgres, so I'm guessing it is an older attempt,  
which
has been overtaken by pgcluster.

Just realize that clusters don't necessarily scale like you would want
them too. Because at some point you have to insert into the same  
table,
which means you need to hold a lock which prevents the other machine
from doing anything. And with synchronous replication, you have to  
wait
for all of the machines to get a copy of the data before you can  
say it
has been committed, which does *not* scale well with the number of  
machines.
This is why I mention partitioning. It solves this issue by storing  
different data sets on different machines under the same schema.  
These seperate chunks of the table can then be replicated as well for  
data redundancy and so on. MySQL are working on these things, but PG  
just has a bunch of third party extensions, I wonder why these are  
not being integrated into the main trunk :/ Thanks for pointing me to  
PGCluster though. It looks like it should be better than Slony at least.

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Alex Stapleton

On 10 May 2005, at 16:02, Adam Haberlach wrote:
I think that perhaps he was trying to avoid having to buy Big  
Iron at all.
You would be right. Although we are not against paying a bit more  
than $300 for a server ;)

With all the Opteron v. Xeon around here, and talk of $30,000  
machines,
perhaps it would be worth exploring the option of buying 10 cheapass
machines for $300 each.  At the moment, that $300 buys you, from  
Dell, a
2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit  
ethernet.
The aggregate CPU and bandwidth is pretty stupendous, but not as  
easy to
harness as a single machine.
snip
Yes, clustering solutions can distribute the data, and can even do  
it on a
per-table basis in some cases.  This still leaves it up to the  
application's
logic to handle reunification of the data.
If your going to be programming that sort of logic into your API in  
the beginning, it's not too much more work to add basic replication,  
load balancing and partitioning into it either. But the DB should be  
able to do it for you, adding that stuff in later is often more  
difficult and less likely to get done.

Ideas:
1. Create a table/storage type that consists of a select statement
on another machine.  While I don't think the current executor is  
capable of
working on multiple nodes of an execution tree at the same time, it  
would be
great if it could offload a select of tuples from a remote table to an
entirely different server and merge the resulting data into the  
current
execution.  I believe MySQL has this, and Oracle may implement it  
in another
way.
MySQL sort of has this, it's not as good as Oracle's though.  
Apparently there is a much better version of it in 5.1 though, that  
should make it to stable sometime next year I imagine.

2. There is no #2 at this time, but I'm sure one can be
hypothesized.
I would of thought a particularly smart version of pg_pool could do  
it. It could partition data to different servers if it knew which  
columns to key by on each table.

...Google and other companies have definitely proved that one can  
harness
huge clusters of cheap hardware.  It can't be _that_ hard, can it.  :)
I shudder to think how much the Big Iron equivalent of a google  
data-center would cost.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John A  
Meinel
Sent: Tuesday, May 10, 2005 7:41 AM
To: Alex Stapleton
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Partitioning / Clustering

Alex Stapleton wrote:
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded your
box and tuned the conf files as much as you can? But your query load
is just too high for a single machine?
Upgrading stock Dell boxes (I know we could be using better machines,
but I am trying to tackle the real issue) is not a hugely price
efficient way of getting extra performance, nor particularly scalable
in the long term.
Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell  
is far
away from Big Iron. I don't know what performance you are looking  
for, but
you can easily get into inserting 10M rows/day with quality hardware.

But actually is it your SELECT load that is too high, or your  
INSERT load,
or something inbetween.

Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php
Basically, Slony is a Master/Slave replication system. So if you  
have INSERT
going into the Master, you can have as many replicated slaves,  
which can
handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.

This would require some application level support, since an INSERT  
goes to a
different place than a SELECT. But there has been some discussion  
about
pg_pool being able to spread the query load, and having it be aware  
of the
difference between a SELECT and an INSERT and have it route the  
query to the
correct host. The biggest problem being that functions could cause  
a SELECT
func() to actually insert a row, which pg_pool wouldn't know about.  
There
are 2 possible solutions, a) don't do that when you are using this  
system,
b) add some sort of comment hint so that pg_pool can understand  
that the
select is actually an INSERT, and needs to be done on the master.


So, when/is PG meant to be getting a decent partitioning system?
MySQL is getting one (eventually) which is apparently meant to be
similiar to Oracle's according to the docs. Clusgres does not appear
to be widely/or at all used, and info on it seems pretty thin on the
ground, so I am not too keen on going with that. Is the real solution
to multi- machine partitioning (as in, not like MySQLs MERGE tables)
on  PostgreSQL actually doing it in our application API? This seems
like  a less than perfect solution once we want to add

1 2 >

1 - 100 of 127 matches

Mail list logo