Re: [PERFORM] perf problem with huge table

2010-02-11 Thread Leo Mannhart
Dave Crooke wrote:
 On Wed, Feb 10, 2010 at 5:30 PM, Jon Lewison jlewis...@gmail.com
 mailto:jlewis...@gmail.com wrote:
 
 
 
 Just a nit, but Oracle implements MVCC.  90% of the databases out
 there do.
 
 
 Sorry, I spoke imprecisely. What I meant was the difference in how the
 rows are stored internally  in Oracle, the main tablespace contains
 only the newest version of a row, which is (where possible) updated in
 place - queries in a transaction that can still see an older version
 have to pull it from the UNDO tablespace (rollback segments in Oracle 8
 and older).
  
 In Postgres, all versions of all rows are in the main table, and have
 validity ranges associated with them (this version of this row existed
 between transaction ids x and y). Once a version goes out of scope, it
 has to be garbage collected by the vacuuming process so the space can be
 re-used.
 
 In general, this means Oracle is faster *if* you're only doing lots of
 small transactions (consider how these different models handle an update
 to a single field in a single row) but it is more sensitive to the scale
 of transactions  doing a really big transaction against a database
 with an OLTP workload can upset Oracle's digestion as it causes a lot of
 UNDO lookups, PG's performance is a lot more predictable in this regard.
 
 Both models have benefits and drawbacks ... when designing a schema for
 performance it's important to understand these differences.
 
 
 I find partitioning pretty useful in this scenario if the data
 allows is.  Aging out data just means dropping a partition rather
 than a delete statement.
 
 
 Forgot to say this - yes, absolutely agree  dropping a table is a
 lot cheaper than a transactional delete.
 
 In general, I think partitioning is more important / beneficial with
 PG's style of MVCC than with Oracle or SQL-Server (which I think is
 closer to Oracle than PG).

I would like to disagree here a little bit

Where Oracle's table partitioning is coming in very handy is for example
when you have to replace the data of a big (read-only) table on a
regularly basis (typically the replicated data from another system).
In this case, you just create a partitioned table of exact the same
columns/indexes whatsoever as the data table.

To load, you then do load the data into the partitioned table, i.e.
- truncate the partitioned table, disable constraints, drop indexes
- load the data into the partitioned table
- rebuild all indexes etc. on the partitioned table

during all this time (even if it takes hours) the application can still
access the data in the data table without interfering the bulk load.

Once you have prepared the data in the partitioned table, you
- exchange the partition with the data table
wich is a dictionary operation, that means, the application is (if ever)
only blocked during this operation which is in the sub-seconds range.

If you have to do this with convetional updates or deletes/inserts resp.
then this might not even be possible in the given timeframe.

just as an example
Leo

p.s. just to make it a little bit clearer about the famous ORA-01555:
Oracle is not forgetting the data as the Oracle RDBMS is of course
also ACID-compliant. The ORA-01555 can happen

- when the rollback tablespace is really to small to hold all the data
changed in the transaction (which I consider a configuration error)

- when a long running (read) transaction is trying to change a record
which is already updated AND COMMITTED by another transaction. The key
here is, that a second transaction has changed a record which is also
needed by the first transaction and the second transaction commited the
work. Committing the change means, the data in the rollback segment is
no longer needed, as it can be read directly from the data block (after
all it is commited and this means valid and visible to other
transactions). If the first transaction now tries to read the data from
the rollback segment to see the unchanged state, it will still succeed
(it is still there, nothing happend until now to the rollback segment).
The problem of the ORA-01555 shows up only, if now a third transaction
needs space in the rollback segment. As the entry from the first/second
transaction is marked committed (and therefore no longer needed), it is
perfectly valid for transaction #3 to grab this rollback segment and to
store its old value there. If THEN (and only then) comes transaction #1
again, asking for the old, unchanged value when the transaction started,
THEN the famous ORA-01555 is raised as this value is now overwritten by
transaction #3.
Thats why in newer versions you have to set the retention time of the
rollback blocks/segments to a value bigger than your expected longest
transaction. This will decrease the likelihood of the ORA-01555
drastically (but it is still not zero, as you could easily construct an
example where it still will fail with ORA-0155 as a transaction can
still run longer than you 

Re: [PERFORM] some problems when i use postgresql 8.4.2 in my projects .

2010-02-03 Thread Leo Mannhart
wyx6...@sina.com wrote:
 after shaming , I think i should pick out some my points:
  the unique constraints actualy kill concurrency write transaction when
 concurrency insert violate the unique constraints , they block each
 other , i test this in oracle10g, has the same behavour. I think this
 may be reasonable because the uqniue check must be  the seriazable check .
 for resolve this problem , i do the unique check in application as
 possible , but in big concurrency env , this is not good way .
 

How can you enforce uniqueness in the application? If you implement it
correctly, you need considerably longer than letting it do PostgreSQL.
Even if you use some kind of magic, I could not imagine, how you can
implement a unique constraint in the application and gaurantee
uniqueness while at the same time be faster than the RDBMS.

Leo

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Leo Mannhart
Carlo Stonebanks wrote:
 Our DB has an audit table which is 500M rows and growing. (FYI the
 objects being audited are grouped semantically, not individual field
 values).
 
 Recently we wanted to add a new feature and we altered the table to add
 a new column. We are backfilling this varchar(255) column by writing a
 TCL script to page through the rows (where every update is a UPDATE ...
 WHERE id = x AND id  x+10 and a commit is performed after every 1000
 updates statement, i.e. every 1 rows.)
 
 We have 10 columns, six of which are indexed. Rough calculations suggest
 that this will take two to three weeks to complete on an 8-core CPU with
 more than enough memory.
 
 As a ballpark estimate - is this sort of performance for an 500M updates
 what one would expect of PG given the table structure (detailed below)
 or should I dig deeper to look for performance issues?
 
 As always, thanks!
 
 Carlo
 

If it is possible to lock this audit table exclusively (may be during
off peak hours) I would look into
- create new_audit_table as select col1, col2, col3 ... col9,
'new_col_value' from old_audit_table;
- create all indexes
- drop old_audit_table
- rename new_audit_table to old_audit_table

That is probably the fasted method you can do, even if you have to join
the new_col_value from an extra helper-table with the correspondig id.
Remeber, databases are born to join.

You could also try to just update the whole table in one go, it is
probably faster than you expect.

just a thought
Leo

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread Leo Mannhart
Kevin Grittner wrote:
 Leo Mannhart leo.mannh...@beecom.ch wrote:
  
 You could also try to just update the whole table in one go, it is
 probably faster than you expect.
  
 That would, of course, bloat the table and indexes horribly.  One
 advantage of the incremental approach is that there is a chance for
 autovacuum or scheduled vacuums to make space available for re-use
 by subsequent updates.
  
 -Kevin
 

ouch...
thanks for correcting this.
... and forgive an old man coming from Oracle ;)

Leo

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance