Re: [PERFORM] Large rows number, and large objects

2011-07-20 Thread Jose Ildefonso Camargo Tolosa
On Tue, Jul 19, 2011 at 3:57 PM, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Jun 19, 2011 at 10:19 PM, Jose Ildefonso Camargo Tolosa
 ildefonso.cama...@gmail.com wrote:
  So, the question is, if I were to store 8TB worth of data into large
  objects system, it would actually make the pg_largeobject table slow,
  unless it was automatically partitioned.

 I think it's a bit of an oversimplification to say that large,
 unpartitioned tables are automatically going to be slow.  Suppose you
 had 100 tables that were each 80GB instead of one table that is 8TB.
 The index lookups would be a bit faster on the smaller tables, but it
 would take you some non-zero amount of time to figure out which index
 to read in the first place.  It's not clear that you are really
 gaining all that much.


Certainly but it is still very blurry to me on *when* it is better to
partition than not.



 Many of the advantages of partitioning have to do with maintenance
 tasks.  For example, if you gather data on a daily basis, it's faster
 to drop the partition that contains Thursday's data than it is to do a
 DELETE that finds the rows and deletes them one at a time.  And VACUUM
 can be a problem on very large tables as well, because only one VACUUM
 can run on a table at any given time.  If the frequency with which the
 table needs to be vacuumed is less than the time it takes for VACUUM
 to complete, then you've got a problem.


And pg_largeobject table doesn't get vacuumed? I mean, isn't that table
just as any other table?



 But I think that if we want to optimize pg_largeobject, we'd probably
 gain a lot more by switching to a different storage format than we
 could ever gain by partitioning the table.  For example, we might
 decide that any object larger than 16MB should be stored in its own
 file.  Even somewhat smaller objects would likely benefit from being
 stored in larger chunks - say, a bunch of 64kB chunks, with any
 overage stored in the 2kB chunks we use now.  While this might be an
 interesting project, it's probably not going to be anyone's top
 priority, because it would be a lot of work for the amount of benefit
 you'd get.  There's an easy workaround: store the files in the
 filesystem, and a path to those files in the database.


Ok, one reason for storing a file *in* the DB is to be able to do PITR of a
wrongly deleted files (or overwritten, and that kind of stuff), on the
filesystem level you would need a versioning filesystem (and I don't, yet,
know any that is stable in the Linux world).

Also, you can use streaming replication and at the same time you stream your
data, your files are also streamed to a secondary server (yes, on the
FS-level you could use drbd or similar).

Ildefonso.


Re: [PERFORM] Large rows number, and large objects

2011-07-20 Thread Robert Haas
On Wed, Jul 20, 2011 at 11:57 AM, Jose Ildefonso Camargo Tolosa
ildefonso.cama...@gmail.com wrote:
 On Tue, Jul 19, 2011 at 3:57 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jun 19, 2011 at 10:19 PM, Jose Ildefonso Camargo Tolosa
 ildefonso.cama...@gmail.com wrote:
  So, the question is, if I were to store 8TB worth of data into large
  objects system, it would actually make the pg_largeobject table slow,
  unless it was automatically partitioned.

 I think it's a bit of an oversimplification to say that large,
 unpartitioned tables are automatically going to be slow.  Suppose you
 had 100 tables that were each 80GB instead of one table that is 8TB.
 The index lookups would be a bit faster on the smaller tables, but it
 would take you some non-zero amount of time to figure out which index
 to read in the first place.  It's not clear that you are really
 gaining all that much.

 Certainly but it is still very blurry to me on *when* it is better to
 partition than not.

I think that figuring that out is as much an art as it is a science.
It's better to partition when most of your queries are going to touch
only a single partition; when you are likely to want to remove
partitions in their entirety; when VACUUM starts to have trouble
keeping up... but the reality is that in some cases you probably have
to try it both ways and see which one works better.

 Many of the advantages of partitioning have to do with maintenance
 tasks.  For example, if you gather data on a daily basis, it's faster
 to drop the partition that contains Thursday's data than it is to do a
 DELETE that finds the rows and deletes them one at a time.  And VACUUM
 can be a problem on very large tables as well, because only one VACUUM
 can run on a table at any given time.  If the frequency with which the
 table needs to be vacuumed is less than the time it takes for VACUUM
 to complete, then you've got a problem.

 And pg_largeobject table doesn't get vacuumed? I mean, isn't that table
 just as any other table?

Yes, it is.  So, I agree: putting 8TB of data in there is probably
going to hurt.

 But I think that if we want to optimize pg_largeobject, we'd probably
 gain a lot more by switching to a different storage format than we
 could ever gain by partitioning the table.  For example, we might
 decide that any object larger than 16MB should be stored in its own
 file.  Even somewhat smaller objects would likely benefit from being
 stored in larger chunks - say, a bunch of 64kB chunks, with any
 overage stored in the 2kB chunks we use now.  While this might be an
 interesting project, it's probably not going to be anyone's top
 priority, because it would be a lot of work for the amount of benefit
 you'd get.  There's an easy workaround: store the files in the
 filesystem, and a path to those files in the database.

 Ok, one reason for storing a file *in* the DB is to be able to do PITR of a
 wrongly deleted files (or overwritten, and that kind of stuff), on the
 filesystem level you would need a versioning filesystem (and I don't, yet,
 know any that is stable in the Linux world).

 Also, you can use streaming replication and at the same time you stream your
 data, your files are also streamed to a secondary server (yes, on the
 FS-level you could use drbd or similar).

Well, those are good arguments for putting the functionality in the
database and making it all play nicely with write-ahead logging.  But
nobody's felt motivated to write the code yet, so...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Large rows number, and large objects

2011-07-20 Thread Andrzej Nakonieczny

W dniu 20.07.2011 17:57, Jose Ildefonso Camargo Tolosa pisze:

[...]


Many of the advantages of partitioning have to do with maintenance
tasks.  For example, if you gather data on a daily basis, it's faster
to drop the partition that contains Thursday's data than it is to do a
DELETE that finds the rows and deletes them one at a time.  And VACUUM
can be a problem on very large tables as well, because only one VACUUM
can run on a table at any given time.  If the frequency with which the
table needs to be vacuumed is less than the time it takes for VACUUM
to complete, then you've got a problem.


And pg_largeobject table doesn't get vacuumed? I mean, isn't that
table just as any other table?


Vacuum is a real problem on big pg_largeobject table. I have 1.6 TB 
database mostly with large objects and vacuuming that table on fast SAN 
takes about 4 hours:


now  |start|   time   |  datname   | 
   current_query

-+-+--++--
 2011-07-20 20:12:03 | 2011-07-20 16:21:20 | 03:50:43 | bigdb  | 
autovacuum: VACUUM pg_catalog.pg_largeobject

(1 row)


LO generates a lot of dead tuples when object are adding:

 relname  | n_dead_tup
--+
 pg_largeobject   | 246731

Adding LO is very fast when table is vacuumed. But when there is a lot 
of dead tuples adding LO is very slow (50-100 times slower) and eats 
100% of CPU.


It looks that better way is writing object directly as a bytea on 
paritioned tables althought it's a bit slower than LO interface on a 
vacuumed table.



Regards,
Andrzej

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance