Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-19 Thread Karsten Hilbert
On Sat, Aug 18, 2007 at 10:54:11PM -0700, Vance Maverick wrote:

 This is my experience with a Java client too.  Writing the data with
 PreparedStatement.setBinaryStream works great for long strings, but
 reading it with the complementary method ResultSet.getBinaryStream runs
 into the memory problem, killing the Java VM.
Again, I am observing this under Python with a libpq-based driver
running on Windows and during retrieval only.

Are we seeing a pattern ?

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Tom Lane
Vance Maverick [EMAIL PROTECTED] writes:
 My question is about performance in the postgres server.  When I execute
 SELECT SUBSTRING (my_bytea FROM ? FOR ?) FROM my_table WHERE id =3D ?,
 does it fetch the whole BYTEA into memory?  Or does it access only the
 pages that contain the requested substring?

Recent releases will do what you want if the column has been marked
SET STORAGE EXTERNAL (before storing anything in it...)  See the
ALTER TABLE reference page.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Lane wrote:
 Vance Maverick [EMAIL PROTECTED] writes:
 My question is about performance in the postgres server.  When I execute
 SELECT SUBSTRING (my_bytea FROM ? FOR ?) FROM my_table WHERE id =3D ?,
 does it fetch the whole BYTEA into memory?  Or does it access only the
 pages that contain the requested substring?
 
 Recent releases will do what you want if the column has been marked
 SET STORAGE EXTERNAL (before storing anything in it...)  See the
 ALTER TABLE reference page.

Should we consider setting storage external by default for the type?

Joshua D. Drake

 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxx8CATb/zqfZUUQRAkXkAJ4i6GKe7/v8dHOaj8fjTvc2hZZN4wCgknFQ
VVaj655AAbKFipfFNcAbdos=
=jdgD
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes:
 Should we consider setting storage external by default for the type?

No.  That would be counterproductive for the more typical case of bytea
values in the range of some-small-number-of-kilobytes.  Or at least
I think that's more typical than values that are so large you have to go
out of your way to fetch them in chunks.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Lane wrote:
 Joshua D. Drake [EMAIL PROTECTED] writes:
 Should we consider setting storage external by default for the type?
 
 No.  That would be counterproductive for the more typical case of bytea
 values in the range of some-small-number-of-kilobytes.  Or at least
 I think that's more typical than values that are so large you have to go
 out of your way to fetch them in chunks.

The typical case that I run into with Bytea is storing enough
information to where this would be useful. Specifically items such as
pdf and .doc.

Sincerely,

Joshua D. Drake

 
   regards, tom lane
 


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxyYLATb/zqfZUUQRAq5gAJsGFIHglJGcGjqjNc92G6Wt2U+cwQCghGMV
181pA78JUFIfpepzXLY1eK0=
=GJGH
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Karsten Hilbert
On Sat, Aug 18, 2007 at 12:20:42PM -0400, Tom Lane wrote:

 Vance Maverick [EMAIL PROTECTED] writes:
  My question is about performance in the postgres server.  When I execute
  SELECT SUBSTRING (my_bytea FROM ? FOR ?) FROM my_table WHERE id =3D ?,
  does it fetch the whole BYTEA into memory?  Or does it access only the
  pages that contain the requested substring?
 
 Recent releases will do what you want if the column has been marked
 SET STORAGE EXTERNAL (before storing anything in it...)  See the
 ALTER TABLE reference page.
Ah, thanks, good to know !

Recent releases seems to mean at least as far back as 8.1
going by the docs.

Now, to convert an existing bytea column I would need to add
a new bytea column with set storage external, move the
data from the old column to the new column, remove the old
column, and give the new column the original name, correct ?

Or is the an easier way ?

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org/


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Joshua D. Drake
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Karsten Hilbert wrote:
 On Sat, Aug 18, 2007 at 12:20:42PM -0400, Tom Lane wrote:
 
 Vance Maverick [EMAIL PROTECTED] writes:
 My question is about performance in the postgres server.  When I execute
 SELECT SUBSTRING (my_bytea FROM ? FOR ?) FROM my_table WHERE id =3D ?,
 does it fetch the whole BYTEA into memory?  Or does it access only the
 pages that contain the requested substring?
 Recent releases will do what you want if the column has been marked
 SET STORAGE EXTERNAL (before storing anything in it...)  See the
 ALTER TABLE reference page.
 Ah, thanks, good to know !
 
 Recent releases seems to mean at least as far back as 8.1
 going by the docs.
 
 Now, to convert an existing bytea column I would need to add
 a new bytea column with set storage external, move the
 data from the old column to the new column, remove the old
 column, and give the new column the original name, correct ?
 
 Or is the an easier way ?

Well this is a guess, but:

Set existing column to storage external
update existing column with existing data:

UPDATE foo SET bar = bar;

Now the down side to this is you are going to create a dead row for
every update which means a vacuum (probably full) afterward, but the way
you describe above will do the same thing as well.

Sincerely,

Joshua D. Drake


 
 Karsten


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGxyseATb/zqfZUUQRAn7AAJ9jzhsOb8xoy9QWoI6yfNV4cO9Z3gCeJG6W
n3Z0uaYp5d6QGoFP3O8QJUI=
=fIqx
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Karsten Hilbert
On Sat, Aug 18, 2007 at 12:49:09PM -0400, Tom Lane wrote:

 Joshua D. Drake [EMAIL PROTECTED] writes:
  Should we consider setting storage external by default for the type?
 
 No.  That would be counterproductive for the more typical case of bytea
 values in the range of some-small-number-of-kilobytes.  Or at least
 I think that's more typical than values that are so large you have to go
 out of your way to fetch them in chunks.

Would it be feasible to add an ALTER TABLE mode

... set storage externally-extended cutoff size ...

where size is the user configurable size of the column
data at which PostgreSQL switches from extended to external
storage strategy ?

Such that large bytea values would be chunkable while
smaller ones wouldn't at the discretion of the DBA.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Tom Lane
Karsten Hilbert [EMAIL PROTECTED] writes:
 Would it be feasible to add an ALTER TABLE mode
   ... set storage externally-extended cutoff size ...
 where size is the user configurable size of the column
 data at which PostgreSQL switches from extended to external
 storage strategy ?

Actually, it just occurred to me that this ties into the recent
discussion of compression parameters
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php
(which hasn't gone further than discussion yet).  Perhaps we need
an additional parameter which is a maximum input size to attempt
compression at all.  IOW, the current force_input_size is not
only useless but exactly backwards ...

There was some discussion in that thread (or maybe the earlier
one on -patches) of exposing the lzcompress parameters directly
to users, perhaps as an extended form of the current SET STORAGE
command.  That won't happen for 8.3 but it might later.  In the
meantime, if the defaults included not attempting to compress
multi-megabyte values, I think it'd Just Work for cases like
yours.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Karsten Hilbert
On Sat, Aug 18, 2007 at 10:23:42AM -0700, Joshua D. Drake wrote:

  SET STORAGE EXTERNAL (before storing anything in it...)  See the
  ALTER TABLE reference page.

  Now, to convert an existing bytea column I would need to add
  a new bytea column with set storage external, move the
  data from the old column to the new column, remove the old
  column, and give the new column the original name, correct ?
 
 Set existing column to storage external
 update existing column with existing data:
 
 UPDATE foo SET bar = bar;
 
 Now the down side to this is you are going to create a dead row for
 every update which means a vacuum (probably full) afterward, but the way
 you describe above will do the same thing as well.
Sure.

I was a bit uneasy about the docs saying

set storage doesn't affect existing data but only sets the
 strategy on new inserts/updates

and hence thought using a wholy new column would somehow be
safer. But maybe this can be nefariously interpreted such
that I could sort-of implement cutoff-based
extended/external switching by prepending alter table ...
set storage external/extended ... to INSERTs/UPDATEs based
on bytea parameter size. Or even writing a trigger issuing
ALTER TABLE depending on size of insert ?

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Gregory Stark

Joshua D. Drake [EMAIL PROTECTED] writes:

 Well this is a guess, but:

 Set existing column to storage external
 update existing column with existing data:

 UPDATE foo SET bar = bar;

Well, not quite. That would actually reuse the toast pointer without
decompressing it. We try to be clever about not decompressing and duplicating
toast pointers unnecessarily on updates -- in this case too clever.

You could do this:

postgres=# ALTER TABLE foo ALTER bar TYPE bytea, ALTER bar SET STORAGE external;
ALTER TABLE

(Note that you have to include the 'ALTER bar SET STORAGE external' in the
same command or the storage will get reset to the default 'extended' for bytea
even if it was previously set to 'external'.)

When I tested this though I noticed it did *not* decompress compressed data
which was small enough to store internally. This may actually be desirable for
your case since anything small enough to be stored internally is probably not
worth bothering decompressing so it can be streamed out. It will still not be
compressed next time you update it so it's not really helpful for the long
term.

If you want to decompress everything you have to do something like:

postgres=# ALTER TABLE foo ALTER bar TYPE bytea USING t||'', ALTER bar SET 
STORAGE external;
ALTER TABLE

However note that this will require extra memory for both the decompressed
original value and the new value after appending the empty string.

Another option would be to update only the records which need to be
decompressed with something like

UPDATE foo SET bar=bar||'' WHERE length(bar)  pg_column_size(bar)

This at least gives you the option of doing them in small groups or even one
by one. I would suggest vacuuming between each update.



I do have to wonder how you're getting the data *in* though. If it's large
enough to have to stream out like this then how do you initially load the
data?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Karsten Hilbert
On Sat, Aug 18, 2007 at 09:32:33PM +0100, Gregory Stark wrote:

 I do have to wonder how you're getting the data *in* though. If it's large
 enough to have to stream out like this then how do you initially load the
 data?
Well, in my particular case it isn't so much that I *want*
to access bytea in chunks but rather that under certain
not-yet-pinned-down circumstances windows clients tend to go
out-or-memory on the socket during *retrieval* (insertion is
fine, as is put/get access from Linux clients). Doing
chunked retrieval works on those boxen, too, so it's an
option in our application (the user defines a chunk size
that works, a size of 0 is treated as no-chunking).

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Karsten Hilbert
On Sat, Aug 18, 2007 at 01:51:18PM -0400, Tom Lane wrote:

 Karsten Hilbert [EMAIL PROTECTED] writes:
  Would it be feasible to add an ALTER TABLE mode
  ... set storage externally-extended cutoff size ...
  where size is the user configurable size of the column
  data at which PostgreSQL switches from extended to external
  storage strategy ?
 
 Actually, it just occurred to me that this ties into the recent
 discussion of compression parameters
 http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php
 (which hasn't gone further than discussion yet).  Perhaps we need
 an additional parameter which is a maximum input size to attempt
 compression at all.  IOW, the current force_input_size is not
 only useless but exactly backwards ...

I can see that a maximum size can be relevant for the
decision as to whether to *attempt* compression since large
things compress slowly and may unduly slow down queries.

As well as a minimum size to use compression on, quite
obviously.

OTOH, I'd like to be able to tell PostgreSQL to be so kind
and refrain from attempting to compress values above a
certain size even if it thought it'd make sense.

 There was some discussion in that thread (or maybe the earlier
 one on -patches) of exposing the lzcompress parameters directly
 to users, perhaps as an extended form of the current SET STORAGE
 command.  That won't happen for 8.3 but it might later.  In the
Sounds good.

 meantime, if the defaults included not attempting to compress
 multi-megabyte values, I think it'd Just Work for cases like
 yours.
Not as tweakable as I'd eventually want it but, yes, that
would sort of Just Work.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Gregory Stark

Karsten Hilbert [EMAIL PROTECTED] writes:

 But maybe this can be nefariously interpreted such that I could sort-of
 implement cutoff-based extended/external switching by prepending alter
 table ... set storage external/extended ... to INSERTs/UPDATEs based on
 bytea parameter size. Or even writing a trigger issuing ALTER TABLE
 depending on size of insert ?

I wouldn't suggest doing that. It will bloat the pg_attribute catalog table
and require a lot of extra vacuums. I think it would also create some lock
contention issues.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [GENERAL] SUBSTRING performance for large BYTEA

2007-08-18 Thread Vance Maverick
Karsten Hilbert writes:
 Well, in my particular case it isn't so much that I *want*
 to access bytea in chunks but rather that under certain
 not-yet-pinned-down circumstances windows clients tend to go
 out-or-memory on the socket during *retrieval* (insertion is
 fine, as is put/get access from Linux clients). Doing
 chunked retrieval works on those boxen, too, so it's an
 option in our application (the user defines a chunk size
 that works, a size of 0 is treated as no-chunking).

This is my experience with a Java client too.  Writing the data with
PreparedStatement.setBinaryStream works great for long strings, but
reading it with the complementary method ResultSet.getBinaryStream runs
into the memory problem, killing the Java VM.

Thanks to all for the useful feedback.  I'm going to post a note to the
JDBC list as well to make this easier to find in the future.

Vance

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq