Re: [GENERAL] SUBSTRING performance for large BYTEA
On Sat, Aug 18, 2007 at 10:54:11PM -0700, Vance Maverick wrote: This is my experience with a Java client too. Writing the data with PreparedStatement.setBinaryStream works great for long strings, but reading it with the complementary method ResultSet.getBinaryStream runs into the memory problem, killing the Java VM. Again, I am observing this under Python with a libpq-based driver running on Windows and during retrieval only. Are we seeing a pattern ? Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] SUBSTRING performance for large BYTEA
Vance Maverick [EMAIL PROTECTED] writes: My question is about performance in the postgres server. When I execute SELECT SUBSTRING (my_bytea FROM ? FOR ?) FROM my_table WHERE id =3D ?, does it fetch the whole BYTEA into memory? Or does it access only the pages that contain the requested substring? Recent releases will do what you want if the column has been marked SET STORAGE EXTERNAL (before storing anything in it...) See the ALTER TABLE reference page. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] SUBSTRING performance for large BYTEA
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Lane wrote: Vance Maverick [EMAIL PROTECTED] writes: My question is about performance in the postgres server. When I execute SELECT SUBSTRING (my_bytea FROM ? FOR ?) FROM my_table WHERE id =3D ?, does it fetch the whole BYTEA into memory? Or does it access only the pages that contain the requested substring? Recent releases will do what you want if the column has been marked SET STORAGE EXTERNAL (before storing anything in it...) See the ALTER TABLE reference page. Should we consider setting storage external by default for the type? Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGxx8CATb/zqfZUUQRAkXkAJ4i6GKe7/v8dHOaj8fjTvc2hZZN4wCgknFQ VVaj655AAbKFipfFNcAbdos= =jdgD -END PGP SIGNATURE- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SUBSTRING performance for large BYTEA
Joshua D. Drake [EMAIL PROTECTED] writes: Should we consider setting storage external by default for the type? No. That would be counterproductive for the more typical case of bytea values in the range of some-small-number-of-kilobytes. Or at least I think that's more typical than values that are so large you have to go out of your way to fetch them in chunks. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] SUBSTRING performance for large BYTEA
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Should we consider setting storage external by default for the type? No. That would be counterproductive for the more typical case of bytea values in the range of some-small-number-of-kilobytes. Or at least I think that's more typical than values that are so large you have to go out of your way to fetch them in chunks. The typical case that I run into with Bytea is storing enough information to where this would be useful. Specifically items such as pdf and .doc. Sincerely, Joshua D. Drake regards, tom lane - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGxyYLATb/zqfZUUQRAq5gAJsGFIHglJGcGjqjNc92G6Wt2U+cwQCghGMV 181pA78JUFIfpepzXLY1eK0= =GJGH -END PGP SIGNATURE- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] SUBSTRING performance for large BYTEA
On Sat, Aug 18, 2007 at 12:20:42PM -0400, Tom Lane wrote: Vance Maverick [EMAIL PROTECTED] writes: My question is about performance in the postgres server. When I execute SELECT SUBSTRING (my_bytea FROM ? FOR ?) FROM my_table WHERE id =3D ?, does it fetch the whole BYTEA into memory? Or does it access only the pages that contain the requested substring? Recent releases will do what you want if the column has been marked SET STORAGE EXTERNAL (before storing anything in it...) See the ALTER TABLE reference page. Ah, thanks, good to know ! Recent releases seems to mean at least as far back as 8.1 going by the docs. Now, to convert an existing bytea column I would need to add a new bytea column with set storage external, move the data from the old column to the new column, remove the old column, and give the new column the original name, correct ? Or is the an easier way ? Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] SUBSTRING performance for large BYTEA
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Karsten Hilbert wrote: On Sat, Aug 18, 2007 at 12:20:42PM -0400, Tom Lane wrote: Vance Maverick [EMAIL PROTECTED] writes: My question is about performance in the postgres server. When I execute SELECT SUBSTRING (my_bytea FROM ? FOR ?) FROM my_table WHERE id =3D ?, does it fetch the whole BYTEA into memory? Or does it access only the pages that contain the requested substring? Recent releases will do what you want if the column has been marked SET STORAGE EXTERNAL (before storing anything in it...) See the ALTER TABLE reference page. Ah, thanks, good to know ! Recent releases seems to mean at least as far back as 8.1 going by the docs. Now, to convert an existing bytea column I would need to add a new bytea column with set storage external, move the data from the old column to the new column, remove the old column, and give the new column the original name, correct ? Or is the an easier way ? Well this is a guess, but: Set existing column to storage external update existing column with existing data: UPDATE foo SET bar = bar; Now the down side to this is you are going to create a dead row for every update which means a vacuum (probably full) afterward, but the way you describe above will do the same thing as well. Sincerely, Joshua D. Drake Karsten - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGxyseATb/zqfZUUQRAn7AAJ9jzhsOb8xoy9QWoI6yfNV4cO9Z3gCeJG6W n3Z0uaYp5d6QGoFP3O8QJUI= =fIqx -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] SUBSTRING performance for large BYTEA
On Sat, Aug 18, 2007 at 12:49:09PM -0400, Tom Lane wrote: Joshua D. Drake [EMAIL PROTECTED] writes: Should we consider setting storage external by default for the type? No. That would be counterproductive for the more typical case of bytea values in the range of some-small-number-of-kilobytes. Or at least I think that's more typical than values that are so large you have to go out of your way to fetch them in chunks. Would it be feasible to add an ALTER TABLE mode ... set storage externally-extended cutoff size ... where size is the user configurable size of the column data at which PostgreSQL switches from extended to external storage strategy ? Such that large bytea values would be chunkable while smaller ones wouldn't at the discretion of the DBA. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] SUBSTRING performance for large BYTEA
Karsten Hilbert [EMAIL PROTECTED] writes: Would it be feasible to add an ALTER TABLE mode ... set storage externally-extended cutoff size ... where size is the user configurable size of the column data at which PostgreSQL switches from extended to external storage strategy ? Actually, it just occurred to me that this ties into the recent discussion of compression parameters http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php (which hasn't gone further than discussion yet). Perhaps we need an additional parameter which is a maximum input size to attempt compression at all. IOW, the current force_input_size is not only useless but exactly backwards ... There was some discussion in that thread (or maybe the earlier one on -patches) of exposing the lzcompress parameters directly to users, perhaps as an extended form of the current SET STORAGE command. That won't happen for 8.3 but it might later. In the meantime, if the defaults included not attempting to compress multi-megabyte values, I think it'd Just Work for cases like yours. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SUBSTRING performance for large BYTEA
On Sat, Aug 18, 2007 at 10:23:42AM -0700, Joshua D. Drake wrote: SET STORAGE EXTERNAL (before storing anything in it...) See the ALTER TABLE reference page. Now, to convert an existing bytea column I would need to add a new bytea column with set storage external, move the data from the old column to the new column, remove the old column, and give the new column the original name, correct ? Set existing column to storage external update existing column with existing data: UPDATE foo SET bar = bar; Now the down side to this is you are going to create a dead row for every update which means a vacuum (probably full) afterward, but the way you describe above will do the same thing as well. Sure. I was a bit uneasy about the docs saying set storage doesn't affect existing data but only sets the strategy on new inserts/updates and hence thought using a wholy new column would somehow be safer. But maybe this can be nefariously interpreted such that I could sort-of implement cutoff-based extended/external switching by prepending alter table ... set storage external/extended ... to INSERTs/UPDATEs based on bytea parameter size. Or even writing a trigger issuing ALTER TABLE depending on size of insert ? Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SUBSTRING performance for large BYTEA
Joshua D. Drake [EMAIL PROTECTED] writes: Well this is a guess, but: Set existing column to storage external update existing column with existing data: UPDATE foo SET bar = bar; Well, not quite. That would actually reuse the toast pointer without decompressing it. We try to be clever about not decompressing and duplicating toast pointers unnecessarily on updates -- in this case too clever. You could do this: postgres=# ALTER TABLE foo ALTER bar TYPE bytea, ALTER bar SET STORAGE external; ALTER TABLE (Note that you have to include the 'ALTER bar SET STORAGE external' in the same command or the storage will get reset to the default 'extended' for bytea even if it was previously set to 'external'.) When I tested this though I noticed it did *not* decompress compressed data which was small enough to store internally. This may actually be desirable for your case since anything small enough to be stored internally is probably not worth bothering decompressing so it can be streamed out. It will still not be compressed next time you update it so it's not really helpful for the long term. If you want to decompress everything you have to do something like: postgres=# ALTER TABLE foo ALTER bar TYPE bytea USING t||'', ALTER bar SET STORAGE external; ALTER TABLE However note that this will require extra memory for both the decompressed original value and the new value after appending the empty string. Another option would be to update only the records which need to be decompressed with something like UPDATE foo SET bar=bar||'' WHERE length(bar) pg_column_size(bar) This at least gives you the option of doing them in small groups or even one by one. I would suggest vacuuming between each update. I do have to wonder how you're getting the data *in* though. If it's large enough to have to stream out like this then how do you initially load the data? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] SUBSTRING performance for large BYTEA
On Sat, Aug 18, 2007 at 09:32:33PM +0100, Gregory Stark wrote: I do have to wonder how you're getting the data *in* though. If it's large enough to have to stream out like this then how do you initially load the data? Well, in my particular case it isn't so much that I *want* to access bytea in chunks but rather that under certain not-yet-pinned-down circumstances windows clients tend to go out-or-memory on the socket during *retrieval* (insertion is fine, as is put/get access from Linux clients). Doing chunked retrieval works on those boxen, too, so it's an option in our application (the user defines a chunk size that works, a size of 0 is treated as no-chunking). Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] SUBSTRING performance for large BYTEA
On Sat, Aug 18, 2007 at 01:51:18PM -0400, Tom Lane wrote: Karsten Hilbert [EMAIL PROTECTED] writes: Would it be feasible to add an ALTER TABLE mode ... set storage externally-extended cutoff size ... where size is the user configurable size of the column data at which PostgreSQL switches from extended to external storage strategy ? Actually, it just occurred to me that this ties into the recent discussion of compression parameters http://archives.postgresql.org/pgsql-hackers/2007-08/msg00082.php (which hasn't gone further than discussion yet). Perhaps we need an additional parameter which is a maximum input size to attempt compression at all. IOW, the current force_input_size is not only useless but exactly backwards ... I can see that a maximum size can be relevant for the decision as to whether to *attempt* compression since large things compress slowly and may unduly slow down queries. As well as a minimum size to use compression on, quite obviously. OTOH, I'd like to be able to tell PostgreSQL to be so kind and refrain from attempting to compress values above a certain size even if it thought it'd make sense. There was some discussion in that thread (or maybe the earlier one on -patches) of exposing the lzcompress parameters directly to users, perhaps as an extended form of the current SET STORAGE command. That won't happen for 8.3 but it might later. In the Sounds good. meantime, if the defaults included not attempting to compress multi-megabyte values, I think it'd Just Work for cases like yours. Not as tweakable as I'd eventually want it but, yes, that would sort of Just Work. Karsten -- GPG key ID E4071346 @ wwwkeys.pgp.net E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] SUBSTRING performance for large BYTEA
Karsten Hilbert [EMAIL PROTECTED] writes: But maybe this can be nefariously interpreted such that I could sort-of implement cutoff-based extended/external switching by prepending alter table ... set storage external/extended ... to INSERTs/UPDATEs based on bytea parameter size. Or even writing a trigger issuing ALTER TABLE depending on size of insert ? I wouldn't suggest doing that. It will bloat the pg_attribute catalog table and require a lot of extra vacuums. I think it would also create some lock contention issues. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] SUBSTRING performance for large BYTEA
Karsten Hilbert writes: Well, in my particular case it isn't so much that I *want* to access bytea in chunks but rather that under certain not-yet-pinned-down circumstances windows clients tend to go out-or-memory on the socket during *retrieval* (insertion is fine, as is put/get access from Linux clients). Doing chunked retrieval works on those boxen, too, so it's an option in our application (the user defines a chunk size that works, a size of 0 is treated as no-chunking). This is my experience with a Java client too. Writing the data with PreparedStatement.setBinaryStream works great for long strings, but reading it with the complementary method ResultSet.getBinaryStream runs into the memory problem, killing the Java VM. Thanks to all for the useful feedback. I'm going to post a note to the JDBC list as well to make this easier to find in the future. Vance ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq