Re: [GENERAL] storing large files in database - performance
I am pleased to report that with Merlin's suggestion of using the pg-large-object middleware, I have a test case now showing that I can write a 25MB buffer from Node.js to Postgres in roughly 700 milliseconds. Here is the JavaScript code, which is nearly verbatim from the example in the pg-large-object doc: packages.testLargeObjects = function(callback) { var pgp = require('pg-promise')(); var LargeObjectManager = require('pg-large-object').LargeObjectManager; var PassThrough = require('stream').PassThrough; var bufSize = 1024 * 1024 * 25; var buf = new Buffer(bufSize); buf.fill("pgrocks"); var connInfo = { host: 'localhost', port: 5432, database:'mydb', user: 'postgres, password:'secret' }; var db = pgp(connInfo); db.tx(function(tx) { const lObjMgr = new LargeObjectManager({pgPromise: tx}); const bufferSize = 16384; return lObjMgr.createAndWritableStreamAsync(bufferSize) .then( ([oid, stream]) => { let bufferStream = new PassThrough(); bufferStream.end(buf); bufferStream.pipe(stream); return new Promise(function(resolve, reject) { stream.on('finish', resolve); stream.on('error', reject); }); }); }) .then(function() { callback(); pgp.end(); }) .catch(function(err) { callback(err); pgp.end(); }); }; Thanks very much! Eric -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Another point, some googling turned up https://www.npmjs.com/package/pg-large-object which is definitely something to consider trying. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] storing large files in database - performance
Thanks, Merlin - lots of good information here, and I had not yet stumbled across pg-large-object - I will look into it. Eric -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Thursday, May 18, 2017 9:49 AM To: Eric Hill Cc: Thomas Kellerer ; PostgreSQL General Subject: Re: storing large files in database - performance EXTERNAL On Thu, May 18, 2017 at 7:34 AM, Eric Hill wrote: > I would be thrilled to get 76 MB per second, and it is comforting to know > that we have that as a rough upper bound on performance. I've got work to do > to figure out how to approach that upper bound from Node.js. > > In the meantime, I've been looking at performance on the read side. For > that, I can bypass all my Node.js layers and just run a query from pgAdmin 4. > I ran this query, where indexFile.contents for the row in question is 25MB > in size. The query itself took 4 seconds in pgAdmin 4. Better than the 12 > seconds I'm getting in Node.js, but still on the order of 6MB per second, not > 76. Do you suppose pgAdmin 4 and I are doing similarly inefficient things in > querying bytea values? Probably. I haven't spent a lot of time with pgadmin 4 so I'm not entirely sure. If you want a quick and dirty comparison, try using running your query in psql unaligned mode for a comaprison point. You can also do \copy BINARY in the case of byte transfers. The basic problem is not really the database, it's that database interaction APIs tend not to be directed to this kind of problem. The big picture issues are: *) Driver overhead marshaling from wire format to managed types *) Driver overhead for memory management *) Wire format issues. Certain types are *much* faster with the binary wire format and are additionally much more memory efficient. Your bytea transfers are probably being serialized to text and back in both directions which is very wasteful, especially for very large transfers since it's wasteful in terms of memory. If I were to seriously look at node.js performance, my rough thinking is that I'd want to be setting up the javascript variables directly in C somehow using plv8 internal routines. Short of that, I would probably be querying all data out of postgres in json rather than serializing individual fields (which is what I generally do in practice). Another point, some googling turned up https://www.npmjs.com/package/pg-large-object which is definitely something to consider trying. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] storing large files in database - performance
My apologies: I said I ran "this query" but failed to include the query. It was merely this: SELECT "indexFile"."_id", "indexFile"."contents" FROM "mySchema"."indexFiles" AS "indexFile" WHERE "indexFile"."_id" = '591c609bb56d0849404e4720'; Eric -Original Message- From: Eric Hill [mailto:eric.h...@jmp.com] Sent: Thursday, May 18, 2017 8:35 AM To: Merlin Moncure ; Thomas Kellerer Cc: PostgreSQL General Subject: Re: storing large files in database - performance I would be thrilled to get 76 MB per second, and it is comforting to know that we have that as a rough upper bound on performance. I've got work to do to figure out how to approach that upper bound from Node.js. In the meantime, I've been looking at performance on the read side. For that, I can bypass all my Node.js layers and just run a query from pgAdmin 4. I ran this query, where indexFile.contents for the row in question is 25MB in size. The query itself took 4 seconds in pgAdmin 4. Better than the 12 seconds I'm getting in Node.js, but still on the order of 6MB per second, not 76. Do you suppose pgAdmin 4 and I are doing similarly inefficient things in querying bytea values? Thanks, Eric -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] storing large files in database - performance
I would be thrilled to get 76 MB per second, and it is comforting to know that we have that as a rough upper bound on performance. I've got work to do to figure out how to approach that upper bound from Node.js. In the meantime, I've been looking at performance on the read side. For that, I can bypass all my Node.js layers and just run a query from pgAdmin 4. I ran this query, where indexFile.contents for the row in question is 25MB in size. The query itself took 4 seconds in pgAdmin 4. Better than the 12 seconds I'm getting in Node.js, but still on the order of 6MB per second, not 76. Do you suppose pgAdmin 4 and I are doing similarly inefficient things in querying bytea values? Thanks, Eric -Original Message- From: Merlin Moncure [mailto:mmonc...@gmail.com] Sent: Wednesday, May 17, 2017 10:21 AM To: Thomas Kellerer Cc: PostgreSQL General Subject: Re: storing large files in database - performance On Tue, May 16, 2017 at 9:51 AM, Thomas Kellerer wrote: > John R Pierce schrieb am 16.05.2017 um 16:44: >> On 5/16/2017 7:35 AM, Thomas Kellerer wrote: >>> When my (JDBC based) SQL client and the database server are on the same >>> computer... >> >> node.js is Javascript, not java w/ jdbc > > I know that. > > I mentioned JDBC so that it's clear that the timings were done using a > different technology > > Maybe it's Node.js or the JavaScript "driver" that causes the problems. When writing large objects to the database, method of transmission will very much determine performance until you start hitting the natural boundaries imposed by the database. via (hastily written): #include "libpqtypes.h" #include "stdlib.h" #include "string.h" int main() { int s = 1024 * 1024 * 256; char *p = malloc(s); memset(p, 'x', s); p[s-1] = 0; PGconn *conn = PQconnectdb(""); PQinitTypes(conn); PGresult *res = PQexecf(conn, "insert into foo values(1,%text)", p); if(!res) fprintf(stderr, "*ERROR: %s\n", PQgeterror()); PQclear(res); } mmoncure@mernix2 09:13 AM /tmp$ gcc -otest test.c -lpq -lpqtypes -I /home/mmoncure/src/libpqtypes-1.5.1/src -I /home/mmoncure/pg94/include/ -L /home/mmoncure/src/libpqtypes-1.5.1/.libs/ mmoncure@mernix2 09:13 AM /tmp$ psql -c "create table foo(i int, f text)" CREATE TABLE mmoncure@mernix2 09:13 AM /tmp$ psql -c "alter table foo alter f set storage external" ALTER TABLE mmoncure@mernix2 09:14 AM /tmp$ time LD_LIBRARY_PATH=/home/mmoncure/src/libpqtypes-1.5.1/.libs ./test real 0m3.245s user 0m0.092s sys 0m0.232s mmoncure@mernix2 09:15 AM /tmp$ psql -c "select pg_size_pretty(pg_table_size('foo'))" pg_size_pretty 266 MB (1 row) ...that's over 76mb/sec (to local server) for 256mb transfer. That's pretty good I think. We have a 1GB barrier on bytea/text and (at least in C, with certain reasonable precautions) you can work comfortably under that limit. There might be other better strategies but it can be done. merlin -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] storing large files in database - performance
OK, thanks very much. It seems like my process is somehow flawed. I'll try removing some layers and see if I can figure out what is killing the performance. Eric > > Do these numbers surprise you? Are these files just too large for > storage in PostgreSQL to be practical? Could there be something about > my methodology that is slowing things down? Yes, it does surprise me. I just tested inserting an 11MB file using psycopg2(Python) and it was less then a second. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] storing large files in database - performance
Hey, I searched and found a few discussions of storing large files in the database in the archives, but none that specifically address performance and how large of files can realistically be stored in the database. I have a node.js application using PostgreSQL to store uploaded files. The column in which I am storing the file contents is of type "bytea" with "Storage" type set to "EXTENDED". Storing a 12.5 MB file is taking 10 seconds, and storing a 25MB file is taking 37 seconds. Two notable things about those numbers: It seems like a long time, and the time seems to grow exponentially with file size rather than linearly. Do these numbers surprise you? Are these files just too large for storage in PostgreSQL to be practical? Could there be something about my methodology that is slowing things down? I do have the Sequelize ORM and the pg driver in between my code and the database. Thanks, Eric