Re: [GENERAL] storing large files in database - performance

2017-05-19 Thread Eric Hill
I am pleased to report that with Merlin's suggestion of using the 
pg-large-object middleware, I have a test case now showing that I can write a 
25MB buffer from Node.js to Postgres in roughly 700 milliseconds.  Here is the 
JavaScript code, which is nearly verbatim from the example in the 
pg-large-object doc:

packages.testLargeObjects = function(callback) {
   var pgp = require('pg-promise')();
   var LargeObjectManager = require('pg-large-object').LargeObjectManager;
   var PassThrough = require('stream').PassThrough;

   var bufSize = 1024 * 1024 * 25;
   var buf = new Buffer(bufSize);
   buf.fill("pgrocks");

   var connInfo = {
  host:   'localhost',
  port:   5432,
  database:'mydb',
  user:   'postgres,
  password:'secret'
   };

   var db = pgp(connInfo);

   db.tx(function(tx) {
  const lObjMgr = new LargeObjectManager({pgPromise: tx});
  const bufferSize = 16384;

  return lObjMgr.createAndWritableStreamAsync(bufferSize)
 .then( ([oid, stream]) => {
let bufferStream = new PassThrough();
bufferStream.end(buf);
bufferStream.pipe(stream);
return new Promise(function(resolve, reject) {
   stream.on('finish', resolve);
   stream.on('error', reject);
});
 });
   })
   .then(function() {
  callback();
  pgp.end();
   })
   .catch(function(err) {
  callback(err);
  pgp.end();
   });
};

Thanks very much!

Eric

-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 

Another point, some googling turned up
https://www.npmjs.com/package/pg-large-object which is definitely something to 
consider trying.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-18 Thread Eric Hill
Thanks, Merlin - lots of good information here, and I had not yet stumbled 
across pg-large-object - I will look into it.

Eric

-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Thursday, May 18, 2017 9:49 AM
To: Eric Hill 
Cc: Thomas Kellerer ; PostgreSQL General 

Subject: Re: storing large files in database - performance

EXTERNAL

On Thu, May 18, 2017 at 7:34 AM, Eric Hill  wrote:
> I would be thrilled to get 76 MB per second, and it is comforting to know 
> that we have that as a rough upper bound on performance.  I've got work to do 
> to figure out how to approach that upper bound from Node.js.
>
> In the meantime, I've been looking at performance on the read side.  For 
> that, I can bypass all my Node.js layers and just run a query from pgAdmin 4. 
>  I ran this query, where indexFile.contents for the row in question is 25MB 
> in size.  The query itself took 4 seconds in pgAdmin 4.  Better than the 12 
> seconds I'm getting in Node.js, but still on the order of 6MB per second, not 
> 76.  Do you suppose pgAdmin 4 and I are doing similarly inefficient things in 
> querying bytea values?

Probably.  I haven't spent a lot of time with pgadmin 4 so I'm not entirely 
sure.  If you want a quick and dirty comparison, try using running your query 
in psql unaligned mode for a comaprison point.  You can also do \copy BINARY in 
the case of byte transfers.

The basic problem is not really the database, it's that database interaction 
APIs tend not to be directed to this kind of problem.
The big picture issues are:

*) Driver overhead marshaling from wire format to managed types

*) Driver overhead for memory management

*) Wire format issues.  Certain types are *much* faster with the binary wire 
format and are additionally much more memory efficient.
Your bytea transfers are probably being serialized to text and back in both 
directions which is very wasteful, especially for very large transfers since 
it's wasteful in terms of memory.

If I were to seriously look at node.js performance, my rough thinking is that 
I'd want to be setting up the javascript variables directly in C somehow using 
plv8 internal routines.  Short of that, I would probably be querying all data 
out of postgres in json rather than serializing individual fields (which is 
what I generally do in practice).

Another point, some googling turned up
https://www.npmjs.com/package/pg-large-object which is definitely something to 
consider trying.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-18 Thread Eric Hill
My apologies: I said I ran "this query" but failed to include the query.  It 
was merely this:

SELECT "indexFile"."_id", "indexFile"."contents"
FROM "mySchema"."indexFiles" AS "indexFile"
WHERE "indexFile"."_id" = '591c609bb56d0849404e4720';

Eric

-Original Message-
From: Eric Hill [mailto:eric.h...@jmp.com] 
Sent: Thursday, May 18, 2017 8:35 AM
To: Merlin Moncure ; Thomas Kellerer 
Cc: PostgreSQL General 
Subject: Re: storing large files in database - performance

I would be thrilled to get 76 MB per second, and it is comforting to know that 
we have that as a rough upper bound on performance.  I've got work to do to 
figure out how to approach that upper bound from Node.js.  

In the meantime, I've been looking at performance on the read side.  For that, 
I can bypass all my Node.js layers and just run a query from pgAdmin 4.  I ran 
this query, where indexFile.contents for the row in question is 25MB in size.  
The query itself took 4 seconds in pgAdmin 4.  Better than the 12 seconds I'm 
getting in Node.js, but still on the order of 6MB per second, not 76.  Do you 
suppose pgAdmin 4 and I are doing similarly inefficient things in querying 
bytea values?

Thanks,

Eric

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-18 Thread Eric Hill
I would be thrilled to get 76 MB per second, and it is comforting to know that 
we have that as a rough upper bound on performance.  I've got work to do to 
figure out how to approach that upper bound from Node.js.  

In the meantime, I've been looking at performance on the read side.  For that, 
I can bypass all my Node.js layers and just run a query from pgAdmin 4.  I ran 
this query, where indexFile.contents for the row in question is 25MB in size.  
The query itself took 4 seconds in pgAdmin 4.  Better than the 12 seconds I'm 
getting in Node.js, but still on the order of 6MB per second, not 76.  Do you 
suppose pgAdmin 4 and I are doing similarly inefficient things in querying 
bytea values?

Thanks,

Eric

-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Wednesday, May 17, 2017 10:21 AM
To: Thomas Kellerer 
Cc: PostgreSQL General 
Subject: Re: storing large files in database - performance

On Tue, May 16, 2017 at 9:51 AM, Thomas Kellerer  wrote:
> John R Pierce schrieb am 16.05.2017 um 16:44:
>> On 5/16/2017 7:35 AM, Thomas Kellerer wrote:
>>> When my (JDBC based) SQL client and the database server are on the same 
>>> computer...
>>
>> node.js is Javascript, not java w/ jdbc
>
> I know that.
>
> I mentioned JDBC so that it's clear that the timings were done using a 
> different technology
>
> Maybe it's Node.js or the JavaScript "driver" that causes the problems.

When writing large objects to the database, method of transmission will very 
much determine performance until you start hitting the natural boundaries 
imposed by the database.

via (hastily written):
#include "libpqtypes.h"
#include "stdlib.h"
#include "string.h"

int main()
{
  int s = 1024 * 1024 * 256;

  char *p = malloc(s);
  memset(p, 'x', s);
  p[s-1] = 0;

  PGconn *conn = PQconnectdb("");
  PQinitTypes(conn);

  PGresult *res = PQexecf(conn, "insert into foo values(1,%text)", p);

  if(!res)
fprintf(stderr, "*ERROR: %s\n", PQgeterror());

  PQclear(res);
}

mmoncure@mernix2 09:13 AM /tmp$ gcc -otest test.c -lpq -lpqtypes -I 
/home/mmoncure/src/libpqtypes-1.5.1/src -I /home/mmoncure/pg94/include/ -L 
/home/mmoncure/src/libpqtypes-1.5.1/.libs/
mmoncure@mernix2 09:13 AM /tmp$ psql -c "create table foo(i int, f text)"
CREATE TABLE
mmoncure@mernix2 09:13 AM /tmp$ psql -c "alter table foo alter f set storage 
external"
ALTER TABLE
mmoncure@mernix2 09:14 AM /tmp$ time
LD_LIBRARY_PATH=/home/mmoncure/src/libpqtypes-1.5.1/.libs ./test

real 0m3.245s
user 0m0.092s
sys 0m0.232s
mmoncure@mernix2 09:15 AM /tmp$ psql -c "select 
pg_size_pretty(pg_table_size('foo'))"
 pg_size_pretty

 266 MB
(1 row)


...that's over 76mb/sec (to local server) for 256mb transfer.  That's
pretty good I think.   We have a 1GB barrier on bytea/text and (at
least in C, with certain reasonable precautions) you can work comfortably under 
that limit.  There might be other better strategies but it can be done.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-16 Thread Eric Hill
OK, thanks very much.  It seems like my process is somehow flawed.  I'll try 
removing some layers and see if I can figure out what is killing the 
performance.

Eric



>
> Do these numbers surprise you?  Are these files just too large for 
> storage in PostgreSQL to be practical?  Could there be something about 
> my methodology that is slowing things down?

Yes, it does surprise me. I just tested inserting an 11MB file using
psycopg2(Python) and it was less then a second.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] storing large files in database - performance

2017-05-16 Thread Eric Hill
Hey,

I searched and found a few discussions of storing large files in the database 
in the archives, but none that specifically address performance and how large 
of files can realistically be stored in the database.

I have a node.js application using PostgreSQL to store uploaded files.  The 
column in which I am storing the file contents is of type "bytea" with 
"Storage" type set to "EXTENDED".  Storing a 12.5 MB file is taking 10 seconds, 
and storing a 25MB file is taking 37 seconds.  Two notable things about those 
numbers:  It seems like a long time, and the time seems to grow exponentially 
with file size rather than linearly.

Do these numbers surprise you?  Are these files just too large for storage in 
PostgreSQL to be practical?  Could there be something about my methodology that 
is slowing things down?

I do have the Sequelize ORM and the pg driver in between my code and the 
database.

Thanks,

Eric