Re: [GENERAL] storing large files in database - performance

2017-05-19 Thread Merlin Moncure
On Fri, May 19, 2017 at 2:04 PM, Eric Hill  wrote:
> I am pleased to report that with Merlin's suggestion of using the 
> pg-large-object middleware, I have a test case now showing that I can write a 
> 25MB buffer from Node.js to Postgres in roughly 700 milliseconds.  Here is 
> the JavaScript code, which is nearly verbatim from the example in the 
> pg-large-object doc:
>
> packages.testLargeObjects = function(callback) {
>var pgp = require('pg-promise')();
>var LargeObjectManager = require('pg-large-object').LargeObjectManager;
>var PassThrough = require('stream').PassThrough;
>
>var bufSize = 1024 * 1024 * 25;
>var buf = new Buffer(bufSize);
>buf.fill("pgrocks");
>
>var connInfo = {
>   host:   'localhost',
>   port:   5432,
>   database:'mydb',
>   user:   'postgres,
>   password:'secret'
>};
>
>var db = pgp(connInfo);
>
>db.tx(function(tx) {
>   const lObjMgr = new LargeObjectManager({pgPromise: tx});
>   const bufferSize = 16384;
>
>   return lObjMgr.createAndWritableStreamAsync(bufferSize)
>  .then( ([oid, stream]) => {
> let bufferStream = new PassThrough();
> bufferStream.end(buf);
> bufferStream.pipe(stream);
> return new Promise(function(resolve, reject) {
>stream.on('finish', resolve);
>stream.on('error', reject);
> });
>  });
>})
>.then(function() {
>   callback();
>   pgp.end();
>})
>.catch(function(err) {
>   callback(err);
>   pgp.end();
>});
> };
>
> Thanks very much!

well done sir! that's probably as fast as you're going to get in node,
at least without a large investment at the driver level.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-19 Thread Eric Hill
I am pleased to report that with Merlin's suggestion of using the 
pg-large-object middleware, I have a test case now showing that I can write a 
25MB buffer from Node.js to Postgres in roughly 700 milliseconds.  Here is the 
JavaScript code, which is nearly verbatim from the example in the 
pg-large-object doc:

packages.testLargeObjects = function(callback) {
   var pgp = require('pg-promise')();
   var LargeObjectManager = require('pg-large-object').LargeObjectManager;
   var PassThrough = require('stream').PassThrough;

   var bufSize = 1024 * 1024 * 25;
   var buf = new Buffer(bufSize);
   buf.fill("pgrocks");

   var connInfo = {
  host:   'localhost',
  port:   5432,
  database:'mydb',
  user:   'postgres,
  password:'secret'
   };

   var db = pgp(connInfo);

   db.tx(function(tx) {
  const lObjMgr = new LargeObjectManager({pgPromise: tx});
  const bufferSize = 16384;

  return lObjMgr.createAndWritableStreamAsync(bufferSize)
 .then( ([oid, stream]) => {
let bufferStream = new PassThrough();
bufferStream.end(buf);
bufferStream.pipe(stream);
return new Promise(function(resolve, reject) {
   stream.on('finish', resolve);
   stream.on('error', reject);
});
 });
   })
   .then(function() {
  callback();
  pgp.end();
   })
   .catch(function(err) {
  callback(err);
  pgp.end();
   });
};

Thanks very much!

Eric

-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 

Another point, some googling turned up
https://www.npmjs.com/package/pg-large-object which is definitely something to 
consider trying.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-18 Thread Eric Hill
Thanks, Merlin - lots of good information here, and I had not yet stumbled 
across pg-large-object - I will look into it.

Eric

-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Thursday, May 18, 2017 9:49 AM
To: Eric Hill 
Cc: Thomas Kellerer ; PostgreSQL General 

Subject: Re: storing large files in database - performance

EXTERNAL

On Thu, May 18, 2017 at 7:34 AM, Eric Hill  wrote:
> I would be thrilled to get 76 MB per second, and it is comforting to know 
> that we have that as a rough upper bound on performance.  I've got work to do 
> to figure out how to approach that upper bound from Node.js.
>
> In the meantime, I've been looking at performance on the read side.  For 
> that, I can bypass all my Node.js layers and just run a query from pgAdmin 4. 
>  I ran this query, where indexFile.contents for the row in question is 25MB 
> in size.  The query itself took 4 seconds in pgAdmin 4.  Better than the 12 
> seconds I'm getting in Node.js, but still on the order of 6MB per second, not 
> 76.  Do you suppose pgAdmin 4 and I are doing similarly inefficient things in 
> querying bytea values?

Probably.  I haven't spent a lot of time with pgadmin 4 so I'm not entirely 
sure.  If you want a quick and dirty comparison, try using running your query 
in psql unaligned mode for a comaprison point.  You can also do \copy BINARY in 
the case of byte transfers.

The basic problem is not really the database, it's that database interaction 
APIs tend not to be directed to this kind of problem.
The big picture issues are:

*) Driver overhead marshaling from wire format to managed types

*) Driver overhead for memory management

*) Wire format issues.  Certain types are *much* faster with the binary wire 
format and are additionally much more memory efficient.
Your bytea transfers are probably being serialized to text and back in both 
directions which is very wasteful, especially for very large transfers since 
it's wasteful in terms of memory.

If I were to seriously look at node.js performance, my rough thinking is that 
I'd want to be setting up the javascript variables directly in C somehow using 
plv8 internal routines.  Short of that, I would probably be querying all data 
out of postgres in json rather than serializing individual fields (which is 
what I generally do in practice).

Another point, some googling turned up
https://www.npmjs.com/package/pg-large-object which is definitely something to 
consider trying.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-18 Thread Eric Hill
My apologies: I said I ran "this query" but failed to include the query.  It 
was merely this:

SELECT "indexFile"."_id", "indexFile"."contents"
FROM "mySchema"."indexFiles" AS "indexFile"
WHERE "indexFile"."_id" = '591c609bb56d0849404e4720';

Eric

-Original Message-
From: Eric Hill [mailto:eric.h...@jmp.com] 
Sent: Thursday, May 18, 2017 8:35 AM
To: Merlin Moncure ; Thomas Kellerer 
Cc: PostgreSQL General 
Subject: Re: storing large files in database - performance

I would be thrilled to get 76 MB per second, and it is comforting to know that 
we have that as a rough upper bound on performance.  I've got work to do to 
figure out how to approach that upper bound from Node.js.  

In the meantime, I've been looking at performance on the read side.  For that, 
I can bypass all my Node.js layers and just run a query from pgAdmin 4.  I ran 
this query, where indexFile.contents for the row in question is 25MB in size.  
The query itself took 4 seconds in pgAdmin 4.  Better than the 12 seconds I'm 
getting in Node.js, but still on the order of 6MB per second, not 76.  Do you 
suppose pgAdmin 4 and I are doing similarly inefficient things in querying 
bytea values?

Thanks,

Eric

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-18 Thread Merlin Moncure
On Thu, May 18, 2017 at 7:34 AM, Eric Hill  wrote:
> I would be thrilled to get 76 MB per second, and it is comforting to know 
> that we have that as a rough upper bound on performance.  I've got work to do 
> to figure out how to approach that upper bound from Node.js.
>
> In the meantime, I've been looking at performance on the read side.  For 
> that, I can bypass all my Node.js layers and just run a query from pgAdmin 4. 
>  I ran this query, where indexFile.contents for the row in question is 25MB 
> in size.  The query itself took 4 seconds in pgAdmin 4.  Better than the 12 
> seconds I'm getting in Node.js, but still on the order of 6MB per second, not 
> 76.  Do you suppose pgAdmin 4 and I are doing similarly inefficient things in 
> querying bytea values?

Probably.  I haven't spent a lot of time with pgadmin 4 so I'm not
entirely sure.  If you want a quick and dirty comparison, try using
running your query in psql unaligned mode for a comaprison point.  You
can also do \copy BINARY in the case of byte transfers.

The basic problem is not really the database, it's that database
interaction APIs tend not to be directed to this kind of problem.
The big picture issues are:

*) Driver overhead marshaling from wire format to managed types

*) Driver overhead for memory management

*) Wire format issues.  Certain types are *much* faster with the
binary wire format and are additionally much more memory efficient.
Your bytea transfers are probably being serialized to text and back in
both directions which is very wasteful, especially for very large
transfers since it's wasteful in terms of memory.

If I were to seriously look at node.js performance, my rough thinking
is that I'd want to be setting up the javascript variables directly in
C somehow using plv8 internal routines.  Short of that, I would
probably be querying all data out of postgres in json rather than
serializing individual fields (which is what I generally do in
practice).

Another point, some googling turned up
https://www.npmjs.com/package/pg-large-object which is definitely
something to consider trying.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-18 Thread Eric Hill
I would be thrilled to get 76 MB per second, and it is comforting to know that 
we have that as a rough upper bound on performance.  I've got work to do to 
figure out how to approach that upper bound from Node.js.  

In the meantime, I've been looking at performance on the read side.  For that, 
I can bypass all my Node.js layers and just run a query from pgAdmin 4.  I ran 
this query, where indexFile.contents for the row in question is 25MB in size.  
The query itself took 4 seconds in pgAdmin 4.  Better than the 12 seconds I'm 
getting in Node.js, but still on the order of 6MB per second, not 76.  Do you 
suppose pgAdmin 4 and I are doing similarly inefficient things in querying 
bytea values?

Thanks,

Eric

-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Wednesday, May 17, 2017 10:21 AM
To: Thomas Kellerer 
Cc: PostgreSQL General 
Subject: Re: storing large files in database - performance

On Tue, May 16, 2017 at 9:51 AM, Thomas Kellerer  wrote:
> John R Pierce schrieb am 16.05.2017 um 16:44:
>> On 5/16/2017 7:35 AM, Thomas Kellerer wrote:
>>> When my (JDBC based) SQL client and the database server are on the same 
>>> computer...
>>
>> node.js is Javascript, not java w/ jdbc
>
> I know that.
>
> I mentioned JDBC so that it's clear that the timings were done using a 
> different technology
>
> Maybe it's Node.js or the JavaScript "driver" that causes the problems.

When writing large objects to the database, method of transmission will very 
much determine performance until you start hitting the natural boundaries 
imposed by the database.

via (hastily written):
#include "libpqtypes.h"
#include "stdlib.h"
#include "string.h"

int main()
{
  int s = 1024 * 1024 * 256;

  char *p = malloc(s);
  memset(p, 'x', s);
  p[s-1] = 0;

  PGconn *conn = PQconnectdb("");
  PQinitTypes(conn);

  PGresult *res = PQexecf(conn, "insert into foo values(1,%text)", p);

  if(!res)
fprintf(stderr, "*ERROR: %s\n", PQgeterror());

  PQclear(res);
}

mmoncure@mernix2 09:13 AM /tmp$ gcc -otest test.c -lpq -lpqtypes -I 
/home/mmoncure/src/libpqtypes-1.5.1/src -I /home/mmoncure/pg94/include/ -L 
/home/mmoncure/src/libpqtypes-1.5.1/.libs/
mmoncure@mernix2 09:13 AM /tmp$ psql -c "create table foo(i int, f text)"
CREATE TABLE
mmoncure@mernix2 09:13 AM /tmp$ psql -c "alter table foo alter f set storage 
external"
ALTER TABLE
mmoncure@mernix2 09:14 AM /tmp$ time
LD_LIBRARY_PATH=/home/mmoncure/src/libpqtypes-1.5.1/.libs ./test

real 0m3.245s
user 0m0.092s
sys 0m0.232s
mmoncure@mernix2 09:15 AM /tmp$ psql -c "select 
pg_size_pretty(pg_table_size('foo'))"
 pg_size_pretty

 266 MB
(1 row)


...that's over 76mb/sec (to local server) for 256mb transfer.  That's
pretty good I think.   We have a 1GB barrier on bytea/text and (at
least in C, with certain reasonable precautions) you can work comfortably under 
that limit.  There might be other better strategies but it can be done.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-17 Thread Merlin Moncure
On Tue, May 16, 2017 at 9:51 AM, Thomas Kellerer  wrote:
> John R Pierce schrieb am 16.05.2017 um 16:44:
>> On 5/16/2017 7:35 AM, Thomas Kellerer wrote:
>>> When my (JDBC based) SQL client and the database server are on the same 
>>> computer...
>>
>> node.js is Javascript, not java w/ jdbc
>
> I know that.
>
> I mentioned JDBC so that it's clear that the timings were done using a 
> different technology
>
> Maybe it's Node.js or the JavaScript "driver" that causes the problems.

When writing large objects to the database, method of transmission
will very much determine performance until you start hitting the
natural boundaries imposed by the database.

via (hastily written):
#include "libpqtypes.h"
#include "stdlib.h"
#include "string.h"

int main()
{
  int s = 1024 * 1024 * 256;

  char *p = malloc(s);
  memset(p, 'x', s);
  p[s-1] = 0;

  PGconn *conn = PQconnectdb("");
  PQinitTypes(conn);

  PGresult *res = PQexecf(conn, "insert into foo values(1,%text)", p);

  if(!res)
fprintf(stderr, "*ERROR: %s\n", PQgeterror());

  PQclear(res);
}

mmoncure@mernix2 09:13 AM /tmp$ gcc -otest test.c -lpq -lpqtypes -I
/home/mmoncure/src/libpqtypes-1.5.1/src -I
/home/mmoncure/pg94/include/ -L
/home/mmoncure/src/libpqtypes-1.5.1/.libs/
mmoncure@mernix2 09:13 AM /tmp$ psql -c "create table foo(i int, f text)"
CREATE TABLE
mmoncure@mernix2 09:13 AM /tmp$ psql -c "alter table foo alter f set
storage external"
ALTER TABLE
mmoncure@mernix2 09:14 AM /tmp$ time
LD_LIBRARY_PATH=/home/mmoncure/src/libpqtypes-1.5.1/.libs ./test

real 0m3.245s
user 0m0.092s
sys 0m0.232s
mmoncure@mernix2 09:15 AM /tmp$ psql -c "select
pg_size_pretty(pg_table_size('foo'))"
 pg_size_pretty

 266 MB
(1 row)


...that's over 76mb/sec (to local server) for 256mb transfer.  That's
pretty good I think.   We have a 1GB barrier on bytea/text and (at
least in C, with certain reasonable precautions) you can work
comfortably under that limit.  There might be other better strategies
but it can be done.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-16 Thread Peter J. Holzer
On 2017-05-16 12:25:03 +, Eric Hill wrote:
> I searched and found a few discussions of storing large files in the database
> in the archives, but none that specifically address performance and how large
> of files can realistically be stored in the database.
> 
>  
> 
> I have a node.js application using PostgreSQL to store uploaded files.  The
> column in which I am storing the file contents is of type “bytea” with
> “Storage” type set to “EXTENDED”.

I have mentioned this little experiment before, but I finally put the
results on my web site: https://www.hjp.at/databases/blob-bench/

(Please note that so far I have run this only on one system.
Generalizing to other systems might be premature).


> Storing a 12.5 MB file is taking 10 seconds, and storing a 25MB file
> is taking 37 seconds.  Two notable things about those numbers:  It
> seems like a long time, and the time seems to grow exponentially with
> file size rather than linearly.
> 
>  
> 
> Do these numbers surprise you?

Yes. on my system, storing a 25 MB bytea value takes well under 1 second.

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: Digital signature


Re: [GENERAL] storing large files in database - performance

2017-05-16 Thread Eric Hill
OK, thanks very much.  It seems like my process is somehow flawed.  I'll try 
removing some layers and see if I can figure out what is killing the 
performance.

Eric



>
> Do these numbers surprise you?  Are these files just too large for 
> storage in PostgreSQL to be practical?  Could there be something about 
> my methodology that is slowing things down?

Yes, it does surprise me. I just tested inserting an 11MB file using
psycopg2(Python) and it was less then a second.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-16 Thread Thomas Kellerer
John R Pierce schrieb am 16.05.2017 um 16:44:
> On 5/16/2017 7:35 AM, Thomas Kellerer wrote:
>> When my (JDBC based) SQL client and the database server are on the same 
>> computer...
> 
> node.js is Javascript, not java w/ jdbc

I know that. 

I mentioned JDBC so that it's clear that the timings were done using a 
different technology

Maybe it's Node.js or the JavaScript "driver" that causes the problems.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-16 Thread Adrian Klaver

On 05/16/2017 07:44 AM, John R Pierce wrote:

On 5/16/2017 7:35 AM, Thomas Kellerer wrote:
When my (JDBC based) SQL client and the database server are on the 
same computer...


node.js is Javascript, not java w/ jdbc


I think it was more a point of comparison, like my using a Python 
example. Something to show that is probably not on the Postgres end.








--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-16 Thread John R Pierce

On 5/16/2017 5:25 AM, Eric Hill wrote:
I do have the Sequelize ORM and the pg driver in between my code and 
the database. 



Can you try a similar test without the ORM, just going straight from 
node.js to sql ?



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-16 Thread John R Pierce

On 5/16/2017 7:35 AM, Thomas Kellerer wrote:

When my (JDBC based) SQL client and the database server are on the same 
computer...


node.js is Javascript, not java w/ jdbc


--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-16 Thread Adrian Klaver

On 05/16/2017 05:25 AM, Eric Hill wrote:

Hey,

I searched and found a few discussions of storing large files in the 
database in the archives, but none that specifically address performance 
and how large of files can realistically be stored in the database.


I have a node.js application using PostgreSQL to store uploaded files.  
The column in which I am storing the file contents is of type “bytea” 
with “Storage” type set to “EXTENDED”.  Storing a 12.5 MB file is taking 
10 seconds, and storing a 25MB file is taking 37 seconds.  Two notable 
things about those numbers:  It seems like a long time, and the time 
seems to grow exponentially with file size rather than linearly.


Do these numbers surprise you?  Are these files just too large for 
storage in PostgreSQL to be practical?  Could there be something about 
my methodology that is slowing things down?


Yes, it does surprise me. I just tested inserting an 11MB file using 
psycopg2(Python) and it was less then a second.




I do have the Sequelize ORM and the pg driver in between my code and the 
database.


Thanks,

Eric




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-16 Thread Thomas Kellerer
Eric Hill schrieb am 16.05.2017 um 14:25:
> I have a node.js application using PostgreSQL to store uploaded
> files. The column in which I am storing the file contents is of type
> “bytea” with “Storage” type set to “EXTENDED”. Storing a 12.5 MB file
> is taking 10 seconds, and storing a 25MB file is taking 37 seconds.
> Two notable things about those numbers: It seems like a long time,
> and the time seems to grow exponentially with file size rather than
> linearly.
> 
> Do these numbers surprise you? 

When my (JDBC based) SQL client and the database server are on the same 
computer, I can store a 45MB file about 4 seconds, a 240MB file in about 
20seconds 
When I do the same with a server on the (same) network, the 45MB take about 9 
seconds, the 240MB take about 60 seconds.

So yes, these numbers sound a bit surprising. 

> Could there be something about my methodology that is slowing things down?

How far are your app server and the database server apart? 
Maybe it's just a case of a slow network

Thomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] storing large files in database - performance

2017-05-16 Thread Daniel Verite
Eric Hill wrote:

> I am storing the file contents is of type "bytea" with "Storage" type set to
> "EXTENDED".  Storing a 12.5 MB file is taking 10 seconds

That seems really slow indeed.
Can you import the same file to the same server with psql's
\lo_import command and see how much time it takes?
That would give a baseline to compare against.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general