Re: [PERFORM] Array interface

2010-11-03 Thread Mladen Gogala

Conor Walsh wrote:


I generally suspect this is a Perl problem rather than a Postgres
problem, 


So do I. I had the same situation with Oracle, until John Scoles had the 
DBD::Oracle driver fixed and started utilizing the Oracle array interface.



but can't say more without code.  Maybe try pastebin if
you're having email censorship issues.

-Conor

  

I posted it to comp.databases.postgresql.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Test

2010-11-03 Thread Robert Gravsjö



On 2010-11-02 22.21, Mladen Gogala wrote:

Can you hear me now?


sure


--
Regards,
Robert roppert Gravsjö

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bufer cache replacement LRU algorithm?

2010-11-03 Thread Kenneth Marshall
Mladen,

You would need to check the mailing lists. The release notes
have it as being a clock sweep algorithm starting in version
8. Then additional changes were added to eliminate the cache
blowout caused by a sequential scan and by vacuum/autovacuum.
I do not believe that there are any parameters available other
than total size of the pool and whether sequential scans are
synchronized.

Regards,
Ken

On Wed, Nov 03, 2010 at 12:35:33PM -0400, Mladen Gogala wrote:
 Where can I find the documentation describing the buffer replacement 
 policy? Are there any parameters governing the page replacement policy?

 -- 
 Mladen Gogala Sr. Oracle DBA
 1500 Broadway
 New York, NY 10036
 (212) 329-5251
 http://www.vmsinfo.com The Leader in Integrated Media Intelligence 
 Solutions




 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Simple (hopefully) throughput question?

2010-11-03 Thread Nick Matheson

Hello

We have an application that needs to do bulk reads of ENTIRE
Postgres tables very quickly (i.e. select * from table).  We have
observed that such sequential scans run two orders of magnitude slower
than observed raw disk reads (5 MB/s versus 100 MB/s).  Part of this is
due to the storage overhead we have observed in Postgres.  In the
example below, it takes 1 GB to store 350 MB of nominal data.  However
that suggests we would expect to get 35 MB/s bulk read rates.

Observations using iostat and top during these bulk reads suggest
that the queries are CPU bound, not I/O bound.  In fact, repeating the
queries yields similar response times.  Presumably if it were an I/O
issue the response times would be much shorter the second time through
with the benefit of caching.

We have tried these simple queries using psql, JDBC, pl/java stored
procedures, and libpq.  In all cases the client code ran on the same
box as the server. We have experimented with Postgres 8.1, 8.3 and 9.0.

We also tried playing around with some of the server tuning parameters such as 
shared_buffers to no avail.

Here is uname -a for a machine we have tested on:

Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 
2010 x86_64 x86_64 x86_64 GNU/Linux

A sample dataset that reproduces these results looks like the following (there 
are no indexes):

Table bulk_performance.counts
Column |  Type   | Modifiers
+-+---
i1 | integer |
i2 | integer |
i3 | integer |
i4 | integer |

There are 22 million rows in this case.

We HAVE observed that summation queries run considerably faster.  In this case,

select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts

runs at 35 MB/s.

Our business logic does operations on the resulting data such that
the output is several orders of magnitude smaller than the input.  So
we had hoped that by putting our business logic into stored procedures
(and thus drastically reducing the amount of data flowing to the
client) our throughput would go way up.  This did not happen.

So our questions are as follows:

Is there any way using stored procedures (maybe C code that calls
SPI directly) or some other approach to get close to the expected 35
MB/s doing these bulk reads?  Or is this the price we have to pay for
using SQL instead of some NoSQL solution.  (We actually tried Tokyo
Cabinet and found it to perform quite well. However it does not measure
up to Postgres in terms of replication, data interrogation, community
support, acceptance, etc).

Thanks

Dan Schaffer
Paul Hamer
Nick Matheson


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bufer cache replacement LRU algorithm?

2010-11-03 Thread Kevin Grittner
Kenneth Marshall k...@rice.edu wrote:
 On Wed, Nov 03, 2010 at 12:35:33PM -0400, Mladen Gogala wrote:
 Where can I find the documentation describing the buffer
 replacement policy? Are there any parameters governing the page
 replacement policy?
 
 You would need to check the mailing lists. The release notes
 have it as being a clock sweep algorithm starting in version
 8. Then additional changes were added to eliminate the cache
 blowout caused by a sequential scan and by vacuum/autovacuum.
 I do not believe that there are any parameters available other
 than total size of the pool and whether sequential scans are
 synchronized.
 
The background writer settings might be considered relevant, too.
 
Also keep in mind that PostgreSQL goes through the OS cache and
filesystems; the filesystem choice and OS settings will have an
impact on how that level of caching behaves.  Since there is often
much more cache at the OS level than in PostgreSQL shared buffers,
you don't want to overlook that aspect of things.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-03 Thread Heikki Linnakangas

On 03.11.2010 17:52, Nick Matheson wrote:

We have an application that needs to do bulk reads of ENTIRE
Postgres tables very quickly (i.e. select * from table). We have
observed that such sequential scans run two orders of magnitude slower
than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is
due to the storage overhead we have observed in Postgres. In the
example below, it takes 1 GB to store 350 MB of nominal data. However
that suggests we would expect to get 35 MB/s bulk read rates.

Observations using iostat and top during these bulk reads suggest
that the queries are CPU bound, not I/O bound. In fact, repeating the
queries yields similar response times. Presumably if it were an I/O
issue the response times would be much shorter the second time through
with the benefit of caching.

We have tried these simple queries using psql, JDBC, pl/java stored
procedures, and libpq. In all cases the client code ran on the same
box as the server. We have experimented with Postgres 8.1, 8.3 and 9.0.


Try COPY, ie. COPY bulk_performance.counts TO STDOUT BINARY.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-03 Thread Marti Raudsepp
Just some ideas that went through my mind when reading your post.

On Wed, Nov 3, 2010 at 17:52, Nick Matheson nick.d.mathe...@noaa.gov wrote:
 than observed raw disk reads (5 MB/s versus 100 MB/s).  Part of this is
 due to the storage overhead we have observed in Postgres.  In the
 example below, it takes 1 GB to store 350 MB of nominal data.

PostgreSQL 8.3 and later have 22 bytes of overhead per row, plus
page-level overhead and internal fragmentation. You can't do anything
about row overheads, but you can recompile the server with larger
pages to reduce page overhead.

 Is there any way using stored procedures (maybe C code that calls
 SPI directly) or some other approach to get close to the expected 35
 MB/s doing these bulk reads?

Perhaps a simpler alternative would be writing your own aggregate
function with four arguments.

If you write this aggregate function in C, it should have similar
performance as the sum() query.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-03 Thread Andy Colson

On 11/3/2010 10:52 AM, Nick Matheson wrote:

Hello

We have an application that needs to do bulk reads of ENTIRE
Postgres tables very quickly (i.e. select * from table). We have
observed that such sequential scans run two orders of magnitude slower
than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is
due to the storage overhead we have observed in Postgres. In the
example below, it takes 1 GB to store 350 MB of nominal data. However
that suggests we would expect to get 35 MB/s bulk read rates.

Observations using iostat and top during these bulk reads suggest
that the queries are CPU bound, not I/O bound. In fact, repeating the
queries yields similar response times. Presumably if it were an I/O
issue the response times would be much shorter the second time through
with the benefit of caching.

We have tried these simple queries using psql, JDBC, pl/java stored
procedures, and libpq. In all cases the client code ran on the same
box as the server. We have experimented with Postgres 8.1, 8.3 and 9.0.

We also tried playing around with some of the server tuning parameters
such as shared_buffers to no avail.

Here is uname -a for a machine we have tested on:

Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20
07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

A sample dataset that reproduces these results looks like the following
(there are no indexes):

Table bulk_performance.counts
Column | Type | Modifiers
+-+---
i1 | integer |
i2 | integer |
i3 | integer |
i4 | integer |

There are 22 million rows in this case.

We HAVE observed that summation queries run considerably faster. In this
case,

select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts

runs at 35 MB/s.

Our business logic does operations on the resulting data such that
the output is several orders of magnitude smaller than the input. So
we had hoped that by putting our business logic into stored procedures
(and thus drastically reducing the amount of data flowing to the
client) our throughput would go way up. This did not happen.

So our questions are as follows:

Is there any way using stored procedures (maybe C code that calls
SPI directly) or some other approach to get close to the expected 35
MB/s doing these bulk reads? Or is this the price we have to pay for
using SQL instead of some NoSQL solution. (We actually tried Tokyo
Cabinet and found it to perform quite well. However it does not measure
up to Postgres in terms of replication, data interrogation, community
support, acceptance, etc).

Thanks

Dan Schaffer
Paul Hamer
Nick Matheson




I have no idea if this would be helpful or not, never tried it, but when 
you fire off select * from bigtable pg will create the entire 
resultset in memory (and maybe swap?) and then send it all to the client 
in one big lump.  You might try a cursor and fetch 100-1000 at a time 
from the cursor.  No idea if it would be faster or slower.


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance