Re: [PERFORM] select on 22 GB table causes "An I/O error occured while sending to the backend." exception

2008-08-29 Thread Valentin Bogdanov
In 2003 I met this guy who was doing Computation Fluid Dynamics and he had to 
use this software written by physics engineers in FORTRAN. 1 Gig of ram wasn't 
yet the standard for a desktop pc at that time but the software required at 
least 1 Gig just to get started. So I thought what is the problem after all you 
are supposed to be able to allocate upto 2G on a 32 bit system even if you 
don't quite have the memory and you have sufficiently big swat space. Still, 
the software didn't load on Windows. So, it seems that Windows does not 
overcommit.

regards


--- On Fri, 29/8/08, Matthew Wakeling <[EMAIL PROTECTED]> wrote:

> From: Matthew Wakeling <[EMAIL PROTECTED]>
> Subject: Re: [PERFORM] select on 22 GB table causes "An I/O error occured 
> while sending to the backend." exception
> To: pgsql-performance@postgresql.org
> Date: Friday, 29 August, 2008, 4:56 PM
> On Fri, 29 Aug 2008, Craig James wrote:
> > Disable overcommitted memory.  There is NO REASON to
> use it on any modern 
> > server-class computer, and MANY REASONS WHY IT IS A
> BAD IDEA.
> 
> As far as I can see, the main reason nowadays for
> overcommit is when a 
> large process forks and then execs. Are there any other
> modern programs 
> that allocate lots of RAM and never use it?
> 
> Matthew
> 
> -- 
> Nog: Look! They've made me into an ensign!
> O'Brien: I didn't know things were going so badly.
> Nog: Frightening, isn't it?
> 
> -- 
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

Send instant messages to your online friends http://uk.messenger.yahoo.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using PK value as a String

2008-08-11 Thread Valentin Bogdanov
--- On Mon, 11/8/08, Gregory Stark <[EMAIL PROTECTED]> wrote:

> From: Gregory Stark <[EMAIL PROTECTED]>
> Subject: Re: [PERFORM] Using PK value as a String
> To: "Jay" <[EMAIL PROTECTED]>
> Cc: pgsql-performance@postgresql.org
> Date: Monday, 11 August, 2008, 10:30 AM
> "Jay" <[EMAIL PROTECTED]> writes:
> 
> > I have a table named table_Users:
> >
> > CREATE TABLE table_Users (
> >UserID   character(40)  NOT NULL default
> '',
> >Username   varchar(256)  NOT NULL default
> '',
> >Email  varchar(256) NOT NULL default
> ''
> >etc...
> > );
> >
> > The UserID is a character(40) and is generated using
> UUID function. We
> > started making making other tables and ended up not
> really using
> > UserID, but instead using Username as the unique
> identifier for the
> > other tables. Now, we pass and insert the Username to
> for discussions,
> > wikis, etc, for all the modules we have developed. I
> was wondering if
> > it would be a performance improvement to use the 40
> Character UserID
> > instead of Username when querying the other tables, or
> if we should
> > change the UserID to a serial value and use that to
> query the other
> > tables. Or just keep the way things are because it
> doesn't really make
> > much a difference.
> 
> Username would not be any slower than UserID unless you
> have a lot of
> usernames longer than 40 characters.
> 
> However making UserID an integer would be quite a bit more
> efficient. It would
> take 4 bytes instead of as the length of the Username which
> adds up when it's
> in all your other tables... Also internationalized text
> collations are quite a
> bit more expensive than a simple integer comparison.
> 
> But the real question here is what's the better design.
> If you use Username
> you'll be cursing if you ever want to provide a
> facility to allow people to
> change their usernames. You may not want such a facility
> now but one day...
> 

I don't understand Gregory's suggestion about the design. I thought using 
natural primary keys as opposed to surrogate ones is a better design strategy, 
even when it comes to performance considerations and even more so if there are 
complex relationships within the database.

Regards,
Valentin


> -- 
>   Gregory Stark
>   EnterpriseDB  http://www.enterprisedb.com
>   Ask me about EnterpriseDB's On-Demand Production
> Tuning
> 
> -- 
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Database size Vs performance degradation

2008-07-30 Thread Valentin Bogdanov
I am guessing that you are using DELETE to remove the 75,000 unimportant.
Change your batch job to CREATE a new table consisting only of the 5,000 
important. You can use "CREATE TABLE table_name AS select_statement" command. 
Then drop the old table. After that you can use ALTER TABLE to change the name 
of the new table to that of the old one. 

I am not an expert but if this is a viable solution for you then I think doing 
it this way will rid you of your bloating problem.

Regards,
Val


--- On Wed, 30/7/08, Dave North <[EMAIL PROTECTED]> wrote:

> From: Dave North <[EMAIL PROTECTED]>
> Subject: [PERFORM] Database size Vs performance degradation
> To: pgsql-performance@postgresql.org
> Date: Wednesday, 30 July, 2008, 1:09 PM
> Morning folks,
>   Long time listener, first time poster.  Having an
> interesting
> problem related to performance which I'll try and
> describe below and
> hopefully get some enlightenment.  First the environment:
> 
> 
> Postgres 8.1.8
>   shared_buffers = 2000
>   max_fsm_pages = 40
> Redhat Enterprise 4
> Running on HP DL380 w/ 4GB RAM, dual 10K HDDs in RAID 0+1
> Also running on the server is a tomcat web server and other
> ancillaries
> 
> Now, the problem.  We have an application that continually
> writes a
> bunch of data to a few tables which is then deleted by a
> batch job each
> night.  We're adding around 80,000 rows to one table
> per day and
> removing around 75,000 that are deemed to be
> "unimportant".  Now, the
> problem we see is that after a period of time, the database
> access
> becomes very 'slow' and the load avg on the machine
> gets up around 5.
> When this happens, the application using the DB basically
> grinds to a
> halt.  Checking the stats, the DB size is around 7.5GB; no
> tables or
> indexes look to be 'bloated' (we have been using
> psql since 7.3 with the
> classic index bloat problem) and the auto-vac has been
> running solidly.
> 
> We had this problem around a month ago and again yesterday.
>  Because the
> application needs reasonably high availability, we
> couldn't full vacuum
> so what we did was a dump and load to another system.  What
> I found here
> was that after the load, the DB size was around 2.7GB - a
> decrease of
> 5GB.  Re-loading this back onto the main system, and the
> world is good.
> 
> One observation I've made on the DB system is the disk
> I/O seems
> dreadfully slow...we're at around 75% I/O wait
> sometimes and the read
> rates seem quite slow (hdparm says around 2.2MB/sec -
> 20MB/sec for
> un-cached reads).  I've also observed that the OS cache
> seems to be
> using all of the remaining memory for it's cache
> (around 3GB) which
> seems probably the best it can do with the available
> memory.
> 
> Now, clearly we need to examine the need for the
> application to write
> and remove so much data but my main question is:
> 
> Why does the size of the database with so much
> "un-used" space seem to
> impact performance so much?  If (in this case) the extra
> 5GB of space is
> essentially "unallocated", does it factor into
> any of the caching or
> performance metrics that the DBMS uses?  And if so, would I
> be better
> having a higher shared_buffers rather than relying so much
> on OS cache?
> 
> Yes, I know we need to upgrade to 8.3 but that's going
> to take some time
> :)
> 
> Many thanks in advance.
> 
> Dave
> 
> ___
> Dave North
> [EMAIL PROTECTED]
> Signiant - Making Media Move
> Visit Signiant at: www.signiant.com
>   
> 
> 
> -- 
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perl/DBI vs Native

2008-07-23 Thread Valentin Bogdanov
Thanks Guys, this is really useful, especially the pg_service.conf. I have got 
an app where the connection parameters have to be set in 3 different places I 
was thinking of writing something myself but now that I know of 
pg_service.conf, problem solved.

Regards,
Val


--- On Tue, 22/7/08, Jeffrey Baker <[EMAIL PROTECTED]> wrote:

> From: Jeffrey Baker <[EMAIL PROTECTED]>
> Subject: Re: [PERFORM] Perl/DBI vs Native
> To: "Greg Sabino Mullane" <[EMAIL PROTECTED]>
> Cc: pgsql-performance@postgresql.org
> Date: Tuesday, 22 July, 2008, 9:35 PM
> On Tue, Jul 22, 2008 at 9:48 AM, Greg Sabino Mullane
> <[EMAIL PROTECTED]> wrote:
> >> In case someone is wondering, the way to force DBI
> to use unix
> >> sockets is by not specifying a host and port in
> the connect call.
> >
> > Actually, the host defaults to the local socket. Using
> the port
> > may still be needed: if you leave it out, it simply
> uses the default
> > value (5432) if left out. Thus, for most purposes,
> just leaving
> > the host out is enough to cause a socket connection on
> the default
> > port.
> 
> For the further illumination of the historical record, the
> best
> practice here is probably to use the pg_service.conf file,
> which may
> or may not live in /etc depending on your operating system.
>  Then you
> can connect in DBI using dbi:Pg:service=whatever, and
> change the
> definition of "whatever" in pg_service.conf. 
> This has the same
> semantics as PGSERVICE=whatever when using psql.  It's
> a good idea to
> keep these connection details out of your program code.
> 
> -jwb
> 
> -- 
> Sent via pgsql-performance mailing list
> (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perl/DBI vs Native

2008-07-22 Thread Valentin Bogdanov
Thanks to everyone who replied. There were some really good points.

However, I found what is causing the difference. The perl program was 
connecting to the database via a TCP socket while the C version was using Unix 
socket. I changed the connect in my perl script, so that it now uses Unix 
sockets as well. Run the tests again and got identical results for both 
programs.

In case someone is wondering, the way to force DBI to use unix sockets is by 
not specifying a host and port in the connect call.


Cheers,

Val


  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Perl/DBI vs Native

2008-07-21 Thread Valentin Bogdanov
Hi,

I have ran quite a few tests comparing how long a query takes to execute from 
Perl/DBI as compared to psql/pqlib. No matter how many times I run the test the 
results were always the same.

I run a SELECT all on a fairly big table and enabled the 
log_min_duration_statement option. With psql postgres consistently logs half a 
second while the exact same query executed with Perl/DBI takes again 
consistently 2 seconds.

If I were timing the applications I would have been too much surprised by these 
results, obviously, processing with Perl would be slower than a native 
application. But it's the postmaster that gives these results. Could it be 
because the DBI module is slower at assimilating the data?

Any light on the subject would be greatly appreciated.


Regards,

Val


  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] postgres overall performance seems to degrade when large SELECT are requested

2008-05-14 Thread Valentin Bogdanov
HI,

I have an application that maintains 150 open connections to a Postgres DB 
server. The application works fine without a problem for the most time. 

The problem seem to arise when a SELECT that returns a lot of rows is executed 
or the SELECT is run on a large object. These selects are run from time to time 
by a separate process whose purpose is to generate reports from the db data.

The problem is that when the SELECTs are run the main application starts 
running out of available connections which means that postgres is not returning 
the query results fast enough. What I find a little bit starnge is that the 
report engine's SELECTs operate on a different set of tables than the ones the 
main application is using. Also the db box is hardly breaking a sweat, CPU and 
memory utilization are ridiculously low and IOwaits are typically less than 10%.

Has anyone experienced this? Are there any settings I can change to improve 
throughput?  Any help will be greatly appreciated.


Thanks,
val


  __
Sent from Yahoo! Mail.
A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance