Re: [PERFORM] Final decision

2005-04-27 Thread Bruce Momjian
Joshua D. Drake wrote:
> Dave Page wrote:
> >  
> > 
> > 
> >>-Original Message-
> >>From: Joshua D. Drake [mailto:[EMAIL PROTECTED] 
> >>Sent: 27 April 2005 17:46
> >>To: Dave Page
> >>Cc: Josh Berkus; Joel Fradkin; PostgreSQL Perform
> >>Subject: Re: [PERFORM] Final decision
> >>
> >>
> >>>It is? No-one told the developers...
> >>
> >>We have mentioned it on the list.
> > 
> > 
> > Err, yes. But that's not quite the same as core telling us the current
> > driver is being replaced.
> 
> Well I don't think anyone knew that the current driver is still being 
> maintained?

We have been looking for someone to take over ODBC and Pervasive agreed
to do it, but there wasn't a big announcement about it.  I have
discussed this with them.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: ODBC driver overpopulation (was Re: [PERFORM] Final decision)

2005-04-27 Thread Bruce Momjian
Joshua D. Drake wrote:
> >>Mind you, having 2 different teams working on two different ODBC drivers is 
> >>a 
> >>problem for another list ...
> > 
> > 
> > Only two?  I thought another commercial entity was also working on their
> > own ODBC driver, so there may be three of them.
> 
> Well I only know of one company actually working on ODBC actively and 
> that is Command Prompt, If there are others I would like to hear about 
> it because I would rather work with someone than against them.

Well, you should talk to Pervasive because they have a team working on
improving the existing driver.  I am sure they would want to work
together too.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Final decision

2005-04-27 Thread Joshua D. Drake
Dave Page wrote:
 


-Original Message-
From: Joshua D. Drake [mailto:[EMAIL PROTECTED] 
Sent: 27 April 2005 17:46
To: Dave Page
Cc: Josh Berkus; Joel Fradkin; PostgreSQL Perform
Subject: Re: [PERFORM] Final decision


It is? No-one told the developers...
We have mentioned it on the list.

Err, yes. But that's not quite the same as core telling us the current
driver is being replaced.
Well I don't think anyone knew that the current driver is still being 
maintained?

Sincerely,
Joshua D. Drake

Regards, Dave.

--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: ODBC driver overpopulation (was Re: [PERFORM] Final decision)

2005-04-27 Thread Joshua D. Drake
Mind you, having 2 different teams working on two different ODBC drivers is a 
problem for another list ...

Only two?  I thought another commercial entity was also working on their
own ODBC driver, so there may be three of them.
Well I only know of one company actually working on ODBC actively and 
that is Command Prompt, If there are others I would like to hear about 
it because I would rather work with someone than against them.

Sincerely,
Joshua D. Drake



--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


ODBC driver overpopulation (was Re: [PERFORM] Final decision)

2005-04-27 Thread Alvaro Herrera
On Wed, Apr 27, 2005 at 08:09:27PM -0700, Josh Berkus wrote:

> Mind you, having 2 different teams working on two different ODBC drivers is a 
> problem for another list ...

Only two?  I thought another commercial entity was also working on their
own ODBC driver, so there may be three of them.

-- 
Alvaro Herrera (<[EMAIL PROTECTED]>)
"Always assume the user will do much worse than the stupidest thing
you can imagine."(Julien PUYDT)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Final decision

2005-04-27 Thread Josh Berkus
Dave, folks,

> Err, yes. But that's not quite the same as core telling us the current
> driver is being replaced.

Sorry, I spoke off the cuff.I also was unaware that work on the current 
driver had renewed.   Us Core people are not omnicient, believe it or don't.

Mind you, having 2 different teams working on two different ODBC drivers is a 
problem for another list ...

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Why is this system swapping?

2005-04-27 Thread Josh Berkus
Greg,

> In fact I think it's generally superior to having a layer like pgpool
> having to hand off all your database communication. Having to do an extra
> context switch to handle every database communication is crazy.

Although, one of their issues is that their database connection pooling is 
per-server.Which means that a safety margin of pre-allocated connections 
(something they need since they get bursts of 1000 new users in a few 
seconds) has to be maintained per server, increasing the total number of 
connections.   

So a pooling system that allowed them to hold 100 free connections centrally 
rather than 10 per server might be a win.

Better would be getting some of this stuff offloaded onto database replication 
slaves.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Why is this system swapping?

2005-04-27 Thread Greg Stark
Jeff <[EMAIL PROTECTED]> writes:

> Are you (Anjan) using real or fake connection pooling - ie pgpool versus php's
> persistent connections ?  I'd strongly recommend looking at pgpool. it does
> connection pooling correctly (A set of X connections shared among the entire
> box rather than 1 per web server)

Having one connection per web process isn't "fake connection pooling", it's a
completely different arrangement. And there's nothing "incorrect" about it. 

In fact I think it's generally superior to having a layer like pgpool having
to hand off all your database communication. Having to do an extra context
switch to handle every database communication is crazy. 

For typical web sites where the database is the only slow component there's
not much point in having more web server processes than connections anyways,
All your doing is transferring the wait time from waiting for a web server
process to waiting for a database process.

Most applications that find they need connection pooling are using it to work
around a poorly architected system that is mixing static requests (like
images) and database driven requests in the same web server.

However, your application sounds like it's more involved than a typical web
server. If it's handling many slow resources, such as connections to multiple
databases, SOAP services, mail, or other network services then you may well
need that many processes. In which case you'll need something like pgpool.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-27 Thread Jim C. Nasby
BTW, http://stats.distributed.net/~decibel/base.log is a test I ran;
select count(*) was ~6x faster than explain analyze select *.

On Tue, Apr 26, 2005 at 07:46:52PM -0700, Kevin Brown wrote:
> Josh Berkus wrote:
> > Jim, Kevin,
> > 
> > > > Hrm... I was about to suggest that for timing just the query (and not
> > > > output/data transfer time) using explain analyze, but then I remembered
> > > > that explain analyze can incur some non-trivial overhead with the timing
> > > > calls. Is there a way to run the query but have psql ignore the output?
> > > > If so, you could use \timing.
> > >
> > > Would timing "SELECT COUNT(*) FROM (query)" work?
> > 
> > Just \timing would work fine; PostgreSQL doesn't return anything until it 
> > has 
> > the whole result set.  
> 
> Hmm...does \timing show the amount of elapsed time between query start
> and the first results handed to it by the database (even if the
> database itself has prepared the entire result set for transmission by
> that time), or between query start and the last result handed to it by
> the database?
> 
> Because if it's the latter, then things like server<->client network
> bandwidth are going to affect the results that \timing shows, and it
> won't necessarily give you a good indicator of how well the database
> backend is performing.  I would expect that timing SELECT COUNT(*)
> FROM (query) would give you an idea of how the backend is performing,
> because the amount of result set data that has to go over the wire is
> trivial.
> 
> Each is, of course, useful in its own right, and you want to be able
> to measure both (so, for instance, you can get an idea of just how
> much your network affects the overall performance of your queries).
> 
> 
> > That's why MSSQL vs. PostgreSQL timing comparisons are 
> > deceptive unless you're careful:  MSSQL returns the results on block at a 
> > time, and reports execution time as the time required to return the *first* 
> > block, as opposed to Postgres which reports the time required to return the 
> > whole dataset.
> 
> Interesting.  I had no idea MSSQL did that, but I can't exactly say
> I'm surprised.  :-)
> 
> 
> -- 
> Kevin Brown [EMAIL PROTECTED]
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly
> 

-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Why is this system swapping?

2005-04-27 Thread Anjan Dave
Using Resin's connection pooling. We are looking into pgpool alongside
slony to separate some reporting functionality.

-anjan

-Original Message-
From: Jeff [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 27, 2005 3:29 PM
To: Greg Stark
Cc: Anjan Dave; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why is this system swapping?


On Apr 27, 2005, at 2:29 PM, Greg Stark wrote:

> "AI would seriously look at tuning those connection pools down. A lot.

> If your
> server processes are sitting idle over half the time I would at least 
> cut it
> by a factor of 2.
>

Are you (Anjan) using real or fake connection pooling - ie pgpool 
versus php's persistent connections ?  I'd strongly recommend looking 
at pgpool. it does connection pooling correctly (A set of X connections 
shared among the entire box rather than 1 per web server)

--

Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Why is this system swapping?

2005-04-27 Thread Anjan Dave
Yes, HT is turned off (I haven't seen any recommendations to keep it
on).

This is when we were seeing 30 to 50% less traffic (users) than today -
we didn't want the idle connections in the pool to expire too soon
(default 30 secs, after which it goes back to pool) and reopen it
quickly, or not have sufficient available (default 20 conns, we raised
it to 50), so we figured a number per app server (50) and set that to
expire after a very long time, so as to avoid any overhead, and always
have the connection available whenever needed, without opening a new
one. 

But now, for *some* reason, in some part of the day, we use up almost
all connections in each app's pool. After that since they are set to
expire after a long time, they remain there, taking up DB resources.

I will be trimming down the idle-timeout to a few minutes first, see if
that helps.

Thanks,
Anjan

-Original Message-
From: Greg Stark [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 27, 2005 2:29 PM
To: Anjan Dave
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why is this system swapping?

"Anjan Dave" <[EMAIL PROTECTED]> writes:

> Some background: 
> 
> This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4...pretty
heavy
> on concurrent usage. With peak traffic (db allows 1000 connections, in
> line with the number of app servers and connection pools for each)
> following is from 'top' (sorted by mem) Shared_buffers is 170MB,
> sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre
channel
> storage, RAID10.
> 
> 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped
> 
> CPU states:  cpuusernice  systemirq  softirq  iowait
idle
>total   57.2%0.0%   23.2%   0.0% 3.6%   82.8%
232.4%

This looks to me like most of your server processes are sitting around
idle
most of the time.

> 21397 postgres  22   0  181M 180M  175M D25.9  1.5  85:17   0
> postmaster
> 
> 23820 postgres  15   0  178M 177M  175M S 0.0  1.5   1:53   3
> postmaster

So each process is taking up 8-11M of ram beyond the shared memory.
1,000 x
10M is 10G. Add in some memory for page tables and kernel data
structures, as
well as the kernel's need to keep some memory set aside for filesystem
buffers
(what you really want all that memory being used for anyways) and you've
used
up all your 12G.

I would seriously look at tuning those connection pools down. A lot. If
your
server processes are sitting idle over half the time I would at least
cut it
by a factor of 2.

Working the other direction: you have four processors (I guess you have
hyperthreading turned off?) so ideally what you want is four runnable
processes at all times and as few others as possible. If your load
typically
spends about half the time waiting on i/o (which is what that top output
says)
then you want a total of 8 connections.

Realistically you might not be able to predict which app server will be
providing the load at any given time, so you might want 8 connections
per app
server. 

And you might have some load that's more i/o intensive than the 50% i/o
load
shown here. Say you think some loads will be 80% i/o, you might want 20
connections for those loads. If you had 10 app servers with 20
connections
each for a total of 200 connections I suspect that would be closer to
right
than having 1,000 connections.

200 connections would consume 2G of ram leaving you with 10G of
filesystem
cache. Which might in turn decrease the percentage of time waiting on
i/o,
which would decrease the number of processes you need even further...

-- 

greg



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Final decision

2005-04-27 Thread Dave Page
 

> -Original Message-
> From: Joshua D. Drake [mailto:[EMAIL PROTECTED] 
> Sent: 27 April 2005 17:46
> To: Dave Page
> Cc: Josh Berkus; Joel Fradkin; PostgreSQL Perform
> Subject: Re: [PERFORM] Final decision
> 
> > It is? No-one told the developers...
> 
> We have mentioned it on the list.

Err, yes. But that's not quite the same as core telling us the current
driver is being replaced.

Regards, Dave.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Final decision

2005-04-27 Thread John A Meinel
Joel Fradkin wrote:
...
I am guessing our app is like 75% data entry and 25% reporting, but the
reporting is taking the toll SQL wise.
This was from my insert test with 15 users.
Test type: Dynamic
 Simultaneous browser connections: 15
 Warm up time (secs): 0
 Test duration: 00:00:03:13
 Test iterations: 200
 Detailed test results generated: Yes
Response Codes
 Response Code: 403 - The server understood the request, but is refusing to
fulfill it.
  Count: 15
  Percent (%): 0.29
 Response Code: 302 - The requested resource resides temporarily under a
different URI (Uniform Resource Identifier).
  Count: 200
  Percent (%): 3.85
 Response Code: 200 - The request completed successfully.
  Count: 4,980
  Percent (%): 95.86
My select test with 25 users had this
Properties
 Test type: Dynamic
 Simultaneous browser connections: 25
 Warm up time (secs): 0
 Test duration: 00:00:06:05
 Test iterations: 200
 Detailed test results generated: Yes
Summary
 Total number of requests: 187
 Total number of connections: 200
 Average requests per second: 0.51
 Average time to first byte (msecs): 30,707.42
 Average time to last byte (msecs): 30,707.42
 Average time to last byte per iteration (msecs): 28,711.44
 Number of unique requests made in test: 1
 Number of unique response codes: 1
Well, having a bandwidth of 392Bps seems *really* low. I mean that is a
very old modem speed (3200 baud).
I'm wondering if you are doing a lot of aggregating in the web server,
and if you couldn't move some of that into the database by using plpgsql
functions.
That would take some of the load off of your IIS servers, and possibly
improve your overall bandwidth.
But I do agree, it looks like the select side is where you are hurting.
If I understand the numbers correctly, you can do 5k inserts in 3min,
but are struggling to do 200 selects in 6min.
John
=:->
Errors Counts
 HTTP: 0
 DNS: 0
 Socket: 26
Additional Network Statistics
 Average bandwidth (bytes/sec): 392.08
 Number of bytes sent (bytes): 64,328
 Number of bytes received (bytes): 78,780
 Average rate of sent bytes (bytes/sec): 176.24
 Average rate of received bytes (bytes/sec): 215.84
 Number of connection errors: 0
 Number of send errors: 13
 Number of receive errors: 13
 Number of timeout errors: 0
Response Codes
 Response Code: 200 - The request completed successfully.
  Count: 187
  Percent (%): 100.00

Joel



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Why is this system swapping?

2005-04-27 Thread Jeff
On Apr 27, 2005, at 2:29 PM, Greg Stark wrote:
"AI would seriously look at tuning those connection pools down. A lot. 
If your
server processes are sitting idle over half the time I would at least 
cut it
by a factor of 2.

Are you (Anjan) using real or fake connection pooling - ie pgpool 
versus php's persistent connections ?  I'd strongly recommend looking 
at pgpool. it does connection pooling correctly (A set of X connections 
shared among the entire box rather than 1 per web server)

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Final decision

2005-04-27 Thread Steve Poe
Joshua,
This article was in July 2002, so is there update to this information? 
When will a new ODBC driver be available for testing? Is there a release 
of the ODBC driver with better performance than 7.0.3.0200 for a 7.4.x 
database?

Steve Poe
We have mentioned it on the list.
http://www.linuxdevcenter.com/pub/a/linux/2002/07/16/drake.html
Regards, Dave
[and yes, I know Joshua said Command Prompt are rewriting /their/
driver]

:) No we are rewriting a complete OSS driver.
Sincerely,
Joshua D. Drake
Command Prompt, Inc.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Why is this system swapping?

2005-04-27 Thread Anjan Dave
Sorry, I didn't attach vmstat, the system does actively swap pages. Not
to the point where it crawls, but for some brief periods the console
becomes a bit unresponsive. I am taking this as a sign to prevent future
problems.

anjan

-Original Message-
From: Jeff [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 27, 2005 2:30 PM
To: Anjan Dave
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Why is this system swapping?


On Apr 27, 2005, at 1:48 PM, Anjan Dave wrote:

> As you can see the system starts utilizing swap at some point, with so

> many processes. Some time ago we had decided to keep the connections 
> from the pool open for longer

You've shown the system has used swap but not that it is swapping.  
Having swap in use is fine - there is likely plenty of code and whatnot 
that is not being used so it dumped it out to swap. However if you are 
actively moving data to/from swap that is bad. Very bad. Especially on 
linux.

To tell if you are swapping you need to watch the output of say, vmstat 
1 and look at the si and so columns.

Linux is very swap happy and likes to swap things for fun and profit.

--

Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Final decision

2005-04-27 Thread Joel Fradkin
Just realize, you probably *don't* want to set that in postgresql.conf.
You just want to issue an "SET enable_seqscan TO off" before issuing one
of the queries that are mis-planned.

I believe all the tested queries (90 some odd views) saw an improvement.
I will however take the time to verify this and take your suggestion as I
can certainly put the appropriate settings in each as opposed to using the
config option, Thanks for the good advice (I believe Josh from
Commandprompt.com also suggested this approach and I in my lazy self some
how blurred the concept.)


Also, I second the notion of getting a confidentiality contract. There
have been several times where someone had a pathological case, and by
sending the data to someone (Tom Lane), they were able to track down and
fix the problem.

Excellent point, Our data is confidential, but I should write something to
allow me to ship concept without confidential, so in the future I can just
send a backup and not have it break our agreements, but allow minds greater
then my own to see, and feel my issues.


What do you mean by "blew up"? 
IIS testing was being done with an old 2300 and a optiplex both machines
reached 100%CPU utilization and the test suite (ASP code written in house by
one of programmers) was not returning memory correctly, so it ran out of
memory and died. Prior to death I did see cpu utilization on the 4proc linux
box running postgres fluctuate and at times hit the 100% level, but the
server seemed very stable. I did fix the memory usage of the suite and was
able to see 50 concurrent users with fairly high RPS especially on select
testing, the insert and update seemed to fall apart (many 404 errors etc)


I assume you have IIS on a different
machine than the database. Are you saying that the database slowed down
dramatically, or that the machine crashed, or just that the web
interface became unresponsive? Just the web interface.

It probably depends on what queries are being done, and what kind of
times you need. Usually the update machine needs the stronger hardware,
so that it can do the writing.

But it depends if you can wait longer to update data than to query data,
obviously the opposite is true. It all depends on load, and that is
pretty much application defined.

I am guessing our app is like 75% data entry and 25% reporting, but the
reporting is taking the toll SQL wise.

This was from my insert test with 15 users.
Test type: Dynamic 
 Simultaneous browser connections: 15 
 Warm up time (secs): 0 
 Test duration: 00:00:03:13 
 Test iterations: 200 
 Detailed test results generated: Yes
Response Codes 

 Response Code: 403 - The server understood the request, but is refusing to
fulfill it. 
  Count: 15 
  Percent (%): 0.29 
 
 
 Response Code: 302 - The requested resource resides temporarily under a
different URI (Uniform Resource Identifier). 
  Count: 200 
  Percent (%): 3.85 
 
 
 Response Code: 200 - The request completed successfully. 
  Count: 4,980 
  Percent (%): 95.86 
 
My select test with 25 users had this
Properties 

 Test type: Dynamic 
 Simultaneous browser connections: 25 
 Warm up time (secs): 0 
 Test duration: 00:00:06:05 
 Test iterations: 200 
 Detailed test results generated: Yes 
  
Summary 

 Total number of requests: 187 
 Total number of connections: 200 
  
 Average requests per second: 0.51 
 Average time to first byte (msecs): 30,707.42 
 Average time to last byte (msecs): 30,707.42 
 Average time to last byte per iteration (msecs): 28,711.44 
  
 Number of unique requests made in test: 1 
 Number of unique response codes: 1 
  
Errors Counts 

 HTTP: 0 
 DNS: 0 
 Socket: 26 
  
Additional Network Statistics 

 Average bandwidth (bytes/sec): 392.08 
  
 Number of bytes sent (bytes): 64,328 
 Number of bytes received (bytes): 78,780 
  
 Average rate of sent bytes (bytes/sec): 176.24 
 Average rate of received bytes (bytes/sec): 215.84 
  
 Number of connection errors: 0 
 Number of send errors: 13 
 Number of receive errors: 13 
 Number of timeout errors: 0 
  
Response Codes 

 Response Code: 200 - The request completed successfully. 
  Count: 187 
  Percent (%): 100.00 
 



Joel


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Why is this system swapping?

2005-04-27 Thread Greg Stark
"Anjan Dave" <[EMAIL PROTECTED]> writes:

> Some background: 
> 
> This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4...pretty heavy
> on concurrent usage. With peak traffic (db allows 1000 connections, in
> line with the number of app servers and connection pools for each)
> following is from 'top' (sorted by mem) Shared_buffers is 170MB,
> sort_mem 2MB. Both WAL and pgdata are on separate LUNs on fibre channel
> storage, RAID10.
> 
> 972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped
> 
> CPU states:  cpuusernice  systemirq  softirq  iowaitidle
>total   57.2%0.0%   23.2%   0.0% 3.6%   82.8%  232.4%

This looks to me like most of your server processes are sitting around idle
most of the time.

> 21397 postgres  22   0  181M 180M  175M D25.9  1.5  85:17   0
> postmaster
> 
> 23820 postgres  15   0  178M 177M  175M S 0.0  1.5   1:53   3
> postmaster

So each process is taking up 8-11M of ram beyond the shared memory. 1,000 x
10M is 10G. Add in some memory for page tables and kernel data structures, as
well as the kernel's need to keep some memory set aside for filesystem buffers
(what you really want all that memory being used for anyways) and you've used
up all your 12G.

I would seriously look at tuning those connection pools down. A lot. If your
server processes are sitting idle over half the time I would at least cut it
by a factor of 2.

Working the other direction: you have four processors (I guess you have
hyperthreading turned off?) so ideally what you want is four runnable
processes at all times and as few others as possible. If your load typically
spends about half the time waiting on i/o (which is what that top output says)
then you want a total of 8 connections.

Realistically you might not be able to predict which app server will be
providing the load at any given time, so you might want 8 connections per app
server. 

And you might have some load that's more i/o intensive than the 50% i/o load
shown here. Say you think some loads will be 80% i/o, you might want 20
connections for those loads. If you had 10 app servers with 20 connections
each for a total of 200 connections I suspect that would be closer to right
than having 1,000 connections.

200 connections would consume 2G of ram leaving you with 10G of filesystem
cache. Which might in turn decrease the percentage of time waiting on i/o,
which would decrease the number of processes you need even further...

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Why is this system swapping?

2005-04-27 Thread Jeff
On Apr 27, 2005, at 1:48 PM, Anjan Dave wrote:
As you can see the system starts utilizing swap at some point, with so 
many processes. Some time ago we had decided to keep the connections 
from the pool open for longer
You've shown the system has used swap but not that it is swapping.  
Having swap in use is fine - there is likely plenty of code and whatnot 
that is not being used so it dumped it out to swap. However if you are 
actively moving data to/from swap that is bad. Very bad. Especially on 
linux.

To tell if you are swapping you need to watch the output of say, vmstat 
1 and look at the si and so columns.

Linux is very swap happy and likes to swap things for fun and profit.
--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Final decision

2005-04-27 Thread Joel Fradkin
BTW, your performance troubleshooting will continue to be hampered if you 
can't share actual queries and data structure.   I strongly suggest that you

make a confidentiality contract with  a support provider so that you can
give them detailed (rather than general) problem reports.

I am glad to hear your perspective, maybe my rollout is not as off base as I
thought.

FYI it is not that I can not share specifics (I have posted a few table
structures and views here and on pgsql, I just can not backup the entire
database and ship it off to a consultant.

What I had suggested with Commandprompt was to use remote connectivity for
him to have access to our server directly. In this way I can learn by
watching what types of test he does and it allows him to do tests with our
data set.

Once I am in production that will not be something I want tests done on, so
it may have to wait until we get a development box with a similar deployment
(at the moment development is on a XP machine and production will be on
Linux (The 4 proc is linux and will be our production).

Thank you for letting me know what I can hope to see in the way of disk
access on the next hardware procurement, I may email you off list to get the
specific brands etc that you found that kind of through put with.




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Why is this system swapping?

2005-04-27 Thread Anjan Dave








Hello,

 

I am trying to understand what I need to do for this system
to stop using swap. Maybe it’s something simple, or obvious for the
situation. I’d appreciate some thoughts/suggestions.

 

Some background: 

This is a quad XEON (yes, Dell) with 12GB of RAM, pg 7.4…pretty
heavy on concurrent usage. With peak traffic (db allows 1000 connections, in
line with the number of app servers and connection pools for each) following is
from ‘top’ (sorted by mem) Shared_buffers is 170MB, sort_mem 2MB.
Both WAL and pgdata are on separate LUNs on fibre channel storage, RAID10.

 

972 processes: 971 sleeping, 1 running, 0 zombie, 0 stopped

CPU states:  cpu   
user    nice  system    irq 
softirq  iowait    idle

  
total   57.2%    0.0%   23.2%  
0.0% 3.6%   82.8%  232.4%

  
cpu00   22.0%    0.0%   
9.1%   0.1%     0.9%  
18.7%   48.8%

  
cpu01   17.5%    0.0%   
5.8%   0.0% 2.3%  
19.7%   54.4%

  
cpu02    7.8%    0.0%   
3.7%   0.0% 0.0%  
20.8%   67.5%

  
cpu03    9.7%    0.0%   
4.4%   0.0% 0.5%  
23.6%   61.5%

Mem:  12081744k av, 12055220k used,   26524k
free,   0k shrd,   71828k buff

  
9020480k actv, 1741348k in_d,  237396k in_c

Swap: 4096532k av,  472872k used, 3623660k
free
9911176k cached

 

  PID USER PRI  NI 
SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND

21397 postgres  22   0  181M 180M 
175M D    25.9  1.5  85:17   0 postmaster

23820 postgres  15   0  178M 177M 
175M S 0.0  1.5   1:53   3
postmaster

24428 postgres  15   0  178M 177M 
175M S 0.0  1.5   1:35   3
postmaster

24392 postgres  15   0  178M 177M 
175M S 2.7  1.5   2:07   2
postmaster

23610 postgres  15   0  178M 177M 
175M S 0.0  1.5   0:29   2
postmaster

24395 postgres  15   0  178M 177M 
175M S 0.0  1.5   1:12   1
postmaster

…

…

-bash-2.05b$ free

   
  total used  
free shared   
buffers cached

Mem:  12081744  
12055536 
26208 
0  66704    9943988

-/+ buffers/cache:    2044844  
10036900

Swap: 
4096532 512744    3583788

 

As you can see the system starts utilizing swap at some
point, with so many processes. Some time ago we had decided to keep the
connections from the pool open for longer periods of time, possibly to avoid
connection maintenance overhead on the db. At that time the traffic was not as
high as it is today, which might be causing this, because for the most part,
non-idle postmaster processes are only a few, except when the system becomes
busy and suddenly you see a lot of selects piling up, and load averages
shooting upwards. I am thinking closing out connections sooner might help the
system release some memory to the kernel. Swapping adds up to the IO, although
OS is on separate channel than postgres.

 

I can add more memory, but I want to make sure I haven’t
missed out something obvious.

 

Thanks!

Anjan

 

 **This e-mail and any files transmitted with it are intended for the use of the addressee(s) only and may be confidential and covered by the attorney/client and other privileges.  If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.**

 








Re: [PERFORM] Final decision

2005-04-27 Thread John A Meinel
Joel Fradkin wrote:
I spent a great deal of time over the past week looking seriously at
Postgres and MYSQL.
Objectively I am not seeing that much of an improvement in speed with
MYSQL, and we have a huge investment in postgrs.
So I am planning on sticking with postgres fro our production database
(going live this weekend).
Glad to hear it. Good luck.

...
Things I still have to make better are my settings in config, I have it
set to no merge joins and no seq scans.
Just realize, you probably *don't* want to set that in postgresql.conf.
You just want to issue an "SET enable_seqscan TO off" before issuing one
of the queries that are mis-planned.
Because there are lots of times when merge join and seq scan is actually
faster than the alternatives. And since I don't think you tested every
query you are going to run, you probably want to let the planner handle
the ones it gets right. (Usually it doesn't quite a good job.)
Also, I second the notion of getting a confidentiality contract. There
have been several times where someone had a pathological case, and by
sending the data to someone (Tom Lane), they were able to track down and
fix the problem.
I am going to have to use flattened history files for reporting (I saw
huge difference here the view for audit cube took 10 minutes in explain
analyze and the flattened file took under one second).

I understand both of these practices are not desirable, but I am at a
place where I have to get it live and these are items I could not resolve.
Nothing wrong with a properly updated flattened table. You just need to
be careful to keep it consistent with the rest of the data. (Update
triggers/lazy materialization, etc)
I may try some more time with Commanpromt.com, or seek other
professional help.

In stress testing I found Postgres was holding up very well (but my IIS
servers could not handle much of a load to really push the server).
I have a few desktops acting as IIS servers at the moment and if I
pushed past 50 consecutive users it pretty much blew the server up.
On inserts that number was like 7 consecutive users and updates was also
like 7 users.

I believe that was totally IIS not postgres, but I am curious as to if
using postgres odbc will put more stress on the IIS side then MSSQL did.
What do you mean by "blew up"? I assume you have IIS on a different
machine than the database. Are you saying that the database slowed down
dramatically, or that the machine crashed, or just that the web
interface became unresponsive?
I did have a question if any folks are using two servers one for
reporting and one for data entry what system should be the beefier?
I have a 2proc machine I will be using and I can either put Sears off by
themselves on this machine or split up functionality and have one for
reporting and one for inserts and updates; so not sure which machine
would be best for which spot (reminder the more robust is a 4proc with 8
gigs and 2 proc is 4 gigs, both dells).
It probably depends on what queries are being done, and what kind of
times you need. Usually the update machine needs the stronger hardware,
so that it can do the writing.
But it depends if you can wait longer to update data than to query data,
obviously the opposite is true. It all depends on load, and that is
pretty much application defined.

Thank you for any ideas in this arena.

Joel Fradkin
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Suggestions for a data-warehouse migration routine

2005-04-27 Thread John A Meinel
Richard Rowell wrote:
I've ported enough of my companies database to Postgres to make
warehousing on PG a real possibility.  I thought I would toss my data
migration architecture ideas out for the list to shoot apart..
1.  Script on production server dumps the production database (MSSQL) to
a set of delimited text files.
2.  Script on production server moves files via FTP to a Postgres
database server.
3.  File Alteration Monitor trigger on PG server executes script when
last file is transferred.
4.  Script on PG server drops the target database (by issuing a "dropdb"
command).
5.  Script on PG server re-creates target database. (createdb command)
6.  Script on PG server re-creates the tables.
7.  Script on PG server issues COPY commands to import data.
8.  Script on PG server indexes tables.
9.  Script on PG server builds de-normalized reporting tables.
10. Script on PG server indexes the reporting tables.
11. Script on PG server creates needed reporting functions.
12. Vacuum analyze?
My question revolves around the drop/create for the database.  Is their
significant downside to this approach?  I'm taking this approach because
it is simpler from a scripting point of view to simply start from
scratch on each warehouse update.  If I do not drop the database I would
need to delete the contents of each table and drop all indexes prior to
the COPY/data import.  My assumption is all the table deletes and index
drops would be more expensive then just droping/re-creating the entire
database.
I believe you are correct. If you are going to completely wipe the
database, just drop it and re-create. Deleting is much slower than
dropping. (One of the uses of partitioning is so that you can just drop
one of the tables, rather than deleting the entries). Dropping the whole
db skips any Foreign Key checks, etc.
Also, is the Vacuum analyze step needed on a freshly minted database
where the indexes have all been newly created?
Thanks in advance for all feedback.
ANALYZE is needed, since you haven't updated any of your statistics yet.
So the planner doesn't really know how many rows there are.
VACUUM probably isn't since everything should be pretty well aligned.
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-27 Thread Greg Stark

"Dave Held" <[EMAIL PROTECTED]> writes:

> > Actually, it's more to characterize how large of a sample
> > we need.  For example, if we sample 0.005 of disk pages, and
> > get an estimate, and then sample another 0.005 of disk pages
> > and get an estimate which is not even close to the first
> > estimate, then we have an idea that this is a table which 
> > defies analysis based on small samples.  
> 
> I buy that.

Better yet is to use the entire sample you've gathered of .01 and then perform
analysis on that sample to see what the confidence interval is. Which is
effectively the same as what you're proposing except looking at every possible
partition.

Unfortunately the reality according to the papers that were sent earlier is
that you will always find the results disappointing. Until your sample is
nearly the entire table your estimates for n_distinct will be extremely
unreliable.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Final decision

2005-04-27 Thread Joshua D. Drake
It is? No-one told the developers...

We have mentioned it on the list.
http://www.linuxdevcenter.com/pub/a/linux/2002/07/16/drake.html
Ooops ;)
http://archives.postgresql.org/pgsql-odbc/2005-03/msg00109.php
Sincerely,
Joshua D. Drake
Command Prompt, Inc.
--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedication Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Final decision

2005-04-27 Thread Joshua D. Drake
Dave Page wrote:
 


-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
Josh Berkus
Sent: 27 April 2005 17:14
To: Joel Fradkin
Cc: PostgreSQL Perform
Subject: Re: [PERFORM] Final decision

Actually, I think the problem may be ODBC.   Our ODBC driver 
is not the best 
and is currently being re-built from scratch.   

It is? No-one told the developers...
We have mentioned it on the list.
http://www.linuxdevcenter.com/pub/a/linux/2002/07/16/drake.html
Regards, Dave
[and yes, I know Joshua said Command Prompt are rewriting /their/
driver]
:) No we are rewriting a complete OSS driver.
Sincerely,
Joshua D. Drake
Command Prompt, Inc.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedication Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Final decision

2005-04-27 Thread Josh Berkus
Dave,

> > Actually, I think the problem may be ODBC.   Our ODBC driver
> > is not the best
> > and is currently being re-built from scratch.
>
> It is? No-one told the developers...
>
> Regards, Dave
>
> [and yes, I know Joshua said Command Prompt are rewriting /their/
> driver]

OK.   Well, let's put it this way:  the v3 and v3.5 drivers will not be based 
on the current driver, unless you suddenly have a bunch of free time.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Final decision

2005-04-27 Thread Dave Page
 

> -Original Message-
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Josh Berkus
> Sent: 27 April 2005 17:14
> To: Joel Fradkin
> Cc: PostgreSQL Perform
> Subject: Re: [PERFORM] Final decision
> 
> Actually, I think the problem may be ODBC.   Our ODBC driver 
> is not the best 
> and is currently being re-built from scratch.   

It is? No-one told the developers...

Regards, Dave

[and yes, I know Joshua said Command Prompt are rewriting /their/
driver]

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Final decision

2005-04-27 Thread Josh Berkus
Joel,

> So I am planning on sticking with postgres fro our production database
> (going live this weekend).

Glad to have you.

> I did not find any resolutions to my issues with Commandprompt.com (we only
> worked together 2.5 hours).

BTW, your performance troubleshooting will continue to be hampered if you 
can't share actual queries and data structure.   I strongly suggest that you 
make a confidentiality contract with  a support provider so that you can give 
them detailed (rather than general) problem reports.

> Most of my application is working about the same speed as MSSQL server
> (unfortunately its twice the speed box, but as many have pointed out it
> could be an issue with the 4 proc dell). I spent considerable time with
> Dell and could see my drives are delivering 40 meg per sec.

FWIW, on a v40z I get 180mb/s.   So your disk array on the Dell is less than 
ideal ... basically, what you have is a more expensive box, not a faster 
one :-(

> Things I still have to make better are my settings in config, I have it set
> to no merge joins and no seq scans.

Yeah, I'm also finding that our estimator underestimates the real cost of 
merge joins on some systems.Basically we need a sort-cost variable, 
because I've found an up to 2x difference in sort cost depending on 
architecture.

> I am going to have to use flattened history files for reporting (I saw huge
> difference here the view for audit cube took 10 minutes in explain analyze
> and the flattened file took under one second).
> I understand both of these practices are not desirable, but I am at a place
> where I have to get it live and these are items I could not resolve.

Flattening data for reporting is completely reasonable; I do it all the time.

> I believe that was totally IIS not postgres, but I am curious as to if
> using postgres odbc will put more stress on the IIS side then MSSQL did.

Actually, I think the problem may be ODBC.   Our ODBC driver is not the best 
and is currently being re-built from scratch.   Is using npgsql, a much 
higher-performance driver (for .NET) out of the question?  According to one 
company, npgsql performs better than drivers supplied by Microsoft.

> I did have a question if any folks are using two servers one for reporting
> and one for data entry what system should be the beefier?

Depends on the relative # of users.This is often a good approach, because 
the requirements for DW reporting and OLTP are completely different.  
Basically:
OLTP: Many slow processors, disk array set up for fast writes, moderate shared 
mem, low work_mem.
DW: Few fast processors, disk array set up for fast reads, high shared mem and 
work mem.

If reporting is at least 1/4 of your workload, I'd suggest spinning that off 
to the 2nd machine before putting one client on that machine.That way you 
can also use the 2nd machine as a failover back-up.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Suggestions for a data-warehouse migration routine

2005-04-27 Thread Richard Rowell
I've ported enough of my companies database to Postgres to make
warehousing on PG a real possibility.  I thought I would toss my data
migration architecture ideas out for the list to shoot apart..

1.  Script on production server dumps the production database (MSSQL) to
a set of delimited text files.  
2.  Script on production server moves files via FTP to a Postgres
database server.  
3.  File Alteration Monitor trigger on PG server executes script when
last file is transferred.
4.  Script on PG server drops the target database (by issuing a "dropdb"
command).
5.  Script on PG server re-creates target database. (createdb command)
6.  Script on PG server re-creates the tables.
7.  Script on PG server issues COPY commands to import data.
8.  Script on PG server indexes tables.
9.  Script on PG server builds de-normalized reporting tables.
10. Script on PG server indexes the reporting tables.
11. Script on PG server creates needed reporting functions.
12. Vacuum analyze?

My question revolves around the drop/create for the database.  Is their
significant downside to this approach?  I'm taking this approach because
it is simpler from a scripting point of view to simply start from
scratch on each warehouse update.  If I do not drop the database I would
need to delete the contents of each table and drop all indexes prior to
the COPY/data import.  My assumption is all the table deletes and index
drops would be more expensive then just droping/re-creating the entire
database.

Also, is the Vacuum analyze step needed on a freshly minted database
where the indexes have all been newly created?

Thanks in advance for all feedback.


-- 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-27 Thread Dave Held
> -Original Message-
> From: Josh Berkus [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, April 27, 2005 10:25 AM
> To: Andrew Dunstan
> Cc: Mischa Sandberg; pgsql-perform; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks
> suggested?
> 
> [...]
> Actually, it's more to characterize how large of a sample
> we need.  For example, if we sample 0.005 of disk pages, and
> get an estimate, and then sample another 0.005 of disk pages
> and get an estimate which is not even close to the first
> estimate, then we have an idea that this is a table which 
> defies analysis based on small samples.  

I buy that.

> Wheras if the two estimates are < 1.0 stdev apart, we can
> have good confidence that the table is easily estimated. 

I don't buy that.  A negative indication is nothing more than
proof by contradiction.  A positive indication is mathematical
induction over the set, which in this type of context is 
logically unsound.  There is no reason to believe that two
small samples with a small difference imply that a table is
easily estimated rather than that you got unlucky in your
samples.

> [...]
> Yes, actually.   We need 3 different estimation methods:
> 1 for tables where we can sample a large % of pages
> (say, >= 0.1)
> 1 for tables where we sample a small % of pages but are 
> "easily estimated"
> 1 for tables which are not easily estimated by we can't 
> afford to sample a large % of pages.

I don't buy that the first and second need to be different
estimation methods.  I think you can use the same block
sample estimator for both, and simply stop sampling at
different points.  If you set the default to be a fixed
number of blocks, you could get a large % of pages on
small tables and a small % of pages on large tables, which
is exactly how you define the first two cases.  However,
I think such a default should also be overridable to a
% of the table or a desired accuracy.

Of course, I would recommend the distinct sample technique
for the third case.

> If we're doing sampling-based estimation, I really don't
> want people to lose sight of the fact that page-based random
> sampling is much less expensive than row-based random
> sampling.   We should really be focusing on methods which 
> are page-based.

Of course, that savings comes at the expense of having to
account for factors like clustering within blocks.  So block
sampling is more efficient, but can also be less accurate.
Nonetheless, I agree that of the sampling estimators, block
sampling is the better technique.

__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East,  Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Final decision

2005-04-27 Thread Rod Taylor

> 
> I did have a question if any folks are using two servers one for
> reporting and one for data entry what system should be the beefier?

Yeah. We started putting up slaves for reporting purposes and
application specific areas using Slony replicating partial data sets to
various locations -- some for reporting.

If your reports have a long runtime and don't require transactional
safety for writes (daily summary written or results aren't recorded in
the DB at all) this is probably something to consider.

I understand that PGAdmin makes Slony fairly painless to setup, but it
can be time consuming to get going and Slony can add new complications
depending on the data size and what you're doing with it -- but they're
working hard to reduce the impact of those complications.

> I have a 2proc machine I will be using and I can either put Sears off
> by themselves on this machine or split up functionality and have one
> for reporting and one for inserts and updates; so not sure which
> machine would be best for which spot (reminder the more robust is a
> 4proc with 8 gigs and 2 proc is 4 gigs, both dells).
> 
>  
> 
> Thank you for any ideas in this arena.
> 
>  
> 
> Joel Fradkin
> 
>  
> 
> 
>  
> 
>  
> 
> 
-- 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-27 Thread Josh Berkus
Mischa,

> >Perhaps I can save you some time (yes, I have a degree in Math). If I
> >understand correctly, you're trying extrapolate from the correlation
> >between a tiny sample and a larger sample. Introducing the tiny sample
> >into any decision can only produce a less accurate result than just
> >taking the larger sample on its own; GIGO. Whether they are consistent
> >with one another has no relationship to whether the larger sample
> >correlates with the whole population. You can think of the tiny sample
> >like "anecdotal" evidence for wonderdrugs.

Actually, it's more to characterize how large of a sample we need.  For 
example, if we sample 0.005 of disk pages, and get an estimate, and then 
sample another 0.005 of disk pages and get an estimate which is not even 
close to the first estimate, then we have an idea that this is a table which 
defies analysis based on small samples.   Wheras if the two estimates are < 
1.0 stdev apart, we can have good confidence that the table is easily 
estimated.  Note that this doesn't require progressively larger samples; any 
two samples would work.

> I'm with Tom though in being very wary of solutions that require even
> one-off whole table scans. Maybe we need an additional per-table
> statistics setting which could specify the sample size, either as an
> absolute number or as a percentage of the table. It certainly seems that
> where D/N ~ 0.3, the estimates on very large tables at least are way way
> out.

Oh, I think there are several other cases where estimates are way out.  
Basically the estimation method we have doesn't work for samples smaller than 
0.10.   

> Or maybe we need to support more than one estimation method.

Yes, actually.   We need 3 different estimation methods:
1 for tables where we can sample a large % of pages (say, >= 0.1)
1 for tables where we sample a small % of pages but are "easily estimated"
1 for tables which are not easily estimated by we can't afford to sample a 
large % of pages.

If we're doing sampling-based estimation, I really don't want people to lose 
sight of the fact that page-based random sampling is much less expensive than 
row-based random sampling.   We should really be focusing on methods which 
are page-based.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Final decision

2005-04-27 Thread Joel Fradkin









Sorry I am using Redhat AS4 and postgres
8.0.2

Joel



 

You didnt tell us what OS
are you using, windows?

If you want good
performance you must install unix on  that machine,

 

---

 

 










Re: [PERFORM] Final decision

2005-04-27 Thread mmiranda



 

  -Original Message-From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED]On Behalf Of Joel 
  FradkinSent: Wednesday, April 27, 2005 9:02 AMTo: 
  PostgreSQL PerformSubject: [PERFORM] Final 
  decision
  
  I spent a great deal of time over 
  the past week looking seriously at Postgres and MYSQL.
  Objectively I am not seeing that 
  much of an improvement in speed with MYSQL, and we have a huge investment in 
  postgrs.
  So I am planning on sticking with 
  postgres fro our production database (going live this 
  weekend).
   
  Many people have offered a great 
  deal of help and I appreciate all that time and energy.
  I did not find any resolutions to 
  my issues with Commandprompt.com (we only worked together 2.5 
  hours).
   
  Most of my application is working 
  about the same speed as MSSQL server (unfortunately its twice the speed box, 
  but as many have pointed out it could be an issue with the 4 proc dell). I 
  spent considerable time with Dell and could see my drives are delivering 40 
  meg per sec.
   
  Things I still have to make better 
  are my settings in config, I have it set to no merge joins and no seq 
  scans.
  I am going to have to use 
  flattened history files for reporting (I saw huge difference here the view for 
  audit cube took 10 minutes in explain analyze and the flattened file took 
  under one second).
   
  I understand both of these 
  practices are not desirable, but I am at a place where I have to get it live 
  and these are items I could not resolve.
  I may try some more time with 
  Commanpromt.com, or seek other professional help.
   
  In stress testing I found Postgres 
  was holding up very well (but my IIS servers could not handle much of a load 
  to really push the server).
  I have a few desktops acting as 
  IIS servers at the moment and if I pushed past 50 consecutive users it pretty 
  much blew the server up.
  On inserts that number was like 7 
  consecutive users and updates was also like 7 users.
   
  I believe that was totally IIS not 
  postgres, but I am curious as to if using postgres odbc will put more stress 
  on the IIS side then MSSQL did.
  I did have a question if any folks 
  are using two servers one for reporting and one for data entry what system 
  should be the beefier?
  I have a 2proc machine I will be 
  using and I can either put Sears off by themselves on this machine or split up 
  functionality and have one for reporting and one for inserts and updates; so 
  not sure which machine would be best for which spot (reminder the more robust 
  is a 4proc with 8 gigs and 2 proc is 4 gigs, both dells).
   
  Thank you for any ideas in this 
  arena.
   
  Joel Fradkin
  
   
   
   
   
   
   
  You didnt tell us what 
  OS are you using, windows?
  If you want good 
  performance you must install unix on  that 
  machine,
   
  ---
   
   


[PERFORM] Final decision

2005-04-27 Thread Joel Fradkin








I spent a great deal of time over the past week looking
seriously at Postgres and MYSQL.

Objectively I am not seeing that much of an improvement in
speed with MYSQL, and we have a huge investment in postgrs.

So I am planning on sticking with postgres fro our
production database (going live this weekend).

 

Many people have offered a great deal of help and I
appreciate all that time and energy.

I did not find any resolutions to my issues with
Commandprompt.com (we only worked together 2.5 hours).

 

Most of my application is working about the same speed as
MSSQL server (unfortunately its twice the speed box, but as many have pointed
out it could be an issue with the 4 proc dell). I spent considerable time with
Dell and could see my drives are delivering 40 meg per sec.

 

Things I still have to make better are my settings in config,
I have it set to no merge joins and no seq scans.

I am going to have to use flattened history files for
reporting (I saw huge difference here the view for audit cube took 10 minutes
in explain analyze and the flattened file took under one second).

 

I understand both of these practices are not desirable, but
I am at a place where I have to get it live and these are items I could not
resolve.

I may try some more time with Commanpromt.com, or seek other
professional help.

 

In stress testing I found Postgres was holding up very well
(but my IIS servers could not handle much of a load to really push the server).

I have a few desktops acting as IIS servers at the moment
and if I pushed past 50 consecutive users it pretty much blew the server up.

On inserts that number was like 7 consecutive users and
updates was also like 7 users.

 

I believe that was totally IIS not postgres, but I am
curious as to if using postgres odbc will put more stress on the IIS side then
MSSQL did.

I did have a question if any folks are using two servers one
for reporting and one for data entry what system should be the beefier?

I have a 2proc machine I will be using and I can either put
Sears off by themselves on this machine or split up functionality and have one
for reporting and one for inserts and updates; so not sure which machine would
be best for which spot (reminder the more robust is a 4proc with 8 gigs and 2
proc is 4 gigs, both dells).

 

Thank you for any ideas in this arena.

 

Joel Fradkin



 



 

 








Re: [PERFORM] What needs to be done for real Partitioning?

2005-04-27 Thread Yann Michel
Hi,

On Sun, Mar 20, 2005 at 06:01:49PM -0500, Tom Lane wrote:
> Global indexes would seriously reduce the performance of both vacuum and
> cluster for a single partition, and if you want seq scans you don't need
> an index for that at all.  So the above doesn't strike me as a strong
> argument for global indexes ...

I'd like to describe a usecase where a global index is usefull.

We have a datawarehouse with invoices for a rolling window of a few
years. Each invoice has several positions so a uk is
(invoice,position). Dur to the fact that most of the queries are only on
a few months or some quarters of a year, our pk starts with the
time-attribute (followed by the dimension ids) which is the partition
key (range). During the nightly update, we receive each updated invoice
so we have to update that special (global unique) row which is resolved
very fast by using the uk.

So you can see, that there is a usefull case for providing a global
index while using partitining and local indexes as well.

Regards,
Yann

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-27 Thread Simon Riggs
On Tue, 2005-04-26 at 15:00 -0700, Gurmeet Manku wrote:

>  2. In a single scan, it is possible to estimate n_distinct by using
> a very simple algorithm:
> 
>  "Distinct sampling for highly-accurate answers to distinct value
>   queries and event reports" by Gibbons, VLDB 2001.
> 
>  http://www.aladdin.cs.cmu.edu/papers/pdfs/y2001/dist_sampl.pdf

That looks like the one...

...though it looks like some more complex changes to the current
algorithm to use it, and we want the other stats as well...

>  3. In fact, Gibbon's basic idea has been extended to "sliding windows" 
> (this extension is useful in streaming systems like Aurora / Stream):
> 
>  "Distributed streams algorithms for sliding windows"
>  by Gibbons and Tirthapura, SPAA 2002.
> 
>  http://home.eng.iastate.edu/~snt/research/tocs.pdf
> 

...and this offers the possibility of calculating statistics at load
time, as part of the COPY command

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])