Re: [PERFORM] Help with performance on current status column

2005-09-14 Thread Richard Huxton

Chris Kratz wrote:

Hello All,

We are struggling with a specific query that is killing us.  When doing 
explain analyze on the entire query, we *seem* to be getting killed by the 
estimated number of rows on a case statement calculation.


I've included a snippet from the explain analyze of the much larger query. The 
line in question, (cost=0.00..106.52 rows=1 width=16) (actual 
time=0.048..67.728 rows=4725 loops=1) shows that it returned 4700 rows 
instead of 1 which when coupled with a later join causes the statement to run 
over 3 minutes.[1] 

It seems that it thinks that the scan on role_id is going to return 1 row, but 
in reality returns 4725 rows.  The case statement causing the problem uses 
todays date to see if a particular row is still active.  Here is a test case 
showing how far off the estimate is from the reality. [2]



[2] A much simpler statement triggers the incorrect row counts here.

explain analyze
select * 
from roles rol
where  


CASE
   WHEN rol.role_id IS NULL
   THEN NULL
   WHEN rol.begin IS NOT NULL and rol.end IS NOT NULL
   THEN
 CASE WHEN TIMESTAMP 'now'=rol.begin and TIMESTAMP 
'now'=rol.end

 THEN 'Active'
 ELSE 'Inactive' END
   WHEN rol.begin IS NOT NULL
   THEN
 CASE WHEN TIMESTAMP 'now'=rol.begin
 THEN 'Active'
 ELSE 'Inactive' END
   WHEN rol.end IS NOT NULL
   THEN
 CASE WHEN TIMESTAMP 'now'=rol.end
 THEN 'Active'
 ELSE 'Inactive' END
   ELSE 'Active'
   END = 'Active'


Aside #1 - I'm not entirely clear how role_id can be null since you 
seemed to be joining against it in the real query.


Aside #2 - You're probably better off with CURRENT_DATE since begin/end 
seem to be dates, rather than TIMESTAMP 'now' - and in any case you 
wanted timestamp with time zone


OK, I think the root of your problem is your use of null to mean not 
ended or not started (whatever 'not started' means). PostgreSQL has 
the handy timestamptz value infinity, but only for timestamps and not 
for dates. I'd probably cheat a little and use an end date of 
'-12-31' or similar to simulate infinity. Then your test is simply:


WHERE
  ...
  AND (rol.begin = CURRENT_DATE AND rol.end = CURRENT_DATE)

That should estimate simply enough.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Low performance on Windows problem

2005-09-14 Thread Dalibor Sramek
On Tue, Sep 13, 2005 at 11:05:00AM -0400, Merlin Moncure wrote:
 5. do select array_accum(q::text) from generate_series(1,1) q;

I made the tests you suggested and the pattern is clear. The difference
between local and remote command execution is caused by moving data over
the network. E.g. the command above takes 700 ms locally and 1500 ms
remotely. Remote explain analyze takes exactly the 700 ms.

I downloaded PCATTCP - http://www.pcausa.com/Utilities/pcattcp.htm
and the measured throughput between the two machines is over 1 kB/s.
PCATTCP allows setting TCP_NODELAY but it had no effect on the transfer
speed. So the difference between local and remote execution should IMHO stay
in the 10 ms range. Definitely not 800 ms. The 8.1 has the same problem.

Just for the record: the server PC is Dell Precision 330 with 3Com 3C920
integrated network card. OS MS Windows Professional 2002 with service pack
2. There is Symantec Antivirus installed - which I have (hopefully)
completely disabled.

Thanks for any help

Dalibor Sramek

-- 
Dalibor Sramek  http://www.insula.cz/dali \ In the eyes of cats
 /  [EMAIL PROTECTED]   \  all things
/  H blog  http://www.transhumanismus.cz/blog.php  \   belong to cats.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Help with performance on current status column

2005-09-14 Thread Chris Kratz
Hello Richard,

Thank you for the response.  I did forget to mention that the columns have the 
following meanings.

One, if a begin or end date is null, it means that the role is open ended in 
that direction.  For example, if there is no end date, that means currently 
the role will go on forever beginning with the start date.  Your idea of 
using minimal and maximum dates is an interesting one and not one that I had 
considered.  I will do some testing later today and see if that makes a 
difference.

The other option I am toying with is simply having a status column which is 
updated nightly via a cron job.  This will probably be the most efficient and 
can be indexed.

I also forgot to say that we have seen this behavior on 2 boxes both on Linux 
(Red Hat ES  Mandrake) and both are running Postgres 8.0 (8.0.1 and 8.0.3).  
Strangely, after playing with statistics some yesterday (setting from 10 to 
100 to 1000 and back to 10 and analyzing), the 8.0.1 machine picks a 
different plan and runs in a 101.104ms.  The larger machine (dual proc Opt, 6 
disk raid 10, etc) with 8.0.3 still takes 3-5minutes to run the same query 
with the same data set even after playing with statistics and repeated 
analyze on the same table.  It just seems odd.  It seems it is picking the 
incorrect plan based off of an overly optimistic estimate of rows returned 
from the calculation.

The other frustration with this is that this sql is machine generated which is 
why we have some of the awkwardness in the calculation.  That calc gets used 
for a lot of different things including column definitions when people want 
to see the column on screen.

Thanks,

-Chris

On Wednesday 14 September 2005 05:13 am, Richard Huxton wrote:
 Chris Kratz wrote:
  Hello All,
 
  We are struggling with a specific query that is killing us.  When doing
  explain analyze on the entire query, we *seem* to be getting killed by
  the estimated number of rows on a case statement calculation.
 
  I've included a snippet from the explain analyze of the much larger
  query. The line in question, (cost=0.00..106.52 rows=1 width=16) (actual
  time=0.048..67.728 rows=4725 loops=1) shows that it returned 4700 rows
  instead of 1 which when coupled with a later join causes the statement to
  run over 3 minutes.[1]
 
  It seems that it thinks that the scan on role_id is going to return 1
  row, but in reality returns 4725 rows.  The case statement causing the
  problem uses todays date to see if a particular row is still active. 
  Here is a test case showing how far off the estimate is from the reality.
  [2]
 
  [2] A much simpler statement triggers the incorrect row counts here.
 
  explain analyze
  select *
  from roles rol
  where
 
  CASE
 WHEN rol.role_id IS NULL
 THEN NULL
 WHEN rol.begin IS NOT NULL and rol.end IS NOT NULL
 THEN
   CASE WHEN TIMESTAMP 'now'=rol.begin and TIMESTAMP
  'now'=rol.end
   THEN 'Active'
   ELSE 'Inactive' END
 WHEN rol.begin IS NOT NULL
 THEN
   CASE WHEN TIMESTAMP 'now'=rol.begin
   THEN 'Active'
   ELSE 'Inactive' END
 WHEN rol.end IS NOT NULL
 THEN
   CASE WHEN TIMESTAMP 'now'=rol.end
   THEN 'Active'
   ELSE 'Inactive' END
 ELSE 'Active'
 END = 'Active'

 Aside #1 - I'm not entirely clear how role_id can be null since you
 seemed to be joining against it in the real query.

 Aside #2 - You're probably better off with CURRENT_DATE since begin/end
 seem to be dates, rather than TIMESTAMP 'now' - and in any case you
 wanted timestamp with time zone

 OK, I think the root of your problem is your use of null to mean not
 ended or not started (whatever 'not started' means). PostgreSQL has
 the handy timestamptz value infinity, but only for timestamps and not
 for dates. I'd probably cheat a little and use an end date of
 '-12-31' or similar to simulate infinity. Then your test is simply:

 WHERE
...
AND (rol.begin = CURRENT_DATE AND rol.end = CURRENT_DATE)

 That should estimate simply enough.

 --
Richard Huxton
Archonet Ltd

 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

-- 
Chris Kratz

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Low performance on Windows problem

2005-09-14 Thread Kevin Grittner
(1)  Latency and throughput don't necessarily correlate well.  When blasting
quantities of data to test throughput, TCP_NODELAY might not matter
much -- a full buffer will be sent without a delay anyway.  What do you get
on a ping while running the throughput test?
 
(2)  Besides the TCP_NODELAY issue, another issue which has caused
similar problems is a mismatch between half duplex and full duplex in the
configuration of the switch and the server.  Sometimes auto-negotiate
doesn't work as advertised; you might want to try setting the configuration
explicitly, if you aren't already doing so.
 
-Kevin
 
 
 Dalibor Sramek [EMAIL PROTECTED] 09/14/05 8:02 AM 
On Tue, Sep 13, 2005 at 11:05:00AM -0400, Merlin Moncure wrote:
 5. do select array_accum(q::text) from generate_series(1,1) q;

I made the tests you suggested and the pattern is clear. The difference
between local and remote command execution is caused by moving data over
the network. E.g. the command above takes 700 ms locally and 1500 ms
remotely. Remote explain analyze takes exactly the 700 ms.

I downloaded PCATTCP - http://www.pcausa.com/Utilities/pcattcp.htm
and the measured throughput between the two machines is over 1 kB/s.
PCATTCP allows setting TCP_NODELAY but it had no effect on the transfer
speed. So the difference between local and remote execution should IMHO stay
in the 10 ms range. Definitely not 800 ms. The 8.1 has the same problem.

Just for the record: the server PC is Dell Precision 330 with 3Com 3C920
integrated network card. OS MS Windows Professional 2002 with service pack
2. There is Symantec Antivirus installed - which I have (hopefully)
completely disabled.

Thanks for any help

Dalibor Sramek


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Low performance on Windows problem

2005-09-14 Thread Merlin Moncure
 in the 10 ms range. Definitely not 800 ms. The 8.1 has the same
problem.
 
 Just for the record: the server PC is Dell Precision 330 with 3Com
3C920
 integrated network card. OS MS Windows Professional 2002 with service
pack
 2. There is Symantec Antivirus installed - which I have (hopefully)
 completely disabled.

Try throwing in another network card and see if it helps.  Next step is
to try twinking tcp settings
(http://support.microsoft.com/default.aspx?scid=kb;en-us;314053) and see
if that helps.  Beyond that, try playing the update driver game.  If you
are still having problems, try receiving bigger and bigger results to
see where problem occurs. 1-2k range suggests mtu problem, 4-8k range
suggests tcp receive window problem.

Beyond that, I'm stumped, uh, buy Opteron? :)

Merlin


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Battery Backed Cache for RAID

2005-09-14 Thread Peter Darley
Folks,
I'm getting a new server for our database, and I have a quick question
about RAID controllers with a battery backed cache.  I understand that the
cache will allow the cache to be written out if the power fails to the box,
which allows it to report a write as committed safely when it's not actually
committed.
My question is, if the power goes off, and the drives stop, how does the
battery backed cache save things out to the dead drives?  Is there another
component that is implied that will provide power to the drives that I
should be looking into as well?
Thanks,
Peter Darley


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Battery Backed Cache for RAID

2005-09-14 Thread Jeffrey W. Baker
On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote:
   I'm getting a new server for our database, and I have a quick question
 about RAID controllers with a battery backed cache.  I understand that the
 cache will allow the cache to be written out if the power fails to the box,
 which allows it to report a write as committed safely when it's not actually
 committed.

Actually the cache will just hold its contents while the power is out.
When the power is restored, the RAID controller will complete the writes
to disk.  If the battery does not last through the outage, the data is
lost.

   My question is, if the power goes off, and the drives stop, how does the
 battery backed cache save things out to the dead drives?  Is there another
 component that is implied that will provide power to the drives that I
 should be looking into as well?

A UPS would allow you to do an orderly shutdown and write contents to
disk during a power failure.  However a UPS can be an extra point of
failure.

-jwb

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Battery Backed Cache for RAID

2005-09-14 Thread Alvaro Herrera
On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote:
 On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote:
  I'm getting a new server for our database, and I have a quick question
  about RAID controllers with a battery backed cache.  I understand that the
  cache will allow the cache to be written out if the power fails to the box,
  which allows it to report a write as committed safely when it's not actually
  committed.
 
 Actually the cache will just hold its contents while the power is out.
 When the power is restored, the RAID controller will complete the writes
 to disk.  If the battery does not last through the outage, the data is
 lost.

Just curious: how long are the batteries supposed to last?

-- 
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
Hi! I'm a .signature virus!
cp me into your .signature file to help me spread!

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Battery Backed Cache for RAID

2005-09-14 Thread Merlin Moncure
 On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote:
  On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote:
 I'm getting a new server for our database, and I have a quick
 question
   about RAID controllers with a battery backed cache.  I understand
that
 the
   cache will allow the cache to be written out if the power fails to
the
 box,
   which allows it to report a write as committed safely when it's
not
 actually
   committed.
 
  Actually the cache will just hold its contents while the power is
out.
  When the power is restored, the RAID controller will complete the
writes
  to disk.  If the battery does not last through the outage, the data
is
  lost.
 
 Just curious: how long are the batteries supposed to last?
 
For the length of time it will take for you to get fired for not getting
the server running plus one hour :).

Merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Battery Backed Cache for RAID

2005-09-14 Thread John A Meinel
Alvaro Herrera wrote:
 On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote:

On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote:

 I'm getting a new server for our database, and I have a quick question
about RAID controllers with a battery backed cache.  I understand that the
cache will allow the cache to be written out if the power fails to the box,
which allows it to report a write as committed safely when it's not actually
committed.

Actually the cache will just hold its contents while the power is out.
When the power is restored, the RAID controller will complete the writes
to disk.  If the battery does not last through the outage, the data is
lost.


 Just curious: how long are the batteries supposed to last?


The recent *cheap* version of a ramdisk had battery backup for 16 hours.
(Very expensive ramdisks actually have enough battery power to power a
small hard-drive to dump the contents into).

I'm guessing for a RAID controller, the time would be in the max 1 day
range.

John
=:-



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Battery Backed Cache for RAID

2005-09-14 Thread Welty, Richard

John A Meinel wrote:
The recent *cheap* version of a ramdisk had battery backup for 16 hours.
(Very expensive ramdisks actually have enough battery power to power a
small hard-drive to dump the contents into).

I'm guessing for a RAID controller, the time would be in the max 1 day
range.

i think some will go a bit longer. i have seen an IBM ServeRaid (rebranded
mylex in this particular case) keep its memory after being pulled for a
remarkably long period of time.

no guarantees, though, so i'm not actually going to say how long so that nobody
gets unreasonable expectations.

richard

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Battery Backed Cache for RAID

2005-09-14 Thread Arjen van der Meijden

On 14-9-2005 22:03, Alvaro Herrera wrote:

On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote:


On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote:

Actually the cache will just hold its contents while the power is out.
When the power is restored, the RAID controller will complete the writes
to disk.  If the battery does not last through the outage, the data is
lost.



Just curious: how long are the batteries supposed to last?


For the LSI-Logic MegaRaid 320-2e its about 72 hours for the standard 
128MB version. Their SATA2-solution offers 32 and 72 hour-options. So I 
assume its in the order of days for most RAID controllers.


Best regards,

Arjen van der Meijden

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Battery Backed Cache for RAID

2005-09-14 Thread mudfoot
Bear in mind you will lose data if the raid controller itself fails (or the
cache memory module).  Many solutions have mirrored cache for this reason.  But
that's more $$, depending on the risks you want to take.

Quoting Arjen van der Meijden [EMAIL PROTECTED]:

 On 14-9-2005 22:03, Alvaro Herrera wrote:
  On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote:
  
 On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote:
 
 Actually the cache will just hold its contents while the power is out.
 When the power is restored, the RAID controller will complete the writes
 to disk.  If the battery does not last through the outage, the data is
 lost.
  
  
  Just curious: how long are the batteries supposed to last?
 
 For the LSI-Logic MegaRaid 320-2e its about 72 hours for the standard 
 128MB version. Their SATA2-solution offers 32 and 72 hour-options. So I 
 assume its in the order of days for most RAID controllers.
 
 Best regards,
 
 Arjen van der Meijden
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend
 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How many tables is too many tables?

2005-09-14 Thread Qingqing Zhou

[EMAIL PROTECTED] wrote

 One machine is simply not going to be able to scale with the quantities
 of links we hope to store information about and we want to move to some
 kind of cluster.  Because of the quantities of data, it seems to make
 sense to go for a cluster setup such that in a 4 machine cluster, each
 machine has a quarter of the data (is this Share nothing, or, Share
 everything?).  To that end, we figured a good first step was to
 partition the data on one machine into multiple tables defining the
 logic which would find the appropriate table given a piece of data.
 Then, we assumed, adding the logic to find the appropriate machine and
 database in our cluster would only be an incremental upgrade.


So you set up 4 separate copies of PG in 4 machines? This is neither SN or 
SE.

The partition is good for performance if you distribute IOs and CPUs. In 
your design, I believe IO is distributed (to 4 machines), but since you 
sliced data into too small pieces, you will get penality from other places. 
For example, each table has to maintain separate indices (index becomes an 
useless burden when table is too small), so there will be so many Btree root 
...  System tables (pg_class/pg_attribute, etc) has to contains many rows to 
record your tables ... though we cached system table rows, but the memory 
space is limited ...

In short, too many tables. To design your new partition method, jsut keep in 
mind that database access data in a page-wise IO.

Regards,
Qingqing 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster