Re: [PERFORM] "iowait" bug?

2009-03-22 Thread Laurent Wandrebeck
2009/3/21 M. Edward (Ed) Borasky :
> I just discovered this on a LinkedIn user group:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=12309
>
> Is anyone here seeing evidence of this in PostgreSQL??
I've been hit by an I/O wait problem, as described here:
https://bugzilla.redhat.com/show_bug.cgi?id=444759
I've told it to that other bug, but no one seems to have followed that path.
Regards,
Laurent

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] "iowait" bug?

2009-03-22 Thread Laurent Laborde
On Sun, Mar 22, 2009 at 8:49 AM, Laurent Wandrebeck
 wrote:
> 2009/3/21 M. Edward (Ed) Borasky :
>> I just discovered this on a LinkedIn user group:
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=12309
>>
>> Is anyone here seeing evidence of this in PostgreSQL??
> I've been hit by an I/O wait problem, as described here:
> https://bugzilla.redhat.com/show_bug.cgi?id=444759
> I've told it to that other bug, but no one seems to have followed that path.

We applied this mwi patch on 3 pgsql servers, and seen great
performance improvement.
Using 3ware, 8 SAS HDD, Octocore (2x4) Xeon and 32GB RAM, on a custom
2.6.18 kernel.

-- 
Laurent Laborde
http://www.over-blog.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] "iowait" bug?

2009-03-22 Thread Greg Smith

On Fri, 20 Mar 2009, M. Edward (Ed) Borasky wrote:


I just discovered this on a LinkedIn user group:
http://bugzilla.kernel.org/show_bug.cgi?id=12309


I would bet there's at least 3 different bugs in that one.  That bug 
report got a lot of press via Slashdot a few months ago, and it's picked 
all sort of people who all have I/O wait issues, but they don't all have 
the same cause.  The 3ware-specific problem Laurent mentioned is an 
example.  That's not the same thing most of the people there are running 
into, the typical reporter there has disks attached directly to their 
motherboard.  The irony here is that #12309 was a fork of #7372 to start 
over with a clean discussion slat because the same thing happened to that 
earlier one.


The original problem reported there showed up in 2.6.20, so I've been able 
to avoid this whole thing by sticking to the stock RHEL5 kernel (2.6.18) 
on most of the production systems I deal with.  (Except for my system with 
an Areca card--that one needs 2.6.22 or later to be stable, and seems to 
have no unexpected I/O wait issues.  I think this is because it's taking 
over the lowest level I/O scheduling from Linux, when it pushes from the 
card's cache onto the disks).


Some of the people there reported significant improvement by tuning the 
pdflush tunables; now that I've had to do a few times on systems to get 
rid of unexpected write lulls.  I wrote up a walkthrough on one of them at 
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html that 
goes over how to tell if you're running into that problem, and what to do 
about it; something else I wrote on that already made it into the bug 
report in comment #150.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread Tom Lane
Kouber Saparev  writes:
> Tom Lane wrote:
>> Hmph, that's still not real good.  Ideally it should be estimating
>> *less* than the average frequency, because the estimate is made after
>> excluding all the most-common-values, which evidently 'kouber' is not
>> one of.

> I altered the statistics for that column to 1000, so now the planner 
> assumes exactly 492 rows for the fore-mentioned query, which is indeed 
> the average. It never went *less* than that value, it was always higher, 
> i.e. for a statistics value of 600, it was 588, for 800, it became 540.

I got some time to think more about this and experiment a bit further.
As far as I can tell there is no fundamental bug here --- given
reasonably accurate stats the rowcount estimate behaves as expected, ie,
you get an estimate that's less than the actual average number of values
if the target value is not one of the known MCVs.  However, as the
n_distinct estimate falls below the actual number of distinct values,
that rowcount estimate necessarily rises.  What had surprised me about
this report is that the estimate matched the true average number of rows
so closely; I wondered if there was some property of the way we estimate
n_distinct that would make that happen.  But I now think that that was
just chance: there doesn't seem to be any underlying behavior that would
cause it.  I did some experiments with random data matching a Zipfian
distribution (1/k law) and did not observe that the rowcount estimate
converged to the true average when the n_distinct value was too low.

So the bottom line here is just that the estimated n_distinct is too
low.  We've seen before that the equation we use tends to do that more
often than not.  I doubt that consistently erring on the high side would
be better though :-(.  Estimating n_distinct from a limited sample of
the population is known to be a statistically hard problem, so we'll
probably not ever have perfect answers, but doing better is on the
to-do list.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] "iowait" bug?

2009-03-22 Thread Laurent Wandrebeck
2009/3/22 Greg Smith :
> On Fri, 20 Mar 2009, M. Edward (Ed) Borasky wrote:
>
>> I just discovered this on a LinkedIn user group:
>> http://bugzilla.kernel.org/show_bug.cgi?id=12309
>
> I would bet there's at least 3 different bugs in that one.  That bug report
> got a lot of press via Slashdot a few months ago, and it's picked all sort
> of people who all have I/O wait issues, but they don't all have the same
> cause.  The 3ware-specific problem Laurent mentioned is an example.  That's
> not the same thing most of the people there are running into, the typical
> reporter there has disks attached directly to their motherboard.  The irony
> here is that #12309 was a fork of #7372 to start over with a clean
> discussion slat because the same thing happened to that earlier one.
That I/O wait problem is not 3ware specific. A friend of mine has the
same problem/fix with aacraid.
I'd bet a couple coins that controllers that show this problem do not set mwi.
quickly grepping linux sources (2.6.28.8) for pci_try_set_mwi:
(only disks controllers showed here)
230:pata_cs5530.c
3442:sata_mv.c
2016:3w-9xxx.c
147:qla_init.c
2412:lpfc_init.c
171:cs5530.c
>
> The original problem reported there showed up in 2.6.20, so I've been able
> to avoid this whole thing by sticking to the stock RHEL5 kernel (2.6.18) on
> most of the production systems I deal with.  (Except for my system with an
> Areca card--that one needs 2.6.22 or later to be stable, and seems to have
> no unexpected I/O wait issues.  I think this is because it's taking over the
> lowest level I/O scheduling from Linux, when it pushes from the card's cache
> onto the disks).
I thought about completely fair scheduler at first, but that one came
in around 2.6.21.
some tests were done with different I/O scheduler, and they do not
seem to be the real cause of I/O wait.
A bad interaction between hard raid cards cache and system willing the
card to write at the same time could be a reason.
unfortunately, I've met it with a now retired box at work, that was
running a single disk plugged on the mobo controller.
So, there's something else under the hood...but my (very) limited
kernel knowledge can't help more here.
>
> Some of the people there reported significant improvement by tuning the
> pdflush tunables; now that I've had to do a few times on systems to get rid
> of unexpected write lulls.  I wrote up a walkthrough on one of them at
> http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html that
> goes over how to tell if you're running into that problem, and what to do
> about it; something else I wrote on that already made it into the bug report
> in comment #150.
I think that forcing the system to write down more often, and smaller
data just hides the problem, and doesn't correct it.
But well, that's just feeling, not science. I hope some real hacker
will be able to spot the problem(s) so they can be fixed.
anyway, I keep a couple coins on mwi as a source of problem :-)
Regards,
Laurent

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread marcin mank
> So the bottom line here is just that the estimated n_distinct is too
> low.  We've seen before that the equation we use tends to do that more
> often than not.  I doubt that consistently erring on the high side would
> be better though :-(.  Estimating n_distinct from a limited sample of
> the population is known to be a statistically hard problem, so we'll
> probably not ever have perfect answers, but doing better is on the
> to-do list.
>

I hit an interestinhg paper on n_distinct calculation:

http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf

the PCSA algorithm described there requires O(1) calculation per
value. Page 22 describes what to do with updates streams.

This I think (disclaimer: I know little about PG internals) means that
the n_distinct estimation can be done during vacuum time (it would
play well with the visibility map addon).

What do You think?

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread marcin mank
> I hit an interestinhg paper on n_distinct calculation:
>
> http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf
>
> the PCSA algorithm described there requires O(1) calculation per
> value. Page 22 describes what to do with updates streams.
>
> This I think (disclaimer: I know little about PG internals) means that
> the n_distinct estimation can be done during vacuum time (it would
> play well with the visibility map addon).
>
> What do You think?

ok, if You think that calculating a has function of every data field
for each insert or delete is prohibitive, just say so and don`t bother
reading the paper :]

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread Tom Lane
marcin mank  writes:
> I hit an interestinhg paper on n_distinct calculation:
> http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf

I don't think we're quite ready to make ANALYZE read every row of a
table in order to estimate n_distinct.  It is an interesting paper
in that it says that you have to do that in order to get *provably*
good estimates, but I've not abandoned the hope of getting *usually*
good estimates without so much work.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] "iowait" bug?

2009-03-22 Thread Greg Smith

On Mon, 23 Mar 2009, Laurent Wandrebeck wrote:


I thought about completely fair scheduler at first, but that one came
in around 2.6.21.


CFS showed up in 2.6.23.


I think that forcing the system to write down more often, and smaller
data just hides the problem, and doesn't correct it.


That's one possibility.  I've been considering things like whether the OS 
is getting bogged down managing things like the elevator sorting for 
outstanding writes.  If there was something about that process that gets 
really inefficient proportionally to the size of the pending queue, that 
would both match the kinds of symptoms people are reporting, and would go 
better just reducing the maximum size of the issue by lowering the pdflush 
tunables.


Anyway, the point I was trying to make is that there sure seem to be 
multiple problems mixed into that one bug report, and it's starting to 
look just as unmanagably messy as the older bug that had to be abandoned. 
It would have been nice if somebody kicked out all the diversions it 
wanted into to keep the focus a bit better.  Anybody using a SSD device, 
USB, or ext4 should have been punted to somewhere else for example. 
Plenty of examples that don't require any of those things.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance