Re: [PERFORM] how could select id=xx so slow?

2012-07-10 Thread Yan Chunlu
I have logged one day data and found the checkpoint is rather
frequently(detail: https://gist.github.com/3088338). Not sure if it is
normal, but the average time of checkpoint is about 100sec~200sec, it seems
related with my settings:

574 checkpoint_segments = 64
575 wal_keep_segments = 5000

I set checkpoint_segments as a very large value which is because otherwise
the slave server always can not follow the master, should I lower that
value?

or the slow query is about something else?  thanks!

On Tue, Jul 10, 2012 at 10:46 AM, Craig Ringer wrote:

>  On 07/10/2012 10:25 AM, Yan Chunlu wrote:
>
> I didn't set log_min_duration_statement in the postgresql.conf, but execute
> *dbapi_con.cursor().execute("SET log_min_duration_statement to 30")*
> *for every connection.*
>
>
> OK, same effect: You're only logging slow statements.
>
> It's not at all surprising that BEGIN doesn't appear when a
> log_min_duration_statement is set. It's an incredibly fast operation.
> What's amazing is that it appears even once - that means your database must
> be in serious performance trouble, as BEGIN should take tenths of a
> millisecond on an unloaded system. For example my quick test here:
>
> LOG:  statement: BEGIN;
> LOG:  duration: 0.193 ms
>
> ... which is actually a lot slower than I expected, but hardly slow
> statement material.
>
> The frequent appearance of slow (multi-second) COMMIT statements in your
> slow statement logs suggests there's enough load on your database that
> there's real contention for disk, and/or that checkpoints are stalling
> transactions.
>
>
> First, you need to set log_min_messages = 'info' to allow Pg to complain
> about things like checkpoint frequency.
>
> Now temporarily set log_checkpoints = on to record when checkpoints happen
> and how long they take. Most likely you'll find you need to tune checkpoint
> behaviour. Some information, albeit old, on that is here:
>
>   http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
>
> Basically you might want to try increasing your
> checkpoint_completion_target and making the bgwriter more aggressive -
> assuming that your performance issues are in fact checkpoint related.
>
> It's also possible that they're just overall load, especially if you have
> lots and lots (hundreds) of connections to the database all trying to do
> work at once without any kind of admission control or pooling/queuing. In
> that case, introducing a connection pool like PgBouncer may help.
>
> --
> Craig Ringer
>


Re: [PERFORM] DELETE vs TRUNCATE explanation

2012-07-10 Thread Daniel Farina
On Tue, Jul 10, 2012 at 5:37 PM, Craig Ringer  wrote:
> Hi
>
> After seeing a few discussions here and on Stack Overflow I've put together
> a quick explanation of why "DELETE FROM table;" may be faster than "TRUNCATE
> table" for people doing unit testing on lots of tiny tables, people who're
> doing this so often they care how long it takes.
>
> I'd love it if a few folks who know the guts were to take a look and verify
> its correctness:

I haven't said this before, but think it every time someone asks me
about this, so I'll say it now:

This is a papercut that should be solved with improved mechanics.
TRUNCATE should simply be very nearly the fastest way to remove data
from a table while retaining its type information, and if that means
doing DELETE without triggers when the table is small, then it should.
 The only person who could thwart me is someone who badly wants their
128K table to be exactly 8 or 0K, which seems unlikely given the 5MB
of catalog anyway.

Does that sound reasonable?  As in, would anyone object if TRUNCATE
learned this behavior?

-- 
fdr

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] DELETE vs TRUNCATE explanation

2012-07-10 Thread Craig Ringer

Hi

After seeing a few discussions here and on Stack Overflow I've put 
together a quick explanation of why "DELETE FROM table;" may be faster 
than "TRUNCATE table" for people doing unit testing on lots of tiny 
tables, people who're doing this so often they care how long it takes.


I'd love it if a few folks who know the guts were to take a look and 
verify its correctness:


http://stackoverflow.com/a/11423886/398670

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-10 Thread Reza Taheri
Hi Merlin,
We are moving up to a larger testbed, and are planning to use 9.2. But the 
results will not comparable to our 8.4 results due to differences in hardware. 
But that comparison is a useful one. I'll try for a quick test on the new 
hardware with 8.4 before moving to 9.2.

Thanks,
Reza

> -Original Message-
> From: Merlin Moncure [mailto:mmonc...@gmail.com]
> Sent: Tuesday, July 10, 2012 12:06 PM
> To: Reza Taheri
> Cc: Greg Smith; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] The need for clustered indexes to boost TPC-V
> performance
> 
> On Thu, Jul 5, 2012 at 10:33 PM, Reza Taheri  wrote:
> > Just to be clear, we have a number of people from different companies
> working on the kit. This is not a VMware project, it is a TPC project. But I
> hear you regarding coming in from the cold and asking for a major db engine
> feature. I know that I have caused a lot of rolling eyes. Believe me, I have
> had the same (no, worse!) reaction from every one of the commercial
> database companies in response to similar requests over the past 25 years.
> 
> No rolling of eyes from me.  Clustered indexes work and if your table access
> mainly hits the table through that index you'll see enormous reductions in
> i/o.  Index only scans naturally are a related optimization in the same vein.
> Denying that is just silly.  BTW, putting postgres through a standard non
> trivial benchmark suite over reasonable hardware, reporting results,
> identifying bottlenecks, etc.
> is incredibly useful.  Please keep it up, and don't be afraid to ask for help
> here.  (one thing I'd love to see is side by side results comparing 8.4 to 
> 9.1 to
> 9.2).
> 
> merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-10 Thread Merlin Moncure
On Thu, Jul 5, 2012 at 10:33 PM, Reza Taheri  wrote:
> Just to be clear, we have a number of people from different companies working 
> on the kit. This is not a VMware project, it is a TPC project. But I hear you 
> regarding coming in from the cold and asking for a major db engine feature. I 
> know that I have caused a lot of rolling eyes. Believe me, I have had the 
> same (no, worse!) reaction from every one of the commercial database 
> companies in response to similar requests over the past 25 years.

No rolling of eyes from me.  Clustered indexes work and if your table
access mainly hits the table through that index you'll see enormous
reductions in i/o.  Index only scans naturally are a related
optimization in the same vein.  Denying that is just silly.  BTW,
putting postgres through a standard non trivial benchmark suite over
reasonable hardware, reporting results, identifying bottlenecks, etc.
is incredibly useful.  Please keep it up, and don't be afraid to ask
for help here.  (one thing I'd love to see is side by side results
comparing 8.4 to 9.1 to 9.2).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Massive I/O spikes during checkpoint

2012-07-10 Thread Andres Freund
On Tuesday, July 10, 2012 03:36:35 PM Jeff Janes wrote:
> On Tue, Jul 10, 2012 at 5:44 AM, Andres Freund  
wrote:
> > On Tuesday, July 10, 2012 08:14:00 AM Maxim Boguk wrote:
> >> So kernel doesn't start write any pages out in background before it has
> >> at least 13Gb dirty pages in kernel memory.
> >> And at end of the checkpoint kernel trying flush all dirty pages to
> >> disk.
> > 
> > Thast not entirely true. The kernel will also writeout pages which
> > haven't been written to for dirty_expire_centisecs.
> 
> There seems to be many situations in which it totally fails to do that.
Totally as in diry pages sitting around without any io activity? Or just not 
agressive enough?

Currently its a bit hard to speculate about all without specifying the kernel 
because there have been massive rewrites of all that stuff in several kernels 
in the last two years...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Massive I/O spikes during checkpoint

2012-07-10 Thread Jeff Janes
On Tue, Jul 10, 2012 at 5:44 AM, Andres Freund  wrote:
> On Tuesday, July 10, 2012 08:14:00 AM Maxim Boguk wrote:
>>
>> So kernel doesn't start write any pages out in background before it has at
>> least 13Gb dirty pages in kernel memory.
>> And at end of the checkpoint kernel trying flush all dirty pages to disk.

> Thast not entirely true. The kernel will also writeout pages which haven't
> been written to for dirty_expire_centisecs.

There seems to be many situations in which it totally fails to do that.

Although I've never been able to categorize just what those situations are.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Massive I/O spikes during checkpoint

2012-07-10 Thread Andres Freund
On Tuesday, July 10, 2012 08:14:00 AM Maxim Boguk wrote:
> On Tue, Jul 10, 2012 at 4:03 PM, David Kerr  wrote:
> > On Jul 9, 2012, at 10:51 PM, Maxim Boguk wrote:
> >> But what appears to be happening is that all of the data is being
> >> written out at the end of the checkpoint.
> >> 
> >> This happens at every checkpoint while the system is under load.
> >> 
> >> I get the feeling that this isn't the correct behavior and i've done
> >> something wrong.
> > 
> > It's not an actual checkpoints.
> > It's is a fsync after checkpoint which create write spikes hurting
> > server.
> > 
> > You should set sysctl vm.dirty_background_bytes and vm.dirty_bytes to
> > reasonable low values
> > 
> > 
> > So use bla_bytes instead of bla_ratio?
> 
> Yes because on 256GB server
> echo 10 > /proc/sys/vm/dirty_ratio
> is equivalent to 26Gb dirty_bytes
> 
> and
> echo 5 >/proc/sys/vm/dirty_background_ratio
> is equivalent to 13Gb dirty_background_bytes
> 
> It is really huge values.
> 
> So kernel doesn't start write any pages out in background before it has at
> least 13Gb dirty pages in kernel memory.
> And at end of the checkpoint kernel trying flush all dirty pages to disk.
Thast not entirely true. The kernel will also writeout pages which haven't 
been written to for dirty_expire_centisecs.

But yes, adjusting dirty_* is definitely a good idea.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Fw: Re: [PERFORM] Custom function in where clause

2012-07-10 Thread Pena Kupen

Hello again,

Seems to be ok, by adding normal outer join and some fields on where-part.

Previous, I use to used with Oracle and Sybase databases as much as possible 
functions/procedures.
There ware something to do with performance: "Do it on server, not in client".
Typically all programs were c/s, maybe that or am I missing something?

--
kupen

Maxim Boguk [maxim.bo...@gmail.com] kirjoitti: 

On Tue, Jul 10, 2012 at 6:36 PM, Pena Kupen  wrote:

> Hi,
>
> I have searched solution to my problem a few days. On my query, there is
> big performance problem.
> It seems to me, that problem is on where-part of sql and it's function.
>
> My sql is:
> select count(*)
> from table_hwhere   level <=
> get_level_value(11268,id,area) and (date1 >= '2011-1-1'
> or date2>='2011-1-1') and  (date1 <= '2012-07-09' or
> date2<='2012-07-09')
> This takes about 40sek.
>
> select count(*)
> from table_hwhere   (date1 >=
> '2011-1-1' or date2>='2011-1-1') and  (date1 <=
> '2012-07-09' or date2<='2012-07-09')
> when ignoring function, it takes <1sek.
>
> Function is:
> CREATE OR REPLACE FUNCTION get_level_value(_user integer, _id, _area)
> RETURNS integer
>AS $$
> DECLARE found integer;
> BEGIN
>   SELECT 1 INTO found
>  FROM table_o
>  WHERE userid=_user AND
>id=_id AND
>area=_area;
>   IF (found) THEN
>  return 3;
>   ELSE
>  return 1;
>   END IF;
> END;
> $$
> LANGUAGE plpgsql;
>
> On explain, it seems to me that this function is on filter and it will
> execute on every row. Total resultset contains 1 700 000 rows.
> QUERY PLAN
> Aggregate  (cost=285543.89..285543.90 rows=1 width=0) (actual
> time=32391.380..32391.380 rows=1 loops=1)
>  ->  Bitmap Heap Scan on table_h  (cost=11017.63..284987.40 rows=222596
> width=0) (actual time=326.946..31857.145 rows=631818 loops=1)
>Recheck Cond: ((date1 >= '2011-01-01'::date) OR (date2 >=
> '2011-01-01'::date))
>Filter: (((date1 <= '2012-07-09'::date) OR (date2 <=
> '2012-07-09'::date)) AND (level <= get_level_value(11268, id, area)))
>->  BitmapOr  (cost=11017.63..11017.63 rows=669412 width=0) (actual
> time=321.635..321.635 rows=0 loops=1)
>  ->  Bitmap Index Scan on date1  (cost=0.00..10626.30
> rows=652457 width=0) (actual time=84.555..84.555 rows=647870 loops=1)
>Index Cond: (date1 >= '2011-01-01'::date)
>  ->  Bitmap Index Scan on date2_table_h  (cost=0.00..280.03
> rows=16955 width=0) (actual time=237.074..237.074 rows=15222 loops=1)
>Index Cond: (date2 >= '2011-01-01'::date)
>
> How should I handle this situation and use function?
>
>
You could not have good performance using function in case where direct
JOIN is only way to have reasonable performance.
Stop using function and write join with table_o instead, or put whole query
with join inside a function.

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ 

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
>9@C3: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."






--
Wippies-vallankumous on täällä! Varmista paikkasi vallankumouksen eturintamassa 
ja liity Wippiesiin heti!
http://www.wippies.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Custom function in where clause

2012-07-10 Thread Pena Kupen

Hi and thank's guys!

First trying this Brendan's recommendation.

It seems only a small difference between sql and PL/pgSQL. from 40-->37. Not so 
good yet.
I will try Maxim's little later and you all know.

--
kupen

Brendan Jurd [dire...@gmail.com] kirjoitti: 

On 10 July 2012 18:36, Pena Kupen  wrote:
> Hi,
>
> I have searched solution to my problem a few days. On my query, there is big
> performance problem.
> It seems to me, that problem is on where-part of sql and it's function.
>

> How should I handle this situation and use function?
>

I would start by rewriting your function in plain SQL rather than
PL/pgSQL.  As a general rule, don't write a function in PL/pgSQL
unless you really need procedurality.  This function does not.

For example:

CREATE OR REPLACE FUNCTION get_level_value(_user integer, _id, _area)
RETURNS integer
   AS $$
  -- Return 3 if there are matching records in table_o, otherwise return 1.
  SELECT CASE WHEN EXISTS (
 SELECT id
 FROM table_o
 WHERE userid=_user AND
   id=_id AND
   area=_area
  ) THEN 3 ELSE 1 END;
$$
LANGUAGE sql STABLE;

Cheers,
BJ




--
Wippies-vallankumous on täällä! Varmista paikkasi vallankumouksen eturintamassa 
ja liity Wippiesiin heti!
http://www.wippies.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Custom function in where clause

2012-07-10 Thread Brendan Jurd
On 10 July 2012 18:36, Pena Kupen  wrote:
> Hi,
>
> I have searched solution to my problem a few days. On my query, there is big
> performance problem.
> It seems to me, that problem is on where-part of sql and it's function.
>

> How should I handle this situation and use function?
>

I would start by rewriting your function in plain SQL rather than
PL/pgSQL.  As a general rule, don't write a function in PL/pgSQL
unless you really need procedurality.  This function does not.

For example:

CREATE OR REPLACE FUNCTION get_level_value(_user integer, _id, _area)
RETURNS integer
   AS $$
  -- Return 3 if there are matching records in table_o, otherwise return 1.
  SELECT CASE WHEN EXISTS (
 SELECT id
 FROM table_o
 WHERE userid=_user AND
   id=_id AND
   area=_area
  ) THEN 3 ELSE 1 END;
$$
LANGUAGE sql STABLE;

Cheers,
BJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Custom function in where clause

2012-07-10 Thread Maxim Boguk
On Tue, Jul 10, 2012 at 6:36 PM, Pena Kupen  wrote:

> Hi,
>
> I have searched solution to my problem a few days. On my query, there is
> big performance problem.
> It seems to me, that problem is on where-part of sql and it's function.
>
> My sql is:
> select count(*)
> from table_hwhere   level <=
> get_level_value(11268,id,area) and (date1 >= '2011-1-1'
> or date2>='2011-1-1') and  (date1 <= '2012-07-09' or
> date2<='2012-07-09')
> This takes about 40sek.
>
> select count(*)
> from table_hwhere   (date1 >=
> '2011-1-1' or date2>='2011-1-1') and  (date1 <=
> '2012-07-09' or date2<='2012-07-09')
> when ignoring function, it takes <1sek.
>
> Function is:
> CREATE OR REPLACE FUNCTION get_level_value(_user integer, _id, _area)
> RETURNS integer
>AS $$
> DECLARE found integer;
> BEGIN
>   SELECT 1 INTO found
>  FROM table_o
>  WHERE userid=_user AND
>id=_id AND
>area=_area;
>   IF (found) THEN
>  return 3;
>   ELSE
>  return 1;
>   END IF;
> END;
> $$
> LANGUAGE plpgsql;
>
> On explain, it seems to me that this function is on filter and it will
> execute on every row. Total resultset contains 1 700 000 rows.
> QUERY PLAN
> Aggregate  (cost=285543.89..285543.90 rows=1 width=0) (actual
> time=32391.380..32391.380 rows=1 loops=1)
>  ->  Bitmap Heap Scan on table_h  (cost=11017.63..284987.40 rows=222596
> width=0) (actual time=326.946..31857.145 rows=631818 loops=1)
>Recheck Cond: ((date1 >= '2011-01-01'::date) OR (date2 >=
> '2011-01-01'::date))
>Filter: (((date1 <= '2012-07-09'::date) OR (date2 <=
> '2012-07-09'::date)) AND (level <= get_level_value(11268, id, area)))
>->  BitmapOr  (cost=11017.63..11017.63 rows=669412 width=0) (actual
> time=321.635..321.635 rows=0 loops=1)
>  ->  Bitmap Index Scan on date1  (cost=0.00..10626.30
> rows=652457 width=0) (actual time=84.555..84.555 rows=647870 loops=1)
>Index Cond: (date1 >= '2011-01-01'::date)
>  ->  Bitmap Index Scan on date2_table_h  (cost=0.00..280.03
> rows=16955 width=0) (actual time=237.074..237.074 rows=15222 loops=1)
>Index Cond: (date2 >= '2011-01-01'::date)
>
> How should I handle this situation and use function?
>
>
You could not have good performance using function in case where direct
JOIN is only way to have reasonable performance.
Stop using function and write join with table_o instead, or put whole query
with join inside a function.

-- 
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/ 

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim.bo...@gmail.com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


[PERFORM] Custom function in where clause

2012-07-10 Thread Pena Kupen

Hi,

I have searched solution to my problem a few days. On my query, there is big 
performance problem.
It seems to me, that problem is on where-part of sql and it's function.

My sql is:
select count(*)
		from table_h 
		where 
			level <= get_level_value(11268,id,area) and 
			(date1 >= '2011-1-1' or date2>='2011-1-1') and 
			(date1 <= '2012-07-09' or date2<='2012-07-09')

This takes about 40sek.

select count(*)
		from table_h 
		where 
			(date1 >= '2011-1-1' or date2>='2011-1-1') and 
			(date1 <= '2012-07-09' or date2<='2012-07-09')

when ignoring function, it takes <1sek.

Function is:
CREATE OR REPLACE FUNCTION get_level_value(_user integer, _id, _area) RETURNS 
integer
   AS $$
DECLARE found integer;
BEGIN
  SELECT 1 INTO found
 FROM table_o
 WHERE userid=_user AND
   id=_id AND
   area=_area;
  IF (found) THEN
 return 3;
  ELSE
 return 1;
  END IF;
END;
$$
LANGUAGE plpgsql;

On explain, it seems to me that this function is on filter and it will execute 
on every row. Total resultset contains 1 700 000 rows.
QUERY PLAN
Aggregate  (cost=285543.89..285543.90 rows=1 width=0) (actual 
time=32391.380..32391.380 rows=1 loops=1)
 ->  Bitmap Heap Scan on table_h  (cost=11017.63..284987.40 rows=222596 
width=0) (actual time=326.946..31857.145 rows=631818 loops=1)
   Recheck Cond: ((date1 >= '2011-01-01'::date) OR (date2 >= 
'2011-01-01'::date))
   Filter: (((date1 <= '2012-07-09'::date) OR (date2 <= '2012-07-09'::date)) 
AND (level <= get_level_value(11268, id, area)))
   ->  BitmapOr  (cost=11017.63..11017.63 rows=669412 width=0) (actual 
time=321.635..321.635 rows=0 loops=1)
 ->  Bitmap Index Scan on date1  (cost=0.00..10626.30 rows=652457 
width=0) (actual time=84.555..84.555 rows=647870 loops=1)
   Index Cond: (date1 >= '2011-01-01'::date)
 ->  Bitmap Index Scan on date2_table_h  (cost=0.00..280.03 
rows=16955 width=0) (actual time=237.074..237.074 rows=15222 loops=1)
   Index Cond: (date2 >= '2011-01-01'::date)

How should I handle this situation and use function?

--
kupen

--
Wippies-vallankumous on täällä! Varmista paikkasi vallankumouksen eturintamassa 
ja liity Wippiesiin heti!
http://www.wippies.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance