[PERFORM] Re: Sub-optimal plan for a paginated query on a view with another view inside of it.

2013-08-05 Thread slapo

I apologise, I have neglected to mention Postgres versions tested. It occurs 
with 9.0 and 9.2
I have typo in my previous message - the sentence about vacuum, reindex and 
analyze should be:
"I had also run vacuum, reindex and analyze on the whole database, but it seems to 
have had no effect."
 
Thanks for any thoughts on the issue.
 
Peter Slapansky


__

Od: 
Komu: 
Dátum: 02.08.2013 15:43
Predmet: Sub-optimal plan for a paginated query on a view with another view 
inside of it.



Good day,
 
I have a performance issue when JOINing a view within another view more than 
once.
The query takes over three seconds to execute, which is too long in this case. 
It's not a problem if the tables are nearly empty, but that isn't the case on 
the production database.
 
I suspect the planner thinks it's better to first put together the v_address view and 
JOIN it to the parcel table later on, but the function 
"fx_get_user_tree_subordinates_by_id" should be JOINed to the parcel table 
first, as it reduces the number of rows to less than 200 and any following JOINs would be 
much faster.
 
I have also ran vacuum, reindex and analyze on the whole database, but it seems 
to have had to effect.
 
Is there any way to nudge the planner toward that way of execution?
 
This is the query:
https://app.box.com/s/jzxiuuxoyj28q4q8rzxr 

 
This is the query plan:
https://app.box.com/s/u8nk6qvkjs4ae7l7dh4h 
 (plain text)
https://app.box.com/s/jzxiuuxoyj28q4q8rzxr 
 (graphical output)
 
These are the views:
https://app.box.com/s/uibzidsazwv3eeauovuk 
 (paginated view)
https://app.box.com/s/v71vyexmdyl97m4f3m6u 
 (used three times in the paginated 
view).
 
 
Thank you.
 
Peter Slapansky



Re: [PERFORM] PG performance issues related to storage I/O waits

2013-08-05 Thread Tomas Vondra
Hi,

On 5.8.2013 17:55, Tasos Petalas wrote:
> 
> Seems most of the I/O is caused by SELECT backend processes (READ), 
> whereas (WRITE) requests of wal writer and checkpointer processes do
> not appear as top IO proceses (correct me if I am wrong)
> 
> E.g. check the follwoing heavy write process that reports 0% I/O ...!
>  
>  14:09:40   769 be/4 enterpri0.00 B/s   33.65 M/s  0.00 %  0.00 %
> postgres: wal writer process 

That's because the WAL writer does sequential I/O (writes), which is a
perfect match for SAS drives.

OTOH the queries do a lot of random reads, which is a terrible match for
spinners.

> That however still doesn't say which processes are responsible
> for that.
> Is that background writer, backends running queries or what? The
> iotop
> should give you answer to this (or at least a hint).
> 
> 
> It seems most of I/O reported from backends running heavy concurrent
> select queries (See iotop attachment in previous email)

Yes, that seems to be the case.

> Also, how are these volumes defined? Do they use distinct sets
> of disks?
> How many disks are used for each volume?
> 
> 
> These are LUNs from SAN (we have dedicated 16 SAS 2,5'' disks in RAID-10
> topology in Storage)

I do understand these are LUNs from the SAN. I was asking whether there
are separate sets of disks for the data directory (which you mentioned
to be RAID-10) and pg_archives (which you mentioned to be RAID-5).

Although I doubt it'd be possible to use the same disk for two LUNs.

> > Yes we are using 15K SAS disks in RAID 10. (253-2 dev refers
> to sar
> > output for disks)
> 
> OK, so the pg_archives is probably for xlog archive, right?
> 
> NO.
> /pg_archives is the target mount_point where we copy archive_logs to
> (archive_command = 'test ! -f /pg_archives/%f && cp %p /pg_archives/%f')

... which is exactly what WAL archive is. That's why the GUC is called
archive_command.

> I've checked the conf, and I think you should really consider
> increasing
> checkpoint_segments - it's set to 3 (= 64MB) but I think
> something like
> 32 (=512MB) or even more would be more appropriate.
> 
> We use EDB dynatune. Actual setting can be found in  file
> (Ticket.Usual.Info.27.07.13.txt) of initial e-mail --> check show all;
> section
> Current checkpoint_segments is set to 64

OK, I'm not familiar with dynatune, and I got confused by the
postgresql.conf that you sent. 64 seems fine to me.

> I see you've enabled log_checkpoints - can you check your logs
> how often
> the checkpoints happen?
> 
> 
> This is the output of the checkpoints during peak hours (avg. every 2-5
> minutes)
>  
> 2013-08-02 14:00:20 UTC [767]: [19752]: [0]LOG:  checkpoint complete:
> wrote 55926 buffers (5.3%); 0 transaction log file(s) added, 0 removed,
> 41 recycled; write=220.619 s, sync=
> 5.443 s, total=226.152 s; sync files=220, longest=1.433 s, average=0.024 s
> 2013-08-02 14:05:14 UTC [767]: [19754]: [0]LOG:  checkpoint complete:
> wrote 109628 buffers (10.5%); 0 transaction log file(s) added, 0
> removed, 31 recycled; write=209.714 s, syn
> c=9.513 s, total=219.252 s; sync files=222, longest=3.472 s, average=0.042 s

Meh, seems OK to me. This was based on the incorrect number of
checkpoint segments ...
> 
>  
> 
> Also, can you check pg_stat_bgwriter view? I'd bet the value in
> checkpoints_timed is very low, compared to checkpoints_req. Or even
> better, get the values from this view before / after running the
> batch jobs.
> 
> Results during load:
> checkpoints_timed : 12432 , checkpoints_req = 3058

Again, seems fine.

> In the afternoon it's a different story - for 253-2 it looks
> like this:
> 
>   DEV   tps  rd_sec/s  wr_sec/s   await%util
> 15:50:01 dev253-2   4742.91  33828.98  29156.17   84.84   105.14
> 16:00:01 dev253-2   2781.05  12737.41  18878.52   19.2480.53
> 16:10:01 dev253-2   3661.51  20950.64  23758.96   36.8699.03
> 16:20:01 dev253-2   5011.45  32454.33  31895.05   72.75   102.38
> 16:30:01 dev253-2   2638.08  14661.23  17853.16   25.2475.64
> 16:40:01 dev253-2   1988.95   5764.73  14190.12   45.0558.80
> 16:50:01 dev253-2   2185.15  88296.73  11806.387.4684.37
> 17:00:01 dev253-2   2031.19  12835.56  12997.348.9082.62
> 17:10:01 dev253-2   4009.24  34288.71  23974.92   38.07   103.01
> 17:20:01 dev253-2   3605.86  26107.83  22457.41   45.7690.90
> 17:30:01 dev253-2   2550.47   7496.85  18267.07   19.1065.87
> 
> 
> This is when the actual  problem arises

Well, then I think it's mostly about the SELECT queries.

> What I think you could/should do:
> 
> * move pg_xlog to a separate dev

[PERFORM] ORDER BY, LIMIT and indexes

2013-08-05 Thread Ivan Voras
Hello,

Assuming I have a huge table (doesn't fit in RAM), of which the most
important fields are "id" which is a SERIAL PRIMARY KEY and "active"
which is a boolean, and I'm issuing a query like:

SELECT * FROM table ORDER BY id DESC LIMIT 10

... is pgsql smart enough to use the index to fetch only the 10
required rows instead of reading the whole table, then sorting it,
then trimming the result set? How about in the following queries:

SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10

SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10

Or, more generally, is there some set of circumstances under which the
catastrophic scenario will happen?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-05 Thread Claudio Freire
On Mon, Aug 5, 2013 at 8:04 PM, Ivan Voras  wrote:
> SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10
>
> SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10

Did you try explain?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-05 Thread Michael Paquier
On Tue, Aug 6, 2013 at 8:25 AM, Claudio Freire wrote:

> On Mon, Aug 5, 2013 at 8:04 PM, Ivan Voras  wrote:
> > SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10
> >
> > SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10
>
> Did you try explain?
>
And did you run ANALYZE on your table to be sure that you generate correct
plans?
-- 
Michael


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-05 Thread Josh Berkus
Ivan,

> Or, more generally, is there some set of circumstances under which the
> catastrophic scenario will happen?

Yes:

SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10

This is the "high offset" problem, and affects all databases which
support applications with paginated results, including non-relational
ones like SOLR.  The basic problem is that you can't figure out what is
OFFSET 10 without first sorting the first 10 results.

The easiest solution is to limit the number of pages your users can
"flip through".  Generally anyone asking for page 10,000 is a bot
screen-scraping your site, anyway.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-05 Thread Sergey Konoplev
On Mon, Aug 5, 2013 at 6:22 PM, Josh Berkus  wrote:
>> Or, more generally, is there some set of circumstances under which the
>> catastrophic scenario will happen?
>
> Yes:
>
> SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10
>
> This is the "high offset" problem, and affects all databases which
> support applications with paginated results, including non-relational
> ones like SOLR.  The basic problem is that you can't figure out what is
> OFFSET 10 without first sorting the first 10 results.
>
> The easiest solution is to limit the number of pages your users can
> "flip through".  Generally anyone asking for page 10,000 is a bot
> screen-scraping your site, anyway.

In addition to Josh's answer I would like to mention that it might be
worth to use partial index like this

CREATE INDEX i_table_id_active ON table (is) WHERE active

in this particular case

SELECT * FROM table
WHERE active
ORDER BY id DESC
LIMIT 10 OFFSET 10

so it will prevent from long filtering tons of rows in case of long
"NOT active" gaps in the beginning of the scanning sequence.

As an alternative solution for pagination (OFFSET) problem you might
also use the "prev/next" technique, like

SELECT * FROM table
WHERE id > :current_last_id
ORDER BY id LIMIT 10

for "next", and

SELECT * FROM (
SELECT * FROM table
WHERE id < :current_first_id
ORDER BY id DESC
LIMIT 10
) AS sq ORDER BY id

for "prev". It will be very fast.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-05 Thread David Johnston
Sergey Konoplev-2 wrote
> As an alternative solution for pagination (OFFSET) problem you might
> also use the "prev/next" technique, like
> 
> SELECT * FROM table
> WHERE id > :current_last_id
> ORDER BY id LIMIT 10
> 
> for "next", and
> 
> SELECT * FROM (
> SELECT * FROM table
> WHERE id < :current_first_id
> ORDER BY id DESC
> LIMIT 10
> ) AS sq ORDER BY id
> 
> for "prev". It will be very fast.

Even being fairly experienced at SQL generally because I haven't explored
pagination that much my awareness of the OFFSET issue led me to conclude bad
things.  Thank you for thinking to take the time for a brief moment of
enlightenment of something you likely take for granted by now.

Curious how much slower/faster these queries would run if you added:

SELECT *, first_value(id) OVER (...), last_value(id) OVER (...)  
--note the window specifications need to overcome the "ORDER BY" limitation
noted in the documentation.

to the query.  Using the window functions you know at each record what the
first and last ids are for its window.  Applicability would be
application/need specific but it would avoid having to calculate/maintain
these two values in a separate part of the application.

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/ORDER-BY-LIMIT-and-indexes-tp5766413p5766429.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG performance issues related to storage I/O waits

2013-08-05 Thread Tasos Petalas
On Mon, Aug 5, 2013 at 11:28 PM, Tomas Vondra  wrote:

> Hi,
>
> On 5.8.2013 17:55, Tasos Petalas wrote:
> >
> > Seems most of the I/O is caused by SELECT backend processes (READ),
> > whereas (WRITE) requests of wal writer and checkpointer processes do
> > not appear as top IO proceses (correct me if I am wrong)
> >
> > E.g. check the follwoing heavy write process that reports 0% I/O ...!
> >
> >  14:09:40   769 be/4 enterpri0.00 B/s   33.65 M/s  0.00 %  0.00 %
> > postgres: wal writer process
>
> That's because the WAL writer does sequential I/O (writes), which is a
> perfect match for SAS drives.
>
> OTOH the queries do a lot of random reads, which is a terrible match for
> spinners.
>
> > That however still doesn't say which processes are responsible
> > for that.
> > Is that background writer, backends running queries or what? The
> > iotop
> > should give you answer to this (or at least a hint).
> >
> >
> > It seems most of I/O reported from backends running heavy concurrent
> > select queries (See iotop attachment in previous email)
>
> Yes, that seems to be the case.
>
> > Also, how are these volumes defined? Do they use distinct sets
> > of disks?
> > How many disks are used for each volume?
> >
> >
> > These are LUNs from SAN (we have dedicated 16 SAS 2,5'' disks in RAID-10
> > topology in Storage)
>
> I do understand these are LUNs from the SAN. I was asking whether there
> are separate sets of disks for the data directory (which you mentioned
> to be RAID-10) and pg_archives (which you mentioned to be RAID-5).
>
> Although I doubt it'd be possible to use the same disk for two LUNs.
>

Sorry I didn't get you question right. Yes there are different disk sets
for RAID-10 (data) and RAID-5 (wal archives)

>
> > > Yes we are using 15K SAS disks in RAID 10. (253-2 dev refers
> > to sar
> > > output for disks)
> >
> > OK, so the pg_archives is probably for xlog archive, right?
> >
> > NO.
> > /pg_archives is the target mount_point where we copy archive_logs to
> > (archive_command = 'test ! -f /pg_archives/%f && cp %p /pg_archives/%f')
>
> ... which is exactly what WAL archive is. That's why the GUC is called
> archive_command.
>

Again misunderstood your question. I wrongly got you're asking for separate
LUN for WAL (pg_xlog to a separate device and not WAL archives)

>
> > I've checked the conf, and I think you should really consider
> > increasing
> > checkpoint_segments - it's set to 3 (= 64MB) but I think
> > something like
> > 32 (=512MB) or even more would be more appropriate.
> >
> > We use EDB dynatune. Actual setting can be found in  file
> > (Ticket.Usual.Info.27.07.13.txt) of initial e-mail --> check show all;
> > section
> > Current checkpoint_segments is set to 64
>
> OK, I'm not familiar with dynatune, and I got confused by the
> postgresql.conf that you sent. 64 seems fine to me.
>

Understood. EDB dynatune is a specific feature that ships with EDB PG
versions and suppose to take care of most of the PG conf parameters (found
in postgresql.conf) automatically and adjust them in run time (You can
always override them).

"Show all" command in psql promt gives you the actual values at any given
time.


> > I see you've enabled log_checkpoints - can you check your logs
> > how often
> > the checkpoints happen?
> >
> >
> > This is the output of the checkpoints during peak hours (avg. every 2-5
> > minutes)
> >
> > 2013-08-02 14:00:20 UTC [767]: [19752]: [0]LOG:  checkpoint complete:
> > wrote 55926 buffers (5.3%); 0 transaction log file(s) added, 0 removed,
> > 41 recycled; write=220.619 s, sync=
> > 5.443 s, total=226.152 s; sync files=220, longest=1.433 s, average=0.024
> s
> > 2013-08-02 14:05:14 UTC [767]: [19754]: [0]LOG:  checkpoint complete:
> > wrote 109628 buffers (10.5%); 0 transaction log file(s) added, 0
> > removed, 31 recycled; write=209.714 s, syn
> > c=9.513 s, total=219.252 s; sync files=222, longest=3.472 s,
> average=0.042 s
>
> Meh, seems OK to me. This was based on the incorrect number of
> checkpoint segments ...
> >
> >
> >
> > Also, can you check pg_stat_bgwriter view? I'd bet the value in
> > checkpoints_timed is very low, compared to checkpoints_req. Or
> even
> > better, get the values from this view before / after running the
> > batch jobs.
> >
> > Results during load:
> > checkpoints_timed : 12432 , checkpoints_req = 3058
>
> Again, seems fine.
>
>
Update values for pg_stat_bgwriter after batch activity (off-peak)
 checkpoints_timed : 12580 checkpoints_req : 3070

I don't see any significant difference here.


> > In the afternoon it's a different story - for 253-2 it looks
> > like this:
> >
> >   DEV   tps  rd_sec/s  wr_sec/s   await
>  %util
> > 15:50:01 dev253-2   4742.91  33828.98  29156.17   84.84