Re: [PERFORM] query slowdown after 9.0 -> 9.4 migration

2016-10-27 Thread Filip Rembiałkowski
On Thu, Oct 27, 2016 at 7:38 AM, Andreas Kretschmer <
akretsch...@spamfence.net> wrote:

> Tomas Vondra  wrote:
>
> >
> > Perhaps the table is organized / sorted differently, or something like
> > that. How did you do the upgrade?
>
>
Nothing special, dump + reload. The table in question is tiny - 280 kB, 859
rows.



> Maybe table-bloat? Filip, check if autovacuum runs properly.
>
> ​
Yes, it does. Just to be sure I ran VACUUM FULL, ANALZYE and REINDEX on all
tables and indexes - no change :-(

Any other ideas (before drawing on heavy tools like strace)?


Does it make sense to ask on postgis-users list?

​

​Thanks,
Filip
​


[PERFORM] query slowdown after 9.0 -> 9.4 migration

2016-10-26 Thread Filip Rembiałkowski
Hi.

Query run time degraded after migration from Pg 9.0 + postgis 1.5 to Pg 9.4
+ postgis 2.2.

1 ms versus 7 ms.

Same query, same data, same schema, similar hardware. Data is small and
fits in cache.

EXPLAIN shows heap scan cost increase. What can be the reason for 40-fold
increase in page scans needed to run Bitmap Heap Scan with Filter and
Recheck?

GIST index performance looks OK.


PostgreSQL 9.0.23 on x86_64-suse-linux-gnu, compiled by GCC gcc (SUSE
Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064], 64-bit
POSTGIS="1.5.4" GEOS="3.4.2-CAPI-1.8.2 r3921" PROJ="Rel. 4.8.0, 6 March
2012" LIBXML="2.7.8" USE_STATS (procs from 1.5 r5976 need upgrade)
-> https://explain.depesz.com/s/C3Vw

PostgreSQL 9.4.7 on x86_64-suse-linux-gnu, compiled by gcc (SUSE Linux)
4.8.5, 64-bit
POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4084" PROJ="Rel. 4.9.2, 08
September 2015" GDAL="GDAL 2.1.0, released 2016/04/25 GDAL_DATA not found"
LIBXML="2.9.1" LIBJSON="0.12" (core procs from "2.2.1 r14555" need upgrade)
TOPOLOGY (topology procs from "2.2.1 r14555" need upgrade) RASTER (raster
procs from "2.2.1 r14555" need upgrade)
-> https://explain.depesz.com/s/24GA


Query:

SELECT
round(meters_to_miles(st_distance_sphere(ST_GeomFromText('POINT(-77.0364
38.89524)', 4326),llpoint))::numeric,2) as _distance FROM storelocator
WHERE st_expand(ST_GeomFromText('POINT(-77.0364 38.89524)', 4326),
miles_to_degree(50,38.89524)) && llpoint AND
st_distance_sphere(ST_GeomFromText('POINT(-77.0364 38.89524)', 4326),
llpoint) <= miles_to_meters(50) ORDER BY _distance LIMIT 10;



thanks for any suggestions / ideas.



Filip


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-06 Thread Filip Rembiałkowski
Thanks for the feedback.

This patch is my first and obvious approach.

@Merlin, I'm not sure if I get your idea:
- keep previous behaviour as obligatory? (which is: automatic
de-duplicating of incoming messages by channel+payload),
- instead of trivial search (sorting by browsing) use some kind of
faster lookups?

I'm not sure if this statement in async.c is carved in stone:

* Duplicate notifications from the same transaction are sent out as one
* notification only. This is done to save work when for example a trigger
* on a 2 million row table fires a notification for each row that has been
* changed. If the application needs to receive every single notification
* that has been sent, it can easily add some unique string into the extra
* payload parameter.

1) "work-saving" is disputable in some cases

2) an idea to "add some unique string" is OK logical-wise but it's not
OK performance-wise.

Current search code is a sequential search:
https://github.com/filiprem/postgres/blob/master/src/backend/commands/async.c#L2139

I'm not that smart to devise an algorithm for faster lookups -
probably you guys can give some advice.

Again, my rationale is... This feature can burn a lot of CPU for
nothing. I was hoping to use NOTIFY/LISTEN as superfast notification
mechanism. Superfast regardless on whether you insert 100, 10k or 1m
rows.




On Fri, Feb 5, 2016 at 8:52 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
> On Fri, Feb 5, 2016 at 9:33 AM, Filip Rembiałkowski
> <filip.rembialkow...@gmail.com> wrote:
>> patch submitted on -hackers list.
>> http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com
>>
>> results after the patch:
>>
>> trigger= BEGIN RETURN NULL; END
>> rows=4
>>   228ms COPY test.tab FROM '/tmp/test.dat'
>>   205ms COPY test.tab FROM '/tmp/test.dat'
>> rows=8
>>   494ms COPY test.tab FROM '/tmp/test.dat'
>>   395ms COPY test.tab FROM '/tmp/test.dat'
>> rows=12
>>   678ms COPY test.tab FROM '/tmp/test.dat'
>>   652ms COPY test.tab FROM '/tmp/test.dat'
>> rows=16
>>   956ms COPY test.tab FROM '/tmp/test.dat'
>>   822ms COPY test.tab FROM '/tmp/test.dat'
>> rows=20
>>  1184ms COPY test.tab FROM '/tmp/test.dat'
>>  1072ms COPY test.tab FROM '/tmp/test.dat'
>> trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END
>> rows=4
>>   440ms COPY test.tab FROM '/tmp/test.dat'
>>   406ms COPY test.tab FROM '/tmp/test.dat'
>> rows=8
>>   887ms COPY test.tab FROM '/tmp/test.dat'
>>   769ms COPY test.tab FROM '/tmp/test.dat'
>> rows=12
>>  1346ms COPY test.tab FROM '/tmp/test.dat'
>>  1171ms COPY test.tab FROM '/tmp/test.dat'
>> rows=16
>>  1710ms COPY test.tab FROM '/tmp/test.dat'
>>  1709ms COPY test.tab FROM '/tmp/test.dat'
>> rows=20
>>  2189ms COPY test.tab FROM '/tmp/test.dat'
>>  2206ms COPY test.tab FROM '/tmp/test.dat'
>
> I'm not so sure that this is a great idea.  Generally, we tend to
> discourage GUCs that control behavior at the SQL level.  Are you 100%
> certain that there is no path to optimizing this case without changing
> behvior?
>
> merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-05 Thread Filip Rembiałkowski
patch submitted on -hackers list.
http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpdeh...@mail.gmail.com

results after the patch:

trigger= BEGIN RETURN NULL; END
rows=4
  228ms COPY test.tab FROM '/tmp/test.dat'
  205ms COPY test.tab FROM '/tmp/test.dat'
rows=8
  494ms COPY test.tab FROM '/tmp/test.dat'
  395ms COPY test.tab FROM '/tmp/test.dat'
rows=12
  678ms COPY test.tab FROM '/tmp/test.dat'
  652ms COPY test.tab FROM '/tmp/test.dat'
rows=16
  956ms COPY test.tab FROM '/tmp/test.dat'
  822ms COPY test.tab FROM '/tmp/test.dat'
rows=20
 1184ms COPY test.tab FROM '/tmp/test.dat'
 1072ms COPY test.tab FROM '/tmp/test.dat'
trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END
rows=4
  440ms COPY test.tab FROM '/tmp/test.dat'
  406ms COPY test.tab FROM '/tmp/test.dat'
rows=8
  887ms COPY test.tab FROM '/tmp/test.dat'
  769ms COPY test.tab FROM '/tmp/test.dat'
rows=12
 1346ms COPY test.tab FROM '/tmp/test.dat'
 1171ms COPY test.tab FROM '/tmp/test.dat'
rows=16
 1710ms COPY test.tab FROM '/tmp/test.dat'
 1709ms COPY test.tab FROM '/tmp/test.dat'
rows=20
 2189ms COPY test.tab FROM '/tmp/test.dat'
 2206ms COPY test.tab FROM '/tmp/test.dat'



On Fri, Feb 5, 2016 at 1:45 PM, Filip Rembiałkowski <
filip.rembialkow...@gmail.com> wrote:

> On Thu, Feb 4, 2016 at 11:41 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>
>> =?UTF-8?Q?Filip_Rembia=C5=82kowski?= <filip.rembialkow...@gmail.com>
>> writes:
>> > A table has a trigger.
>> > The trigger sends a NOTIFY.
>> > Test with COPY FROM shows non-linear correlation between number of
>> inserted
>> > rows and COPY duration.
>>
>> No surprise, see AsyncExistsPendingNotify.  You would have a lot of other
>> performance issues with sending hundreds of thousands of distinct notify
>> events from one transaction anyway, so I can't get terribly excited about
>> this.
>>
>
>
> What kind of issues? Do you mean, problems in postgres or problems in
> client?
>
> Is there an additional non-linear cost on COMMIT (extra to the cost I
> already showed)?
>
> The 8GB internal queue (referenced in a Note at
> http://www.postgresql.org/docs/current/static/sql-notify.html) should be
> able to keep ~ 1E8 such notifications (assumed one notification will fit in
> 80 bytes).
>
> On client side, this seems legit - the LISTENer deamon will collect these
> notifications and process them in line.
> There might be no LISTENer running at all.
>
> Still, the main problem I get with this approach is quadratic cost on big
> insert transactions.
> I wonder if this behavior is possible to change in future postgres
> versions. And how much programming work does it require.
>
> Is duplicate-elimination a fundamental, non-negotiable requirement?
>
>
>
> Thank you,
> Filip
>
>


Re: [PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-05 Thread Filip Rembiałkowski
On Thu, Feb 4, 2016 at 11:41 PM, Tom Lane  wrote:

> =?UTF-8?Q?Filip_Rembia=C5=82kowski?= 
> writes:
> > A table has a trigger.
> > The trigger sends a NOTIFY.
> > Test with COPY FROM shows non-linear correlation between number of
> inserted
> > rows and COPY duration.
>
> No surprise, see AsyncExistsPendingNotify.  You would have a lot of other
> performance issues with sending hundreds of thousands of distinct notify
> events from one transaction anyway, so I can't get terribly excited about
> this.
>


What kind of issues? Do you mean, problems in postgres or problems in
client?

Is there an additional non-linear cost on COMMIT (extra to the cost I
already showed)?

The 8GB internal queue (referenced in a Note at
http://www.postgresql.org/docs/current/static/sql-notify.html) should be
able to keep ~ 1E8 such notifications (assumed one notification will fit in
80 bytes).

On client side, this seems legit - the LISTENer deamon will collect these
notifications and process them in line.
There might be no LISTENer running at all.

Still, the main problem I get with this approach is quadratic cost on big
insert transactions.
I wonder if this behavior is possible to change in future postgres
versions. And how much programming work does it require.

Is duplicate-elimination a fundamental, non-negotiable requirement?



Thank you,
Filip


[PERFORM] bad COPY performance with NOTIFY in a trigger

2016-02-04 Thread Filip Rembiałkowski
Hi.

A table has a trigger.
The trigger sends a NOTIFY.

Test with COPY FROM shows non-linear correlation between number of inserted
rows and COPY duration.

 Table "test.tab"
 Column  |  Type   |   Modifiers
-+-+---
 id  | integer | not null default nextval('test.tab_id_seq'::regclass)
 payload | text|
Indexes:
"tab_pkey" PRIMARY KEY, btree (id)
Triggers:
trg AFTER INSERT ON test.tab FOR EACH ROW EXECUTE PROCEDURE test.fun()


Test Series 1. Trigger code:
BEGIN RETURN NULL; END
You can see linear scaling.

rows=4
  191ms COPY test.tab FROM '/tmp/test.dat'
  201ms COPY test.tab FROM '/tmp/test.dat'
rows=8
  426ms COPY test.tab FROM '/tmp/test.dat'
  415ms COPY test.tab FROM '/tmp/test.dat'
rows=12
  634ms COPY test.tab FROM '/tmp/test.dat'
  616ms COPY test.tab FROM '/tmp/test.dat'
rows=16
  843ms COPY test.tab FROM '/tmp/test.dat'
  834ms COPY test.tab FROM '/tmp/test.dat'
rows=20
 1101ms COPY test.tab FROM '/tmp/test.dat'
 1094ms COPY test.tab FROM '/tmp/test.dat'


Test Series 2. Trigger code:
BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL;
You can see non-linear scaling.

rows=4
 9767ms COPY test.tab FROM '/tmp/test.dat'
 8901ms COPY test.tab FROM '/tmp/test.dat'
rows=8
37409ms COPY test.tab FROM '/tmp/test.dat'
38015ms COPY test.tab FROM '/tmp/test.dat'
rows=12
90227ms COPY test.tab FROM '/tmp/test.dat'
87838ms COPY test.tab FROM '/tmp/test.dat'
rows=16
   160080ms COPY test.tab FROM '/tmp/test.dat'
   159801ms COPY test.tab FROM '/tmp/test.dat'
rows=20
   247330ms COPY test.tab FROM '/tmp/test.dat'
   251191ms COPY test.tab FROM '/tmp/test.dat'


O(N^2) 


[PERFORM] 9.0 performance degradation with kernel 3.11

2014-11-13 Thread Filip Rembiałkowski
Hi

After upgrading our 9.0 database server

from:
openSUSE 11.4, kernel 2.6.37.6-24-default, Pg 9.0.13

to:
openSUSE 13.1, kernel v 3.11.10-21-default, Pg 9.0.15

... and  overall server load is +1 after that.

We did not add any new services/daemons.

It's hard to track down to individual queries - when I tested most
individual query times are same as before the migration.


Any - ANY - hints will be much appreciated.

Thanks
Filip


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple join doesn't use index

2013-01-29 Thread Filip Rembiałkowski
On Tue, Jan 29, 2013 at 8:24 AM, Alex Vinnik alvinni...@gmail.com wrote:

 On Mon, Jan 28, 2013 at 6:55 PM, Filip Rembiałkowski 
 plk.zu...@gmail.comwrote:


 do you know pgtune?
 it's a good tool for starters, if you want a fast postgres and don't
 really want to learn what's behind the scenes.

 Yeah.. I came across pgtune but noticed that latest version dated
 2009-10-29 http://pgfoundry.org/frs/?group_id=1000416 which is kind of
 outdated. Tar file has settings for pg 8.3. Is still relevant?


Yes, I'm sure it will not do anything bad to your config.



 random_page_cost=1 might be not what you really want.
 it would mean that random reads are as fast as as sequential reads, which
 probably is true only for SSD

 What randon_page_cost would be more appropriate for EC2 EBS Provisioned
 volume that can handle 2,000 IOPS?



I'd say: don't guess. Measure.
Use any tool that can test sequential disk block reads versus random disk
block reads.
bonnie++ is quite popular.



Filip


Re: [PERFORM] Simple join doesn't use index

2013-01-28 Thread Filip Rembiałkowski
On Mon, Jan 28, 2013 at 5:43 PM, Alex Vinnik alvinni...@gmail.com wrote:

 It sure turned out that default settings are not a good fit.


do you know pgtune?
it's a good tool for starters, if you want a fast postgres and don't really
want to learn what's behind the scenes.

random_page_cost=1 might be not what you really want.
it would mean that random reads are as fast as as sequential reads, which
probably is true only for SSD


Filip


Re: [PERFORM] problem with large inserts

2012-12-13 Thread Filip Rembiałkowski
Just an idea - how long does it take to run _only_
CREATE TEMP TABLE foo AS your SELECT here




On Thu, Dec 13, 2012 at 4:37 PM, Lutz Fischer
lfisc...@staffmail.ed.ac.uk wrote:
 Hi

 I have currently some trouble with inserts into a table

 INSERT INTO LPP (PPID, LID)
 SELECT DISTINCT PPid, LID FROM
 (SELECT * FROM PP WHERE s_id = sid) pp
 INNER JOIN
 has_protein hp1
 ON pp.p1id = hp1.pid
 INNER JOIN
 has_protein hp2
 ON pp.p2_id = hp2.pid
 INNER JOIN
 (SELECT * FROM L WHERE s_id = sid) l
 ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
 pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
 OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
 pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
 ;

 If I run only

 SELECT DISTINCT PPid, LID FROM
 (SELECT * FROM PP WHERE s_id = 708) pp
 INNER JOIN
 has_protein hp1
 ON pp.p1id = hp1.pid
 INNER JOIN
 has_protein hp2
 ON pp.p2_id = hp2.pid
 INNER JOIN
 (SELECT * FROM L WHERE s_id = 708) l
 ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
 pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
 OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
 pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
 ;

 it returns 200620 rows in 170649 ms  ( thats just under 3  minutes). I
 stopped the actual insert after about 8h.

 The table that the insert happens to, is following:
 CREATE TABLE LPP
 (
   ppid bigint NOT NULL,
   lid bigint NOT NULL,
   CONSTRAINT pk_lpp PRIMARY KEY (ppid,lid)
 )

 I also tried without the primary key but that one is still running for
 more that a day.

 Currently the table LPP holds 471139 rows. Its linking the PP and the L
 table.

 There are no foreign keys referring to that table nor are there any
 other constraints on it.
 Previously I had foreign keys on lid and ppid refering to the L and PP
 table. But in a desperate try to get some speed up I deleted these. -
 But still...

 I am running postgresql 9.2 on a windows 2008  R2 server with 256 GB and
 the database is on something like a raid 1+0 (actually a raid1e)
 consisting of 3x4TB disks (limit of what could easily be fitted into the
 server).

 At the given time  there were no concurrent access to any of the
 involved tables.

 Has anybody some idea why the insert takes so long and/or how to speed
 things up a bit? I could live with something like half an hour - better
 would be in minutes.


 Thanks for any responds,

 Lutz Fischer


 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] problem with large inserts

2012-12-13 Thread Filip Rembiałkowski
Hmm, so it is some kind of file / table locking issue, not general IO
system malfunction.

It would be interesting and useful to run this use case on other
postgres instance (or several instances), including non-Windows ones.

OTOH Pg on Windows housekeeping was always fun - I advise all my
clients to avoid it for production purposes.




On Thu, Dec 13, 2012 at 5:33 PM, Lutz Fischer
lfisc...@staffmail.ed.ac.uk wrote:
 Thanks a lot you saved my day

 create temp table foo AS SELECT DISTINCT ...
 did take a mere 77464.744 ms
 And an additional
 Insert into LPP select * from foo;
 Just 576.909 ms

 I don't really understand why it's working via a temp table but not
 directly (or in any reasonable amount of time) - but at least I have a
 solution I can work with.


 On 13/12/12 16:09, Filip Rembiałkowski wrote:
 Just an idea - how long does it take to run _only_
 CREATE TEMP TABLE foo AS your SELECT here




 On Thu, Dec 13, 2012 at 4:37 PM, Lutz Fischer
 lfisc...@staffmail.ed.ac.uk wrote:
 Hi

 I have currently some trouble with inserts into a table

 INSERT INTO LPP (PPID, LID)
 SELECT DISTINCT PPid, LID FROM
 (SELECT * FROM PP WHERE s_id = sid) pp
 INNER JOIN
 has_protein hp1
 ON pp.p1id = hp1.pid
 INNER JOIN
 has_protein hp2
 ON pp.p2_id = hp2.pid
 INNER JOIN
 (SELECT * FROM L WHERE s_id = sid) l
 ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
 pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
 OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
 pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
 ;

 If I run only

 SELECT DISTINCT PPid, LID FROM
 (SELECT * FROM PP WHERE s_id = 708) pp
 INNER JOIN
 has_protein hp1
 ON pp.p1id = hp1.pid
 INNER JOIN
 has_protein hp2
 ON pp.p2_id = hp2.pid
 INNER JOIN
 (SELECT * FROM L WHERE s_id = 708) l
 ON (hp1.pid = l.p1id AND hp2.pid = l.p2id AND hp1.ppos +
 pp.s1 = l.s1 AND hp2.ppos + pp.s2 = l.s2)
 OR (hp1.pid = l.p2id AND hp2.pid = l.p1id AND hp1.ppos +
 pp.s1 = l.s2 AND hp2.ppos + pp.s2 = l.s1)
 ;

 it returns 200620 rows in 170649 ms  ( thats just under 3  minutes). I
 stopped the actual insert after about 8h.

 The table that the insert happens to, is following:
 CREATE TABLE LPP
 (
   ppid bigint NOT NULL,
   lid bigint NOT NULL,
   CONSTRAINT pk_lpp PRIMARY KEY (ppid,lid)
 )

 I also tried without the primary key but that one is still running for
 more that a day.

 Currently the table LPP holds 471139 rows. Its linking the PP and the L
 table.

 There are no foreign keys referring to that table nor are there any
 other constraints on it.
 Previously I had foreign keys on lid and ppid refering to the L and PP
 table. But in a desperate try to get some speed up I deleted these. -
 But still...

 I am running postgresql 9.2 on a windows 2008  R2 server with 256 GB and
 the database is on something like a raid 1+0 (actually a raid1e)
 consisting of 3x4TB disks (limit of what could easily be fitted into the
 server).

 At the given time  there were no concurrent access to any of the
 involved tables.

 Has anybody some idea why the insert takes so long and/or how to speed
 things up a bit? I could live with something like half an hour - better
 would be in minutes.


 Thanks for any responds,

 Lutz Fischer


 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


 --
 Lutz Fischer
 lfisc...@staffmail.ed.ac.uk
 +44 131 6517057


 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] parse - bind take more time than execute

2011-12-27 Thread Filip Rembiałkowski
hello.

1. planning time  execute time, it can happen normally, for some
fast-executing queries, so it is not bad per se.

2. what are your statistics settings? they influence planning time. I
mean default_statistics_target and per-column SET STATISTICS?

3. upgrade to 8.4.10, it's quick upgrade (minimal downtime) and there
were some planner improvements.

4. what is considerably more time in absolute units?


Filip


2011/12/27 MirrorX mirr...@gmail.com:
 there are some performance issues on a server and by searching in the logs i
 noticed that the phases of parse and bind take considerably more time than
 execute for most of the queries. i guess that the right thing to do in this
 case is to use functions or prepare statements but in any case, what could
 be the cause of this?

 information about the server-
 -CentOS 5.6
 -4-cores
 -12GB ram


 shared_buffers: 1 GB
 temp_buffers = 100MB
 work_mem : 30 MB
 maintenance_mem: 512 MB

 database_size: 1,5 GB
 archive_mode is ON
 vacuum/analyze (vacuum_scale_factor 0.1, analyze 0.05)


 this behaviour is not related with checkpoints on the database (as indicated
 by the logs, i dont see this latency when a checkpoint occurs, i see it most
 of the time)

 so my question is the following; what can cause the bind/parse phases to
 take so much longer than the execute? if you need any more info the server i
 ll be glad to provide it. thank you in advance for your advice

 --
 View this message in context: 
 http://postgresql.1045698.n5.nabble.com/parse-bind-take-more-time-than-execute-tp5102940p5102940.html
 Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] will the planner ever use an index when the condition is ?

2011-12-17 Thread Filip Rembiałkowski
Normally there is no chance it could work,
because (a) the planner does not know all possible values of a column,
and (b) btree indexes cannot search on not equal operator.


BTW I've just made a case where - logically - it could work, but it
still does not:

create table nums ( num int4 not null, check(num=1 or num=2) );
insert into nums select case when random()=0.99 then 1 else 2 end
from generate_series(1,100);
create index nums_idx on nums(num);
analyze nums;
set constraint_exclusion to 'on';
explain select * from nums where num1;
--planner could estimate selectivity as 1%, and use index with =2
filter basing on check constraint?




2011/12/17 Roxanne Reid-Bennett r...@tara-lu.com:
 I have a query that used  against an indexed column. In this
 case I can use the reverse and use in or = and get the performance
 I need... but in general... will the planner ever use an index when the
 related column is compared using ?

 I feel like the answer is no, but wanted to ask.

 Roxanne
 Postgres Version 8.4.9 PostGIS version 1.5.2



 Context for question:

 I have the following query:

 select *
 from op_region opr, yield_segment_info ysi, data_location dl
 where opr.op_region_id in
         (select distinct op_region_id
          from yield_point
          where yield  0
          and area  0
          and ST_GeometryType(location)  'ST_Point'
         )
 and ysi.op_region_id = opr.op_region_id
 and dl.data_set_id = opr.data_set_id

 Yield_Point has 161,575,599 records
 where yield 0 and area  0 has 161,263,193 records,
 where ST_GeometryType(location) 'ST_Point' has just 231 records

 yield_segment_info has 165,929 records
 op_region has 566,212 records
 data_location has 394,763

 All of these have a high volume of insert/delete's.
 The tables have recently been vacuum full'd and the indexes reindexed.
 [they are under the management of the autovacuum, but we forced a cleanup on
 the chance that things had degraded...]

 If I run an explain analyze:

 Nested Loop
    (cost=5068203.00..5068230.31 rows=3 width=225308)
    (actual time=192571.730..193625.728 rows=236 loops=1)
 -Nested Loop
      (cost=5068203.00..5068219.66 rows=1 width=57329)
      (actual time=192522.573..192786.698 rows=230 loops=1)
   -Nested Loop
        (cost=5068203.00..5068211.36 rows=1 width=57268)
        (actual time=192509.822..192638.446 rows=230 loops=1)
     -HashAggregate
           (cost=5068203.00..5068203.01 rows=1 width=4)
           (actual time=192471.507..192471.682 rows=230 loops=1)
        -Seq Scan on yield_point
              (cost=0.00..5068203.00 rows=1 width=4)
              (actual time=602.174..192471.177 rows=230 loops=1)
              Filter: ((yield  0::double precision) AND
                       (area  0::double precision) AND
                       (st_geometrytype(location)  'ST_Point'::text))
     -Index Scan using op_region_pkey on op_region opr
           (cost=0.00..8.33 rows=1 width=57264)
           (actual time=0.723..0.723 rows=1 loops=230)
           Index Cond: (opr.op_region_id = yield_point.op_region_id)
   -Index Scan using yield_segment_info_key on yield_segment_info ysi
       (cost=0.00..8.29 rows=1 width=65)
       (actual time=0.643..0.643 rows=1 loops=230)
       Index Cond: (ysi.op_region_id = opr.op_region_id)
 -Index Scan using data_location_data_set_idx on data_location dl
    (cost=0.00..10.61 rows=3 width=167979)
    (actual time=3.611..3.646 rows=1 loops=230)
 Index Cond: (dl.data_set_id = opr.data_set_id)
 Total runtime: 193625.955 ms

 yield_point has the following indexes:
      btree on ST_GeometryType(location)
      gist on location
      btree on op_region_id

 I've also tried an index on
      ((yield  0::double precision) AND (area  0::double precision) AND
 (st_geometrytype(location)  'ST_Point'::text))
 ... it still goes for the sequential scan.

 But if I change it to st_geometrytype(location) = 'ST_Polygon' or
 even in ('ST_Polygon','ST_MultiPolygon')

 the planner uses the index.

 Roxanne

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query on tables with new columns added.

2011-09-26 Thread Filip Rembiałkowski
2011/9/26 M. D. li...@turnkey.bz

  I have full access to the database, but no access to the application
 source code.  If creating an index will help, I can do that, but with the
 columns I don't see it helping as I don't have access to the application
 source to change that.

 So yes, by changing settings, I would like to know if there's any memory
 settings I can change to help or create an index.  There is an index on the
 customer_id column in the gltx table, so I'm not sure what else could be
 done.

 If there was a way to create a select trigger, I would do it and return 0
 for both columns on that customer_id as it should always be 0.



Hi

I didn't respond earlier, because I actually don't see any easy way of
speeding up the query.

The memory settings seem fine for this size of data.

It does not look like you can change things by simply adding indexes. I
mean, you can certainly add a specially crafted partial index on
gltx.customer_id WHERE (gltx.inactive_on IS NULL) AND (gltx.posted = 'Y') -
this can earn you a few percent max.

The problem here might be the type of join columns - we can see they are
about 10 characters which is not an ideal choice (that's one of reasons why
I'm a fan of artificial integer pkeys).

You _could_ try running the query with enable_mergejoin = off and see what
happens.

You can check if the problem persists after dumping and reloading to another
db.

If app modification was possible, you could materialize the data _before_ it
must be queried - using summary table and appropriate triggers for keeping
it up-to-date.

Regarding your last comment - on that customer_id values should be 0 - if
it's a persistent business rule, I would try to create a CHECK to reflect
it. With some luck and fiddling, constraint_exclusion might come to help
with speeding up your query.

Also, if there is something special about customer_id distribution - table
partitioning might be an option.

Ok, that's a long list - hope this helps, and good luck.

After all you can throw more hardware at the problem - or hire some Pg
magician :-)


Re: [PERFORM] slow query on tables with new columns added.

2011-09-24 Thread Filip Rembiałkowski
2011/9/23 M. D. li...@turnkey.bz


 I did a software upgrade, and with it came a new feature where when
 selecting a customer it queries for the sum of a few columns.  This takes 7
 seconds for the 'Cash Sale' customer - by far the most active customer. I'd
 like to see if it's possible to get it down a bit by changing settings.


To make things clear before we search for a solution. You wrote by changing
settings. Is it the only option? Can't you change the query in software?
Can't you change database schema (add indexes etc)?


Query:
 explain analyse select sum(item_points),sum(disc_**points) from invoice
 left join gltx on invoice.invoice_id = gltx.gltx_id
 where gltx.inactive_on is null and gltx.posted = 'Y' and gltx.customer_id =
 'A0ZQ2gsACIsEKLI638ikyg'


Aside from other things, you know that LEFT join here is useless? - planner
should collapse it to normal join but I'd check.


Filip


Re: [PERFORM] the XID question

2011-01-19 Thread Filip Rembiałkowski
2011/1/19 Charles.Hou giveme...@gmail.com:
 what's the definetion of XID?

XID == Transaction ID.

  select * from mybook SQL command also increase the XID ?

Yes. Single SELECT is a transaction. Hence, it needs a transaction ID.


greets,
Filip

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Auto-clustering?

2010-12-17 Thread Filip Rembiałkowski
2010/12/17 Royce Ausburn ro...@inomial.com

 Hi all,

 I have a table that in the typical case holds two minute sample data for a
 few thousand sources.  Often we need to report on these data for a
 particular source over a particular time period and we're finding this query
 tends to get a bit slow.

 The structure of the table:

  Table public.sample
   Column   |   Type   |
  Modifiers

 ---+--+-
  client| integer  | not null
  aggregateid   | bigint   | not null
  sample| bigint   | not null default
 nextval('samplekey'::regclass)
  customer  | integer  |
  period| integer  | not null
  starttime | integer  | not null
  duration  | integer  | not null
  ip| text |
  tariff| integer  |
  bytessentrate | bigint   |
  bytessent | bigint   |
  bytesreceived | bigint   |
  packets   | integer  | not null
  queuetype | integer  | not null default 0
  collection| integer  |
  bytesreceivedrate | bigint   |
  greatestrate  | bigint   |
  invalidated   | timestamp with time zone |
 Indexes:
 sample_pkey PRIMARY KEY, btree (sample)
 sample_collection_starttime_idx btree (collection, starttime)
 sample_customer_starttime_idx btree (customer, starttime)
 sample_sample_idx btree (client, sample)
 Foreign-key constraints:
 sample_client_fkey FOREIGN KEY (client) REFERENCES client(client)


 fc=# explain  analyse select collection, period, tariff, sum(bytesSent),
 sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600 as
 startchunk from sample_20101001 where starttime between 1287493200 and
 1290171599  and collection=128and ip = '10.9.125.207' group by
 startchunk, tariff, collection, period;
 QUERY PLAN


 -
  HashAggregate  (cost=34959.01..34959.03 rows=1 width=44) (actual
 time=67047.850..67047.850 rows=0 loops=1)
-  Bitmap Heap Scan on sample_20101001  (cost=130.56..34958.91 rows=5
 width=44) (actual time=67047.847..67047.847 rows=0 loops=1)
  Recheck Cond: ((collection = 128) AND (starttime = 1287493200)
 AND (starttime = 1290171599))
  Filter: (ip = '10.9.125.207'::text)
  -  Bitmap Index Scan on sample_20101001_collection_starttime_idx
  (cost=0.00..130.56 rows=9596 width=0) (actual time=9806.115..9806.115
 rows=6830 loops=1)
Index Cond: ((collection = 128) AND (starttime =
 1287493200) AND (starttime = 1290171599))
  Total runtime: 67048.201 ms
 (7 rows)


how about (auto)vacuuming?



 I figure at most there should only be ~20,000 rows to be read from disk,
 and I expect that the index is doing a pretty good job of making sure only
 the rows that need reading are read. inclusion of the ip in the query is
 almost redundant as most of the time an ip has its own collection  My
 suspicion is that the rows that we're interested in are very sparsely
 distributed on disk, so we're having to read too many pages for the query...



you can test this suspicion in very simple way:
- create test table (like yours including indexes including constraints, but
with no data)
- insert into test select * from yours order by
- analyze test tablee available
- test the query on the new table

If new query is much faster, and if you have intensive random UPD/DEL/INS
activity, periodic CLUSTER could be a good idea...
but it depends on actual usage patterns (SELECT/modify ratio, types of
updates, and so on).




 All of the queries on this table are reporting on a single collection, so
 ideally a collection's data would all be stored in the same part of the
 disk... or at least clumped together.  This can be achieved using cluster,
 however as far as I know there's no automated, non-cronesque means of
 clustering and having the table become unusable during the cluster is not
 ideal.


cron is a way of automation, isn't it :-)





 I've considered partitioning, but I don't think that's going to give the
 effect I need.  Apparently clustering is only going to scale to a few dozen
 child tables, so that's only going to give one order of magnitude
 performance for significant complexity.




regarding partitioning: I guess it starts to make sense around 10M rows or
10G Bytes in one table.

regarding clustering: it does not help with index bloat.

and finally, you did not specify what 

Re: [PERFORM] Auto-clustering?

2010-12-17 Thread Filip Rembiałkowski
you are right, I must have missed it...

 Table public.u
 Column |Type | Modifiers
+-+---
 id | integer |
 t  | timestamp without time zone |
 d  | text|
Indexes:
u_d btree (d)
u_id btree (id)
u_t btree (t)

fi...@filip=# select oid, relname, pg_Relation_size(oid) from pg_class where
relname in('u','u_id','u_t','u_d');
  oid  | relname | pg_relation_size
---+-+--
 64283 | u   | 15187968
 64289 | u_id|  6758400
 64290 | u_t |  6086656
 64291 | u_d | 16482304

fi...@filip=# CLUSTER u USING u_t;
CLUSTER
fi...@filip=# select oid, relname, pg_Relation_size(oid) from pg_class where
relname in('u','u_id','u_t','u_d');
  oid  | relname | pg_relation_size
---+-+--
 64283 | u   | 12115968
 64289 | u_id|  3391488
 64290 | u_t |  3391488
 64291 | u_d |  8216576
(4 rows)


So CLUSTER is effectively CLUSTER + REINDEX... nice.


W dniu 17 grudnia 2010 10:41 użytkownik Marti Raudsepp ma...@juffo.orgnapisał:

 2010/12/17 Filip Rembiałkowski filip.rembialkow...@gmail.com:
  regarding clustering: it does not help with index bloat.

 I'm almost sure it does, CLUSTER re-creates all indexes from scratch
 after copying the tuples.

 Regards,
 Marti