Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread Claudio Freire
On Wed, Jan 24, 2018 at 3:54 AM, pavan95 
wrote:

> Hi Claudio,
>
> We didn't configure any replication to our production server. Which strace
> are you talking about?
>

This one: https://linux.die.net/man/1/strace

You can attach it to a process (assuming you have the necessary
permissions) and it will report all the syscalls the process does. That
does slow down the process though.

Then lsof ( https://linux.die.net/man/8/lsof ) can be used to map file
descriptor numbers to file paths. You have to do it as soon as you read the
output, because files get closed and file descriptors reused. So it's
better to have a script that directly reads from /proc/pid/fd or fdinfo,
but that takes some programming.

It is nontrivial, but sometimes it's the only tool in your belt. You may
want to try something else first though.


> We did a keen observation that only at the time 9'th minute of the hour and
> 39'th minute of the hour the so called archive logs are generated even
> when

nobody is connecting from application(off the business hours).


Well, if you don't know what happens at those times (and only at those
times), it's not that useful.

Since you don't know what is causing this for certain, first thing you have
to do is ascertain that. Try increasing logging as much as you can,
especially around those times, and see what turns on then and not at other
times. You can monitor autovacuum processes as well in pg_stat_activity, so
make sure you check that as well, as autovacuum will only log once it's
done.

You do know autovacuum is running at those times, you have to check whether
it isn't when WAL isn't being generated, and whether autovacuum is
vacuuming the same tables over and over or what. Your earlier mails show
autoanalyze runs, not vacuum. Those shouldn't cause so much WAL, but if
it's running very often and you have lots of stats, then maybe.

You can also try pg_stat_statements:
https://www.postgresql.org/docs/9.1/static/pgstatstatements.html

Again, concentrate on the differential - what happens at those times, that
doesn't at other times.

Another idea would be to check for freeze runs in autovacuum. Ie, what's
described here: https://wiki.postgresql.org/wiki/VacuumHeadaches#FREEZE

There's a nice blog post with some queries to help you with that here:
http://www.databasesoup.com/2012/09/freezing-your-tuples-off-part-1.html
(and it's continuation here:
http://www.databasesoup.com/2012/10/freezing-your-tuples-off-part-2.html ).
I'm not saying you should tune those parameters, what you were showing was
autoanalyze activity, not vacuum freeze, but you should check whether you
need to anyway.


Re: need help on memory allocation

2018-01-23 Thread pavan95
Then we should find like if there are any idle sessions with uncommitted
transactions. Those might be the culprits.

Regards,
Pavan



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: need help on memory allocation

2018-01-23 Thread Michael Paquier
On Tue, Jan 23, 2018 at 10:54:01PM -0700, pavan95 wrote:
> If you are finding some  sessions then of course your database is
> perfectly alright. As  sessions won't consume any memory.

Those have a cost as well when building transaction snapshots. Too much
of them is no good either, let's not forget that.
--
Michael


signature.asc
Description: PGP signature


Re: Performance impact of lowering max_files_per_process

2018-01-23 Thread Thomas Kellerer
Thomas Kellerer schrieb am 19.01.2018 um 17:48:
>
> I wonder what performance implications that has on a server with
> around 50-100 active connections (through pgBouncer).
> 
> My understanding of the documentation is, that Postgres will work
> just fine if we lower the limit, it simply releases the cached file
> handles if the limit is reached. But I have no idea how expensive
> opening a file handle is in Linux.
> 
> So assuming the sessions (and thus the queries) actually do need that
> many file handles, what kind of performance impact (if any) is to be
> expected by lowering that value for Postgres to e.g. 500?

I would be really interested in an answer. 





Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread pavan95
Hi Claudio,

We didn't configure any replication to our production server. Which strace
are you talking about?

We did a keen observation that only at the time 9'th minute of the hour and
39'th minute of the hour the so called archive logs are generated even when
nobody is connecting from application(off the business hours). Minimum of 76
files are being produced in these two intervals of a hour. Tried to monitor
the DML's but those are the same DML's which were in the past. Any idea??

Thanks in advance.



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: need help on memory allocation

2018-01-23 Thread pavan95
Hi Rambabu,

If you are finding some  sessions then of course your database is
perfectly alright. As  sessions won't consume any memory.

Kindly specify the issue briefly.

Regards,
Pavan



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread Claudio Freire
On Tue, Jan 23, 2018 at 1:16 PM, Pavan Teja 
wrote:

> On Jan 23, 2018 9:37 PM, "Claudio Freire"  wrote:
>
>
>
> On Tue, Jan 23, 2018 at 11:39 AM, Pavan Teja 
> wrote:
>
>> Yes so many wals are continuing to be produced.
>>
>> Deleting the wals after a backup of the database.
>>
>> Yes archiving mode is on. And the warning message in log file is
>>
>> " checkpoints are frequently occurring (1second apart). Consider
>> increasing checkpoint_segements parameter".
>>
>> My doubt is previously the same are the parameters which are reflected as
>> of now. Then what is the point in considering altering those values.
>> Correct me if I am wrong.
>>
>
> You can use pg_xlogdump to inspect those logs and see which
> relations/transactions are generating so much WAL.
>
> Then you can hunt within your apps which code is responsible for that
> traffic, or whether it in fact is autovacuum.
>
>
>
> Hi Claudio,
>
> Is pg_xlogdump available for postgres 9.1, as my current production is
> postgres 9.1.
>

Right, it was added in 9.3

I'm unsure whether it can parse pre-9.3 WAL. I know technically speaking,
WAL doesn't have to stay compatible across versions, but it might be for
the limited purposes of xlogdump.

Yes investigated in that area, found DML's and also autovacuum statements
> for some relations. And the DML's are the same before this huge WAL traffic
> and normal WAL traffic.
>
> Anyways, thanks for your timely response 😊
>

While looking at current query activity makes sense, if you can't identify
a culprit doing that, inspecting the WAL directly will let you know with
precision what is causing all that WAL. Hence the suggestion.

If xlogdump doesn't work in 9.1, I'm not sure what you can do.

One idea that pops to mind, though there's probably a better one, you may
want to consider attaching an strace to a recovery process on a replica.
Preferrably one you're not worried about slowing down. Analyzing output
from that is much harder, but it may give you some insight. You'll have to
correlate file handles to file names to relations manually, which can be
quite a chore.


Re: need help on memory allocation

2018-01-23 Thread Jeff Janes
On Tue, Jan 23, 2018 at 5:59 AM, Rambabu V  wrote:

> > cat PostgreSQL-2018-01-23_06.csv|grep FATAL

What about ERROR, not just FATAL?  Or grep for "out of memory"



>> *$ free -mh*
>>  total   used   free sharedbuffers cached
>> Mem:   58G58G   358M16G   3.6M41G
>> -/+ buffers/cache:16G42G
>> Swap: 9.5G   687M   8.9G
>>
>
This does not seem like it should be a problem.  Is this data collected
near the time of the failure?


> work_mem = 256MB # min 64kB
>> max_connections = 600
>>
>
These look pretty high, especially in combination.  Why do you need that
number of connections?  Could you use a connection pooler instead?  Or do
just have an application bug (leaked connection handles) that needs to be
fixed?  Why do you need that amount of work_mem?


> *ps -ef|grep postgres|grep idle|wc -l*
>> 171
>>
>> *ps -ef|grep postgres|wc -l*
>> 206
>>
>
How close to the time of the problem was this recorded?  How many of the
idle are 'idle in transaction'?


>> PID USER  PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
>>  109063 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 39:55.61
>> postgres: test sss 10.20.2.228(55174) idle
>>   24910 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 27:45.35
>> postgres: testl sss 10.20.2.228(55236) idle
>>  115539 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 28:22.89
>> postgres: test sss 10.20.2.228(55184) idle
>>9816 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 40:19.57
>> postgres: test sss   10.20.2.228(55216) idle
>>
>
How close to the time of the problem was this recorded?  Nothing here seems
to be a problem, because almost all the memory they have resident is shared
memory.

It looks like all your clients decide to run a memory hungry query
simultaneously, consume a lot of work_mem, and cause a problem.  Then by
the time you notice the problem and start collecting information, they are
done and things are back to normal.

Cheers,

Jeff


Re: need help on memory allocation

2018-01-23 Thread Laurenz Albe
On Tue, 2018-01-23 at 19:29 +0530, Rambabu V wrote:
> Any Update, this is continuously hitting our production database.
> 
> > OOM error not recording in server level, it is only recording in our 
> > database logs.
> > 
> > below is the error message:
> > 
> > cat PostgreSQL-2018-01-23_06.csv|grep FATAL
> > 2018-01-23 06:08:01.684 
> > UTC,"postgres","rpx",68034,"[local]",5a66d141.109c2,2,"authentication",2018-01-23
> >  06:08:01 UTC,174/89066,0,FATAL,28000,"Peer authentication failed for user 
> > ""postgres""","Connection matched pg_hba.conf line 5: ""local   all 
> > all peer map=supers"""""
> > 2018-01-23 06:25:52.286 
> > UTC,"postgres","rpx",22342,"[local]",5a66d570.5746,2,"authentication",2018-01-23
> >  06:25:52 UTC,173/107122,0,FATAL,28000,"Peer authentication failed for user 
> > ""postgres""","Connection matched pg_hba.conf line 5: ""local   all 
> > all peer map=supers"""""
> > 2018-01-23 06:37:10.916 
> > UTC,"portal_etl_app","rpx",31226,"10.50.13.151:41052",5a66d816.79fa,1,"authentication",2018-01-23
> >  06:37:10 UTC,,0,FATAL,53200,"out of memory","Failed on request of size 
> > 78336."""
> > 
> > $ free -mh
> >  total   used   free sharedbuffers cached
> > Mem:   58G58G   358M16G   3.6M41G
> > -/+ buffers/cache:16G42G
> > Swap: 9.5G   687M   8.9G
> > 
> > postgresql.conf parametes:
> > =
> > work_mem = 256MB# min 64kB
> > maintenance_work_mem = 256MB# min 1MB
> > shared_buffers = 16GB   # min 128kB
> > temp_buffers = 16MB # min 800kB
> > wal_buffers = 64MB
> > effective_cache_size = 64GB
> > max_connections = 600

It would be interesting to know the output from

   sysctl vm.overcommit_memory
   sysctl vm.overcommit_ratio

Also interesting:

   sar -r 1 1

I think that max_connections = 600 is way to high.

Are you running large, complicated queries on that machine? That could
be a problem with such a high connection limit.

Is the machine dedicated to PostgreSQL?

Yours,
Laurenz Albe



Re: Bad plan

2018-01-23 Thread Matthew Bellew
In my opinion this is the Achilles heel of the postgres optimizer.  Row
estimates should never return 1, unless the estimate is provably <=1.  This
is particularly a problem with join estimates.  A dumb fix for this is to
change clamp_join_row_est() to never return a value <2.  This fixes most of
my observed poor plans.  The real fix is to track uniqueness (or provable
max rows) along with the selectivity estimate.

Here's the dumb fix.

https://github.com/labkey-matthewb/postgres/commit/b1fd99f4deffbbf3db2172ccaba51a34f18d1b1a



On Tue, Jan 23, 2018 at 7:59 AM, Laurent Martelli  wrote:

> I've have a look to the plan with pgadmin, and I think the problem is
> rather here :
>
> ->  Sort  (cost=4997.11..4997.11 rows=1 width=69) (actual
> time=27.427..28.896 rows=7359 loops=1)
>   Sort Key: amendment.id
>   Sort Method: quicksort  Memory: 1227kB
>   ->  Nested Loop  (cost=183.44..4997.10 rows=1 width=69) (actual
> time=1.115..24.616 rows=7359 loops=1)
> ->  Nested Loop  (cost=183.15..4996.59 rows=1 width=49)
> (actual time=1.107..9.091 rows=7360 loops=1)
>   ->  Index Scan using uk_3b1y5vw9gmh7u3jj8aa2uy0b9 on
> contact_partner businessprovider  (cost=0.42..8.44 rows=1 width=13)
> (actual time=0.010..0.010 rows=1 loops=1)
> Index Cond: ((business_provider_code)::text =
> 'BRZH'::text)
>   ->  Bitmap Heap Scan on contract_contract_line
> contractline  (cost=182.73..4907.58 rows=8057 width=52) (actual
> time=1.086..5.231 rows=7360 loops=1)
> Recheck Cond: (business_provider_partner =
> businessprovider.id)
> Heap Blocks: exact=3586
> ->  Bitmap Index Scan on
> contract_contract_line_business_provider_partner_idx
> (cost=0.00..180.72 rows=8057 width=0) (actual time=0.655..0.655
> rows=7360 loops=1)
>   Index Cond: (business_provider_partner =
> businessprovider.id)
> ->  Index Scan using contract_amendment_pkey on
> contract_amendment amendment  (cost=0.29..0.50 rows=1 width=28)
> (actual time=0.001..0.002 rows=1 loops=7360)
>   Index Cond: (id = contractline.amendment)
>
> The bitmap scan on contract_contract_line is good (8057 vs 7360 rows),
> and so is the index scan (1 row), but the JOIN with "contact_partner
> businessProvider" should give the 8057 rows from the bitmap scan,
> shouldn't it ?
>
>
> 2018-01-23 16:38 GMT+01:00 Laurent Martelli :
> > 2018-01-23 16:18 GMT+01:00 Justin Pryzby :
> >> On Tue, Jan 23, 2018 at 01:03:49PM +0100, Laurent Martelli wrote:
> >>
> >>> Here is the default plan :
> >>
> >> Can you resend without line breaks or paste a link to explain.depesz?
> >
> > I hope it's better like that. I've attached it too, just in case.
> >
> >>
> >> The problem appears to be here:
> >>
> >> ->  Nested Loop Left Join  (cost=32067.09..39197.85 rows=1 width=276)
> (actual time=342.725..340775.031 rows=7359 loops=1)
> >> Join Filter: (sh.share_holder_partner = partner.id)
> >> Rows Removed by Join Filter: 204915707
> >>
> >> Justin
> >
> >
> >
> > QUERY PLAN
> > 
> 
> 
> 
> 
> --
> >  Sort  (cost=39200.76..39200.76 rows=1 width=1066) (actual
> > time=341273.300..341274.244 rows=7359 loops=1)
> >Sort Key: ((array_agg(subscribed_power.subscribed_power))[1]) DESC,
> > status.name, contractline.id
> >Sort Method: quicksort  Memory: 3930kB
> >->  Nested Loop Left Join  (cost=32069.19..39200.75 rows=1
> > width=1066) (actual time=342.806..341203.151 rows=7359 loops=1)
> >  ->  Nested Loop Left Join  (cost=32069.05..39200.50 rows=1
> > width=508) (actual time=342.784..341102.848 rows=7359 loops=1)
> >->  Nested Loop Left Join  (cost=32068.77..39200.20
> > rows=1 width=500) (actual time=342.778..341070.310 rows=7359 loops=1)
> >  ->  Nested Loop Left Join
> > (cost=32068.64..39200.04 rows=1 width=507) (actual
> > time=342.776..341058.256 rows=7359 loops=1)
> >Join Filter: (cca.address = adr_contact.id)
> >Rows Removed by Join Filter: 2254
> >->  Nested Loop Left Join
> > (cost=32068.22..39199.55 rows=1 width=515) (actual
> > time=342.767..340997.058 rows=7359 loops=1)
> >  ->  Nested Loop Left Join
> > (cost=32067.79..39198.84 rows=1 width=447) (actual
> > time=342.753..340932.286 rows=7359 loops=1)
> >->  Nested Loop Left Join
> > (cost=32067.65..39198.67 rows=1 width=421) (actual
> > time=342.748..34

Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread Pavan Teja
On Jan 23, 2018 9:37 PM, "Claudio Freire"  wrote:



On Tue, Jan 23, 2018 at 11:39 AM, Pavan Teja 
wrote:

> Yes so many wals are continuing to be produced.
>
> Deleting the wals after a backup of the database.
>
> Yes archiving mode is on. And the warning message in log file is
>
> " checkpoints are frequently occurring (1second apart). Consider
> increasing checkpoint_segements parameter".
>
> My doubt is previously the same are the parameters which are reflected as
> of now. Then what is the point in considering altering those values.
> Correct me if I am wrong.
>

You can use pg_xlogdump to inspect those logs and see which
relations/transactions are generating so much WAL.

Then you can hunt within your apps which code is responsible for that
traffic, or whether it in fact is autovacuum.



Hi Claudio,

Is pg_xlogdump available for postgres 9.1, as my current production is
postgres 9.1.

Yes investigated in that area, found DML's and also autovacuum statements
for some relations. And the DML's are the same before this huge WAL traffic
and normal WAL traffic.

Anyways, thanks for your timely response 😊

Regards,
Pavan


Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread Claudio Freire
On Tue, Jan 23, 2018 at 11:39 AM, Pavan Teja 
wrote:

> Yes so many wals are continuing to be produced.
>
> Deleting the wals after a backup of the database.
>
> Yes archiving mode is on. And the warning message in log file is
>
> " checkpoints are frequently occurring (1second apart). Consider
> increasing checkpoint_segements parameter".
>
> My doubt is previously the same are the parameters which are reflected as
> of now. Then what is the point in considering altering those values.
> Correct me if I am wrong.
>

You can use pg_xlogdump to inspect those logs and see which
relations/transactions are generating so much WAL.

Then you can hunt within your apps which code is responsible for that
traffic, or whether it in fact is autovacuum.


RE: Inefficient full seq scan on pg_largeobject instead of index scan

2018-01-23 Thread Jean-Marc Lessard
Hi,

I have to provide a summary of how much spaces is used in the large objects 
table based on a group by condition.
I would expect an index only scan on the large object table, but a full seq 
scan that last for hours is performed.

BigSql distribution
PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, Built by 
MSYS2 project) 4.9.2, 64-bit
Win Server 2012 R2, 8GB RAM
pg server mem settings:
effective_cache_size | 6GB
maintenance_work_mem | 819MB
random_page_cost | 2
shared_buffers   | 2GB
work_mem | 32MB

Testcase 1: Here is a simplified query, timing and the explain plan:
SELECT ima.sit_cod, COUNT(*)*2048*4/3
  FROM images ima JOIN pg_largeobject ON (loid=ima.val)
GROUP BY ima.sit_cod;
Time: 343997.661 ms (about 6 min) ran on a small DB, took 4hrs on a ~1TB table

HashAggregate  (cost=2452378.86..2452379.01 rows=15 width=14)
  Group Key: ima.sit_cod
  ->  Hash Join  (cost=1460.40..2418245.74 rows=6826625 width=6)
Hash Cond: (pg_largeobject.loid = ima.val)
->  Seq Scan on pg_largeobject  (cost=0.00..2322919.25 rows=6826625 
width=4)
->  Hash  (cost=1114.62..1114.62 rows=27662 width=10)
  ->  Seq Scan on images ima  (cost=0.00..1114.62 rows=27662 
width=10)


Testcase 2: A simple count(*) for a specific group (small group) perform an 
Index Only Scan and last few secs.
SELECT COUNT(*)
  FROM images ima JOIN pg_largeobject ON (loid=ima.val)
WHERE sit_cod='W8213';
count
---
  8599
Time: 12.090 ms

Aggregate  (cost=11930.30..11930.31 rows=1 width=8)
  ->  Nested Loop  (cost=2.87..11918.58 rows=4689 width=0)
->  Bitmap Heap Scan on images ima  (cost=2.43..37.81 rows=19 width=4)
  Recheck Cond: ((sit_cod)::text = 'W8213'::text)
  ->  Bitmap Index Scan on ima_pk  (cost=0.00..2.43 rows=19 width=0)
Index Cond: ((sit_cod)::text = 'W8213'::text)
->  Index Only Scan using pg_largeobject_loid_pn_index on 
pg_largeobject  (cost=0.43..621.22 rows=408 width=4)
  Index Cond: (loid = ima.val)


Testcase 3: However, larger group still perform full seq scan
SELECT COUNT(*)
  FROM images ima JOIN pg_largeobject ON (loid=ima.val)
WHERE sit_cod='W8317';
  count
-
2209704
Time: 345638.118 ms (about 6 min)

Aggregate  (cost=2369363.01..2369363.02 rows=1 width=8)
  ->  Hash Join  (cost=1125.63..2365419.35 rows=1577463 width=0)
Hash Cond: (pg_largeobject.loid = ima.val)
->  Seq Scan on pg_largeobject  (cost=0.00..2322919.25 rows=6826625 
width=4)
->  Hash  (cost=1045.73..1045.73 rows=6392 width=4)
  ->  Bitmap Heap Scan on images ima  (cost=127.83..1045.73 
rows=6392 width=4)
Recheck Cond: ((sit_cod)::text = 'W8317'::text)
->  Bitmap Index Scan on ima_pk  (cost=0.00..126.23 
rows=6392 width=0)
  Index Cond: ((sit_cod)::text = 'W8317'::text)

Pretty sure that using the index would lead to much better perf.
Any idea of what can be done?

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com

Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com


Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread pavan95
Hello,

Is there any way to check, how many transactions happened till date from the
point the database created and started accepting transactions ?

The reason for this doubt is to find whether my database has crossed 2
million transactions or not. 

Strangely had an interesting observation, when I tried to a vacuum full, it
is generating 1GB of archive logs per sec, and yes it's true.


So I had a doubt like whether this is related to vacuum

Please help me cope up with this.

Regards,
Pavan



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: Bad plan

2018-01-23 Thread Laurent Martelli
I've have a look to the plan with pgadmin, and I think the problem is
rather here :

->  Sort  (cost=4997.11..4997.11 rows=1 width=69) (actual
time=27.427..28.896 rows=7359 loops=1)
  Sort Key: amendment.id
  Sort Method: quicksort  Memory: 1227kB
  ->  Nested Loop  (cost=183.44..4997.10 rows=1 width=69) (actual
time=1.115..24.616 rows=7359 loops=1)
->  Nested Loop  (cost=183.15..4996.59 rows=1 width=49)
(actual time=1.107..9.091 rows=7360 loops=1)
  ->  Index Scan using uk_3b1y5vw9gmh7u3jj8aa2uy0b9 on
contact_partner businessprovider  (cost=0.42..8.44 rows=1 width=13)
(actual time=0.010..0.010 rows=1 loops=1)
Index Cond: ((business_provider_code)::text =
'BRZH'::text)
  ->  Bitmap Heap Scan on contract_contract_line
contractline  (cost=182.73..4907.58 rows=8057 width=52) (actual
time=1.086..5.231 rows=7360 loops=1)
Recheck Cond: (business_provider_partner =
businessprovider.id)
Heap Blocks: exact=3586
->  Bitmap Index Scan on
contract_contract_line_business_provider_partner_idx
(cost=0.00..180.72 rows=8057 width=0) (actual time=0.655..0.655
rows=7360 loops=1)
  Index Cond: (business_provider_partner =
businessprovider.id)
->  Index Scan using contract_amendment_pkey on
contract_amendment amendment  (cost=0.29..0.50 rows=1 width=28)
(actual time=0.001..0.002 rows=1 loops=7360)
  Index Cond: (id = contractline.amendment)

The bitmap scan on contract_contract_line is good (8057 vs 7360 rows),
and so is the index scan (1 row), but the JOIN with "contact_partner
businessProvider" should give the 8057 rows from the bitmap scan,
shouldn't it ?


2018-01-23 16:38 GMT+01:00 Laurent Martelli :
> 2018-01-23 16:18 GMT+01:00 Justin Pryzby :
>> On Tue, Jan 23, 2018 at 01:03:49PM +0100, Laurent Martelli wrote:
>>
>>> Here is the default plan :
>>
>> Can you resend without line breaks or paste a link to explain.depesz?
>
> I hope it's better like that. I've attached it too, just in case.
>
>>
>> The problem appears to be here:
>>
>> ->  Nested Loop Left Join  (cost=32067.09..39197.85 rows=1 width=276) 
>> (actual time=342.725..340775.031 rows=7359 loops=1)
>> Join Filter: (sh.share_holder_partner = partner.id)
>> Rows Removed by Join Filter: 204915707
>>
>> Justin
>
>
>
> QUERY PLAN
> --
>  Sort  (cost=39200.76..39200.76 rows=1 width=1066) (actual
> time=341273.300..341274.244 rows=7359 loops=1)
>Sort Key: ((array_agg(subscribed_power.subscribed_power))[1]) DESC,
> status.name, contractline.id
>Sort Method: quicksort  Memory: 3930kB
>->  Nested Loop Left Join  (cost=32069.19..39200.75 rows=1
> width=1066) (actual time=342.806..341203.151 rows=7359 loops=1)
>  ->  Nested Loop Left Join  (cost=32069.05..39200.50 rows=1
> width=508) (actual time=342.784..341102.848 rows=7359 loops=1)
>->  Nested Loop Left Join  (cost=32068.77..39200.20
> rows=1 width=500) (actual time=342.778..341070.310 rows=7359 loops=1)
>  ->  Nested Loop Left Join
> (cost=32068.64..39200.04 rows=1 width=507) (actual
> time=342.776..341058.256 rows=7359 loops=1)
>Join Filter: (cca.address = adr_contact.id)
>Rows Removed by Join Filter: 2254
>->  Nested Loop Left Join
> (cost=32068.22..39199.55 rows=1 width=515) (actual
> time=342.767..340997.058 rows=7359 loops=1)
>  ->  Nested Loop Left Join
> (cost=32067.79..39198.84 rows=1 width=447) (actual
> time=342.753..340932.286 rows=7359 loops=1)
>->  Nested Loop Left Join
> (cost=32067.65..39198.67 rows=1 width=421) (actual
> time=342.748..340896.132 rows=7359 loops=1)
>  ->  Nested Loop Left Join
>  (cost=32067.23..39198.01 rows=1 width=279) (actual
> time=342.739..340821.987 rows=7359 loops=1)
>->  Nested Loop
> Left Join  (cost=32067.09..39197.85 rows=1 width=276) (actual
> time=342.725..340775.031 rows=7359 loops=1)
>  Join Filter:
> (sh.share_holder_partner = partner.id)
>  Rows Removed
> by Join Filter: 204915707
>  ->  Nested
> Loop Left Join  (cost=28514.61..34092.46 rows=1 width=244) (actual
> time=287.323..610.192 rows=7359 loops=1)
>   

Re: Bad plan

2018-01-23 Thread Laurent Martelli
2018-01-23 16:18 GMT+01:00 Justin Pryzby :
> On Tue, Jan 23, 2018 at 01:03:49PM +0100, Laurent Martelli wrote:
>
>> Here is the default plan :
>
> Can you resend without line breaks or paste a link to explain.depesz?

I hope it's better like that. I've attached it too, just in case.

>
> The problem appears to be here:
>
> ->  Nested Loop Left Join  (cost=32067.09..39197.85 rows=1 width=276) (actual 
> time=342.725..340775.031 rows=7359 loops=1)
> Join Filter: (sh.share_holder_partner = partner.id)
> Rows Removed by Join Filter: 204915707
>
> Justin



QUERY PLAN
--
 Sort  (cost=39200.76..39200.76 rows=1 width=1066) (actual
time=341273.300..341274.244 rows=7359 loops=1)
   Sort Key: ((array_agg(subscribed_power.subscribed_power))[1]) DESC,
status.name, contractline.id
   Sort Method: quicksort  Memory: 3930kB
   ->  Nested Loop Left Join  (cost=32069.19..39200.75 rows=1
width=1066) (actual time=342.806..341203.151 rows=7359 loops=1)
 ->  Nested Loop Left Join  (cost=32069.05..39200.50 rows=1
width=508) (actual time=342.784..341102.848 rows=7359 loops=1)
   ->  Nested Loop Left Join  (cost=32068.77..39200.20
rows=1 width=500) (actual time=342.778..341070.310 rows=7359 loops=1)
 ->  Nested Loop Left Join
(cost=32068.64..39200.04 rows=1 width=507) (actual
time=342.776..341058.256 rows=7359 loops=1)
   Join Filter: (cca.address = adr_contact.id)
   Rows Removed by Join Filter: 2254
   ->  Nested Loop Left Join
(cost=32068.22..39199.55 rows=1 width=515) (actual
time=342.767..340997.058 rows=7359 loops=1)
 ->  Nested Loop Left Join
(cost=32067.79..39198.84 rows=1 width=447) (actual
time=342.753..340932.286 rows=7359 loops=1)
   ->  Nested Loop Left Join
(cost=32067.65..39198.67 rows=1 width=421) (actual
time=342.748..340896.132 rows=7359 loops=1)
 ->  Nested Loop Left Join
 (cost=32067.23..39198.01 rows=1 width=279) (actual
time=342.739..340821.987 rows=7359 loops=1)
   ->  Nested Loop
Left Join  (cost=32067.09..39197.85 rows=1 width=276) (actual
time=342.725..340775.031 rows=7359 loops=1)
 Join Filter:
(sh.share_holder_partner = partner.id)
 Rows Removed
by Join Filter: 204915707
 ->  Nested
Loop Left Join  (cost=28514.61..34092.46 rows=1 width=244) (actual
time=287.323..610.192 rows=7359 loops=1)
   ->
Nested Loop Left Join  (cost=28514.47..34092.30 rows=1 width=239)
(actual time=287.318..573.234 rows=7359 loops=1)

->  Hash Right Join  (cost=28513.48..34090.65 rows=1 width=159)
(actual time=287.293..379.564 rows=7359 loops=1)

Hash Cond: (ws.contract_line = contractline.id)

->  Seq Scan on shareholder_web_subscription ws
(cost=0.00..5378.84 rows=52884 width=24) (actual time=0.006..12.307
rows=52884 loops=1)

->  Hash  (cost=28513.47..28513.47 rows=1 width=143) (actual
time=287.243..287.243 rows=7359 loops=1)

  Buckets: 8192 (originally 1024)  Batches: 1 (originally 1)
Memory Usage: 1173kB

  ->  Nested Loop Left Join  (cost=17456.16..28513.47 rows=1
width=143) (actual time=85.005..284.689 rows=7359 loops=1)

->  Nested Loop  (cost=17456.03..28513.31 rows=1
width=148) (actual time=85.000..276.599 rows=7359 loops=1)

  ->  Nested Loop Left Join
(cost=17455.73..28512.84 rows=1 width=148) (actual
time=84.993..261.954 rows=7359 loops=1)

->  Nested Loop  (cost=17455.60..28512.67
rows=1 width=140) (actual time=84.989..253.715 rows=7359 loops=1)

  ->  Nested Loop
(cost=17455.18..28511.93 rows=1 width=93) (actual time=84.981..230.977
rows=7359 loops=1)

->  Merge Right Join
(cost=17454.89..28511.52 rows=1 width=93) (actual time=84.974..211.200
rows=7359 loops=1)

  Merge Cond:
(subscribed_power.amendment = amendment.id)

  ->  GroupAggregate
(cost=12457.78..22574.03 rows=75229 width=168) (actual
time=57.500..175.674 rows=83432 loops=1)

Group Key:
subscribed_power.amendment

->  Mer

Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread Andreas Kretschmer



Am 23.01.2018 um 16:20 schrieb Pavan Teja:

Hi David,

If it's yes what needs to be done in order to stabilize this issue??



Don't top-post ;-)


You can't prevent the generation of wal's (apart from using unlogged 
tables, but i'm sure, that will be not your solution.)


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread Pavan Teja
Hi David,

If it's yes what needs to be done in order to stabilize this issue??

Thanks in advance.

Regards,
Pavan

On Jan 23, 2018 8:15 PM, "David G. Johnston" 
wrote:

> On Tue, Jan 23, 2018 at 7:39 AM, Pavan Teja 
> wrote:
>
>> " checkpoints are frequently occurring (1second apart). Consider
>> increasing checkpoint_segements parameter".
>>
>
> The custom on these lists is to bottom or inline post.​
>
> ​This tends to appear when someone decide to write a load script of the
> form:
>
> INSERT INTO tbl (cols) VALUES (...);
> INSERT INTO ​tbl (cols) VALUES (...);
> [repeat many, many, times]
>
> (note the lack of BEGIN/END, single transaction help mitigate it somewhat)
>
> David J.
>
>


Re: Bad plan

2018-01-23 Thread Justin Pryzby
On Tue, Jan 23, 2018 at 01:03:49PM +0100, Laurent Martelli wrote:
> Hello all,
> 
> So I have a view, for which I can select all rows in about 3s (returns ~80k
> rows), but if I add a where clause on a column, it takes +300s to return
> the ~8k lines.
> 
> From the plan, I see that it expects to return only 1 row and so choose to
> perform some nested loops. Of course, I did run "ANALYZE", but with no
> success.
> 
> I managed to speed things up with "set enable_nestloop = false;", but is
> that the only choice I have ? Should I report a bug ?


> Here is the default plan :

Can you resend without line breaks or paste a link to explain.depesz?

The problem appears to be here:

->  Nested Loop Left Join  (cost=32067.09..39197.85 rows=1 width=276) (actual 
time=342.725..340775.031 rows=7359 loops=1)
Join Filter: (sh.share_holder_partner = partner.id)
Rows Removed by Join Filter: 204915707

Justin



Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread Andreas Kretschmer

Please don't top-posting


Am 23.01.2018 um 15:39 schrieb Pavan Teja:

Yes so many wals are continuing to be produced.


you have to identify why. Please check pg_stat_activity for
* autovacuum
* large inserts
* large updates
* large deletes

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread David G. Johnston
On Tue, Jan 23, 2018 at 7:39 AM, Pavan Teja 
wrote:

> " checkpoints are frequently occurring (1second apart). Consider
> increasing checkpoint_segements parameter".
>

The custom on these lists is to bottom or inline post.​

​This tends to appear when someone decide to write a load script of the
form:

INSERT INTO tbl (cols) VALUES (...);
INSERT INTO ​tbl (cols) VALUES (...);
[repeat many, many, times]

(note the lack of BEGIN/END, single transaction help mitigate it somewhat)

David J.


Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread Pavan Teja
Yes so many wals are continuing to be produced.

Deleting the wals after a backup of the database.

Yes archiving mode is on. And the warning message in log file is

" checkpoints are frequently occurring (1second apart). Consider increasing
checkpoint_segements parameter".

My doubt is previously the same are the parameters which are reflected as
of now. Then what is the point in considering altering those values.
Correct me if I am wrong.

Regards,
Pavan

On Jan 23, 2018 7:47 PM, "Andreas Kretschmer" 
wrote:



Am 23.01.2018 um 12:51 schrieb pavan95:

> Hi Andreas,
>
> Yes I'm facing problem because of this huge WAL(archive log) generation. As
> it is seriously consuming a lot of disk space almost close to 50GB per day
> even if the DML's don't have that impact in this WAL generation.
>
> Previously the archive_log size is nearly 2 to 3 GB a day. Now with the
> same
> set of DML's how is it being generated to 50GB is my burning doubt.
>

Will so many wals continue to be produced?




> I just wanted to know how to stabilize this issue, as checking and deleting
> the archive logs on hourly basis is not a good idea.
>
Don't delete wal's!



Finally, I'm looking how to reduce this back to normal. Thanks in Advance.
>

have you set archive_mode to on and defined an archive_command? Wal-files
will be reused after 2 checkpoints.
Is there something in the logs?


Regards, Andreas
-- 
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com


Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread Andreas Kretschmer



Am 23.01.2018 um 12:51 schrieb pavan95:

Hi Andreas,

Yes I'm facing problem because of this huge WAL(archive log) generation. As
it is seriously consuming a lot of disk space almost close to 50GB per day
even if the DML's don't have that impact in this WAL generation.

Previously the archive_log size is nearly 2 to 3 GB a day. Now with the same
set of DML's how is it being generated to 50GB is my burning doubt.


Will so many wals continue to be produced?




I just wanted to know how to stabilize this issue, as checking and deleting
the archive logs on hourly basis is not a good idea.

Don't delete wal's!



Finally, I'm looking how to reduce this back to normal. Thanks in Advance.


have you set archive_mode to on and defined an archive_command? 
Wal-files will be reused after 2 checkpoints.

Is there something in the logs?


Regards, Andreas
--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



Re: need help on memory allocation

2018-01-23 Thread Andreas Kretschmer



Am 23.01.2018 um 14:59 schrieb Rambabu V:

             total       used       free     shared buffers     cached
Mem:           58G        58G       358M        16G  3.6M        41G
-/+ buffers/cache:        16G        42G
Swap:         9.5G       687M       8.9G

*postgresql.conf parametes:*
*=*
work_mem = 256MB# min 64kB
maintenance_work_mem = 256MB# min 1MB
shared_buffers = 16GB# min 128kB
temp_buffers = 16MB# min 800kB
wal_buffers = 64MB
effective_cache_size = 64GB
max_connections = 600



how many active concurrent connections do you have? With work_mem = 
256MB and 600 active connections and only 1 allocation of work_mem per 
connection you will need more than 150GB of RAM.



With other words: you should lowering work_mem and/or max_connections.


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Re: need help on memory allocation

2018-01-23 Thread Rambabu V
Hi Laurenz,

Any Update, this is continuously hitting our production database.

Regards,
Rambabu Vakada,
PostgreSQL DBA.


On Tue, Jan 23, 2018 at 6:12 PM, Rambabu V  wrote:

> Hi Laurenz,
>
> OOM error not recording in server level, it is only recording in our
> database logs.
>
> below is the error message:
>
> *cat PostgreSQL-2018-01-23_06.csv|grep FATAL*
> 2018-01-23 06:08:01.684 UTC,"postgres","rpx",68034,"[
> local]",5a66d141.109c2,2,"authentication",2018-01-23 06:08:01
> UTC,174/89066,0,FATAL,28000,"Peer authentication failed for user
> ""postgres""","Connection matched pg_hba.conf line 5: ""local all all peer
> map=supers"""""
> 2018-01-23 06:25:52.286 UTC,"postgres","rpx",22342,"[
> local]",5a66d570.5746,2,"authentication",2018-01-23 06:25:52
> UTC,173/107122,0,FATAL,28000,"Peer authentication failed for user
> ""postgres""","Connection matched pg_hba.conf line 5: ""local all all peer
> map=supers"""""
> 2018-01-23 06:37:10.916 UTC,"portal_etl_app","rpx",31226,"
> 10.50.13.151:41052",5a66d816.79fa,1,"authentication",2018-01-23 06:37:10
> UTC,,0,FATAL,53200,"out of memory","Failed on request of size
> 78336."""
>
> *below log from /var/log messages:*
>
> root@prp:~# cat /var/log/syslog*|grep 'out of memory'
> root@prp:~# cat /var/log/syslog*|grep error
> root@prp:~# cat /var/log/syslog*|grep warning
> root@prp:~#
>
> *$ free -mh*
>  total   used   free sharedbuffers cached
> Mem:   58G58G   358M16G   3.6M41G
> -/+ buffers/cache:16G42G
> Swap: 9.5G   687M   8.9G
>
> *postgresql.conf parametes:*
> *=*
> work_mem = 256MB # min 64kB
> maintenance_work_mem = 256MB # min 1MB
> shared_buffers = 16GB # min 128kB
> temp_buffers = 16MB # min 800kB
> wal_buffers = 64MB
> effective_cache_size = 64GB
> max_connections = 600
>
> *cat /etc/sysctl.conf|grep kernel*
> #kernel.domainname = example.com
> #kernel.printk = 3 4 1 3
> kernel.shmmax = 38654705664
> kernel.shmall = 8388608
>
> *ps -ef|grep postgres|grep idle|wc -l*
> 171
>
> *ps -ef|grep postgres|wc -l*
> 206
>
> *ps -ef|wc -l*
> 589
>
> *Databse Size: 1.5 TB*
>
> *below is the htop output:*
> *---*
>   
> Mem[|||17045/60382MB]
>Tasks: 250, 7 thr; 8 running
>   Swp[||
>   686/9765MB] Load average: 8.63 9.34 8.62
>
>   Uptime: 52 days, 07:07:07
>
> PID USER  PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
>  109063 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 39:55.61 postgres:
> test sss 10.20.2.228(55174) idle
>   24910 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 27:45.35 postgres:
> testl sss 10.20.2.228(55236) idle
>  115539 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 28:22.89 postgres:
> test sss 10.20.2.228(55184) idle
>9816 postgres   20   0 16.7G 16.4G 16.3G S  0.0 27.8 40:19.57 postgres:
> test sss   10.20.2.228(55216) idle
>
>
>
> Please help us on this, how can we over come this OOM issue.
>
>
>
> Regards,
>
> Rambabu Vakada,
> PostgreSQL DBA,
> +91 9849137684.
>
>
>
> On Fri, Jan 19, 2018 at 3:37 PM, Laurenz Albe 
> wrote:
>
>> Rambabu V wrote:
>> > we are seeing idle sessions consuming memory in our database, could you
>> please help me
>> > how much memory an idle session can use max and how can we find how
>> much work_mem
>> > consuming for single process.
>> >
>> > we are getting out of memory error,for this i'm asking above questions.
>>
>> Are you sure that you see the private memory of the process and not the
>> shared memory common to all processes?
>>
>> An "idle" connection should not hav a lot of private memory.
>>
>> If you get OOM on the server, the log entry with the memory context dump
>> might be useful information.
>>
>> Yours,
>> Laurenz Albe
>>
>>
>


Bad plan

2018-01-23 Thread Laurent Martelli
Hello all,

So I have a view, for which I can select all rows in about 3s (returns ~80k
rows), but if I add a where clause on a column, it takes +300s to return
the ~8k lines.

>From the plan, I see that it expects to return only 1 row and so choose to
perform some nested loops. Of course, I did run "ANALYZE", but with no
success.

I managed to speed things up with "set enable_nestloop = false;", but is
that the only choice I have ? Should I report a bug ?

The view is this :

CREATE VIEW export_contract_par_region AS
SELECT
contractLine.id as id_contrat,
partner.id as id_partner,
partner.name,
title.name AS contact_civ,
mc.name AS contact_nom,
mc.first_name AS contact_prenom,
(CASE WHEN is_physique(partner.person_category_select) THEN
coalesce(mc.email,mc.email_pro) ELSE coalesce(mc.email_pro,mc.email) END)
AS contact_email,
(CASE WHEN is_physique(partner.person_category_select)
   THEN concat_ws('/',mc.fixed_phone1,mc.mobile_phone_perso)
   ELSE concat_ws('/',mc.fixed_phone_pro,mc.mobile_phone_pro)
END) AS contact_phones,
adr_contact.addressl2 AS contact_addressl2,
adr_contact.addressl3 AS contact_addressl3,
adr_contact.addressl4num AS contact_addressl4num,
adr_contact.addressl4street AS contact_addressl4street,
adr_contact.addressl5 AS contact_addressl5,
adr_contact.addressl6zip AS contact_addressl6zip,
adr_contact.addressl6city AS contact_addressl6city,
coalesce(npai.moved_ok,false) AS npai,
coalesce(mc.address,mc.address_pro) IS NULL AS sans_adresse,
amendment.user_sub_segment_select as type_select,
UserSegment.code as user_segment,
contractLine.real_start_date AS date_mise_en_service,
to_char(contractLine.real_start_date,'/MM') AS datemes_mm,
(ws.created_on::date) AS date_souscription,
status.name AS statut,
power.first AS subscribed_power,
a.addressl2 AS pdl_addressl2,
a.addressl3 AS pdl_addressl3,
a.addressl4num AS pdl_addressl4num,
a.addressl4street AS pdl_addressl4street,
a.addressl5 AS pdl_addressl5,
a.addressl6zip AS pdl_adressel6zip,
a.addressl6city AS pdl_adressel6city,
a.dept AS pdl_code_dept,
a.dept_name AS pdl_nom_dept,
a.region_code AS pdl_code_region,
a.region AS pdl_nom_region,
businessProvider.business_provider_code AS codeCoop,
soc.soc AS company_societaire,
co.code AS connu_enercoop,
ClientNature.name as segment_client,
to_char(ws.created_on,'') as annee_souscription,
to_char(ws.created_on,'MM') as mois_souscription,
mesProductSubFamily.name as type_mes
FROM contract_contract_line contractLine
JOIN contract_contract contract on contractLine.contract = contract.id
JOIN contact_partner partner on partner.id =
contract.main_client_partner
JOIN contact_partner businessProvider on businessProvider.id =
contractLine.business_provider_partner
LEFT JOIN contact_client_nature ClientNature on ClientNature.id =
partner.client_nature
JOIN contract_amendment amendment on contractLine.amendment =
amendment.id
JOIN territory_mpt mpt on contractLine.mpt = mpt.id
LEFT JOIN subscribed_power power ON power.amendment = amendment.id
LEFT JOIN contract_user_segment UserSegment ON UserSegment.id =
amendment.user_segment
LEFT JOIN contact_company company on company.id = contract.company
LEFT JOIN address a on mpt.address = a.id
LEFT JOIN administration_status status ON status.id =
contractLine.status
LEFT JOIN shareholder_summary soc ON soc.partner = partner.id
LEFT JOIN shareholder_web_subscription ws ON ws.contract_line =
contractLine.id
LEFT JOIN crm_origin co ON co.id = ws.how_meet_enercoop
LEFT JOIN contact_contact mc ON partner.main_contact = mc.id
LEFT JOIN contact_title title ON mc.title = title.id
LEFT JOIN contact_address adr_contact ON adr_contact.id = (CASE WHEN
is_physique(partner.person_category_select) THEN
coalesce(mc.address,mc.address_pro) ELSE
coalesce(mc.address_pro,mc.address) END)
LEFT JOIN contact_contact_address cca ON cca.contact = mc.id AND
cca.address = adr_contact.id
LEFT JOIN contact_contact_address_status npai ON
cca.contact_address_status = npai.id
LEFT JOIN crm_crm_request mesRequest ON
mesRequest.original_contract_line = contractLine.id
LEFT JOIN sale_product_sub_family mesProductSubFamily ON
mesProductSubFamily.id = mesRequest.product_sub_family AND
mesProductSubFamily.new_contract_ok is true
ORDER BY subscribed_power DESC, statut,id_contrat;

And the query is : select * from export_contract_par_region where codecoop
= 'BRZH';

Here is the default plan :


QUERY
PLAN

-

Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread pavan95
Hi Andreas,

Yes I'm facing problem because of this huge WAL(archive log) generation. As
it is seriously consuming a lot of disk space almost close to 50GB per day
even if the DML's don't have that impact in this WAL generation.

Previously the archive_log size is nearly 2 to 3 GB a day. Now with the same
set of DML's how is it being generated to 50GB is my burning doubt.

I just wanted to know how to stabilize this issue, as checking and deleting
the archive logs on hourly basis is not a good idea.

Finally, I'm looking how to reduce this back to normal. Thanks in Advance.

Regards,
Pavan  



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html