Re: blending fast and temp space volumes

2018-02-22 Thread Claudio Freire
On Wed, Feb 21, 2018 at 5:09 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> On Wed, Feb 21, 2018 at 12:07 PM, Claudio Freire <klaussfre...@gmail.com> 
> wrote:
>> On Wed, Feb 21, 2018 at 4:50 PM, Peter Geoghegan <p...@bowt.ie> wrote:
>>> On Wed, Feb 21, 2018 at 7:53 AM, Rick Otten <rottenwindf...@gmail.com> 
>>> wrote:
>>>> side note:  The disadvantage of local SSD is that it won't survive "hitting
>>>> the virtual power button" on an instance, nor can it migrate automatically
>>>> to other hardware.  (We have to hit the power button to add memory/cpu to
>>>> the system, and sometimes the power button might get hit by accident.)  
>>>> This
>>>> is OK for temp space.  I never have my database come up automatically on
>>>> boot, and I have scripted the entire setup of the temp space volume and 
>>>> data
>>>> structures.  I can run that script before starting the database.   I've 
>>>> done
>>>> some tests and it seems to work great.  I don't mind rolling back any
>>>> transaction that might be in play during a power failure.
>>>
>>> It sounds like you're treating a temp_tablespaces tablespace as
>>> ephemeral, which IIRC can have problems that an ephemeral
>>> stats_temp_directory does not have.
>>
>> For instance?
>>
>> I've been doing that for years without issue. If you're careful to
>> restore the skeleton directory structure at server boot up, I haven't
>> had any issues.
>
> Then you clearly know what I mean already. That's not documented as
> either required or safe anywhere.

Ah, ok.

But the OP did mention he was doing that already. So it should be safe.



Re: blending fast and temp space volumes

2018-02-21 Thread Claudio Freire
On Wed, Feb 21, 2018 at 4:50 PM, Peter Geoghegan  wrote:
> On Wed, Feb 21, 2018 at 7:53 AM, Rick Otten  wrote:
>> side note:  The disadvantage of local SSD is that it won't survive "hitting
>> the virtual power button" on an instance, nor can it migrate automatically
>> to other hardware.  (We have to hit the power button to add memory/cpu to
>> the system, and sometimes the power button might get hit by accident.)  This
>> is OK for temp space.  I never have my database come up automatically on
>> boot, and I have scripted the entire setup of the temp space volume and data
>> structures.  I can run that script before starting the database.   I've done
>> some tests and it seems to work great.  I don't mind rolling back any
>> transaction that might be in play during a power failure.
>
> It sounds like you're treating a temp_tablespaces tablespace as
> ephemeral, which IIRC can have problems that an ephemeral
> stats_temp_directory does not have.

For instance?

I've been doing that for years without issue. If you're careful to
restore the skeleton directory structure at server boot up, I haven't
had any issues.



On Wed, Feb 21, 2018 at 4:22 PM, Craig James  wrote:
>
> On Wed, Feb 21, 2018 at 7:53 AM, Rick Otten 
>> I was wondering if there anyone had ideas for how to make that possible.
>> I don't think I want to add the SAN disk to the same LVM volume group as the
>> local disk, but maybe that would work, since I'm already building it with a
>> script anyhow ... Is LVM smart enough to optimize radically different disk
>> performances?
>
>
> Couldn't you configure both devices into a single 6T device via RAID0 using
> md?

That would probably perform as slow as the slowest disk.



Re: effective_io_concurrency on EBS/gp2

2018-02-05 Thread Claudio Freire
On Mon, Feb 5, 2018 at 8:26 AM, Vitaliy Garnashevich
 wrote:
>> I mean, that the issue is indeed affected by the order of rows in the
>> table. Random heap access patterns result in sparse bitmap heap scans,
>> whereas less random heap access patterns result in denser bitmap heap
>> scans. Dense scans have large portions of contiguous fetches, a
>> pattern that is quite adversely affected by the current prefetch
>> mechanism in linux.
>>
>
> Thanks for your input.
>
> How can I test a sparse bitmap scan? Can you think of any SQL commands which
> would generate data and run such scans?
>
> Would a bitmap scan over expression index ((aid%1000)=0) do a sparse bitmap
> scan?

If you have a minimally correlated index (ie: totally random order),
and suppose you have N tuples per page, you need to select less (much
less) than 1/Nth of the table.



Re: effective_io_concurrency on EBS/gp2

2018-02-04 Thread Claudio Freire
On Sat, Feb 3, 2018 at 8:05 PM, Vitaliy Garnashevich
 wrote:
> Looks like this behavior is not caused by, and does not depend on:
> - variable performance in the cloud
> - order of rows in the table
> - whether the disk is EBS (backed by SSD or HDD), or ordinary SSD
> - kernel version
>
> Does this mean that the default setting for eic on Linux is just inadequate
> for how the modern kernels behave? Or am I missing something else in the
> tests?
>
> Regards,
> Vitaliy

I have analyzed this issue quite extensively in the past, and I can
say with high confidence that you're analysis on point 2 is most
likely wrong.

Now, I don't have all the information to make that a categorical
assertion, you might have a point, but I believe you're
misinterpreting the data.

I mean, that the issue is indeed affected by the order of rows in the
table. Random heap access patterns result in sparse bitmap heap scans,
whereas less random heap access patterns result in denser bitmap heap
scans. Dense scans have large portions of contiguous fetches, a
pattern that is quite adversely affected by the current prefetch
mechanism in linux.

This analysis does point to the fact that I should probably revisit
this issue. There's a rather simple workaround for this, pg should
just avoid issuing prefetch orders for sequential block patterns,
since those are already much better handled by the kernel itself.



Re: effective_io_concurrency on EBS/gp2

2018-02-01 Thread Claudio Freire
On Wed, Jan 31, 2018 at 11:21 PM, hzzhangjiazhi
 wrote:
> HI
>
>  I think this parameter will be usefull when the storage using RAID
> stripe , otherwise turn up this parameter is meaningless when only has one
> device。

Not at all. Especially on EBS, where keeping a relatively full queue
is necessary to get max thoughput out of the drive.

Problem is, if you're scanning a highly correlated index, the
mechanism is counterproductive. I had worked on some POC patches for
correcting that, I guess I could work something out, but it's
low-priority for me. Especially since it's actually a kernel "bug" (or
shortcoming), that could be fixed in the kernel rather than worked
around by postgres.



Re: effective_io_concurrency on EBS/gp2

2018-01-31 Thread Claudio Freire
On Wed, Jan 31, 2018 at 1:57 PM, Vitaliy Garnashevich
 wrote:
> More tests:
>
> io1, 100 GB:
>
> effective_io_concurrency=0
>  Execution time: 40333.626 ms
> effective_io_concurrency=1
>  Execution time: 163840.500 ms

In my experience playing with prefetch, e_i_c>0 interferes with kernel
read-ahead. What you've got there would make sense if what postgres
thinks will be random I/O ends up being sequential. With e_i_c=0, the
kernel will optimize the hell out of it, because it's a predictable
pattern. But with e_i_c=1, the kernel's optimization gets disabled but
postgres isn't reading much ahead, so you get the worst possible case.



Re: 8.2 Autovacuum BUG ?

2018-01-30 Thread Claudio Freire
On Tue, Jan 30, 2018 at 10:55 AM, pavan95  wrote:
> Hello all,
>
> Will a sudden restart(stop/start) of a postgres database will generate this
> huge WAL?

Shouldn't



Re: 8.2 Autovacuum BUG ?

2018-01-24 Thread Claudio Freire
On Wed, Jan 24, 2018 at 8:50 AM, pavan95 
wrote:

> Hello all,
>
> One more interesting observation made by me.
>
> I have ran the below query(s) on production:
>
> SELECT
> relname,
> age(relfrozenxid) as xid_age,
> pg_size_pretty(pg_table_size(oid)) as table_size
> FROM pg_class
> WHERE relkind = 'r' and pg_table_size(oid) > 1073741824
> ORDER BY age(relfrozenxid) DESC ;
> relname  |
> xid_age | table_size
> 
> +-+
>  *hxx*  |
> 7798262 | 3245 MB
>  hrx |
> 7797554 | 4917 MB
>  irxx|
> 7796771 | 2841 MB
>  hr_   | 7744262 |
> 4778 MB
>  reimbxxx | 6767712 | 1110 MB
>
> show autovacuum_freeze_max_age;
>  autovacuum_freeze_max_age
> ---
>  2
> (1 row)
>

You seem to be rather far from the freeze_max_age. Unless you're consuming
txids at a very high rate, I don't think that's your problem.


Re: 8.2 Autovacuum BUG ?

2018-01-23 Thread Claudio Freire
On Wed, Jan 24, 2018 at 3:54 AM, pavan95 
wrote:

> Hi Claudio,
>
> We didn't configure any replication to our production server. Which strace
> are you talking about?
>

This one: https://linux.die.net/man/1/strace

You can attach it to a process (assuming you have the necessary
permissions) and it will report all the syscalls the process does. That
does slow down the process though.

Then lsof ( https://linux.die.net/man/8/lsof ) can be used to map file
descriptor numbers to file paths. You have to do it as soon as you read the
output, because files get closed and file descriptors reused. So it's
better to have a script that directly reads from /proc/pid/fd or fdinfo,
but that takes some programming.

It is nontrivial, but sometimes it's the only tool in your belt. You may
want to try something else first though.


> We did a keen observation that only at the time 9'th minute of the hour and
> 39'th minute of the hour the so called archive logs are generated even
> when

nobody is connecting from application(off the business hours).


Well, if you don't know what happens at those times (and only at those
times), it's not that useful.

Since you don't know what is causing this for certain, first thing you have
to do is ascertain that. Try increasing logging as much as you can,
especially around those times, and see what turns on then and not at other
times. You can monitor autovacuum processes as well in pg_stat_activity, so
make sure you check that as well, as autovacuum will only log once it's
done.

You do know autovacuum is running at those times, you have to check whether
it isn't when WAL isn't being generated, and whether autovacuum is
vacuuming the same tables over and over or what. Your earlier mails show
autoanalyze runs, not vacuum. Those shouldn't cause so much WAL, but if
it's running very often and you have lots of stats, then maybe.

You can also try pg_stat_statements:
https://www.postgresql.org/docs/9.1/static/pgstatstatements.html

Again, concentrate on the differential - what happens at those times, that
doesn't at other times.

Another idea would be to check for freeze runs in autovacuum. Ie, what's
described here: https://wiki.postgresql.org/wiki/VacuumHeadaches#FREEZE

There's a nice blog post with some queries to help you with that here:
http://www.databasesoup.com/2012/09/freezing-your-tuples-off-part-1.html
(and it's continuation here:
http://www.databasesoup.com/2012/10/freezing-your-tuples-off-part-2.html ).
I'm not saying you should tune those parameters, what you were showing was
autoanalyze activity, not vacuum freeze, but you should check whether you
need to anyway.


Re: Batch insert heavily affecting query performance.

2018-01-09 Thread Claudio Freire
On Wed, Dec 27, 2017 at 2:10 PM, Mike Sofen  wrote:

> In my experience, that 77ms will stay quite constant even if your db grew
> to > 1TB.  Postgres IS amazing.  BTW, for a db, you should always have
> provisioned IOPS or else your performance can vary wildly, since the SSDs
> are shared.
>
>
>
> Re Lambda:  another team is working on a new web app using Lambda calls
> and they were also experiencing horrific performance, just like yours (2
> seconds per call).  They discovered it was the Lambda connection/spin-up
> time causing the problem.  They solved it by keeping several Lambda’s
> “hot”, for an instant connection…solved the problem, the last I heard.
> Google for that topic, you’ll find solutions.
>

You should try to implement an internal connection pool in your lambda.

Lambda functions are reused. You have no guarantees as to how long these
processes will live, but they will live for more than one request. So if
you keep a persistent connection in your lambda code, the first invocation
may be slow, but further invocations will be fast. Lambda will try to batch
several calls at once. In fact, you can usually configure batching in the
event source to try to maximize this effect.

In my experience, your lambda will be most probably network-bound. Increase
the lambda's memory allocation, to get a bigger chunk of the available
network bandwidth (why they decided to call that "memory" nobody will ever
be able to tell).


Re: Need Help on wal_compression

2018-01-09 Thread Claudio Freire
On Tue, Jan 9, 2018 at 3:53 AM, Rambabu V  wrote:

> Hi Team,
>
> Daily 4000 Archive files are generating and these are occupying more
> space, we are trying to compress wall files with using wal_compression
> parameter, but we are not seeing any change in wal files count, could you
> please help us on this.
>

That's very little information to go on.

You'll probably want to inspect WAL record stats before and after enabling
wal_compression to see whether it makes sense to do so. Take a look at
pg_xlogdump --stats

For example:

$ pg_xlogdump --stats -p /path/to/pg_xlog 00010002C36400F0
00010002C36400FA
Type   N  (%)  Record
size  (%) FPI size  (%)Combined size  (%)
   -  ---
---  ---   ---
-  ---
XLOG   0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
Transaction   11 (  0.00)
352 (  0.00)0 (  0.00)  352 (  0.00)
Storage0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
CLOG   0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
Database   0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
Tablespace 0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
MultiXact  4 (  0.00)
208 (  0.00)0 (  0.00)  208 (  0.00)
RelMap 0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
Standby2 (  0.00)
116 (  0.00)0 (  0.00)  116 (  0.00)
Heap2   2504 (  0.18)
78468 (  0.20)  1385576 (  3.55)  1464044 (  1.89)
Heap  667619 ( 48.23)
19432159 ( 50.47) 28641357 ( 73.35) 48073516 (
61.99)
Btree 712093 ( 51.45)
18643846 ( 48.42)  9021270 ( 23.10) 27665116 (
35.67)
Hash   0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
Gin0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
Gist   0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
Sequence1918 (  0.14)
349076 (  0.91)0 (  0.00)   349076 (  0.45)
SPGist 0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
BRIN   0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
CommitTs   0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)
ReplicationOrigin  0 (
0.00)0 (  0.00)0 (
0.00)0 (  0.00)

    
Total1384151
38504225 [49.65%] 39048203 [50.35%] 77552428 [100%]


That shows 50% of that are full page writes. This is with compression
enabled. WAL compression will only help FPW, so if you don't have a large
volume of FPW, or they don't compress well, you won't benefit much.