Hi,

Am 24.01.2017 um 18:41 schrieb Andres Freund:
Hi,

On 2017-01-24 18:37:14 +0100, Tobias Oberstein wrote:
assume that it'd get more than swamped with doing actualy work, and with
buffering the frequently accessed stuff in memory.


What I am trying to say is: the syscall overhead of doing lseek/read/write
instead of pread/pwrite do become visible and hurt at a certain point.

Sure - but the question is whether it's measurable when you do actual
work.

The syscall overhead is visible in production too .. I watched PG using perf
live, and lseeks regularily appear at the top of the list.

Could you show such perf profiles? That'll help us.

oberstet@bvr-sql18:~$ psql -U postgres -d adr
psql (9.5.4)
Type "help" for help.

adr=# select * from svc_sqlbalancer.f_perf_syscalls();
NOTICE: starting Linux perf syscalls sampling - be patient, this can take some time .. NOTICE: sudo /usr/bin/perf stat -e "syscalls:sys_enter_*" -x ";" -a sleep 30 2>&1
 pid |                syscall                |   cnt   | cnt_per_sec
-----+---------------------------------------+---------+-------------
     | syscalls:sys_enter_lseek              | 4091584 |      136386
     | syscalls:sys_enter_newfstat           | 2054988 |       68500
     | syscalls:sys_enter_read               |  767990 |       25600
     | syscalls:sys_enter_close              |  503803 |       16793
     | syscalls:sys_enter_newstat            |  434080 |       14469
     | syscalls:sys_enter_open               |  380382 |       12679
     | syscalls:sys_enter_mmap               |  301491 |       10050
     | syscalls:sys_enter_munmap             |  182313 |        6077
     | syscalls:sys_enter_getdents           |  162443 |        5415
     | syscalls:sys_enter_rt_sigaction       |  158947 |        5298
     | syscalls:sys_enter_openat             |   85325 |        2844
     | syscalls:sys_enter_readlink           |   77439 |        2581
     | syscalls:sys_enter_rt_sigprocmask     |   60929 |        2031
     | syscalls:sys_enter_mprotect           |   58372 |        1946
     | syscalls:sys_enter_futex              |   49726 |        1658
     | syscalls:sys_enter_access             |   40845 |        1362
     | syscalls:sys_enter_write              |   39513 |        1317
     | syscalls:sys_enter_brk                |   33656 |        1122
     | syscalls:sys_enter_epoll_wait         |   23776 |         793
     | syscalls:sys_enter_ioctl              |   19764 |         659
     | syscalls:sys_enter_wait4              |   17371 |         579
     | syscalls:sys_enter_newlstat           |   13008 |         434
     | syscalls:sys_enter_exit_group         |   10135 |         338
     | syscalls:sys_enter_recvfrom           |    8595 |         286
     | syscalls:sys_enter_sendto             |    8448 |         282
     | syscalls:sys_enter_poll               |    7200 |         240
     | syscalls:sys_enter_lgetxattr          |    6477 |         216
     | syscalls:sys_enter_dup2               |    5790 |         193

<snip>

Note: there isn't a lot of load currently (this is from production).

I'm much less against this change than Tom, but doing artificial syscall
microbenchmark seems unlikely to make a big case for using it in

This isn't a syscall benchmark, but FIO.

There's not really a difference between those, when you use fio to
benchmark seek vs pseek.

Sorry, I don't understand what you are talking about.

postgres, where it's part of vastly more expensive operations (like
actually reading data afterwards, exclusive locks, ...).

PG is very CPU hungry, yes.

Indeed - working on it ;)


But there are quite some system related effects
too .. eg we've managed to get down the system load with huge pages (big
improvement).

Glad to hear it.

With 3TB RAM, huge pages is absolutely essential (otherwise, the system bogs down in TLB etc overhead).

I'd welcome seeing profiles of that - I'm working quite heavily on
speeding up analytics workloads for pg.

Here:

https://github.com/oberstet/scratchbox/raw/master/cruncher/adr_stats/ADR-PostgreSQL-READ-Statistics.pdf

https://github.com/oberstet/scratchbox/tree/master/cruncher/adr_stats

Thanks, unfortunately those appear to mostly have io / cache hit ratio
related stats?

Yep, this was just to proof that we are really running a DWH workload at scale;)

Cheers,
/Tobias


Greetings,

Andres Freund




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to