Salut,

Nu reusesc sa imi dau seama de un bottleneck I/O pe un server de baze de
date. Am un script care face niste modificari intensive pe o db firebird de
aprox. 5GB, insa din alte investigatii, inclin sa cred ca problema nu e la
db/script/indecsi samd, ci mai degraba undeva in filesystem (ext4).

Mai jos informatii relevante (sper). Sistemul nu are optimizari sau
parametrizari custom (nobarriers, noatime, writeback, commit, vm.dirty_*) -
totul e default pe un debian squeeze amd64. Memorie suficienta, procesul nu
are nevoie de swap, CPU nu e utilizat mai deloc si nici nu asteapta dupa
IO, rularea se executa local, nu asteapta dupa retea, ionice nu are efect
samd.

Singura chestie dubioasa pare sa o raporteze strace, apelurile pwrite()
dureaza foarte mult comparativ cu cele write(), nu imi dau seama daca e
acceptabil sau nu. Pot spune insa ca pwrite scrie intotdeuna cate 8K
(dimensiunea paginii din db) fata de apelurile write() care sunt pentru
valori mult mai mici (128 bytes), insa nici asa nu explica diferenta de
timp: 8k este de 64 ori mai mare ca 128, insa apelul pwrite dureaza de
aprox. 1400 ori mai mult decat cel de write (vezi mai jos in outputul
strace).

Orice sugestie de testari suplimentare e binevenita, posibil sa ma fi
incurcat inclusiv la intrepretarea vreunui parametru din iostat/vmstat.

Multumesc,
Silviu

2 buc Seagate Cheetah 15K.5 73GB 15K 3.0Gbps Serial SCSI / SAS Hard Drive
ST373454SS in RAID1 soft

# uname -a
Linux sab09 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64
GNU/Linux

# lsb_release -a
No LSB modules are available.
Distributor ID:    Debian
Description:    Debian GNU/Linux 6.0.5 (squeeze)
Release:    6.0.5
Codename:    squeeze


# mdadm --query --detail /dev/md0
/dev/md0:
        Version : 1.2
  Creation Time : Wed Jul 27 19:29:10 2011
     Raid Level : raid1
     Array Size : 66404280 (63.33 GiB 68.00 GB)
  Used Dev Size : 66404280 (63.33 GiB 68.00 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Sat Apr 13 01:21:38 2013
          State : active
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : sab09:0  (local to host sab09)
           UUID : 1ecf91cd:38ebe28d:e9691655:6656a7d6
         Events : 763

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1

# tune2fs -l /dev/md0
tune2fs 1.41.12 (17-May-2010)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          18efff63-743e-4411-b826-574bad5d51dc
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index
filetype needs_recovery extent flex_bg sparse_super large_file huge_file
uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    (none)
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              4153344
Block count:              16601070
Reserved block count:     830053
Free blocks:              12170173
Free inodes:              4122661
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1020
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Wed Jul 27 19:31:08 2011
Last mount time:          Mon Jul 23 20:48:17 2012
Last write time:          Tue Feb 14 18:22:01 2012
Mount count:              9
Maximum mount count:      31
Last checked:             Tue Feb 14 18:22:01 2012
Check interval:           15552000 (6 months)
Next check after:         Sun Aug 12 19:22:01 2012
Lifetime writes:          3306 GB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:              256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      a934d2d2-5f44-4850-8bb9-94ce3f425c45
Journal backup:           inode blocks

# strace -c -p 6146
Process 6146 attached - interrupt to quit
^CProcess 6146 detached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.67    0.064003          26      2454           pwrite
  0.14    0.000093           0      8707           pread
  0.07    0.000048           0      7782           read
  0.07    0.000045           0      6618           write
  0.05    0.000029           0     10021           lseek
  0.00    0.000000           0         6           fstat
  0.00    0.000000           0       752           rt_sigaction
  0.00    0.000000           0       846           rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00    0.064218                 37186           total

# strace -e pwrite -p 6146
pwrite(8,
"\5\000909\1\0\0\0\0\0\0\0\0\0\0\303d\0\0\237\0#\1\0\0\0\0\0\0\0\0"...,
8192, 5040791552) = 8192
[...]

#strace -e write -p 6146
write(12, "\0\376\0\0\0\0\0\0007\1\0\0\0\0\0\0\r\0Ianuarie 2006\0"..., 128)
= 128
[...]

# cat /proc/6146/io
rchar: 498882100537
wchar: 140103265784
syscr: 138456608
syscw: 82931241
read_bytes: 557056
write_bytes: 131345989632
cancelled_write_bytes: 0

# free -m
             total       used       free     shared    buffers     cached
Mem:         12046      11584        462          0          6      11286
-/+ buffers/cache:        291      11754
Swap:        10311          4      10307

# iostat 1 /dev/md0 (in medie pe parcursul rularii)
[...]
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.49    0.00    0.16    1.96    0.00   97.39

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
md0             355.00         0.00      1888.00          0       1888
[...]

# iostat -x 1 /dev/md0
[...]
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.65    0.00    0.26    1.56    0.00   97.53

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
md0               0.00     0.00    0.00  365.00     0.00  1952.00
5.35     0.00    0.00   0.00   0.00
[...]

# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id
wa
 0  0   4848 472976   6492 11557044    0    0     2     5    0    0  1  0
99  0
 0  0   4848 473092   6492 11557044    0    0     0   888  997 1192  0  0
98  2
 0  0   4848 473092   6492 11557044    0    0     0  1032 1064 1375  0  0
97  3
 1  0   4848 473092   6500 11557036    0    0     0   876  965 1140  0  0
97  2
 0  1   4848 473092   6500 11557044    0    0     0  1032 1085 1387  0  0
98  2
 0  0   4848 473092   6500 11557044    0    0     0   912 1015 1216  1  0
97  3
 0  1   4848 473092   6500 11557044    0    0     0   944 1000 1267  0  0
98  2
 0  0   4848 473092   6500 11557044    0    0     0  1040 1083 1429  0  0
98  2
 0  0   4848 473092   6508 11557040    0    0     0   868  965 1144  0  0
98  2
 0  0   4848 473092   6508 11557044    0    0     0   968 1032 1322  0  0
97  2
^C
_______________________________________________
RLUG mailing list
RLUG@lists.lug.ro
http://lists.lug.ro/mailman/listinfo/rlug

Raspunde prin e-mail lui