Re: [ceph-users] Luminous cephfs maybe not to stable as expected?
I decided to restart osd.0, then the load of the cephfs and on all osd nodes dropped. After this I still have on the first server [@~]# cat /sys/kernel/debug/ceph/0f1701f5-453a-4a3b-928d-f652a2bbbcb0.client357431 0/osdc REQUESTS 0 homeless 0 LINGER REQUESTS BACKOFFS [@~]# cat /sys/kernel/debug/ceph/0f1701f5-453a-4a3b-928d-f652a2bbbcb0.client358422 4/osdc REQUESTS 2 homeless 0 317841 osd020.d6ec44c1 20.1[0,28,5]/0 [0,28,5]/0 e65040 10001b44a70.0x40001c102023 read 317853 osd020.5956d31b 20.1b [0,5,10]/0 [0,5,10]/0 e65040 10001ad8962.0x40001c40731 read LINGER REQUESTS BACKOFFS And dmesg -T keeps giving me these (again with wrong timestamps) [Thu Jul 11 11:23:21 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 11:23:21 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 11:23:21 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu Jul 11 11:23:21 2019] libceph: mon0 192.168.10.111:6789 session established [Thu Jul 11 11:23:21 2019] libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 11:23:21 2019] libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 11:23:21 2019] libceph: mon2 192.168.10.113:6789 session established [Thu Jul 11 11:23:21 2019] libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 11:23:21 2019] libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu Jul 11 11:23:21 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 11:23:21 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 11:23:21 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu Jul 11 11:23:21 2019] libceph: mon0 192.168.10.111:6789 session established [Thu Jul 11 11:23:21 2019] libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 11:23:21 2019] libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon What to do now? Restarting the monitor did not help. -Original Message- Subject: Re: [ceph-users] Luminous cephfs maybe not to stable as expected? Forgot to add these [@ ~]# cat /sys/kernel/debug/ceph/0f1701f5-453a-4a3b-928d-f652a2bbbcb0.client357431 0/osdc REQUESTS 0 homeless 0 LINGER REQUESTS BACKOFFS [@~]# cat /sys/kernel/debug/ceph/0f1701f5-453a-4a3b-928d-f652a2bbbcb0.client358422 4/osdc REQUESTS 38 homeless 0 317841 osd020.d6ec44c1 20.1[0,28,5]/0 [0,28,5]/0 e65040 10001b44a70.0x40001c101139 read 317853 osd020.5956d31b 20.1b [0,5,10]/0 [0,5,10]/0 e65040 10001ad8962.0x40001c39847 read 317835 osd320.ede889de 20.1e [3,12,27]/3 [3,12,27]/3 e65040 10001ad80f6.0x40001c87758 read 317838 osd320.7b730a4e 20.e[3,31,9]/3 [3,31,9]/3 e65040 10001ad89d8.0x40001c83444 read 317844 osd320.feead84c 20.c[3,13,18]/3 [3,13,18]/3 e65040 10001ad8733.0x40001c77267 read 317852 osd320.bd2658e 20.e[3,31,9]/3 [3,31,9]/3 e65040 10001ad7e00.0x40001c39331 read 317830 osd420.922e6d04 20.4[4,16,27]/4 [4,16,27]/4 e65040 10001ad80f2.0x40001c86326 read 317837 osd420.fe93d4ab 20.2b [4,14,25]/4 [4,14,25]/4 e65040 10001ad80fb.0x40001c78951 read 317839 osd420.d7af926b 20.2b [4,14,25]/4 [4,14,25]/4 e65040 10001ad80ee.0x40001c77556 read 317849 osd520.5fcb95c5 20.5[5,18,29]/5 [5,18,29]/5 e65040 10001ad7f75.0x40001c61147 read 317857 osd520.28764e9a 20.1a [5,7,28]/5 [5,7,28]/5 e65040 10001ad8a10.0x40001c30369 read 317859 osd520.7bb79985 20.5[5,18,29]/5 [5,18,29]/5 e65040 10001ad7fe8.0x40001c27942 read 317836 osd820.e7bf5cf4 20.34 [8,5,10]/8 [8,5,10]/8 e65040 10001ad7d79.0x40001c133699 read 317842 osd820.abbb9df4 20.34 [8,5,10]/8 [8,5,10]/8 e65040 10001d5903f.0x40001c125308 read 317850 osd820.ecd0034 20.34 [8,5,10]/8 [8,5,10]/8 e65040 10001ad89b2.0x40001c68348 read 317854 osd820.cef50134 20.34 [8,5,10]/8 [8,5,10]/8 e65040 10001ad8728.0x40001c57431 read 317861 osd820.3e859bb4 20.34 [8,5,10]/8 [8,5,10]/8 e65040 10001ad8108.0x40001c50642 read 317847 osd920.fc9e9f43 20.3[9,29,17]/9 [9,29,17]/9 e65040 10001ad8101.0x40001c88464 read 317848 osd920.d32b6ac3 20.3[9,29,17]/9 [9,29,17]/9 e65040 10001ad8100.0x40001c85929 read 317862 osd11 20
Re: [ceph-users] Luminous cephfs maybe not to stable as expected?
20.11 [25,4,0]/25 [25,4,0]/25 e65040 10001ad8023.0x40001c147589 read 317856 osd26 20.febb382a 20.2a [26,0,18]/26[26,0,18]/26 e65040 10001ad8145.0x40001c65169 read 317845 osd27 20.5b433067 20.27 [27,7,14]/27[27,7,14]/27 e65040 10001ad8965.0x40001c124461 read LINGER REQUESTS BACKOFFS -Original Message- Subject: [ceph-users] Luminous cephfs maybe not to stable as expected? Maybe this requires some attention. I have a default centos7 (maybe not the most recent kernel though), ceph luminous setup eg. no different kernels. This is 2nd or 3rd time that a vm is going into a high load (151) and stopping its services. I have two vm's both mounting the same 2 cephfs 'shares'. After the last incident I dismounted the shares on the 2nd server. (Migrating to a new environment this 2nd server is not doing anything). Last time I thought maybe this could be related to my work on the switch from the stupid allocator to the bitmap. Anyway yesterday I thought lets mount again the 2 shares on the 2nd server, see what happens. And this morning the high load was back. Afaik the 2nd server is only doing a cron job on the cephfs mounts, creating snapshots. 1) I have now still increased load on the osd nodes, from cephfs. How can I see what client is doing this? I don’t seem to get this from 'ceph daemon mds.c session ls' however 'ceph osd pool stats | grep client -B 1' indicates it is cephfs. 2) ceph osd blacklist ls No blacklist entries 3) the first server keeps generating such messages, while there is no issue with connectivity. [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: osd25 192.168.10.114:6804 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: osd18 192.168.10.112:6802 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: osd22 192.168.10.111:6811 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session established PS dmesg -T gives me strange times, as you can see these are in the future, os time is 2 min behind (which is the correct one, ntpd sync). [@ ]# uptime 10:39:17 up 50 days, 13:31, 2 users, load average: 3.60, 3.02, 2.57 4) unmount the filesystem on the first server fails. 5) evicting the cephfs sessions of the first server, does not change the load of the cephfs on the osd nodes. 6) unmounting all cephfs clients
[ceph-users] Luminous cephfs maybe not to stable as expected?
Maybe this requires some attention. I have a default centos7 (maybe not the most recent kernel though), ceph luminous setup eg. no different kernels. This is 2nd or 3rd time that a vm is going into a high load (151) and stopping its services. I have two vm's both mounting the same 2 cephfs 'shares'. After the last incident I dismounted the shares on the 2nd server. (Migrating to a new environment this 2nd server is not doing anything). Last time I thought maybe this could be related to my work on the switch from the stupid allocator to the bitmap. Anyway yesterday I thought lets mount again the 2 shares on the 2nd server, see what happens. And this morning the high load was back. Afaik the 2nd server is only doing a cron job on the cephfs mounts, creating snapshots. 1) I have now still increased load on the osd nodes, from cephfs. How can I see what client is doing this? I don’t seem to get this from 'ceph daemon mds.c session ls' however 'ceph osd pool stats | grep client -B 1' indicates it is cephfs. 2) ceph osd blacklist ls No blacklist entries 3) the first server keeps generating such messages, while there is no issue with connectivity. [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: osd25 192.168.10.114:6804 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: osd18 192.168.10.112:6802 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon2 192.168.10.113:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: osd22 192.168.10.111:6811 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session established [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 io error [Thu Jul 11 10:41:22 2019] libceph: mon1 192.168.10.112:6789 session lost, hunting for new mon [Thu Jul 11 10:41:22 2019] libceph: mon0 192.168.10.111:6789 session established PS dmesg -T gives me strange times, as you can see these are in the future, os time is 2 min behind (which is the correct one, ntpd sync). [@ ]# uptime 10:39:17 up 50 days, 13:31, 2 users, load average: 3.60, 3.02, 2.57 4) unmount the filesystem on the first server fails. 5) evicting the cephfs sessions of the first server, does not change the load of the cephfs on the osd nodes. 6) unmounting all cephfs clients, still leaves me with cephfs activity on the data pool and on the osd nodes. [@c03 ~]# ceph daemon mds.c session ls [] 7) On the first server [@~]# ps -auxf| grep D USER PID %CPU %MEMVSZ RSS TTY STAT START TIME COMMAND root 6716 3.0 0.0 0 0 ?D10:18 0:59 \_ [kworker/0:2] root 20039 0.0 0.0 123520 1212 pts/0D+ 10:28 0:00 | \_ umount /home/mail-archive/ [@ ~]# cat /proc/6716/stack [] __wait_on_freeing_inode+0xb0/0xf0 [] find_inode+0x99/0xc0