[ceph-users] Re: cephfs - blacklisted client coming back?

Andras Pataki Mon, 09 Nov 2020 21:12:19 -0800

Hi Dan,

That makes sense - the time between blacklist and magic comeback wasaround 1 hour - thanks for the explanation. Is this is a safe default? At eviction, the MDS takes all caps from the client away, so if it comesback in an hour, doesn't it then write to files that it perhapsshouldn't have access to?

There is the other strange thing ceph-fuse was doing for an hour(increased the objecter log level to 20).


Here is the eviction:

2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 I wasblacklisted at osd epoch 17178942020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680remove_session_caps still has dirty|flushing caps on0x100673a2613.head(faked_ino=0 ref=5 ll_ref=1cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.139916ctime=2020-11-09 14:34:28.139916 caps=- dirty_caps=Fwobjectset[0x100673a2613 ts 0/0 objects 1 dirty_or_tx 0]parents=0x10067375a7c.head["pwaf-00680.ene"] 0x7fffd034b4d0)2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680remove_session_caps still has dirty|flushing caps on0x100673a2614.head(faked_ino=0 ref=5 ll_ref=1cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.145199ctime=2020-11-09 14:34:28.145199 caps=- dirty_caps=Fwobjectset[0x100673a2614 ts 0/0 objects 1 dirty_or_tx 0]parents=0x10067375a7c.head["pwaf-00685.ene"] 0x7fffd034bc20)2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680remove_session_caps still has dirty|flushing caps on0x100673a2615.head(faked_ino=0 ref=5 ll_ref=1cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.150306ctime=2020-11-09 14:34:28.150306 caps=- dirty_caps=Fwobjectset[0x100673a2615 ts 0/0 objects 1 dirty_or_tx 0]parents=0x10067375a7c.head["pwaf-00682.ene"] 0x7fffd034c1d0)

... and a lot more of these ...

then the following types of messages repeat:

2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scatteredbh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131065 dirtyfirstbyte=32] waiters = {}2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scatteredbh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131065 dirtyfirstbyte=-42] waiters = {}2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scatteredbh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131065 dirtyfirstbyte=32] waiters = {}2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scatteredbh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131065 dirtyfirstbyte=96] waiters = {}

... about 200 or so of these ...

followed by

2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commitmarking dirty again due to error bh[ 0x7fff60a0ae40 2026998~40x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108(108) Cannot send after transport endpoint shutdown2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commitmarking dirty again due to error bh[ 0x7ffe6c405f80 2051562~3288040x7fffac4d0460 (328804) v 131183 dirty firstbyte=-42] waiters = {} r =-108 (108) Cannot send after transport endpoint shutdown2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commitmarking dirty again due to error bh[ 0x7fff9b14d850 2380366~40x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108(108) Cannot send after transport endpoint shutdown2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commitmarking dirty again due to error bh[ 0x7fff9bc966f0 2380370~81760x7fffac4d0460 (8176) v 131183 dirty firstbyte=96] waiters = {} r = -108(108) Cannot send after transport endpoint shutdown

... about 200 or so of these ...

then again:

2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scatteredbh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131183 dirtyfirstbyte=32] waiters = {}2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scatteredbh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131183 dirtyfirstbyte=-42] waiters = {}2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scatteredbh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131183 dirtyfirstbyte=32] waiters = {}2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scatteredbh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131183 dirtyfirstbyte=96] waiters = {}


rejected again:

2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commitmarking dirty again due to error bh[ 0x7fff60a0ae40 2026998~40x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108(108) Cannot send after transport endpoint shutdown2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commitmarking dirty again due to error bh[ 0x7ffe6c405f80 2051562~3288040x7fffac4d0460 (328804) v 131349 dirty firstbyte=-42] waiters = {} r =-108 (108) Cannot send after transport endpoint shutdown2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commitmarking dirty again due to error bh[ 0x7fff9b14d850 2380366~40x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108(108) Cannot send after transport endpoint shutdown2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commitmarking dirty again due to error bh[ 0x7fff9bc966f0 2380370~81760x7fffac4d0460 (8176) v 131349 dirty firstbyte=96] waiters = {} r = -108(108) Cannot send after transport endpoint shutdown

repeating for the whole hour - using about 1Gbps bandwidth from just onenode. Then the blacklist was lifted, and presumably the writes wentthrough:

2020-11-09 16:56:36.840 7fffdb7fe700 10 objectcacher bh_write_commitclean bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 cleanfirstbyte=32] waiters = {}2020-11-09 16:56:36.840 7fffdb7fe700 10objectcacher.object(100673a2670.00000008/head) try_merge_bh bh[0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean firstbyte=32]waiters = {}2020-11-09 16:56:36.840 7fffdb7fe700 10objectcacher.object(100673a2670.00000008/head) merge_left bh[0x7fffac4d05b0 0~2026998 0x7fffac4d0460 (2026998) v 6079 cleanfirstbyte=-122] waiters = {} + bh[ 0x7fff60a0ae40 2026998~40x7fffac4d0460 (4) v 160426 clean firstbyte=32] waiters = {}


this looks like it is unsafe?

Andras


On 11/9/20 7:21 PM, Dan van der Ster wrote:

Hi Andras,

The osd blocklist entries expire after 1hr by default:

     Option("mon_osd_blacklist_default_expire", Option::TYPE_FLOAT,
Option::LEVEL_ADVANCED)
     .set_default(1_hr)
     .add_service("mon")
     .set_description("Duration in seconds that blacklist entries for clients "
                      "remain in the OSD map"),

(Check mon/OSDMonitor.cc for the implementation)

Cheers, Dan

On Mon, Nov 9, 2020 at 11:59 PM Andras Pataki
<apat...@flatironinstitute.org> wrote:

We had some network problems (high packet drops) to some cephfs client
nodes that run ceph-fuse (14.2.13) against a Nautilus cluster (on
version 14.2.8).  As a result a couple of clients got evicted (as one
would expect).  What was really odd is that the clients were trying to
flush data they had in cache and kept getting rejected by OSD's for
almost an hour, and then magically the data flush worked. When asked
afterwards, the client reported that it was no longer backlisted.  How
would that happen?  I certainly didn't run any commands to un-blacklist
a client and the docs say that otherwise the client will stay
blacklisted until the file system gets remounted.

Here is the status of the client when it was backlisted:
[root@worker2033 ceph]# ceph daemon
/var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status
{
      "metadata": {
          "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f",
          "ceph_version": "ceph version 14.2.13
(1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)",
          "entity_id": "cephfs2",
          "hostname": "worker2033",
          "mount_point": "/mnt/ceph",
          "pid": "7698",
          "root": "/"
      },
      "dentry_count": 252,
      "dentry_pinned_count": 9,
      "id": 111995680,
      "inst": {
          "name": {
              "type": "client",
              "num": 111995680
          },
          "addr": {
              "type": "v1",
              "addr": "10.254.65.33:0",
              "nonce": 410851087
          }
      },
      "addr": {
          "type": "v1",
          "addr": "10.254.65.33:0",
          "nonce": 410851087
      },
      "inst_str": "client.111995680 10.254.65.33:0/410851087",
      "addr_str": "10.254.65.33:0/410851087",
      "inode_count": 251,
      "mds_epoch": 3376260,
      "osd_epoch": 1717896,
      "osd_epoch_barrier": 1717893,
      "blacklisted": true
}

This corresponds to server side log messages:
2020-11-09 15:56:31.578 7fffe59a4700  1 mds.0.3376160 Evicting (and
blacklisting) client session 111995680 (10.254.65.33:0/410851087)
2020-11-09 15:56:31.578 7fffe59a4700  0 log_channel(cluster) log [INF] :
Evicting (and blacklisting) client session 111995680
(10.254.65.33:0/410851087)
2020-11-09 15:56:31.706 7fffe59a4700  1 mds.0.3376160 Evicting (and
blacklisting) client session 111995680 (10.254.65.33:0/410851087)
2020-11-09 15:56:31.706 7fffe59a4700  0 log_channel(cluster) log [INF] :
Evicting (and blacklisting) client session 111995680
(10.254.65.33:0/410851087)

and them some time later (perhaps half an hour or so) I got this from
the client:

[root@worker2033 ceph]# ceph daemon
/var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status
{
      "metadata": {
          "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f",
          "ceph_version": "ceph version 14.2.13
(1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)",
          "entity_id": "cephfs2",
          "hostname": "worker2033",
          "mount_point": "/mnt/ceph",
          "pid": "7698",
          "root": "/"
      },
      "dentry_count": 252,
      "dentry_pinned_count": 9,
      "id": 111995680,
      "inst": {
          "name": {
              "type": "client",
              "num": 111995680
          },
          "addr": {
              "type": "v1",
              "addr": "10.254.65.33:0",
              "nonce": 410851087
          }
      },
      "addr": {
          "type": "v1",
          "addr": "10.254.65.33:0",
          "nonce": 410851087
      },
      "inst_str": "client.111995680 10.254.65.33:0/410851087",
      "addr_str": "10.254.65.33:0/410851087",
      "inode_count": 251,
      "mds_epoch": 3376260,
      "osd_epoch": 1717897,
      "osd_epoch_barrier": 1717893,
      "blacklisted": false
}

The cluster was otherwise healthy - nothing wrong with MDS's, or any
placement groups, etc.  I also don't see any further log messages
regarding eviction/backlisting in the MDS logs.  I didn't run any ceph
commands that would change the state of the cluster - I was just looking
around, increasing log levels.

Any ideas how could that have happened?

A separate problem (perhaps needs a ticket filed) that while the
ceph-fuse client was in a blacklisted state, it was retrying in an
infinite loop to flush data to the OSD's and got rejected every time.  I
have some logs for the details of this too.

Andras
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: cephfs - blacklisted client coming back?

Reply via email to