Hi Dan,

That makes sense - the time between blacklist and magic comeback was around 1 hour - thanks for the explanation.  Is this is a safe default?  At eviction, the MDS takes all caps from the client away, so if it comes back in an hour, doesn't it then  write to files that it perhaps shouldn't have access to?

There is the other strange thing ceph-fuse was doing for an hour (increased the objecter log level to 20).

Here is the eviction:
2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 I was blacklisted at osd epoch 1717894 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 remove_session_caps still has dirty|flushing caps on 0x100673a2613.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640 size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.139916 ctime=2020-11-09 14:34:28.139916 caps=- dirty_caps=Fw objectset[0x100673a2613 ts 0/0 objects 1 dirty_or_tx 0] parents=0x10067375a7c.head["pwaf-00680.ene"] 0x7fffd034b4d0) 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 remove_session_caps still has dirty|flushing caps on 0x100673a2614.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640 size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.145199 ctime=2020-11-09 14:34:28.145199 caps=- dirty_caps=Fw objectset[0x100673a2614 ts 0/0 objects 1 dirty_or_tx 0] parents=0x10067375a7c.head["pwaf-00685.ene"] 0x7fffd034bc20) 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 remove_session_caps still has dirty|flushing caps on 0x100673a2615.head(faked_ino=0 ref=5 ll_ref=1 cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640 size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.150306 ctime=2020-11-09 14:34:28.150306 caps=- dirty_caps=Fw objectset[0x100673a2615 ts 0/0 objects 1 dirty_or_tx 0] parents=0x10067375a7c.head["pwaf-00682.ene"] 0x7fffd034c1d0)
... and a lot more of these ...

then the following types of messages repeat:

2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131065 dirty firstbyte=32] waiters = {} 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131065 dirty firstbyte=-42] waiters = {} 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131065 dirty firstbyte=32] waiters = {} 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131065 dirty firstbyte=96] waiters = {}
... about 200 or so of these ...

followed by

2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108 (108) Cannot send after transport endpoint shutdown 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131183 dirty firstbyte=-42] waiters = {} r = -108 (108) Cannot send after transport endpoint shutdown 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108 (108) Cannot send after transport endpoint shutdown 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131183 dirty firstbyte=96] waiters = {} r = -108 (108) Cannot send after transport endpoint shutdown
... about 200 or so of these ...

then again:

2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131183 dirty firstbyte=-42] waiters = {} 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131183 dirty firstbyte=96] waiters = {}

rejected again:

2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108 (108) Cannot send after transport endpoint shutdown 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131349 dirty firstbyte=-42] waiters = {} r = -108 (108) Cannot send after transport endpoint shutdown 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108 (108) Cannot send after transport endpoint shutdown 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131349 dirty firstbyte=96] waiters = {} r = -108 (108) Cannot send after transport endpoint shutdown

repeating for the whole hour - using about 1Gbps bandwidth from just one node.  Then the blacklist was lifted, and presumably the writes went through:

2020-11-09 16:56:36.840 7fffdb7fe700 10 objectcacher bh_write_commit clean bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean firstbyte=32] waiters = {} 2020-11-09 16:56:36.840 7fffdb7fe700 10 objectcacher.object(100673a2670.00000008/head) try_merge_bh bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean firstbyte=32] waiters = {} 2020-11-09 16:56:36.840 7fffdb7fe700 10 objectcacher.object(100673a2670.00000008/head) merge_left bh[ 0x7fffac4d05b0 0~2026998 0x7fffac4d0460 (2026998) v 6079 clean firstbyte=-122] waiters = {} + bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean firstbyte=32] waiters = {}

this looks like it is unsafe?

Andras


On 11/9/20 7:21 PM, Dan van der Ster wrote:
Hi Andras,

The osd blocklist entries expire after 1hr by default:

     Option("mon_osd_blacklist_default_expire", Option::TYPE_FLOAT,
Option::LEVEL_ADVANCED)
     .set_default(1_hr)
     .add_service("mon")
     .set_description("Duration in seconds that blacklist entries for clients "
                      "remain in the OSD map"),

(Check mon/OSDMonitor.cc for the implementation)

Cheers, Dan

On Mon, Nov 9, 2020 at 11:59 PM Andras Pataki
<apat...@flatironinstitute.org> wrote:
We had some network problems (high packet drops) to some cephfs client
nodes that run ceph-fuse (14.2.13) against a Nautilus cluster (on
version 14.2.8).  As a result a couple of clients got evicted (as one
would expect).  What was really odd is that the clients were trying to
flush data they had in cache and kept getting rejected by OSD's for
almost an hour, and then magically the data flush worked. When asked
afterwards, the client reported that it was no longer backlisted.  How
would that happen?  I certainly didn't run any commands to un-blacklist
a client and the docs say that otherwise the client will stay
blacklisted until the file system gets remounted.

Here is the status of the client when it was backlisted:
[root@worker2033 ceph]# ceph daemon
/var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status
{
      "metadata": {
          "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f",
          "ceph_version": "ceph version 14.2.13
(1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)",
          "entity_id": "cephfs2",
          "hostname": "worker2033",
          "mount_point": "/mnt/ceph",
          "pid": "7698",
          "root": "/"
      },
      "dentry_count": 252,
      "dentry_pinned_count": 9,
      "id": 111995680,
      "inst": {
          "name": {
              "type": "client",
              "num": 111995680
          },
          "addr": {
              "type": "v1",
              "addr": "10.254.65.33:0",
              "nonce": 410851087
          }
      },
      "addr": {
          "type": "v1",
          "addr": "10.254.65.33:0",
          "nonce": 410851087
      },
      "inst_str": "client.111995680 10.254.65.33:0/410851087",
      "addr_str": "10.254.65.33:0/410851087",
      "inode_count": 251,
      "mds_epoch": 3376260,
      "osd_epoch": 1717896,
      "osd_epoch_barrier": 1717893,
      "blacklisted": true
}

This corresponds to server side log messages:
2020-11-09 15:56:31.578 7fffe59a4700  1 mds.0.3376160 Evicting (and
blacklisting) client session 111995680 (10.254.65.33:0/410851087)
2020-11-09 15:56:31.578 7fffe59a4700  0 log_channel(cluster) log [INF] :
Evicting (and blacklisting) client session 111995680
(10.254.65.33:0/410851087)
2020-11-09 15:56:31.706 7fffe59a4700  1 mds.0.3376160 Evicting (and
blacklisting) client session 111995680 (10.254.65.33:0/410851087)
2020-11-09 15:56:31.706 7fffe59a4700  0 log_channel(cluster) log [INF] :
Evicting (and blacklisting) client session 111995680
(10.254.65.33:0/410851087)

and them some time later (perhaps half an hour or so) I got this from
the client:

[root@worker2033 ceph]# ceph daemon
/var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status
{
      "metadata": {
          "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f",
          "ceph_version": "ceph version 14.2.13
(1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)",
          "entity_id": "cephfs2",
          "hostname": "worker2033",
          "mount_point": "/mnt/ceph",
          "pid": "7698",
          "root": "/"
      },
      "dentry_count": 252,
      "dentry_pinned_count": 9,
      "id": 111995680,
      "inst": {
          "name": {
              "type": "client",
              "num": 111995680
          },
          "addr": {
              "type": "v1",
              "addr": "10.254.65.33:0",
              "nonce": 410851087
          }
      },
      "addr": {
          "type": "v1",
          "addr": "10.254.65.33:0",
          "nonce": 410851087
      },
      "inst_str": "client.111995680 10.254.65.33:0/410851087",
      "addr_str": "10.254.65.33:0/410851087",
      "inode_count": 251,
      "mds_epoch": 3376260,
      "osd_epoch": 1717897,
      "osd_epoch_barrier": 1717893,
      "blacklisted": false
}

The cluster was otherwise healthy - nothing wrong with MDS's, or any
placement groups, etc.  I also don't see any further log messages
regarding eviction/backlisting in the MDS logs.  I didn't run any ceph
commands that would change the state of the cluster - I was just looking
around, increasing log levels.

Any ideas how could that have happened?

A separate problem (perhaps needs a ticket filed) that while the
ceph-fuse client was in a blacklisted state, it was retrying in an
infinite loop to flush data to the OSD's and got rejected every time.  I
have some logs for the details of this too.

Andras
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to