Thanks!,

It's true that I've seen a continuous memory growth, but I've not thought
in a memory leak. I don't remember exactly how many hours were neccesary to
fill the memory, but I calculate that were about 14h.

With the new configuration looks like memory grows slowly and when it
reaches 5-6 GB stops. Sometimes looks like the daemon flush the memory and
down again to less than 1Gb grown again to 5-6Gb slowly.

Just today I don't know why and how, because I've not changed anything on
the ceph cluster, but the memory has down to less than 1 Gb and still there
8 hours later. I've only deployed a git repository with some changes.

I've some nodes on version 12.2.5 because I've detected this problem and I
didn't know if was for the latest version, so I've stopped the update. The
one that is the active MDS is on latest version (12.2.7), and I've
programmed an update for the rest of nodes the thursday.

A graphic of the memory usage of latest days with that configuration:
https://imgur.com/a/uSsvBi4

I haven't info about when the problem was worst (512MB of MDS memory limit
and 15-16Gb of usage), because memory usage was not logged. I've only a
heap stats from that were dumped when the daemon was in progress to fill
the memory:

# ceph tell mds.kavehome-mgto-pro-fs01  heap stats
2018-07-19 00:43:46.142560 7f5a7a7fc700  0 client.1318388 ms_handle_reset on
 10.22.0.168:6800/1129848128
2018-07-19 00:43:46.181133 7f5a7b7fe700  0 client.1318391 ms_handle_reset on
 10.22.0.168:6800/1129848128
mds.kavehome-mgto-pro-fs01 tcmalloc heap stats:------------------------
------------------------
MALLOC:     9982980144 ( 9520.5 MiB) Bytes in use by application
MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
MALLOC: +    172148208 (  164.2 MiB) Bytes in central cache freelist
MALLOC: +     19031168 (   18.1 MiB) Bytes in transfer cache freelist
MALLOC: +     23987552 (   22.9 MiB) Bytes in thread cache freelists
MALLOC: +     20869280 (   19.9 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =  10219016352 ( 9745.6 MiB) Actual memory used (physical + swap)
MALLOC: +   3913687040 ( 3732.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: =  14132703392 (13478.0 MiB) Virtual address space used
MALLOC:
MALLOC:          63875              Spans in use
MALLOC:             16              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via
madvise()).
Bytes released to the OS take up virtual address space but no physical
memory.



Here's the Diff:
--------------------------------------------------------------------------------------------------------------------
{
    "diff": {
        "current": {
            "admin_socket":
"/var/run/ceph/ceph-mds.kavehome-mgto-pro-fs01.asok",
            "auth_client_required": "cephx",
            "bluestore_cache_size_hdd": "80530636",
            "bluestore_cache_size_ssd": "80530636",
            "err_to_stderr": "true",
            "fsid": "f015f888-6e0c-4203-aea8-ef0f69ef7bd8",
            "internal_safe_to_start_threads": "true",
            "keyring":
"/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01/keyring",
            "log_file": "/var/log/ceph/ceph-mds.kavehome-mgto-pro-fs01.log",
            "log_max_recent": "10000",
            "log_to_stderr": "false",
            "mds_cache_memory_limit": "53687091",
            "mds_data": "/var/lib/ceph/mds/ceph-kavehome-mgto-pro-fs01",
            "mgr_data": "/var/lib/ceph/mgr/ceph-kavehome-mgto-pro-fs01",
            "mon_cluster_log_file":
"default=/var/log/ceph/ceph.$channel.log cluster=/var/log/ceph/ceph.log",
            "mon_data": "/var/lib/ceph/mon/ceph-kavehome-mgto-pro-fs01",
            "mon_debug_dump_location":
"/var/log/ceph/ceph-mds.kavehome-mgto-pro-fs01.tdump",
            "mon_host": "10.22.0.168,10.22.0.140,10.22.0.127",
            "mon_initial_members": "kavehome-mgto-pro-fs01,
kavehome-mgto-pro-fs02, kavehome-mgto-pro-fs03",
            "osd_data": "/var/lib/ceph/osd/ceph-kavehome-mgto-pro-fs01",
            "osd_journal":
"/var/lib/ceph/osd/ceph-kavehome-mgto-pro-fs01/journal",
            "public_addr": "10.22.0.168:0/0",
            "public_network": "10.22.0.0/24",
            "rgw_data": "/var/lib/ceph/radosgw/ceph-kavehome-mgto-pro-fs01",
            "setgroup": "ceph",
            "setuser": "ceph"
        },
        "defaults": {
            "admin_socket": "",
            "auth_client_required": "cephx, none",
            "bluestore_cache_size_hdd": "1073741824",
            "bluestore_cache_size_ssd": "3221225472",
            "err_to_stderr": "false",
            "fsid": "00000000-0000-0000-0000-000000000000",
            "internal_safe_to_start_threads": "false",
            "keyring":
"/etc/ceph/$cluster.$name.keyring,/etc/ceph/$cluster.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,",
            "log_file": "",
            "log_max_recent": "500",
            "log_to_stderr": "true",
            "mds_cache_memory_limit": "1073741824",
            "mds_data": "/var/lib/ceph/mds/$cluster-$id",
            "mgr_data": "/var/lib/ceph/mgr/$cluster-$id",
            "mon_cluster_log_file":
"default=/var/log/ceph/$cluster.$channel.log
cluster=/var/log/ceph/$cluster.log",
            "mon_data": "/var/lib/ceph/mon/$cluster-$id",
            "mon_debug_dump_location": "/var/log/ceph/$cluster-$name.tdump",
            "mon_host": "",
            "mon_initial_members": "",
            "osd_data": "/var/lib/ceph/osd/$cluster-$id",
            "osd_journal": "/var/lib/ceph/osd/$cluster-$id/journal",
            "public_addr": "-",
            "public_network": "",
            "rgw_data": "/var/lib/ceph/radosgw/$cluster-$id",
            "setgroup": "",
            "setuser": ""
        }
    },
    "unknown": []
}
----------------------------------------------------------------------------------------------------------



Perf Dump
---------------------------------------------------------------------------------------------------------
{
    "AsyncMessenger::Worker-0": {
        "msgr_recv_messages": 1350895,
        "msgr_send_messages": 1593759,
        "msgr_recv_bytes": 301786293,
        "msgr_send_bytes": 341807191,
        "msgr_created_connections": 148,
        "msgr_active_connections": 45,
        "msgr_running_total_time": 119.217157290,
        "msgr_running_send_time": 39.714493374,
        "msgr_running_recv_time": 127.455260807,
        "msgr_running_fast_dispatch_time": 0.117634930
    },
    "AsyncMessenger::Worker-1": {
        "msgr_recv_messages": 2996114,
        "msgr_send_messages": 3113274,
        "msgr_recv_bytes": 804875332,
        "msgr_send_bytes": 1231962873,
        "msgr_created_connections": 151,
        "msgr_active_connections": 48,
        "msgr_running_total_time": 248.962533700,
        "msgr_running_send_time": 83.497214869,
        "msgr_running_recv_time": 547.534653813,
        "msgr_running_fast_dispatch_time": 0.125151678
    },
    "AsyncMessenger::Worker-2": {
        "msgr_recv_messages": 1793419,
        "msgr_send_messages": 2117240,
        "msgr_recv_bytes": 1425674729,
        "msgr_send_bytes": 871324466,
        "msgr_created_connections": 325,
        "msgr_active_connections": 54,
        "msgr_running_total_time": 160.001753142,
        "msgr_running_send_time": 49.679463024,
        "msgr_running_recv_time": 205.535692064,
        "msgr_running_fast_dispatch_time": 4.350479591
    },
    "finisher-PurgeQueue": {
        "queue_len": 0,
        "complete_latency": {
            "avgcount": 755,
            "sum": 0.022316252,
            "avgtime": 0.000029557
        }
    },
    "mds": {
        "request": 4942944,
        "reply": 489638,
        "reply_latency": {
            "avgcount": 489638,
            "sum": 771.955019623,
            "avgtime": 0.001576583
        },
        "forward": 4453296,
        "dir_fetch": 101036,
        "dir_commit": 3,
        "dir_split": 0,
        "dir_merge": 0,
        "inode_max": 2147483647,
        "inodes": 505,
        "inodes_top": 96,
        "inodes_bottom": 398,
        "inodes_pin_tail": 11,
        "inodes_pinned": 367,
        "inodes_expired": 1556356,
        "inodes_with_caps": 325,
        "caps": 1192,
        "subtrees": 16,
        "traverse": 4956673,
        "traverse_hit": 496867,
        "traverse_forward": 4450841,
        "traverse_discover": 166,
        "traverse_dir_fetch": 1657,
        "traverse_remote_ino": 0,
        "traverse_lock": 19,
        "load_cent": 494278118,
        "q": 0,
        "exported": 1187,
        "exported_inodes": 664127,
        "imported": 947,
        "imported_inodes": 76628
    },
    "mds_cache": {
        "num_strays": 0,
        "num_strays_delayed": 0,
        "num_strays_enqueuing": 0,
        "strays_created": 124,
        "strays_enqueued": 124,
        "strays_reintegrated": 0,
        "strays_migrated": 0,
        "num_recovering_processing": 0,
        "num_recovering_enqueued": 0,
        "num_recovering_prioritized": 0,
        "recovery_started": 0,
        "recovery_completed": 0,
        "ireq_enqueue_scrub": 0,
        "ireq_exportdir": 1189,
        "ireq_flush": 0,
        "ireq_fragmentdir": 0,
        "ireq_fragstats": 0,
        "ireq_inodestats": 0
    },
    "mds_log": {
        "evadd": 125666,
        "evex": 116984,
        "evtrm": 116984,
        "ev": 117582,
        "evexg": 0,
        "evexd": 933,
        "segadd": 138,
        "segex": 138,
        "segtrm": 138,
        "seg": 129,
        "segexg": 0,
        "segexd": 1,
        "expos": 25715287703,
        "wrpos": 25862332030,
        "rdpos": 25663431097,
        "jlat": {
            "avgcount": 23473,
            "sum": 98.111299299,
            "avgtime": 0.004179751
        },
        "replayed": 108900
    },
    "mds_mem": {
        "ino": 507,
        "ino+": 1579334,
        "ino-": 1578827,
        "dir": 312,
        "dir+": 101932,
        "dir-": 101620,
        "dn": 529,
        "dn+": 1580751,
        "dn-": 1580222,
        "cap": 1192,
        "cap+": 1825843,
        "cap-": 1824651,
        "rss": 258840,
        "heap": 313880,
        "buf": 0
    },
    "mds_server": {
        "dispatch_client_request": 5081829,
        "dispatch_server_request": 540,
        "handle_client_request": 4942944,
        "handle_client_session": 233505,
        "handle_slave_request": 846,
        "req_create": 128,
        "req_getattr": 38805,
        "req_getfilelock": 0,
        "req_link": 0,
        "req_lookup": 242216,
        "req_lookuphash": 0,
        "req_lookupino": 0,
        "req_lookupname": 2,
        "req_lookupparent": 0,
        "req_lookupsnap": 0,
        "req_lssnap": 0,
        "req_mkdir": 0,
        "req_mknod": 0,
        "req_mksnap": 0,
        "req_open": 2155,
        "req_readdir": 206315,
        "req_rename": 21,
        "req_renamesnap": 0,
        "req_rmdir": 0,
        "req_rmsnap": 0,
        "req_rmxattr": 0,
        "req_setattr": 2,
        "req_setdirlayout": 0,
        "req_setfilelock": 0,
        "req_setlayout": 0,
        "req_setxattr": 0,
        "req_symlink": 0,
        "req_unlink": 122
    },
    "mds_sessions": {
        "session_count": 10,
        "session_add": 128,
        "session_remove": 118
    },
    "objecter": {
        "op_active": 0,
        "op_laggy": 0,
        "op_send": 136767,
        "op_send_bytes": 202196534,
        "op_resend": 0,
        "op_reply": 136767,
        "op": 136767,
        "op_r": 101193,
        "op_w": 35574,
        "op_rmw": 0,
        "op_pg": 0,
        "osdop_stat": 5,
        "osdop_create": 0,
        "osdop_read": 150,
        "osdop_write": 23587,
        "osdop_writefull": 11750,
        "osdop_writesame": 0,
        "osdop_append": 0,
        "osdop_zero": 2,
        "osdop_truncate": 0,
        "osdop_delete": 228,
        "osdop_mapext": 0,
        "osdop_sparse_read": 0,
        "osdop_clonerange": 0,
        "osdop_getxattr": 100784,
        "osdop_setxattr": 0,
        "osdop_cmpxattr": 0,
        "osdop_rmxattr": 0,
        "osdop_resetxattrs": 0,
        "osdop_tmap_up": 0,
        "osdop_tmap_put": 0,
        "osdop_tmap_get": 0,
        "osdop_call": 0,
        "osdop_watch": 0,
        "osdop_notify": 0,
        "osdop_src_cmpxattr": 0,
        "osdop_pgls": 0,
        "osdop_pgls_filter": 0,
        "osdop_other": 3,
        "linger_active": 0,
        "linger_send": 0,
        "linger_resend": 0,
        "linger_ping": 0,
        "poolop_active": 0,
        "poolop_send": 0,
        "poolop_resend": 0,
        "poolstat_active": 0,
        "poolstat_send": 0,
        "poolstat_resend": 0,
        "statfs_active": 0,
        "statfs_send": 0,
        "statfs_resend": 0,
        "command_active": 0,
        "command_send": 0,
        "command_resend": 0,
        "map_epoch": 468,
        "map_full": 0,
        "map_inc": 39,
        "osd_sessions": 3,
        "osd_session_open": 479,
        "osd_session_close": 476,
        "osd_laggy": 0,
        "omap_wr": 7,
        "omap_rd": 202074,
        "omap_del": 1
    },
    "purge_queue": {
        "pq_executing_ops": 0,
        "pq_executing": 0,
        "pq_executed": 124
    },
    "throttle-msgr_dispatch_throttler-mds": {
        "val": 0,
        "max": 104857600,
        "get_started": 0,
        "get": 6140428,
        "get_sum": 2077944682,
        "get_or_fail_fail": 0,
        "get_or_fail_success": 6140428,
        "take": 0,
        "take_sum": 0,
        "put": 6140428,
        "put_sum": 2077944682,
        "wait": {
            "avgcount": 0,
            "sum": 0.000000000,
            "avgtime": 0.000000000
        }
    },
    "throttle-objecter_bytes": {
        "val": 0,
        "max": 104857600,
        "get_started": 0,
        "get": 0,
        "get_sum": 0,
        "get_or_fail_fail": 0,
        "get_or_fail_success": 0,
        "take": 136767,
        "take_sum": 339484250,
        "put": 136523,
        "put_sum": 339484250,
        "wait": {
            "avgcount": 0,
            "sum": 0.000000000,
            "avgtime": 0.000000000
        }
    },
    "throttle-objecter_ops": {
        "val": 0,
        "max": 1024,
        "get_started": 0,
        "get": 0,
        "get_sum": 0,
        "get_or_fail_fail": 0,
        "get_or_fail_success": 0,
        "take": 136767,
        "take_sum": 136767,
        "put": 136767,
        "put_sum": 136767,
        "wait": {
            "avgcount": 0,
            "sum": 0.000000000,
            "avgtime": 0.000000000
        }
    },
    "throttle-write_buf_throttle": {
        "val": 0,
        "max": 3758096384,
        "get_started": 0,
        "get": 124,
        "get_sum": 11532,
        "get_or_fail_fail": 0,
        "get_or_fail_success": 124,
        "take": 0,
        "take_sum": 0,
        "put": 109,
        "put_sum": 11532,
        "wait": {
            "avgcount": 0,
            "sum": 0.000000000,
            "avgtime": 0.000000000
        }
    },
    "throttle-write_buf_throttle-0x55faf5ba4220": {
        "val": 0,
        "max": 3758096384,
        "get_started": 0,
        "get": 125666,
        "get_sum": 198900816,
        "get_or_fail_fail": 0,
        "get_or_fail_success": 125666,
        "take": 0,
        "take_sum": 0,
        "put": 23473,
        "put_sum": 198900816,
        "wait": {
            "avgcount": 0,
            "sum": 0.000000000,
            "avgtime": 0.000000000
        }
    }
}
----------------------------------------------------------------------------------------------



dump_mempools
----------------------------------------------------------------------------------------------
{
    "bloom_filter": {
        "items": 120,
        "bytes": 120
    },
    "bluestore_alloc": {
        "items": 0,
        "bytes": 0
    },
    "bluestore_cache_data": {
        "items": 0,
        "bytes": 0
    },
    "bluestore_cache_onode": {
        "items": 0,
        "bytes": 0
    },
    "bluestore_cache_other": {
        "items": 0,
        "bytes": 0
    },
    "bluestore_fsck": {
        "items": 0,
        "bytes": 0
    },
    "bluestore_txc": {
        "items": 0,
        "bytes": 0
    },
    "bluestore_writing_deferred": {
        "items": 0,
        "bytes": 0
    },
    "bluestore_writing": {
        "items": 0,
        "bytes": 0
    },
    "bluefs": {
        "items": 0,
        "bytes": 0
    },
    "buffer_anon": {
        "items": 96401,
        "bytes": 16010198
    },
    "buffer_meta": {
        "items": 1,
        "bytes": 88
    },
    "osd": {
        "items": 0,
        "bytes": 0
    },
    "osd_mapbl": {
        "items": 0,
        "bytes": 0
    },
    "osd_pglog": {
        "items": 0,
        "bytes": 0
    },
    "osdmap": {
        "items": 80,
        "bytes": 3296
    },
    "osdmap_mapping": {
        "items": 0,
        "bytes": 0
    },
    "pgmap": {
        "items": 0,
        "bytes": 0
    },
    "mds_co": {
        "items": 17604,
        "bytes": 2330840
    },
    "unittest_1": {
        "items": 0,
        "bytes": 0
    },
    "unittest_2": {
        "items": 0,
        "bytes": 0
    },
    "total": {
        "items": 114206,
        "bytes": 18344542
    }
}
-------------------------------------------------------------------------------------------------------------------


Sorry for my english!.


Greetings!!



El 23 jul. 2018 20:08, "Patrick Donnelly" <pdonn...@redhat.com> escribió:

On Mon, Jul 23, 2018 at 5:48 AM, Daniel Carrasco <d.carra...@i2tic.com>
wrote:
> Hi, thanks for your response.
>
> Clients are about 6, and 4 of them are the most of time on standby. Only
two
> are active servers that are serving the webpage. Also we've a varnish on
> front, so are not getting all the load (below 30% in PHP is not much).
> About the MDS cache, now I've the mds_cache_memory_limit at 8Mb.

What! Please post `ceph daemon mds.<name> config diff`,  `... perf
dump`, and `... dump_mempools `  from the server the active MDS is on.


> I've tested
> also 512Mb, but the CPU usage is the same and the MDS RAM usage grows up
to
> 15GB (on a 16Gb server it starts to swap and all fails). With 8Mb, at
least
> the memory usage is stable on less than 6Gb (now is using about 1GB of
RAM).

We've seen reports of possible memory leaks before and the potential
fixes for those were in 12.2.6. How fast does your MDS reach 15GB?
Your MDS cache size should be configured to 1-8GB (depending on your
preference) so it's disturbing to see you set it so low.


-- 
Patrick Donnelly
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to