[ceph-users] Re: [Suspicious newsletter] Re: RGW memory consumption

2021-08-14 Thread Szabo, Istvan (Agoda)
I’d say if it is not round-robined, same hosts will go to the same endpoints 
and you can end up in an unbalanced situation.

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

On 2021. Aug 14., at 16:57, mhnx  wrote:


Email received from the internet. If in doubt, don't click any link nor open 
any attachment !

 I use hardware loadbalancer. I think it was Round-robin but I'm not sure. When 
I listen requests, rgw usages seems equal.
What is your point by asking loadbalancer? If you think that the leaked rgw is 
using more its not the case you are looking for. I couldnt find any difference. 
Rgw's and hosts are identical. I've restarted that rgw and memory usage is 
gone. I will be watching all Rgw's to understand beter.
I give +1 vote, its probably memory leak.


14 Ağu 2021 Cmt 09:56 tarihinde Szabo, Istvan (Agoda) 
mailto:istvan.sz...@agoda.com>> şunu yazdı:
Are you using loadbalancer? Maybe you use source based balancing method?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

On 2021. Aug 13., at 16:14, mhnx 
mailto:morphinwith...@gmail.com>> wrote:

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hello Martin. I'm using 14.2.16 Our S3 usage is similar.
I have 10 rgw. They're running on OSD nodes.
9 RGW is between 3G and 5G. But One of my rgw is using 85G. I have 256G ram
and that's why I didn't see that before. Thanks for the warning.

But the question is Why 9 RGW is low and one of them very high? Weird...

My ceph.conf:

[radosgw.9]
rgw data = /var/lib/ceph/radosgw/ceph-radosgw.9
rgw zonegroup = xx
rgw zone = xxx
rgw zonegroup root pool = xx.root
rgw zone root pool = xx.root
host = HOST9
rgw dns name = s3..com
rgw frontends = beast port=8010
rgw user max buckets=999
log file = /var/log/ceph/radosgw.9.log
rgw run sync thread = false
rgw_dynamic_resharding = false



Martin Traxl mailto:martin.tr...@1und1.de>>, 13 Ağu 2021 
Cum, 14:53 tarihinde şunu
yazdı:

We are experiencing this behaviour eversince this cluster is productive
and gets "some load". We started with this cluster in May this year,
running Ceph 14.2.15 and already had this same issue. It just took a little
longer until all RAM was consumed, as the load was a little lower than it
is now.

This is my config diff (I stripped some hostnames/IPs):


{
   "diff": {
   "admin_socket": {
   "default": "$run_dir/$cluster-$name.$pid.$cctid.asok",
   "final":
"/var/run/ceph/ceph-client.rgw.#.882549.94336165049544.asok"
   },
   "bluefs_buffered_io": {
   "default": true,
   "file": true,
   "final": true
   },
   "cluster_network": {
   "default": "",
   "file": "#/26",
   "final": "#/26"
   },
   "daemonize": {
   "default": true,
   "override": false,
   "final": false
   },
   "debug_rgw": {
   "default": "1/5",
   "final": "1/5"
   },
   "filestore_fd_cache_size": {
   "default": 128,
   "file": 2048,
   "final": 2048
   },
   "filestore_op_threads": {
   "default": 2,
   "file": 8,
   "final": 8
   },
   "filestore_queue_max_ops": {
   "default": 50,
   "file": 100,
   "final": 100
   },
   "fsid": {
   "default": "----",
   "file": "#",
   "override": "#",
   "final": "#"
   },
   "keyring": {
   "default": "$rgw_data/keyring",
   "final": "/var/lib/ceph/radosgw/ceph-rgw.#/keyring"
   },
   "mon_host": {
   "default": "",
   "file": "# # #",
   "final": "# # #"
   },

   "mon_osd_down_out_interval": {
   "default": 600,
   "file": 1800,
   "final": 1800
   },
   "mon_osd_down_out_subtree_limit": {
   "default": "rack",
   "file": "host",
   "final": "host"
   },
   "mon_osd_initial_require_min_compat_client": {
   "default": "jewel",
   "file": "jewel",
   "final": "jewel"
   },
   "mon_osd_min_down_reporters": {
   "default": 2,
   "file": 2,
   "final": 2
   },
   "mon_osd_reporter_subtree_level": {
   "default": "host",
   "file": "host",
   "final": "host"
   },
   "ms_client_mode": {
   "default": "crc secure",
   "file": "secure",
   "final": 

[ceph-users] Re: [Suspicious newsletter] Re: RGW memory consumption

2021-08-14 Thread mhnx
 I use hardware loadbalancer. I think it was Round-robin but I'm not sure.
When I listen requests, rgw usages seems equal.
What is your point by asking loadbalancer? If you think that the leaked rgw
is using more its not the case you are looking for. I couldnt find any
difference. Rgw's and hosts are identical. I've restarted that rgw and
memory usage is gone. I will be watching all Rgw's to understand beter.
I give +1 vote, its probably memory leak.


14 Ağu 2021 Cmt 09:56 tarihinde Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com> şunu yazdı:

> Are you using loadbalancer? Maybe you use source based balancing method?
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> ---
>
> On 2021. Aug 13., at 16:14, mhnx  wrote:
>
> Email received from the internet. If in doubt, don't click any link nor
> open any attachment !
> 
>
> Hello Martin. I'm using 14.2.16 Our S3 usage is similar.
> I have 10 rgw. They're running on OSD nodes.
> 9 RGW is between 3G and 5G. But One of my rgw is using 85G. I have 256G ram
> and that's why I didn't see that before. Thanks for the warning.
>
> But the question is Why 9 RGW is low and one of them very high? Weird...
>
> My ceph.conf:
>
> [radosgw.9]
> rgw data = /var/lib/ceph/radosgw/ceph-radosgw.9
> rgw zonegroup = xx
> rgw zone = xxx
> rgw zonegroup root pool = xx.root
> rgw zone root pool = xx.root
> host = HOST9
> rgw dns name = s3..com
> rgw frontends = beast port=8010
> rgw user max buckets=999
> log file = /var/log/ceph/radosgw.9.log
> rgw run sync thread = false
> rgw_dynamic_resharding = false
>
>
>
> Martin Traxl , 13 Ağu 2021 Cum, 14:53 tarihinde
> şunu
> yazdı:
>
> We are experiencing this behaviour eversince this cluster is productive
>
> and gets "some load". We started with this cluster in May this year,
>
> running Ceph 14.2.15 and already had this same issue. It just took a little
>
> longer until all RAM was consumed, as the load was a little lower than it
>
> is now.
>
>
> This is my config diff (I stripped some hostnames/IPs):
>
>
>
> {
>
>"diff": {
>
>"admin_socket": {
>
>"default": "$run_dir/$cluster-$name.$pid.$cctid.asok",
>
>"final":
>
> "/var/run/ceph/ceph-client.rgw.#.882549.94336165049544.asok"
>
>},
>
>"bluefs_buffered_io": {
>
>"default": true,
>
>"file": true,
>
>"final": true
>
>},
>
>"cluster_network": {
>
>"default": "",
>
>"file": "#/26",
>
>"final": "#/26"
>
>},
>
>"daemonize": {
>
>"default": true,
>
>"override": false,
>
>"final": false
>
>},
>
>"debug_rgw": {
>
>"default": "1/5",
>
>"final": "1/5"
>
>},
>
>"filestore_fd_cache_size": {
>
>"default": 128,
>
>"file": 2048,
>
>"final": 2048
>
>},
>
>"filestore_op_threads": {
>
>"default": 2,
>
>"file": 8,
>
>"final": 8
>
>},
>
>"filestore_queue_max_ops": {
>
>"default": 50,
>
>"file": 100,
>
>"final": 100
>
>},
>
>"fsid": {
>
>"default": "----",
>
>"file": "#",
>
>"override": "#",
>
>"final": "#"
>
>},
>
>"keyring": {
>
>"default": "$rgw_data/keyring",
>
>"final": "/var/lib/ceph/radosgw/ceph-rgw.#/keyring"
>
>},
>
>"mon_host": {
>
>"default": "",
>
>"file": "# # #",
>
>"final": "# # #"
>
>},
>
>
>"mon_osd_down_out_interval": {
>
>"default": 600,
>
>"file": 1800,
>
>"final": 1800
>
>},
>
>"mon_osd_down_out_subtree_limit": {
>
>"default": "rack",
>
>"file": "host",
>
>"final": "host"
>
>},
>
>"mon_osd_initial_require_min_compat_client": {
>
>"default": "jewel",
>
>"file": "jewel",
>
>"final": "jewel"
>
>},
>
>"mon_osd_min_down_reporters": {
>
>"default": 2,
>
>"file": 2,
>
>"final": 2
>
>},
>
>"mon_osd_reporter_subtree_level": {
>
>"default": "host",
>
>"file": "host",
>
>"final": "host"
>
>},
>
>"ms_client_mode": {
>
>"default": "crc secure",
>
>"file": "secure",
>
>"final": "secure"
>
>},
>
>"ms_cluster_mode": {
>
>"default": "crc secure",
>
>"file": "secure",
>
>"final": "secure"
>
>},
>
>"ms_mon_client_mode": {
>
>  

[ceph-users] Re: [Suspicious newsletter] Re: RGW memory consumption

2021-08-14 Thread Szabo, Istvan (Agoda)
Are you using loadbalancer? Maybe you use source based balancing method?

Istvan Szabo
Senior Infrastructure Engineer
---
Agoda Services Co., Ltd.
e: istvan.sz...@agoda.com
---

On 2021. Aug 13., at 16:14, mhnx  wrote:

Email received from the internet. If in doubt, don't click any link nor open 
any attachment !


Hello Martin. I'm using 14.2.16 Our S3 usage is similar.
I have 10 rgw. They're running on OSD nodes.
9 RGW is between 3G and 5G. But One of my rgw is using 85G. I have 256G ram
and that's why I didn't see that before. Thanks for the warning.

But the question is Why 9 RGW is low and one of them very high? Weird...

My ceph.conf:

[radosgw.9]
rgw data = /var/lib/ceph/radosgw/ceph-radosgw.9
rgw zonegroup = xx
rgw zone = xxx
rgw zonegroup root pool = xx.root
rgw zone root pool = xx.root
host = HOST9
rgw dns name = s3..com
rgw frontends = beast port=8010
rgw user max buckets=999
log file = /var/log/ceph/radosgw.9.log
rgw run sync thread = false
rgw_dynamic_resharding = false



Martin Traxl , 13 Ağu 2021 Cum, 14:53 tarihinde şunu
yazdı:

We are experiencing this behaviour eversince this cluster is productive
and gets "some load". We started with this cluster in May this year,
running Ceph 14.2.15 and already had this same issue. It just took a little
longer until all RAM was consumed, as the load was a little lower than it
is now.

This is my config diff (I stripped some hostnames/IPs):


{
   "diff": {
   "admin_socket": {
   "default": "$run_dir/$cluster-$name.$pid.$cctid.asok",
   "final":
"/var/run/ceph/ceph-client.rgw.#.882549.94336165049544.asok"
   },
   "bluefs_buffered_io": {
   "default": true,
   "file": true,
   "final": true
   },
   "cluster_network": {
   "default": "",
   "file": "#/26",
   "final": "#/26"
   },
   "daemonize": {
   "default": true,
   "override": false,
   "final": false
   },
   "debug_rgw": {
   "default": "1/5",
   "final": "1/5"
   },
   "filestore_fd_cache_size": {
   "default": 128,
   "file": 2048,
   "final": 2048
   },
   "filestore_op_threads": {
   "default": 2,
   "file": 8,
   "final": 8
   },
   "filestore_queue_max_ops": {
   "default": 50,
   "file": 100,
   "final": 100
   },
   "fsid": {
   "default": "----",
   "file": "#",
   "override": "#",
   "final": "#"
   },
   "keyring": {
   "default": "$rgw_data/keyring",
   "final": "/var/lib/ceph/radosgw/ceph-rgw.#/keyring"
   },
   "mon_host": {
   "default": "",
   "file": "# # #",
   "final": "# # #"
   },

   "mon_osd_down_out_interval": {
   "default": 600,
   "file": 1800,
   "final": 1800
   },
   "mon_osd_down_out_subtree_limit": {
   "default": "rack",
   "file": "host",
   "final": "host"
   },
   "mon_osd_initial_require_min_compat_client": {
   "default": "jewel",
   "file": "jewel",
   "final": "jewel"
   },
   "mon_osd_min_down_reporters": {
   "default": 2,
   "file": 2,
   "final": 2
   },
   "mon_osd_reporter_subtree_level": {
   "default": "host",
   "file": "host",
   "final": "host"
   },
   "ms_client_mode": {
   "default": "crc secure",
   "file": "secure",
   "final": "secure"
   },
   "ms_cluster_mode": {
   "default": "crc secure",
   "file": "secure",
   "final": "secure"
   },
   "ms_mon_client_mode": {
   "default": "secure crc",
   "file": "secure",
   "final": "secure"
   },
   "ms_mon_cluster_mode": {
   "default": "secure crc",
   "file": "secure",
   "final": "secure"
   },
   "ms_mon_service_mode": {
   "default": "secure crc",
   "file": "secure",
   "final": "secure"
   },

   "ms_service_mode": {
   "default": "crc secure",
   "file": "secure",
   "final": "secure"
   },
   "objecter_inflight_ops": {
   "default": 24576,
   "final": 24576
   },
   "osd_backfill_scan_max": {
   "default": 512,
   "file": 16,
   "final": 16
   },
   "osd_backfill_scan_min": {
   "default": 64,
   "file": 8,
   "final": 8
   },
   "osd_deep_scrub_stride": {
   "default": "524288",
   "file": "1048576",
   "final": "1048576"
   },