[ceph-users] Re: Some of the EC pools (default.rgw.buckets.data) are PG down, making it impossible to connect to rgw.

nagata3333...@fujitsu.com Fri, 22 Oct 2021 03:49:52 -0700

Hi all,
When accessing the object store service with high traffic, PG down will cause 
the rgw connection
 to be occupied by references to the failed object and The rgw connection 
becomes unavailable.
Is there a workaround to solve this phenomenon?


system-graph(repost)
                                           default.rgw.buckets.data(4k+2m 
size=6 min_size=5)
+---+           +----+           +-----+    +-pg1----------------------------+
USER+-s3cmd--->-+RGW +-get Obj->-+ceph +--+-+*osd1,*osd2,*osd3,osd4,osd5,osd6| 
DOWN
+---+           +----+           +-----+  | +--------------------------------+
               x100 connection pool       |
               (100 workers)              | +-pg2----------------------------+
                                          +-+osd7,osd8,osd9,osd10,osd11,osd12| 
ACTIVE
                                          | +--------------------------------+
                                                        ;;

PROBLEM CONDITION
 *The number of connections for rgw is defined by the default num_threads value 
of 100.
 (rgw can send and receive 100 simultaneous I/O requests.)
 *pg1 goes down due to osd failure (osd.1, osd.2, osd.3).
 *A fetch request to pg down will wait until the pg is restored.
 *This occupies one connection of rgw.
  Repeat
 *Eventually, num_threads (default=100) is exceeded, and the client cannot 
connect to rgw.

PROBLEM
I think it's a problem that PG down takes up all rgw connections.
I tried increasing num_threads (rgw_thread_pool_size) and placing multiple 
rgw's, but they were not useful in high traffic environments.

Since rgw only retrieves objects, no one knows that pg1 was down on the request 
to ceph.
Is there any way to detect pg down earlier and finish the get Obj process?

-Tsuyoshi.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Some of the EC pools (default.rgw.buckets.data) are PG down, making it impossible to connect to rgw.

Reply via email to