Hi list,

I'm currently implementing a sync between ceph and a minio cluster to 
continously sync the buckets and objects to an offsite location. I followed the 
guide on https://croit.io/blog/setting-up-ceph-cloud-sync-module

After the sync starts it successfully creates the first bucket, but somehow 
tries over and over again to create the bucket instead of adding the objects 
itself. This is from the minio logs:

------------
2022-11-21T10:20:55.776 [200 OK] s3.PutBucket 
[2a02::...]:9000/rgw-default-61727a643fba391a [2a02::...]              1.592ms  
    ↑ 78 B ↓ 0 B
2022-11-21T10:20:55.778 [409 Conflict] s3.PutBucket 
[2a02::...]:9000/rgw-default-61727a643fba391a [2a02::...]              649µs    
   ↑ 78 B ↓ 386 B
repeats over and over again
------------


This is my cloud sync config:
------------
{
    "id": "7185f1a9-f33b-41d3-8906-634ac096d4a9",
    "name": "backup",
    "domain_root": "backup.rgw.meta:root",
    "control_pool": "backup.rgw.control",
    "gc_pool": "backup.rgw.log:gc",
    "lc_pool": "backup.rgw.log:lc",
    "log_pool": "backup.rgw.log",
    "intent_log_pool": "backup.rgw.log:intent",
    "usage_log_pool": "backup.rgw.log:usage",
    "roles_pool": "backup.rgw.meta:roles",
    "reshard_pool": "backup.rgw.log:reshard",
    "user_keys_pool": "backup.rgw.meta:users.keys",
    "user_email_pool": "backup.rgw.meta:users.email",
    "user_swift_pool": "backup.rgw.meta:users.swift",
    "user_uid_pool": "backup.rgw.meta:users.uid",
    "otp_pool": "backup.rgw.otp",
    "system_key": {
        "access_key": "<REDACTED>",
        "secret_key": "<REDACTED>"
    },
    "placement_pools": [
        {
            "key": "default-placement",
            "val": {
                "index_pool": "backup.rgw.buckets.index",
                "storage_classes": {
                    "STANDARD": {
                        "data_pool": "backup.rgw.buckets.data"
                    }
                },
                "data_extra_pool": "backup.rgw.buckets.non-ec",
                "index_type": 0
            }
        }
    ],
    "tier_config": {
        "connection": {
            "access_key": "<REDACTED>",
            "endpoint": "http://[<REDACTED>]:9000",
            "secret": "<REDACTED>"
        }
    },
    "realm_id": "d1e9f0cd-c965-44c6-a4bd-b7704cab9c4e",
    "notif_pool": "backup.rgw.log:notif"
}

------------

This is the sync status:
------------
# radosgw-admin sync status --rgw-zone=backup
          realm d1e9f0cd-c965-44c6-a4bd-b7704cab9c4e (defaultrealm)
      zonegroup cee2848e-368f-45d3-8310-caab37b022a7 (default)
           zone 7185f1a9-f33b-41d3-8906-634ac096d4a9 (backup)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: a14cce61-8951-438f-89f6-4e65637e2941 (default)
                        syncing
                        full sync: 128/128 shards
                        full sync: 3404 buckets to sync
                        incremental sync: 0/128 shards
                        data is behind on 128 shards
                        behind shards: 
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127]
                        1 shards are recovering
                        recovering shards: [18]
------------
This is the output on the receiving RGW

Nov 21 10:52:15 <hostname> radosgw[510538]: rgw rados thread: AWS: creating 
bucket rgw-default-61727a643fba391a
Nov 21 10:52:15 <hostname> radosgw[510538]: rgw rados thread: AWS: download 
begin: z=a14cce61-8951-438f-89f6-4e65637e2941 
b=:<REDACTED>[a14cce61-8951-438f-89f6-4e65637e2941.28999987.12]) 
k=0002cb99-3faa-42e1-a760-364c4ffba982 size=23537 
mtime=2022-04-03T14:48:06.449064+0000 etag=0325bf0901634ce13405bea67767b8f4 
zone_short_id=0 pg_ver=744888
Nov 21 10:52:15 <hostname> radosgw[510538]: rgw rados thread: AWS: creating 
bucket rgw-default-61727a643fba391a
Nov 21 10:52:15 <hostname> radosgw[510538]: failed to wait for op, ret=-39: PUT 
http://[<REDACTED>]:9000/rgw-default-61727a643fba391a
------------

After some retries the receiving RGW crashes with the following message
------------
Nov 21 10:52:15 prod-backup-201.ceph.plusline.net radosgw[510538]: *** Caught 
signal (Segmentation fault) **
                                                                    in thread 
7f8cff660700 thread_name:data-sync
                                                                   
                                                                    ceph 
version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable)
                                                                    1: 
/lib64/libpthread.so.0(+0x12ce0) [0x7f8d29e6ece0]
                                                                    2: 
/lib64/libc.so.6(+0xcfd02) [0x7f8d28540d02]
                                                                    3: 
(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> 
>::_M_assign(std::__cxx11::basic_string<char, std::char_traits<char>, 
std::allocator<char> > const&)+0x95) [0x7f8d2cae2ef5]
                                                                    4: 
(RGWAWSStreamObjToCloudPlainCR::operate(DoutPrefixProvider const*)+0x255) 
[0x7f8d2ce581f5]
                                                                    5: 
(RGWCoroutinesStack::operate(DoutPrefixProvider const*, 
RGWCoroutinesEnv*)+0x15c) [0x7f8d2cefe03c]
                                                                    6: 
(RGWCoroutinesManager::run(DoutPrefixProvider const*, 
std::__cxx11::list<RGWCoroutinesStack*, std::allocator<RGWCoroutinesStack*> 
>&)+0x296) [0x7f8d2cefee66]
                                                                    7: 
(RGWCoroutinesManager::run(DoutPrefixProvider const*, RGWCoroutine*)+0x91) 
[0x7f8d2cf00131]
                                                                    8: 
(RGWRemoteDataLog::run_sync(DoutPrefixProvider const*, int)+0x1b4) 
[0x7f8d2cdef104]
                                                                    9: 
(RGWDataSyncProcessorThread::process(DoutPrefixProvider const*)+0x59) 
[0x7f8d2cfcfde9]
                                                                    10: 
(RGWRadosThread::Worker::entry()+0x13a) [0x7f8d2cf8e30a]
                                                                    11: 
/lib64/libpthread.so.0(+0x81ca) [0x7f8d29e641ca]
                                                                    12: clone()
                                                                    NOTE: a 
copy of the executable, or `objdump -rdS <executable>` is needed to interpret 
this.
------------

I can also see that the objects are successfully being requested on the 
originating RGW servers. Any idea of the root cause would be welcome.

Best regards,
Matthias
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to