[ceph-users] Re: Client kernel crashes on cephfs access

2024-04-08 Thread Xiubo Li

Hi Marc,

Thanks for reporting this, I generated one patch to fix it. Will send it 
out after testing is done.


- Xiubo

On 4/8/24 16:01, Marc Ruhmann wrote:

Hi everyone,

I would like to ask for help regarding client kernel crashes that happen
on cephfs access. We have been struggling with this for over a month now
with over 100 crashes on 7 hosts during that time.

Our cluster runs version 18.2.1. Our clients run CentOS Stream.

On CentOS Stream 9 the problem started with kernel version
5.14.0-425.el9. Version 5.14.0-419.el9 is the last one without problems.
It also occurred on CentOS Stream 8, starting with version
4.18.0-546.el8 (4.18.0-544.el8 being the last good one).

The problem presents itself by the client kernel crashing, forcing a
reboot of the machine. Apparently it is triggered by a certain level of
IO on the cephfs mount. It works perfectly fine when we rollback to the
last good kernel version.

The exact call trace in vmcore-dmesg.txt differs between occurrences.
Here are two typical examples:

```
[ 8641.382499] list_del corruption. next->prev should be 
88bd0a4d4c80, but was 88bcefdfd280

[ 8641.382521] [ cut here ]
[ 8641.382521] kernel BUG at lib/list_debug.c:54!
[ 8641.382528] invalid opcode:  [#1] PREEMPT SMP NOPTI
[ 8641.382591] CPU: 2 PID: 83929 Comm: kworker/2:0 Kdump: loaded Not 
tainted 5.14.0-432.el9.x86_64 #1
[ 8641.382610] Hardware name: oVirt RHEL/RHEL-AV, BIOS 
edk2-20230524-4.el9_3 05/24/2023

[ 8641.382624] Workqueue: ceph-cap ceph_cap_unlink_work [ceph]
[ 8641.382662] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[ 8641.382681] Code: c7 c7 78 42 d8 b1 e8 f9 87 fe ff 0f 0b 48 89 fe 
48 c7 c7 08 43 d8 b1 e8 e8 87 fe ff 0f 0b 48 c7 c7 b8 43 d8 b1 e8 da 
87 fe ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 78 43 d8 b1 e8 c6 87 fe ff 
0f 0b

[ 8641.382711] RSP: 0018:95a000d6be60 EFLAGS: 00010246
[ 8641.382722] RAX: 0054 RBX: 88bced76dc00 RCX: 

[ 8641.382734] RDX:  RSI: 88c02eea0840 RDI: 
88c02eea0840
[ 8641.382746] RBP: 88bd0a4d4c80 R08: 80008434 R09: 
0010
[ 8641.382758] R10: 000f R11: 000f R12: 
88c02eeb2800
[ 8641.382779] R13: 88bcc4610258 R14: 88bcc46101b8 R15: 
88bcc46101c8
[ 8641.382793] FS:  () GS:88c02ee8() 
knlGS:

[ 8641.382809] CS:  0010 DS:  ES:  CR0: 80050033
[ 8641.382819] CR2: 7f35cee8a000 CR3: 000105708004 CR4: 
007706e0

[ 8641.382832] PKRU: 5554
[ 8641.382838] Call Trace:
[ 8641.382844]  
[ 8641.382850]  ? show_trace_log_lvl+0x1c4/0x2df
[ 8641.382860]  ? show_trace_log_lvl+0x1c4/0x2df
[ 8641.382870]  ? ceph_cap_unlink_work+0x3f/0x140 [ceph]
[ 8641.382893]  ? __die_body.cold+0x8/0xd
[ 8641.382902]  ? die+0x2b/0x50
[ 8641.382911]  ? do_trap+0xce/0x120
[ 8641.382919]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382930]  ? do_error_trap+0x65/0x80
[ 8641.382938]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382948]  ? exc_invalid_op+0x4e/0x70
[ 8641.382958]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382975]  ? asm_exc_invalid_op+0x16/0x20
[ 8641.382988]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382998]  ceph_cap_unlink_work+0x3f/0x140 [ceph]
[ 8641.383021]  process_one_work+0x1e2/0x3b0
[ 8641.383032]  ? __pfx_worker_thread+0x10/0x10
[ 8641.383043]  worker_thread+0x50/0x3a0
[ 8641.383051]  ? __pfx_worker_thread+0x10/0x10
[ 8641.383061]  kthread+0xdd/0x100
[ 8641.383069]  ? __pfx_kthread+0x10/0x10
[ 8641.383078]  ret_from_fork+0x29/0x50
[ 8641.383090]  
[ 8641.383095] Modules linked in: tls ceph libceph dns_resolver 
fscache netfs nft_counter ipt_REJECT xt_owner xt_conntrack nft_compat 
nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet 
nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat 
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables 
libcrc32c nfnetlink vfat fat intel_rapl_msr intel_rapl_common 
intel_uncore_frequency_common isst_if_common nfit virtio_gpu iTCO_wdt 
iTCO_vendor_support libnvdimm lpc_ich virtio_dma_buf drm_shmem_helper 
drm_kms_helper i2c_i801 rapl syscopyarea sysfillrect sysimgblt 
virtio_balloon fb_sys_fops i2c_smbus pcspkr joydev fuse drm ext4 
mbcache jbd2 sr_mod cdrom sd_mod ahci t10_pi sg libahci 
crct10dif_pclmul crc32_pclmul crc32c_intel libata ghash_clmulni_intel 
virtio_net virtio_console virtio_scsi net_failover failover serio_raw

```

```
[ 3538.365469] list_del corruption. next->prev should be 
8d2b75997c80, but was 8d2afcfaae80

[ 3538.365488] [ cut here ]
[ 3538.365488] kernel BUG at lib/list_debug.c:54!
[ 3538.365493] invalid opcode:  [#1] PREEMPT SMP NOPTI
[ 3538.365553] CPU: 0 PID: 910 Comm: php-fpm Kdump: loaded Not tainted 
5.14.0-432.el9.x86_64 #1
[ 3538.365569] Hardware name: oVirt RHEL/RHEL-AV, BIOS 
edk2-20230524-4.el9_3 05/24/2023

[ 3538.365582] RIP: 

[ceph-users] Re: Ceph Leadership Team Meeting, 2024-04-08

2024-04-08 Thread Satoru Takeuchi
2024年4月9日(火) 8:06 Laura Flores :

> I've added them!
>

Thank you very much for your quick response!


>
> cc @Yuri Weinstein 
>
> On Mon, Apr 8, 2024 at 5:39 PM Satoru Takeuchi 
> wrote:
>
>> 2024年4月9日(火) 0:43 Laura Flores :
>>
>>> Hi all,
>>>
>>> Today we discussed:
>>>
>>> 2024/04/08
>>>
>>>- [Zac] CQ#4 is going out this week -
>>>https://pad.ceph.com/p/ceph_quarterly_2024_04
>>>
>>>
>>>- Last chance to review!
>>>
>>>
>>>- [Zac] IcePic Initiative - context-sensitive help - do we regard
>>>the docs as a part of the online help?
>>>
>>>
>>>- https://pad.ceph.com/p/2024_04_08_cephadm_context_sensitive_help
>>>
>>>
>>>- docs.ceph.com should be main source of truth; can link to this or
>>>reference it generally as "see docs.ceph.com"
>>>
>>>
>>>- Squid RC status
>>>
>>>
>>>- Blockers tracked in: https://pad.ceph.com/p/squid-upgrade-failures
>>>
>>>
>>>- rgw: topic changes merged to main, but introduced some test
>>>failures. account changes blocked on topics
>>>
>>>
>>>- Non-blocker for RC0
>>>
>>>
>>>- centos 9 containerization (status unknown?)
>>>
>>>
>>>- Non-blocker for RC0
>>>
>>>
>>>- Follow up with Dan / Guillaume
>>>
>>>
>>>- RADOS has one outstanding blocker awaiting QA
>>>
>>>
>>>- Failing to register new account at Ceph tracker - error 404.
>>>
>>>
>>>- Likely related to Redmine upgrade over the weekend
>>>
>>>
>>>- Pacific eol:
>>>
>>>
>>>- Action item: in https://docs.ceph.com/en/latest/releases/, move to
>>>"archived"
>>>
>>>
>>>- 18.2.3
>>>
>>>
>>>- one or two PRs from cephfs left
>>>
>>>
>>>- Milestone: https://github.com/ceph/ceph/milestone/19
>>>
>>>
>> Could you add the following PRs to 18.2.3 milestone? Without these PRs,
>> debian package users
>> can't use metrics from ceph-exporter at all.
>>
>> reef: debian: add ceph-exporter package #56541
>> https://github.com/ceph/ceph/pull/56541
>>
>> reef: debian: add missing bcrypt to ceph-mgr .requires to fix resulting
>> package dependencies #54662
>> https://github.com/ceph/ceph/pull/54662
>>
>
>>
>> Thanks,
>> Satoru
>>
>>
>>>
>>> Thanks,
>>> Laura
>>> --
>>>
>>> Laura Flores
>>>
>>> She/Her/Hers
>>>
>>> Software Engineer, Ceph Storage 
>>>
>>> Chicago, IL
>>>
>>> lflo...@ibm.com | lflo...@redhat.com 
>>> M: +17087388804
>>>
>>>
>>> ___
>>> Dev mailing list -- d...@ceph.io
>>> To unsubscribe send an email to dev-le...@ceph.io
>>>
>>
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Leadership Team Meeting, 2024-04-08

2024-04-08 Thread Laura Flores
I've added them!

cc @Yuri Weinstein 

On Mon, Apr 8, 2024 at 5:39 PM Satoru Takeuchi 
wrote:

> 2024年4月9日(火) 0:43 Laura Flores :
>
>> Hi all,
>>
>> Today we discussed:
>>
>> 2024/04/08
>>
>>- [Zac] CQ#4 is going out this week -
>>https://pad.ceph.com/p/ceph_quarterly_2024_04
>>
>>
>>- Last chance to review!
>>
>>
>>- [Zac] IcePic Initiative - context-sensitive help - do we regard the
>>docs as a part of the online help?
>>
>>
>>- https://pad.ceph.com/p/2024_04_08_cephadm_context_sensitive_help
>>
>>
>>- docs.ceph.com should be main source of truth; can link to this or
>>reference it generally as "see docs.ceph.com"
>>
>>
>>- Squid RC status
>>
>>
>>- Blockers tracked in: https://pad.ceph.com/p/squid-upgrade-failures
>>
>>
>>- rgw: topic changes merged to main, but introduced some test
>>failures. account changes blocked on topics
>>
>>
>>- Non-blocker for RC0
>>
>>
>>- centos 9 containerization (status unknown?)
>>
>>
>>- Non-blocker for RC0
>>
>>
>>- Follow up with Dan / Guillaume
>>
>>
>>- RADOS has one outstanding blocker awaiting QA
>>
>>
>>- Failing to register new account at Ceph tracker - error 404.
>>
>>
>>- Likely related to Redmine upgrade over the weekend
>>
>>
>>- Pacific eol:
>>
>>
>>- Action item: in https://docs.ceph.com/en/latest/releases/, move to
>>"archived"
>>
>>
>>- 18.2.3
>>
>>
>>- one or two PRs from cephfs left
>>
>>
>>- Milestone: https://github.com/ceph/ceph/milestone/19
>>
>>
> Could you add the following PRs to 18.2.3 milestone? Without these PRs,
> debian package users
> can't use metrics from ceph-exporter at all.
>
> reef: debian: add ceph-exporter package #56541
> https://github.com/ceph/ceph/pull/56541
>
> reef: debian: add missing bcrypt to ceph-mgr .requires to fix resulting
> package dependencies #54662
> https://github.com/ceph/ceph/pull/54662
>
> Thanks,
> Satoru
>
>
>>
>> Thanks,
>> Laura
>> --
>>
>> Laura Flores
>>
>> She/Her/Hers
>>
>> Software Engineer, Ceph Storage 
>>
>> Chicago, IL
>>
>> lflo...@ibm.com | lflo...@redhat.com 
>> M: +17087388804
>>
>>
>> ___
>> Dev mailing list -- d...@ceph.io
>> To unsubscribe send an email to dev-le...@ceph.io
>>
>

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Leadership Team Meeting, 2024-04-08

2024-04-08 Thread Satoru Takeuchi
2024年4月9日(火) 0:43 Laura Flores :

> Hi all,
>
> Today we discussed:
>
> 2024/04/08
>
>- [Zac] CQ#4 is going out this week -
>https://pad.ceph.com/p/ceph_quarterly_2024_04
>
>
>- Last chance to review!
>
>
>- [Zac] IcePic Initiative - context-sensitive help - do we regard the
>docs as a part of the online help?
>
>
>- https://pad.ceph.com/p/2024_04_08_cephadm_context_sensitive_help
>
>
>- docs.ceph.com should be main source of truth; can link to this or
>reference it generally as "see docs.ceph.com"
>
>
>- Squid RC status
>
>
>- Blockers tracked in: https://pad.ceph.com/p/squid-upgrade-failures
>
>
>- rgw: topic changes merged to main, but introduced some test
>failures. account changes blocked on topics
>
>
>- Non-blocker for RC0
>
>
>- centos 9 containerization (status unknown?)
>
>
>- Non-blocker for RC0
>
>
>- Follow up with Dan / Guillaume
>
>
>- RADOS has one outstanding blocker awaiting QA
>
>
>- Failing to register new account at Ceph tracker - error 404.
>
>
>- Likely related to Redmine upgrade over the weekend
>
>
>- Pacific eol:
>
>
>- Action item: in https://docs.ceph.com/en/latest/releases/, move to
>"archived"
>
>
>- 18.2.3
>
>
>- one or two PRs from cephfs left
>
>
>- Milestone: https://github.com/ceph/ceph/milestone/19
>
>
Could you add the following PRs to 18.2.3 milestone? Without these PRs,
debian package users
can't use metrics from ceph-exporter at all.

reef: debian: add ceph-exporter package #56541
https://github.com/ceph/ceph/pull/56541

reef: debian: add missing bcrypt to ceph-mgr .requires to fix resulting
package dependencies #54662
https://github.com/ceph/ceph/pull/54662

Thanks,
Satoru


>
> Thanks,
> Laura
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dashboard and Object Gateway

2024-04-08 Thread Lawson, Nathan
I am running into this issue as well with a cephadm reef deploy, are there any 
updates?


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW/Lua script does not show logs

2024-04-08 Thread soyoon . lee
Hello, I wrote a Lua script in order to retrieve RGW logs such as bucket name, 
bucket owner, etc.
However, when I apply a lua script I wrote using the below command, I do not 
see any logs start with Lua: INFO

radosgw-admin script put --infile=/usr/tmp/testPreRequest.lua 
--context=postrequest


function print_bucket_log(bucket)
  RGWDebugLog("  Name: " .. bucket.Name)
end

if Request.Bucket then
  RGWDebugLog("bucket operation logs: ")
  print_bucket_log(Request.Bucket)
end


According to the official document regarding Lua Scripting, 
The RGWDebugLog() function accepts a string and prints it to the debug log with 
priority 20. Each log message is prefixed Lua INFO:. This function has no 
return value.
even though I set debug_rgw = 20, I do not see any logs.

However, if when I apply the below lua script with bucket.Id, I am getting Lua: 
ERROR like below:
Lua ERROR: [string "function print_bucket_log(bucket)..."]:3: attempt to 
concatenate a nil value (field 'Id')


function print_bucket_log(bucket)
  RGWDebugLog("  Name: " .. bucket.Name)
  RGWDebugLog("  Id: " .. bucket.Id)
end

if Request.Bucket then
  RGWDebugLog("bucket operation logs: ")
  print_bucket_log(Request.Bucket)
end


Any help would be very appreciated!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] feature_map differs across mon_status

2024-04-08 Thread Joel Davidow
Just curious why the feature_map portions differ in the return of mon_status 
across a cluster. Below is an example from each of five mons in a healthy 
16.2.10 cephadm cluster:

root@mon.d:~# ceph tell mon.a mon_status | jq .feature_map
{
  "mon": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 1
}
  ],
  "osd": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 42
}
  ],
  "client": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 1
}
  ],
  "mgr": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 1
}
  ]
}
root@mon.d:~# ceph tell mon.b mon_status | jq .feature_map
{
  "mon": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 1
}
  ],
  "osd": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 36
}
  ],
  "client": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 2
}
  ]
}
root@mon.d:~# ceph tell mon.c mon_status | jq .feature_map
{
  "mon": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 1
}
  ],
  "osd": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 81
}
  ],
  "client": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 4
}
  ],
  "mgr": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 2
}
  ]
}
root@mon.d:~# ceph tell mon.d mon_status | jq .feature_map
{
  "mon": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 1
}
  ],
  "osd": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 112
}
  ],
  "client": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 7
}
  ]
}
root@mon.d:~# ceph tell mon.e mon_status | jq .feature_map
{
  "mon": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 1
}
  ],
  "osd": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 88
}
  ],
  "client": [
{
  "features": "0x3f01cfb9fffd",
  "release": "luminous",
  "num": 4
}
  ]
}
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: DB/WALL and RGW index on the same NVME

2024-04-08 Thread Lukasz Borek
>
> My understanding is that omap and EC are incompatible, though.

 Reason why multipart upload is using a non-EC pool to save metadata to an
omap database?




On Mon, 8 Apr 2024 at 20:21, Anthony D'Atri  wrote:

> My understanding is that omap and EC are incompatible, though.
>
> > On Apr 8, 2024, at 09:46, David Orman  wrote:
> >
> > I would suggest that you might consider EC vs. replication for index
> data, and the latency implications. There's more than just the nvme vs.
> rotational discussion to entertain, especially if using the more widely
> spread EC modes like 8+3. It would be worth testing for your particular
> workload.
> >
> > Also make sure to factor in storage utilization if you expect to see
> versioning/object lock in use. This can be the source of a significant
> amount of additional consumption that isn't planned for initially.
> >
> > On Mon, Apr 8, 2024, at 01:42, Daniel Parkes wrote:
> >> Hi Lukasz,
> >>
> >> RGW uses Omap objects for the index pool; Omaps are stored in Rocksdb
> >> database of each osd, not on the actual index pool, so by putting
> DB/WALL
> >> on an NVMe as you mentioned, you are already configuring the index pool
> on
> >> a non-rotational drive, you don't need to do anything else.
> >>
> >> You just need to size your DB/WALL partition accordingly. For RGW/object
> >> storage, a good starting point for the DB/Wall sizing is 4%.
> >>
> >> Example of Omap entries in the index pool using 0 bytes, as they are
> stored
> >> in Rocksdb:
> >>
> >> # rados -p default.rgw.buckets.index listomapkeys
> >> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> >> file1
> >> file2
> >> file4
> >> file10
> >>
> >> rados df -p default.rgw.buckets.index
> >> POOL_NAME  USED  OBJECTS  CLONES  COPIES
> >> MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS   RD  WR_OPS  WR
> >> USED COMPR  UNDER COMPR
> >> default.rgw.buckets.index   0 B   11   0  33
> >>00 0 208  207 KiB  41  20 KiB 0 B
> >>0 B
> >>
> >> # rados -p default.rgw.buckets.index stat
> >> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> >>
> default.rgw.buckets.index/.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> >> mtime 2022-12-20T07:32:11.00-0500, size 0
> >>
> >>
> >> On Sun, Apr 7, 2024 at 10:06 PM Lukasz Borek 
> wrote:
> >>
> >>> Hi!
> >>>
> >>> I'm working on a POC cluster setup dedicated to backup app writing
> objects
> >>> via s3 (large objects, up to 1TB transferred via multipart upload
> process).
> >>>
> >>> Initial setup is 18 storage nodes (12HDDs + 1 NVME card for DB/WALL) +
> EC
> >>> pool.  Plan is to use cephadm.
> >>>
> >>> I'd like to follow good practice and put the RGW index pool on a
> >>> no-rotation drive. Question is how to do it?
> >>>
> >>>   - replace a few HDDs (1 per node) with a SSD (how many? 4-6-8?)
> >>>   - reserve space on NVME drive on each node, create lv based OSD and
> let
> >>>   rgb index use the same NVME drive as DB/WALL
> >>>
> >>> Thoughts?
> >>>
> >>> --
> >>> Lukasz
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>>
> >>>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
Łukasz Borek
luk...@borek.org.pl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: DB/WALL and RGW index on the same NVME

2024-04-08 Thread Anthony D'Atri
My understanding is that omap and EC are incompatible, though.

> On Apr 8, 2024, at 09:46, David Orman  wrote:
> 
> I would suggest that you might consider EC vs. replication for index data, 
> and the latency implications. There's more than just the nvme vs. rotational 
> discussion to entertain, especially if using the more widely spread EC modes 
> like 8+3. It would be worth testing for your particular workload.
> 
> Also make sure to factor in storage utilization if you expect to see 
> versioning/object lock in use. This can be the source of a significant amount 
> of additional consumption that isn't planned for initially.
> 
> On Mon, Apr 8, 2024, at 01:42, Daniel Parkes wrote:
>> Hi Lukasz,
>> 
>> RGW uses Omap objects for the index pool; Omaps are stored in Rocksdb
>> database of each osd, not on the actual index pool, so by putting DB/WALL
>> on an NVMe as you mentioned, you are already configuring the index pool on
>> a non-rotational drive, you don't need to do anything else.
>> 
>> You just need to size your DB/WALL partition accordingly. For RGW/object
>> storage, a good starting point for the DB/Wall sizing is 4%.
>> 
>> Example of Omap entries in the index pool using 0 bytes, as they are stored
>> in Rocksdb:
>> 
>> # rados -p default.rgw.buckets.index listomapkeys
>> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
>> file1
>> file2
>> file4
>> file10
>> 
>> rados df -p default.rgw.buckets.index
>> POOL_NAME  USED  OBJECTS  CLONES  COPIES
>> MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS   RD  WR_OPS  WR
>> USED COMPR  UNDER COMPR
>> default.rgw.buckets.index   0 B   11   0  33
>>00 0 208  207 KiB  41  20 KiB 0 B
>>0 B
>> 
>> # rados -p default.rgw.buckets.index stat
>> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
>> default.rgw.buckets.index/.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
>> mtime 2022-12-20T07:32:11.00-0500, size 0
>> 
>> 
>> On Sun, Apr 7, 2024 at 10:06 PM Lukasz Borek  wrote:
>> 
>>> Hi!
>>> 
>>> I'm working on a POC cluster setup dedicated to backup app writing objects
>>> via s3 (large objects, up to 1TB transferred via multipart upload process).
>>> 
>>> Initial setup is 18 storage nodes (12HDDs + 1 NVME card for DB/WALL) + EC
>>> pool.  Plan is to use cephadm.
>>> 
>>> I'd like to follow good practice and put the RGW index pool on a
>>> no-rotation drive. Question is how to do it?
>>> 
>>>   - replace a few HDDs (1 per node) with a SSD (how many? 4-6-8?)
>>>   - reserve space on NVME drive on each node, create lv based OSD and let
>>>   rgb index use the same NVME drive as DB/WALL
>>> 
>>> Thoughts?
>>> 
>>> --
>>> Lukasz
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>> 
>>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting, 2024-04-08

2024-04-08 Thread Laura Flores
Hi all,

Today we discussed:

2024/04/08

   - [Zac] CQ#4 is going out this week -
   https://pad.ceph.com/p/ceph_quarterly_2024_04


   - Last chance to review!


   - [Zac] IcePic Initiative - context-sensitive help - do we regard the
   docs as a part of the online help?


   - https://pad.ceph.com/p/2024_04_08_cephadm_context_sensitive_help


   - docs.ceph.com should be main source of truth; can link to this or
   reference it generally as "see docs.ceph.com"


   - Squid RC status


   - Blockers tracked in: https://pad.ceph.com/p/squid-upgrade-failures


   - rgw: topic changes merged to main, but introduced some test failures.
   account changes blocked on topics


   - Non-blocker for RC0


   - centos 9 containerization (status unknown?)


   - Non-blocker for RC0


   - Follow up with Dan / Guillaume


   - RADOS has one outstanding blocker awaiting QA


   - Failing to register new account at Ceph tracker - error 404.


   - Likely related to Redmine upgrade over the weekend


   - Pacific eol:


   - Action item: in https://docs.ceph.com/en/latest/releases/, move to
   "archived"


   - 18.2.3


   - one or two PRs from cephfs left


   - Milestone: https://github.com/ceph/ceph/milestone/19


Thanks,
Laura
-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Regarding write on CephFS - Operation not permitted

2024-04-08 Thread elite_stu
Dears , 
CephFS mount succesful, however i encounted an issue about write file with 
error message "Operation not permitted". 
   It happend even the file chmod 777, please help me to solve, thanks a lot!


[root@vm-04 mycephfs]# df -Th
Filesystem Type  Size  Used 
Avail Use% Mounted on
/dev/mapper/centos-rootxfs46G  1.3G 
  44G   3% /
devtmpfs   devtmpfs  1.9G 0 
 1.9G   0% /dev
tmpfs  tmpfs 1.9G 0 
 1.9G   0% /dev/shm
tmpfs  tmpfs 1.9G  8.5M 
 1.9G   1% /run
tmpfs  tmpfs 1.9G 0 
 1.9G   0% /sys/fs/cgroup
/dev/vda1  xfs  1014M  145M 
 870M  15% /boot
tmpfs  tmpfs 379M 0 
 379M   0% /run/user/0
192.168.100.7:6789,192.168.100.8:6789,192.168.100.9:6789:/ ceph   19G 0 
  19G   0% /mnt/mycephfs
[root@vm-04 mycephfs]#
[root@vm-04 mycephfs]# pwd
/mnt/mycephfs
[root@vm-04 mycephfs]#
[root@vm-04 mycephfs]# ls -rlt
total 0
-rw-r--r-- 1 root root 0 Apr  8 20:44 aa.txt
-rw-r--r-- 1 root root 0 Apr  8 20:54 bb.txt
[root@vm-04 mycephfs]# cp aa.txt cc.txt
cp: error reading ‘aa.txt’: Operation not permitted
cp: failed to extend ‘cc.txt’: Operation not permitted
[root@vm-04 mycephfs]#
[root@vm-04 mycephfs]# echo "123" > aa.txt
-bash: echo: write error: Operation not permitted
[root@vm-04 mycephfs]#
[root@vm-04 mycephfs]#
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: DB/WALL and RGW index on the same NVME

2024-04-08 Thread Daniel Parkes
Hi,

Yes, that documentation you are linking is from Ceph 3.x with Filestore,
With Bluestore this is no longer the case, the link to the latest Red Hat
doc version is here:

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/7/html-single/object_gateway_guide/index#index-pool_rgw

I see they have this block of text there:

"For Red Hat Ceph Storage running Bluestore, Red Hat recommends deploying
an NVMe drive as a block.db device, rather than as a separate pool.
Ceph Object Gateway index data is written only into an object map (OMAP).
OMAP data for BlueStore resides on the block.db device on an OSD. When an
NVMe drive functions as a block.db device for an HDD OSD and when the index
pool is backed by HDD OSDs, the index data will ONLY be written to the
block.db device. As long as the block.db partition/lvm is sized properly at
4% of block, this configuration is all that is needed for BlueStore."

On Mon, Apr 8, 2024 at 12:02 PM Lukasz Borek  wrote:

> Thanks for clarifying.
>
> So redhat doc
> 
> is outdated?
>
> 3.6. Selecting SSDs for Bucket Indexes
>
>
> When selecting OSD hardware for use with a Ceph Object
>> Gateway—irrespective of the use case—Red Hat recommends considering an OSD
>> node that has at least one SSD drive used exclusively for the bucket index
>> pool. This is particularly important when buckets will contain a large
>> number of objects.
>
>
> A bucket index entry is approximately 200 bytes of data, stored as an
>> object map (omap) in leveldb. While this is a trivial amount of data, some
>> uses of Ceph Object Gateway can result in tens or hundreds of millions of
>> objects in a single bucket. By mapping the bucket index pool to a CRUSH
>> hierarchy of SSD nodes, the reduced latency provides a dramatic performance
>> improvement when buckets contain very large numbers of objects.
>
>
>> Important
>> In a production cluster, a typical OSD node will have at least one SSD
>> for the bucket index, AND at least on SSD for the journal.
>
>
> Current utilisation is what osd df command shows in OMAP field?:
>
> root@cephbackup:/# ceph osd df
>> ID  CLASS  WEIGHTREWEIGHT  SIZE RAW USE   DATA OMAP META
>> AVAIL%USE   VAR   PGS  STATUS
>>  0hdd   7.39870   1.0  7.4 TiB   894 GiB  769 GiB  1.5 MiB  3.4
>> GiB  6.5 TiB  11.80  1.45   40  up
>>  1hdd   7.39870   1.0  7.4 TiB   703 GiB  578 GiB  6.0 MiB  2.9
>> GiB  6.7 TiB   9.27  1.14   37  up
>>  2hdd   7.39870   1.0  7.4 TiB   700 GiB  576 GiB  3.1 MiB  3.1
>> GiB  6.7 TiB   9.24  1.13   39  up
>
>
>
>
>
> On Mon, 8 Apr 2024 at 08:42, Daniel Parkes  wrote:
>
>> Hi Lukasz,
>>
>> RGW uses Omap objects for the index pool; Omaps are stored in Rocksdb
>> database of each osd, not on the actual index pool, so by putting DB/WALL
>> on an NVMe as you mentioned, you are already configuring the index pool on
>> a non-rotational drive, you don't need to do anything else.
>>
>> You just need to size your DB/WALL partition accordingly. For RGW/object
>> storage, a good starting point for the DB/Wall sizing is 4%.
>>
>> Example of Omap entries in the index pool using 0 bytes, as they are
>> stored in Rocksdb:
>>
>> # rados -p default.rgw.buckets.index listomapkeys 
>> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
>> file1
>> file2
>> file4
>> file10
>>
>> rados df -p default.rgw.buckets.index
>> POOL_NAME  USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY 
>>  UNFOUND  DEGRADED  RD_OPS   RD  WR_OPS  WR  USED COMPR  UNDER COMPR
>> default.rgw.buckets.index   0 B   11   0  33   0 
>>0 0 208  207 KiB  41  20 KiB 0 B  0 B
>>
>> # rados -p default.rgw.buckets.index stat 
>> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
>> default.rgw.buckets.index/.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
>>  mtime 2022-12-20T07:32:11.00-0500, size 0
>>
>>
>> On Sun, Apr 7, 2024 at 10:06 PM Lukasz Borek  wrote:
>>
>>> Hi!
>>>
>>> I'm working on a POC cluster setup dedicated to backup app writing
>>> objects
>>> via s3 (large objects, up to 1TB transferred via multipart upload
>>> process).
>>>
>>> Initial setup is 18 storage nodes (12HDDs + 1 NVME card for DB/WALL) + EC
>>> pool.  Plan is to use cephadm.
>>>
>>> I'd like to follow good practice and put the RGW index pool on a
>>> no-rotation drive. Question is how to do it?
>>>
>>>- replace a few HDDs (1 per node) with a SSD (how many? 4-6-8?)
>>>- reserve space on NVME drive on each node, create lv based OSD and
>>> let
>>>rgb index use the same NVME drive as DB/WALL
>>>
>>> Thoughts?
>>>
>>> --
>>> Lukasz
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>>
>
> 

[ceph-users] Re: DB/WALL and RGW index on the same NVME

2024-04-08 Thread David Orman
I would suggest that you might consider EC vs. replication for index data, and 
the latency implications. There's more than just the nvme vs. rotational 
discussion to entertain, especially if using the more widely spread EC modes 
like 8+3. It would be worth testing for your particular workload.

Also make sure to factor in storage utilization if you expect to see 
versioning/object lock in use. This can be the source of a significant amount 
of additional consumption that isn't planned for initially.

On Mon, Apr 8, 2024, at 01:42, Daniel Parkes wrote:
> Hi Lukasz,
>
> RGW uses Omap objects for the index pool; Omaps are stored in Rocksdb
> database of each osd, not on the actual index pool, so by putting DB/WALL
> on an NVMe as you mentioned, you are already configuring the index pool on
> a non-rotational drive, you don't need to do anything else.
>
> You just need to size your DB/WALL partition accordingly. For RGW/object
> storage, a good starting point for the DB/Wall sizing is 4%.
>
> Example of Omap entries in the index pool using 0 bytes, as they are stored
> in Rocksdb:
>
> # rados -p default.rgw.buckets.index listomapkeys
> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> file1
> file2
> file4
> file10
>
> rados df -p default.rgw.buckets.index
> POOL_NAME  USED  OBJECTS  CLONES  COPIES
> MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS   RD  WR_OPS  WR
>  USED COMPR  UNDER COMPR
> default.rgw.buckets.index   0 B   11   0  33
> 00 0 208  207 KiB  41  20 KiB 0 B
> 0 B
>
> # rados -p default.rgw.buckets.index stat
> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> default.rgw.buckets.index/.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> mtime 2022-12-20T07:32:11.00-0500, size 0
>
>
> On Sun, Apr 7, 2024 at 10:06 PM Lukasz Borek  wrote:
>
>> Hi!
>>
>> I'm working on a POC cluster setup dedicated to backup app writing objects
>> via s3 (large objects, up to 1TB transferred via multipart upload process).
>>
>> Initial setup is 18 storage nodes (12HDDs + 1 NVME card for DB/WALL) + EC
>> pool.  Plan is to use cephadm.
>>
>> I'd like to follow good practice and put the RGW index pool on a
>> no-rotation drive. Question is how to do it?
>>
>>- replace a few HDDs (1 per node) with a SSD (how many? 4-6-8?)
>>- reserve space on NVME drive on each node, create lv based OSD and let
>>rgb index use the same NVME drive as DB/WALL
>>
>> Thoughts?
>>
>> --
>> Lukasz
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS Behind on Trimming...

2024-04-08 Thread Xiubo Li


On 4/8/24 12:32, Erich Weiler wrote:

Ah, I see.  Yes, we are already running version 18.2.1 on the server side (we 
just installed this cluster a few weeks ago from scratch).  So I guess if the 
fix has already been backported to that version, then we still have a problem.

Dos that mean it could be the locker order bug 
(https://tracker.ceph.com/issues/62123) as Xiubo suggested?


Then it's possibly the lock order issue. Need to check it later.

Thanks

- Xiubo



Thanks again,
Erich


On Apr 7, 2024, at 9:00 PM, Alexander E. Patrakov  wrote:

Hi Erich,


On Mon, Apr 8, 2024 at 11:51 AM Erich Weiler  wrote:

Hi Xiubo,


Thanks for your logs, and it should be the same issue with
https://tracker.ceph.com/issues/62052, could you try to test with this
fix again ?

This sounds good - but I'm not clear on what I should do?  I see a patch
in that tracker page, is that what you are referring to?  If so, how
would I apply such a patch?  Or is there simply a binary update I can
apply somehow to the MDS server software?

The backport of this patch (https://github.com/ceph/ceph/pull/53241)
was merged on October 18, 2023, and Ceph 18.2.1 was released on
December 18, 2023. Therefore, if you are running Ceph 18.2.1 on the
server side, you already have the fix. If you are already running
version 18.2.1 or 18.2.2 (to which you should upgrade anyway), please
complain, as the purported fix is then ineffective.


Thanks for helping!

-erich


Please let me know if you still could see this bug then it should be the
locker order bug as https://tracker.ceph.com/issues/62123.

Thanks

- Xiubo


On 3/28/24 04:03, Erich Weiler wrote:

Hi All,

I've been battling this for a while and I'm not sure where to go from
here.  I have a Ceph health warning as such:

# ceph -s
  cluster:
id: 58bde08a-d7ed-11ee-9098-506b4b4da440
health: HEALTH_WARN
1 MDSs report slow requests
1 MDSs behind on trimming

  services:
mon: 5 daemons, quorum
pr-md-01,pr-md-02,pr-store-01,pr-store-02,pr-md-03 (age 5d)
mgr: pr-md-01.jemmdf(active, since 3w), standbys: pr-md-02.emffhz
mds: 1/1 daemons up, 2 standby
osd: 46 osds: 46 up (since 9h), 46 in (since 2w)

  data:
volumes: 1/1 healthy
pools:   4 pools, 1313 pgs
objects: 260.72M objects, 466 TiB
usage:   704 TiB used, 424 TiB / 1.1 PiB avail
pgs: 1306 active+clean
 4active+clean+scrubbing+deep
 3active+clean+scrubbing

  io:
client:   123 MiB/s rd, 75 MiB/s wr, 109 op/s rd, 1.40k op/s wr

And the specifics are:

# ceph health detail
HEALTH_WARN 1 MDSs report slow requests; 1 MDSs behind on trimming
[WRN] MDS_SLOW_REQUEST: 1 MDSs report slow requests
mds.slugfs.pr-md-01.xdtppo(mds.0): 99 slow requests are blocked >
30 secs
[WRN] MDS_TRIM: 1 MDSs behind on trimming
mds.slugfs.pr-md-01.xdtppo(mds.0): Behind on trimming (13884/250)
max_segments: 250, num_segments: 13884

That "num_segments" number slowly keeps increasing.  I suspect I just
need to tell the MDS servers to trim faster but after hours of
googling around I just can't figure out the best way to do it. The
best I could come up with was to decrease "mds_cache_trim_decay_rate"
from 1.0 to .8 (to start), based on this page:

https://www.suse.com/support/kb/doc/?id=19740

But it doesn't seem to help, maybe I should decrease it further? I am
guessing this must be a common issue...?  I am running Reef on the MDS
servers, but most clients are on Quincy.

Thanks for any advice!

cheers,
erich
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--
Alexander E. Patrakov

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: DB/WALL and RGW index on the same NVME

2024-04-08 Thread Lukasz Borek
Thanks for clarifying.

So redhat doc

is outdated?

3.6. Selecting SSDs for Bucket Indexes


When selecting OSD hardware for use with a Ceph Object Gateway—irrespective
> of the use case—Red Hat recommends considering an OSD node that has at
> least one SSD drive used exclusively for the bucket index pool. This is
> particularly important when buckets will contain a large number of objects.


A bucket index entry is approximately 200 bytes of data, stored as an
> object map (omap) in leveldb. While this is a trivial amount of data, some
> uses of Ceph Object Gateway can result in tens or hundreds of millions of
> objects in a single bucket. By mapping the bucket index pool to a CRUSH
> hierarchy of SSD nodes, the reduced latency provides a dramatic performance
> improvement when buckets contain very large numbers of objects.


> Important
> In a production cluster, a typical OSD node will have at least one SSD for
> the bucket index, AND at least on SSD for the journal.


Current utilisation is what osd df command shows in OMAP field?:

root@cephbackup:/# ceph osd df
> ID  CLASS  WEIGHTREWEIGHT  SIZE RAW USE   DATA OMAP META
>   AVAIL%USE   VAR   PGS  STATUS
>  0hdd   7.39870   1.0  7.4 TiB   894 GiB  769 GiB  1.5 MiB  3.4
> GiB  6.5 TiB  11.80  1.45   40  up
>  1hdd   7.39870   1.0  7.4 TiB   703 GiB  578 GiB  6.0 MiB  2.9
> GiB  6.7 TiB   9.27  1.14   37  up
>  2hdd   7.39870   1.0  7.4 TiB   700 GiB  576 GiB  3.1 MiB  3.1
> GiB  6.7 TiB   9.24  1.13   39  up





On Mon, 8 Apr 2024 at 08:42, Daniel Parkes  wrote:

> Hi Lukasz,
>
> RGW uses Omap objects for the index pool; Omaps are stored in Rocksdb
> database of each osd, not on the actual index pool, so by putting DB/WALL
> on an NVMe as you mentioned, you are already configuring the index pool on
> a non-rotational drive, you don't need to do anything else.
>
> You just need to size your DB/WALL partition accordingly. For RGW/object
> storage, a good starting point for the DB/Wall sizing is 4%.
>
> Example of Omap entries in the index pool using 0 bytes, as they are
> stored in Rocksdb:
>
> # rados -p default.rgw.buckets.index listomapkeys 
> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> file1
> file2
> file4
> file10
>
> rados df -p default.rgw.buckets.index
> POOL_NAME  USED  OBJECTS  CLONES  COPIES  MISSING_ON_PRIMARY  
> UNFOUND  DEGRADED  RD_OPS   RD  WR_OPS  WR  USED COMPR  UNDER COMPR
> default.rgw.buckets.index   0 B   11   0  33   0  
>   0 0 208  207 KiB  41  20 KiB 0 B  0 B
>
> # rados -p default.rgw.buckets.index stat 
> .dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
> default.rgw.buckets.index/.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2 
> mtime 2022-12-20T07:32:11.00-0500, size 0
>
>
> On Sun, Apr 7, 2024 at 10:06 PM Lukasz Borek  wrote:
>
>> Hi!
>>
>> I'm working on a POC cluster setup dedicated to backup app writing objects
>> via s3 (large objects, up to 1TB transferred via multipart upload
>> process).
>>
>> Initial setup is 18 storage nodes (12HDDs + 1 NVME card for DB/WALL) + EC
>> pool.  Plan is to use cephadm.
>>
>> I'd like to follow good practice and put the RGW index pool on a
>> no-rotation drive. Question is how to do it?
>>
>>- replace a few HDDs (1 per node) with a SSD (how many? 4-6-8?)
>>- reserve space on NVME drive on each node, create lv based OSD and let
>>rgb index use the same NVME drive as DB/WALL
>>
>> Thoughts?
>>
>> --
>> Lukasz
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>

-- 
Łukasz Borek
luk...@borek.org.pl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Bucket usage per storage classes

2024-04-08 Thread Tobias Urdin
Sending again, to you directly as well, I’m unsure if my email was rejected on 
the mailing list.
—

Hello,

There is no such usage collected today, see [1] and [2] – where [2] is a 
specification on how
one community member wanted to implement the feature but nobody has put in the 
work yet
that we know of.

[1] https://tracker.ceph.com/issues/47342
[2] https://tracker.ceph.com/issues/54972

Best regards
Tobias


> On 8 Apr 2024, at 10:39, Ondřej Kukla  wrote:
> 
> Does my example make sense or we are not on the same page still?
> 
> Ondrej
> 
>> On 4. 4. 2024, at 23:59, Ondřej Kukla  wrote:
>> 
>> Let's take for example a situation where I have a standart storage class 
>> backed by HDDs and a fast one on SSDs. The user will mix the classes in the 
>> bucket and I would like to know how much space Is he taking on the HDDs and 
>> how much on the SSDs so I can bill him.
>> 
>> In this scenario I don't care that the head object is on HDDs. I would just 
>> like to transparently know how much Is stored where.
>> 
>> I hope it makes sense.
>> 
>> Ondrej
>> 
>> 
>> On Apr 4, 2024 23:20, Anthony D'Atri  wrote:
>> A bucket may contain objects spread across multiple storage classes, and 
>> AIUI the head object is always in the default storage class, so I'm not sure 
>> *exactly* what you're after here. 
>> 
>>> On Apr 4, 2024, at 17:09, Ondřej Kukla  wrote: 
>>> 
>>> Hello, 
>>> 
>>> I’m playing around with Storage classes in rgw and I’m looking for ways to 
>>> see per bucket statistics for the diferent storage classes (for billing 
>>> purposes etc.). 
>>> 
>>> I though that I would add another object to the bucket usage response like 
>>> for multiparts - rgw.multimeta, but it’s counted under the rgw.mainIs. 
>>> 
>>> Is there some option to get this info? 
>>> 
>>> Ondrej 
>>> 
>>> 
>>> Bucket usage I’m referring to 
>>> 
>>> "usage": { 
>>>   "rgw.main": { 
>>>   "size": 1333179575, 
>>>   "size_actual": 1333190656, 
>>>   "size_utilized": 1333179575, 
>>>   "size_kb": 1301934, 
>>>   "size_kb_actual": 1301944, 
>>>   "size_kb_utilized": 1301934, 
>>>   "num_objects": 4 
>>>   }, 
>>>   "rgw.multimeta": { 
>>>   "size": 0, 
>>>   "size_actual": 0, 
>>>   "size_utilized": 0, 
>>>   "size_kb": 0, 
>>>   "size_kb_actual": 0, 
>>>   "size_kb_utilized": 0, 
>>>   "num_objects": 0 
>>>   } 
>>> } 
>>> ___ 
>>> ceph-users mailing list -- ceph-users@ceph.io 
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD Unmap busy while no "normal" process holds it.

2024-04-08 Thread Nicolas FOURNIL
Hello,

I've got a strange issue with ceph (ceph-adm).

We use Incus (LXD reborn) and we used to add/remove containers.

We had moved to Ceph for  better scalability but sometimes we get this bug
:

https://discuss.linuxcontainers.org/t/howto-delete-container-with-ceph-rbd-volume-giving-device-or-resource-busy/5910

And I respawn this old bug in Incus :
https://discuss.linuxcontainers.org/t/incus-0-x-and-ceph-rbd-map-is-sometimes-busy/19585/6
(thanks for Stephane GRABER help !)

After working on it for several hours. I find this :

I do a loop who do this script  (create image / map it / format it /mount
it / write on it / unmount / unmap / delete image) :
rbd create image1 --size 1024 --pool customers-clouds.ix-mrs2.fr.eho ||
exit $?
RBD_DEVICE=$(rbd map customers-clouds.ix-mrs2.fr.eho/image1 || exit $? )
mkfs.ext4 ${RBD_DEVICE} || exit $?
mount ${RBD_DEVICE} /media/test || exit $?
dd if=/dev/zero of=/media/test/test.out
sleep 10
rm /media/test/test.out || exit $?
umount ${RBD_DEVICE} || exit $?
rbd unmap ${RBD_DEVICE} || exit $?
rbd rm customers-clouds.ix-mrs2.fr.eho/image1 || exit $?
sleep 1

It works for hours without any issue BUT if I add an OSD while doing this
loop I get this :

+ sleep 10
+ rm /media/test/test.out
+ umount /dev/rbd0
+ rbd unmap /dev/rbd0
rbd: sysfs write failed
rbd: unmap failed: (16) Device or resource busy
+ exit 16

And ... of course the winner is always the podman of the latest OSD i've added
(it holds a copy of the mounting point)
root@ceph01:~# grep rbd0 /proc/*/mountinfo/proc/1415299/mountinfo:1959
1837 252:0 / /rootfs/media/test rw,relatime - ext4 /dev/rbd0
rw,stripe=16/proc/1415301/mountinfo:1959 1837 252:0 /
/rootfs/media/test rw,relatime - ext4 /dev/rbd0 rw,stripe=16
root@ceph01:~# cat /proc/1415299/cmdline
/run/podman-init--/usr/bin/ceph-osd-nosd.26-f--setuserceph--setgroupceph--default-log-to-file=false--default-log-to-journald=true--default-log-to-stderr=falser...@ceph01-b15r56-2r016.ix-mrs2.fr.eho.admin:~#
root@ceph01:~# cat /proc/1415301/cmdline
/usr/bin/ceph-osd-nosd.26-f--setuserceph--setgroupceph--default-log-to-file=false--default-log-to-journald=true--default-log-to-stderr=falser...@ceph01-b15r56-2r016.ix-mrs2.fr.eho.admin:~#

Our setup is straight forward : latest stable Ceph release, latest
Debian stable release, and deploying via ceph-adm

Did someone have such a problem ? Where is the best place to report this bug ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Bucket usage per storage classes

2024-04-08 Thread Ondřej Kukla
Does my example make sense or we are not on the same page still?

Ondrej

> On 4. 4. 2024, at 23:59, Ondřej Kukla  wrote:
> 
> Let's take for example a situation where I have a standart storage class 
> backed by HDDs and a fast one on SSDs. The user will mix the classes in the 
> bucket and I would like to know how much space Is he taking on the HDDs and 
> how much on the SSDs so I can bill him.
> 
> In this scenario I don't care that the head object is on HDDs. I would just 
> like to transparently know how much Is stored where.
> 
> I hope it makes sense.
> 
> Ondrej
> 
> 
> On Apr 4, 2024 23:20, Anthony D'Atri  wrote:
> A bucket may contain objects spread across multiple storage classes, and AIUI 
> the head object is always in the default storage class, so I'm not sure 
> *exactly* what you're after here. 
> 
> > On Apr 4, 2024, at 17:09, Ondřej Kukla  wrote: 
> > 
> > Hello, 
> > 
> > I’m playing around with Storage classes in rgw and I’m looking for ways to 
> > see per bucket statistics for the diferent storage classes (for billing 
> > purposes etc.). 
> > 
> > I though that I would add another object to the bucket usage response like 
> > for multiparts - rgw.multimeta, but it’s counted under the rgw.mainIs. 
> > 
> > Is there some option to get this info? 
> > 
> > Ondrej 
> > 
> > 
> > Bucket usage I’m referring to 
> > 
> > "usage": { 
> >"rgw.main": { 
> >"size": 1333179575, 
> >"size_actual": 1333190656, 
> >"size_utilized": 1333179575, 
> >"size_kb": 1301934, 
> >"size_kb_actual": 1301944, 
> >"size_kb_utilized": 1301934, 
> >"num_objects": 4 
> >}, 
> >"rgw.multimeta": { 
> >"size": 0, 
> >"size_actual": 0, 
> >"size_utilized": 0, 
> >"size_kb": 0, 
> >"size_kb_actual": 0, 
> >"size_kb_utilized": 0, 
> >"num_objects": 0 
> >} 
> > } 
> > ___ 
> > ceph-users mailing list -- ceph-users@ceph.io 
> > To unsubscribe send an email to ceph-users-le...@ceph.io 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Client kernel crashes on cephfs access

2024-04-08 Thread Marc
> 
> I would like to ask for help regarding client kernel crashes that happen
> on cephfs access. We have been struggling with this for over a month now
> with over 100 crashes on 7 hosts during that time.
> 
> Our cluster runs version 18.2.1. Our clients run CentOS Stream.
> 
> On CentOS Stream 9 the problem started with kernel version
> 5.14.0-425.el9. Version 5.14.0-419.el9 is the last one without problems.
> It also occurred on CentOS Stream 8, starting with version
> 4.18.0-546.el8 (4.18.0-544.el8 being the last good one).
> 
> The problem presents itself by the client kernel crashing, forcing a
> reboot of the machine. Apparently it is triggered by a certain level of
> IO on the cephfs mount. It works perfectly fine when we rollback to the
> last good kernel version.
> 
> The exact call trace in vmcore-dmesg.txt differs between occurrences.
> Here are two typical examples:
> 
> ```
> [ 8641.382499] list_del corruption. next->prev should be
> 88bd0a4d4c80, but was 88bcefdfd280
> [ 8641.382521] [ cut here ]
> [ 8641.382521] kernel BUG at lib/list_debug.c:54!
> [ 8641.382528] invalid opcode:  [#1] PREEMPT SMP NOPTI
> [ 8641.382591] CPU: 2 PID: 83929 Comm: kworker/2:0 Kdump: loaded Not
> tainted 5.14.0-432.el9.x86_64 #1
> [ 8641.382610] Hardware name: oVirt RHEL/RHEL-AV, BIOS edk2-20230524-
> 4.el9_3 05/24/2023
> [ 8641.382624] Workqueue: ceph-cap ceph_cap_unlink_work [ceph]
> [ 8641.382662] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
> [ 8641.382681] Code: c7 c7 78 42 d8 b1 e8 f9 87 fe ff 0f 0b 48 89 fe 48
> c7 c7 08 43 d8 b1 e8 e8 87 fe ff 0f 0b 48 c7 c7 b8 43 d8 b1 e8 da 87 fe
> ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 78 43 d8 b1 e8 c6 87 fe ff 0f 0b
> [ 8641.382711] RSP: 0018:95a000d6be60 EFLAGS: 00010246
> [ 8641.382722] RAX: 0054 RBX: 88bced76dc00 RCX:
> 
> [ 8641.382734] RDX:  RSI: 88c02eea0840 RDI:
> 88c02eea0840
> [ 8641.382746] RBP: 88bd0a4d4c80 R08: 80008434 R09:
> 0010
> [ 8641.382758] R10: 000f R11: 000f R12:
> 88c02eeb2800
> [ 8641.382779] R13: 88bcc4610258 R14: 88bcc46101b8 R15:
> 88bcc46101c8
> [ 8641.382793] FS:  () GS:88c02ee8()
> knlGS:
> [ 8641.382809] CS:  0010 DS:  ES:  CR0: 80050033
> [ 8641.382819] CR2: 7f35cee8a000 CR3: 000105708004 CR4:
> 007706e0
> [ 8641.382832] PKRU: 5554
> [ 8641.382838] Call Trace:
> [ 8641.382844]  
> [ 8641.382850]  ? show_trace_log_lvl+0x1c4/0x2df
> [ 8641.382860]  ? show_trace_log_lvl+0x1c4/0x2df
> [ 8641.382870]  ? ceph_cap_unlink_work+0x3f/0x140 [ceph]
> [ 8641.382893]  ? __die_body.cold+0x8/0xd
> [ 8641.382902]  ? die+0x2b/0x50
> [ 8641.382911]  ? do_trap+0xce/0x120
> [ 8641.382919]  ? __list_del_entry_valid.cold+0x1d/0x47
> [ 8641.382930]  ? do_error_trap+0x65/0x80
> [ 8641.382938]  ? __list_del_entry_valid.cold+0x1d/0x47
> [ 8641.382948]  ? exc_invalid_op+0x4e/0x70
> [ 8641.382958]  ? __list_del_entry_valid.cold+0x1d/0x47
> [ 8641.382975]  ? asm_exc_invalid_op+0x16/0x20
> [ 8641.382988]  ? __list_del_entry_valid.cold+0x1d/0x47
> [ 8641.382998]  ceph_cap_unlink_work+0x3f/0x140 [ceph]
> [ 8641.383021]  process_one_work+0x1e2/0x3b0
> [ 8641.383032]  ? __pfx_worker_thread+0x10/0x10
> [ 8641.383043]  worker_thread+0x50/0x3a0
> [ 8641.383051]  ? __pfx_worker_thread+0x10/0x10
> [ 8641.383061]  kthread+0xdd/0x100
> [ 8641.383069]  ? __pfx_kthread+0x10/0x10
> [ 8641.383078]  ret_from_fork+0x29/0x50
> [ 8641.383090]  
> [ 8641.383095] Modules linked in: tls ceph libceph dns_resolver fscache
> netfs nft_counter ipt_REJECT xt_owner xt_conntrack nft_compat
> nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat
> nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables
> libcrc32c nfnetlink vfat fat intel_rapl_msr intel_rapl_common
> intel_uncore_frequency_common isst_if_common nfit virtio_gpu iTCO_wdt
> iTCO_vendor_support libnvdimm lpc_ich virtio_dma_buf drm_shmem_helper
> drm_kms_helper i2c_i801 rapl syscopyarea sysfillrect sysimgblt
> virtio_balloon fb_sys_fops i2c_smbus pcspkr joydev fuse drm ext4 mbcache
> jbd2 sr_mod cdrom sd_mod ahci t10_pi sg libahci crct10dif_pclmul
> crc32_pclmul crc32c_intel libata ghash_clmulni_intel virtio_net
> virtio_console virtio_scsi net_failover failover serio_raw
> ```
> 
> ```
> [ 3538.365469] list_del corruption. next->prev should be
> 8d2b75997c80, but was 8d2afcfaae80
> [ 3538.365488] [ cut here ]
> [ 3538.365488] kernel BUG at lib/list_debug.c:54!
> [ 3538.365493] invalid opcode:  [#1] PREEMPT SMP NOPTI
> [ 3538.365553] CPU: 0 PID: 910 Comm: php-fpm Kdump: loaded Not tainted
> 5.14.0-432.el9.x86_64 #1
> [ 3538.365569] Hardware name: oVirt RHEL/RHEL-AV, BIOS edk2-20230524-
> 4.el9_3 05/24/2023
> [ 3538.365582] RIP: 

[ceph-users] Setup Ceph over RDMA

2024-04-08 Thread Vahideh Alinouri
Hi guys,

I need setup Ceph over RDMA, but I faced many issues!
The info regarding my cluster:
Ceph version is Reef
Network cards are Broadcom RDMA.
RDMA connection between OSD nodes are OK.

I just found ms_type = async+rdma config in document and apply it using
ceph config set global ms_type async+rdma
After this action the cluster crashes. I tried to cluster back, and I did:
Put ms_type async+posix in ceph.conf
Restart all MON services

The cluster is back, but I don't have any active mgr. All OSDs are down too.
Is there any order to do for setting up Ceph over RDMA?
Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Client kernel crashes on cephfs access

2024-04-08 Thread Marc Ruhmann

Hi everyone,

I would like to ask for help regarding client kernel crashes that happen
on cephfs access. We have been struggling with this for over a month now
with over 100 crashes on 7 hosts during that time.

Our cluster runs version 18.2.1. Our clients run CentOS Stream.

On CentOS Stream 9 the problem started with kernel version
5.14.0-425.el9. Version 5.14.0-419.el9 is the last one without problems.
It also occurred on CentOS Stream 8, starting with version
4.18.0-546.el8 (4.18.0-544.el8 being the last good one).

The problem presents itself by the client kernel crashing, forcing a
reboot of the machine. Apparently it is triggered by a certain level of
IO on the cephfs mount. It works perfectly fine when we rollback to the
last good kernel version.

The exact call trace in vmcore-dmesg.txt differs between occurrences.
Here are two typical examples:

```
[ 8641.382499] list_del corruption. next->prev should be 88bd0a4d4c80, but 
was 88bcefdfd280
[ 8641.382521] [ cut here ]
[ 8641.382521] kernel BUG at lib/list_debug.c:54!
[ 8641.382528] invalid opcode:  [#1] PREEMPT SMP NOPTI
[ 8641.382591] CPU: 2 PID: 83929 Comm: kworker/2:0 Kdump: loaded Not tainted 
5.14.0-432.el9.x86_64 #1
[ 8641.382610] Hardware name: oVirt RHEL/RHEL-AV, BIOS edk2-20230524-4.el9_3 
05/24/2023
[ 8641.382624] Workqueue: ceph-cap ceph_cap_unlink_work [ceph]
[ 8641.382662] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[ 8641.382681] Code: c7 c7 78 42 d8 b1 e8 f9 87 fe ff 0f 0b 48 89 fe 48 c7 c7 08 43 
d8 b1 e8 e8 87 fe ff 0f 0b 48 c7 c7 b8 43 d8 b1 e8 da 87 fe ff <0f> 0b 48 89 f2 
48 89 fe 48 c7 c7 78 43 d8 b1 e8 c6 87 fe ff 0f 0b
[ 8641.382711] RSP: 0018:95a000d6be60 EFLAGS: 00010246
[ 8641.382722] RAX: 0054 RBX: 88bced76dc00 RCX: 
[ 8641.382734] RDX:  RSI: 88c02eea0840 RDI: 88c02eea0840
[ 8641.382746] RBP: 88bd0a4d4c80 R08: 80008434 R09: 0010
[ 8641.382758] R10: 000f R11: 000f R12: 88c02eeb2800
[ 8641.382779] R13: 88bcc4610258 R14: 88bcc46101b8 R15: 88bcc46101c8
[ 8641.382793] FS:  () GS:88c02ee8() 
knlGS:
[ 8641.382809] CS:  0010 DS:  ES:  CR0: 80050033
[ 8641.382819] CR2: 7f35cee8a000 CR3: 000105708004 CR4: 007706e0
[ 8641.382832] PKRU: 5554
[ 8641.382838] Call Trace:
[ 8641.382844]  
[ 8641.382850]  ? show_trace_log_lvl+0x1c4/0x2df
[ 8641.382860]  ? show_trace_log_lvl+0x1c4/0x2df
[ 8641.382870]  ? ceph_cap_unlink_work+0x3f/0x140 [ceph]
[ 8641.382893]  ? __die_body.cold+0x8/0xd
[ 8641.382902]  ? die+0x2b/0x50
[ 8641.382911]  ? do_trap+0xce/0x120
[ 8641.382919]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382930]  ? do_error_trap+0x65/0x80
[ 8641.382938]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382948]  ? exc_invalid_op+0x4e/0x70
[ 8641.382958]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382975]  ? asm_exc_invalid_op+0x16/0x20
[ 8641.382988]  ? __list_del_entry_valid.cold+0x1d/0x47
[ 8641.382998]  ceph_cap_unlink_work+0x3f/0x140 [ceph]
[ 8641.383021]  process_one_work+0x1e2/0x3b0
[ 8641.383032]  ? __pfx_worker_thread+0x10/0x10
[ 8641.383043]  worker_thread+0x50/0x3a0
[ 8641.383051]  ? __pfx_worker_thread+0x10/0x10
[ 8641.383061]  kthread+0xdd/0x100
[ 8641.383069]  ? __pfx_kthread+0x10/0x10
[ 8641.383078]  ret_from_fork+0x29/0x50
[ 8641.383090]  
[ 8641.383095] Modules linked in: tls ceph libceph dns_resolver fscache netfs 
nft_counter ipt_REJECT xt_owner xt_conntrack nft_compat nft_fib_inet 
nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 
nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 
nf_defrag_ipv4 rfkill ip_set nf_tables libcrc32c nfnetlink vfat fat 
intel_rapl_msr intel_rapl_common intel_uncore_frequency_common isst_if_common 
nfit virtio_gpu iTCO_wdt iTCO_vendor_support libnvdimm lpc_ich virtio_dma_buf 
drm_shmem_helper drm_kms_helper i2c_i801 rapl syscopyarea sysfillrect sysimgblt 
virtio_balloon fb_sys_fops i2c_smbus pcspkr joydev fuse drm ext4 mbcache jbd2 
sr_mod cdrom sd_mod ahci t10_pi sg libahci crct10dif_pclmul crc32_pclmul 
crc32c_intel libata ghash_clmulni_intel virtio_net virtio_console virtio_scsi 
net_failover failover serio_raw
```

```
[ 3538.365469] list_del corruption. next->prev should be 8d2b75997c80, but 
was 8d2afcfaae80
[ 3538.365488] [ cut here ]
[ 3538.365488] kernel BUG at lib/list_debug.c:54!
[ 3538.365493] invalid opcode:  [#1] PREEMPT SMP NOPTI
[ 3538.365553] CPU: 0 PID: 910 Comm: php-fpm Kdump: loaded Not tainted 
5.14.0-432.el9.x86_64 #1
[ 3538.365569] Hardware name: oVirt RHEL/RHEL-AV, BIOS edk2-20230524-4.el9_3 
05/24/2023
[ 3538.365582] RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47
[ 3538.365612] Code: c7 c7 78 42 38 8e e8 f9 87 fe ff 0f 0b 48 89 fe 48 c7 c7 08 43 
38 8e e8 e8 87 fe ff 0f 0b 48 c7 c7 b8 43 38 8e e8 da 87 fe ff <0f> 0b 48 89 f2 
48 

[ceph-users] Re: DB/WALL and RGW index on the same NVME

2024-04-08 Thread Daniel Parkes
Hi Lukasz,

RGW uses Omap objects for the index pool; Omaps are stored in Rocksdb
database of each osd, not on the actual index pool, so by putting DB/WALL
on an NVMe as you mentioned, you are already configuring the index pool on
a non-rotational drive, you don't need to do anything else.

You just need to size your DB/WALL partition accordingly. For RGW/object
storage, a good starting point for the DB/Wall sizing is 4%.

Example of Omap entries in the index pool using 0 bytes, as they are stored
in Rocksdb:

# rados -p default.rgw.buckets.index listomapkeys
.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
file1
file2
file4
file10

rados df -p default.rgw.buckets.index
POOL_NAME  USED  OBJECTS  CLONES  COPIES
MISSING_ON_PRIMARY  UNFOUND  DEGRADED  RD_OPS   RD  WR_OPS  WR
 USED COMPR  UNDER COMPR
default.rgw.buckets.index   0 B   11   0  33
00 0 208  207 KiB  41  20 KiB 0 B
0 B

# rados -p default.rgw.buckets.index stat
.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
default.rgw.buckets.index/.dir.7fb0a3df-9553-4a76-938d-d23711e67677.34162.1.2
mtime 2022-12-20T07:32:11.00-0500, size 0


On Sun, Apr 7, 2024 at 10:06 PM Lukasz Borek  wrote:

> Hi!
>
> I'm working on a POC cluster setup dedicated to backup app writing objects
> via s3 (large objects, up to 1TB transferred via multipart upload process).
>
> Initial setup is 18 storage nodes (12HDDs + 1 NVME card for DB/WALL) + EC
> pool.  Plan is to use cephadm.
>
> I'd like to follow good practice and put the RGW index pool on a
> no-rotation drive. Question is how to do it?
>
>- replace a few HDDs (1 per node) with a SSD (how many? 4-6-8?)
>- reserve space on NVME drive on each node, create lv based OSD and let
>rgb index use the same NVME drive as DB/WALL
>
> Thoughts?
>
> --
> Lukasz
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io