[ceph-users] Re: iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around?
Hi, You can use smartctl_exporter [1] for all your media, not only the SSD k [1] https://github.com/prometheus-community/smartctl_exporter Sent from my iPhone > On 14 Feb 2023, at 23:01, Drew Weaver wrote: > Hello, > > After upgrading a lot of iDRAC9 modules to version 6.10 in servers that are > involved in a Ceph cluster we noticed that the iDRAC9 shows the write > endurance as 0% on any non-certified disk. > > OMSA still shows the correct remaining write endurance but I am assuming that > they are working feverishly to eliminate that too. > > I opened a support ticket with Dell once this was brought to my attention and > they basically told me that I was lucky that it ever worked at all, which I > thought was an odd response given that the iDRAC enterprise licenses cost > several hundred dollars each. > > I know that the old Intel Datacenter Tool used to be able to reach through a > MegaRAID controller and read the remaining write endurance but that tool is > essentially defunct now. > > What are you folks using to monitor your write endurance on your SSDs that > you couldn't buy from Dell because they had a 16 week lead time while the MFG > could deliver the drives in 3 days? > > Thanks, > -Drew > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Announcing go-ceph v0.20.0
We are happy to announce another release of the go-ceph API library. This is a regular release following our every-two-months release cadence. https://github.com/ceph/go-ceph/releases/tag/v0.20.0 Changes include additions to the rbd, rgw and cephfs packages. More details are available at the link above. The library includes bindings that aim to play a similar role to the "pybind" python bindings in the ceph tree but for the Go language. The library also includes additional APIs that can be used to administer cephfs, rbd, and rgw subsystems. There are already a few consumers of this library in the wild, including the ceph-csi project. Sven ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around? [EXT]
That is pretty awesome, I will look into doing it that way. All of our monitoring is integrated to use the very very expensive DRAC enterprise license we pay for (my fault for trusting Dell). We are looking for a new hardware vendor but this will likely work for the mistake we already made. Thanks, -Drew -Original Message- From: Dave Holland Sent: Tuesday, February 14, 2023 11:39 AM To: Drew Weaver Cc: 'ceph-users@ceph.io' Subject: Re: [ceph-users] iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around? [EXT] On Tue, Feb 14, 2023 at 04:00:30PM +, Drew Weaver wrote: > What are you folks using to monitor your write endurance on your SSDs that > you couldn't buy from Dell because they had a 16 week lead time while the MFG > could deliver the drives in 3 days? Our Ceph servers are SuperMicro not Dell but this approach is portable. We wrote a little shell script to parse the output of "nvme" and/or "smartctl" every hour and send the data to a Graphite server. We have a Grafana dashboard to display the all-important graphs. After ~5 years life, our most worn NVMe (used for journal/db only -- data is on HDD) is showing 89% life remaining. Dave -- ** Dave Holland ** Systems Support -- Informatics Systems Group ** ** d...@sanger.ac.uk **Wellcome Sanger Institute, Hinxton, UK** -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Missing object in bucket list
A bug was reported recently where if a put object occurs when bucket resharding is finishing up, it would write to the old bucket shard rather than the new one. From your logs there is evidence that resharding is underway alongside the put object. A fix for that bug is on main and pacific, and the quincy version is not yet merged. See: https://tracker.ceph.com/issues/58034 Octopus was EOLed back in August so won’t receive the fix. But it seems the next releases pacific and quincy will have the fix as will reef. Eric (he/him) > On Feb 13, 2023, at 11:41 AM, mahnoosh shahidi > wrote: > > Hi all, > > We have a cluster on 15.2.12. We are experiencing an unusual scenario in > S3. User send PUT request to upload an object and RGW returns 200 as a > response status code. The object has been uploaded and can be downloaded > but it does not exist in the bucket list. We also tried to get the bucket > index entry for that object but it does not exist. Below is the log of the > RGW for the request. > > 1 == starting new request req=0x7f246c4426b0 = >> 2 req 44161 0s initializing for trans_id = >> tx0ac81-0063e36653-17e18f0-default >> 10 rgw api priority: s3=3 s3website=2 >> 10 host=192.168.0.201 >> 10 meta>> HTTP_X_AMZ_CONTENT_SHA256 >> 10 meta>> HTTP_X_AMZ_DATE >> 10 x>> >> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 >> 10 x>> x-amz-date:20230208T090731Z >> 10 handler=22RGWHandler_REST_Obj_S3 >> 2 req 44161 0s getting op 1 >> 10 req 44161 0s s3:put_obj scheduling with dmclock client=2 cost=1 >> 10 op=21RGWPutObj_ObjStore_S3 >> 2 req 44161 0s s3:put_obj verifying requester >> 10 v4 signature format = >> 7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745 >> 10 v4 credential format = >> 85ZYESW8HS34DC95MZBT/20230208/us-east-1/s3/aws4_request >> 10 access key id = 85ZYESW8HS34DC95MZBT >> 10 credential scope = 20230208/us-east-1/s3/aws4_request >> 10 req 44161 0s canonical headers format = >> content-md5:ttgbNgpWctgMJ0MPORU+LA== >> host:192.168.0.201 >> >> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 >> x-amz-date:20230208T090731Z >> >> 10 payload request hash = >> 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 >> 10 canonical request = PUT >> /test7/file508294 >> >> content-md5:ttgbNgpWctgMJ0MPORU+LA== >> host:192.168.0.201 >> >> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 >> x-amz-date:20230208T090731Z >> >> content-md5;host;x-amz-content-sha256;x-amz-date >> 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 >> 10 canonical request hash = >> 2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba >> 10 string to sign = AWS4-HMAC-SHA256 >> 20230208T090731Z >> 20230208/us-east-1/s3/aws4_request >> 2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba >> 10 req 44161 0s delaying v4 auth >> 10 date_k= >> a9dc6afa32600995d313f1b6a4fa40be3a3cd574d25db8789ac966a8e7f43356 >> 10 region_k = >> b9193e8e261f702b88549da7e81e6a4a7672725996ea8a86269fed665b39670d >> 10 service_k = >> 34214c91aec1192bcc413e02044e346b31ed4f13df8c15830bdb1d7bd3565126 >> 10 signing_k = >> 7656d62334d92c982f8c21e0200e760054b214eebab6dbeab577fb655c00a6f4 >> 10 generated signature = >> 7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745 >> 2 req 44161 0s s3:put_obj normalizing buckets and tenants >> 10 s->object=file508294 s->bucket=test7 >> 2 req 44161 0s s3:put_obj init permissions >> 10 cache get: name=default.rgw.meta+root+test7 : expiry miss >> 10 cache put: name=default.rgw.meta+root+test7 info.flags=0x16 >> 10 adding default.rgw.meta+root+test7 to cache LRU end >> 10 updating xattr: name=ceph.objclass.version bl.length()=42 >> 10 cache get: name=default.rgw.meta+root+test7 : type miss >> (requested=0x11, cached=0x16) >> 10 cache put: name=default.rgw.meta+root+test7 info.flags=0x11 >> 10 moving default.rgw.meta+root+test7 to cache LRU end >> 10 cache get: name=default.rgw.meta+users.uid+storage : hit >> (requested=0x6, cached=0x17) >> 10 cache get: name=default.rgw.meta+users.uid+storage : hit >> (requested=0x3, cached=0x17) >> 2 req 44161 0.00345s s3:put_obj recalculating target >> 2 req 44161 0.00345s s3:put_obj reading permissions >> 2 req 44161 0.00345s s3:put_obj init op >> 2 req 44161 0.00345s s3:put_obj verifying op mask >> 2 req 44161 0.00345s s3:put_obj verifying op permissions >> 5 req 44161 0.00345s s3:put_obj Searching permissions for >> identity=rgw::auth::SysReqApplier -> >> rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=, >> perm_mask=15, is_admin=0) mask=50 >> 5 Searching permissions for uid=storage >> 5 Found permission: 15 >> 5 Searching permissions for group=1 mask=50 >> 5 Permissions for group not found >> 5 Searching permissions for group=2 mask=50 >> 5 Permissions for group not found >> 5 req 44161 0.00345s s3:put_obj --
[ceph-users] Re: iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around? [EXT]
On Tue, Feb 14, 2023 at 04:00:30PM +, Drew Weaver wrote: > What are you folks using to monitor your write endurance on your SSDs that > you couldn't buy from Dell because they had a 16 week lead time while the MFG > could deliver the drives in 3 days? Our Ceph servers are SuperMicro not Dell but this approach is portable. We wrote a little shell script to parse the output of "nvme" and/or "smartctl" every hour and send the data to a Graphite server. We have a Grafana dashboard to display the all-important graphs. After ~5 years life, our most worn NVMe (used for journal/db only -- data is on HDD) is showing 89% life remaining. Dave -- ** Dave Holland ** Systems Support -- Informatics Systems Group ** ** d...@sanger.ac.uk **Wellcome Sanger Institute, Hinxton, UK** -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around?
Hello, After upgrading a lot of iDRAC9 modules to version 6.10 in servers that are involved in a Ceph cluster we noticed that the iDRAC9 shows the write endurance as 0% on any non-certified disk. OMSA still shows the correct remaining write endurance but I am assuming that they are working feverishly to eliminate that too. I opened a support ticket with Dell once this was brought to my attention and they basically told me that I was lucky that it ever worked at all, which I thought was an odd response given that the iDRAC enterprise licenses cost several hundred dollars each. I know that the old Intel Datacenter Tool used to be able to reach through a MegaRAID controller and read the remaining write endurance but that tool is essentially defunct now. What are you folks using to monitor your write endurance on your SSDs that you couldn't buy from Dell because they had a 16 week lead time while the MFG could deliver the drives in 3 days? Thanks, -Drew ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cephalocon 2023 Amsterdam Call For Proposals Extended to February 19!
Hi Mike, I have two questions about Cephalocon 2023. 1. Will this event only be held as on-site (no virtual platform)? 2. Will the session records be available on YouTube as other Ceph events? Thanks, Satoru ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Renaming a ceph node
Hi, yes you can rename a node without massive rebalancing. The following I tested with pacific. But I think this should work with older versions as well. You need to rename the node in the crushmap between shutting down the node with the old name and starting it with the new name. You only must keep the ID from the node in the crushmap! Regards Manuel On Mon, 13 Feb 2023 22:22:35 + "Rice, Christian" wrote: > Can anyone please point me at a doc that explains the most efficient > procedure to rename a ceph node WITHOUT causing a massive misplaced objects > churn? > > When my node came up with a new name, it properly joined the cluster and > owned the OSDs, but the original node with no devices remained. I expect > this affected the crush map such that a large qty of objects got reshuffled. > I want no object movement, if possible. > > BTW this old cluster is on luminous. ☹ > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Frequent calling monitor election
Hi Stefan, thanks for that hint. We use xfs on a dedicated RAID array for the MON stores. I'm not sure if I have seen elections caused by trimming, I will keep an eye on it. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 10 February 2023 10:03:04 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Frequent calling monitor election On 2/9/23 16:55, Frank Schilder wrote: > We moved a switch from one rack to another and after the switch came beck up, > the monitors frequently bitch about who is the alpha. How do I get them to > focus more on their daily duties again? Just checking here, do you use xfs as monitor database filesystem? We encountered monitor elections when monthly trim (discard unused blocks) would run. Disabling the trims solved the issue for us. If you have "discard" option enabled this might hurt you more often ... Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io