[ceph-users] Re: iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around?

2023-02-14 Thread Konstantin Shalygin

Hi,

You can use smartctl_exporter [1] for all your media, not only the SSD


k
[1] https://github.com/prometheus-community/smartctl_exporter

Sent from my iPhone

> On 14 Feb 2023, at 23:01, Drew Weaver  wrote:
> Hello,
> 
> After upgrading a lot of iDRAC9 modules to version 6.10 in servers that are 
> involved in a Ceph cluster we noticed that the iDRAC9 shows the write 
> endurance as 0% on any non-certified disk.
> 
> OMSA still shows the correct remaining write endurance but I am assuming that 
> they are working feverishly to eliminate that too.
> 
> I opened a support ticket with Dell once this was brought to my attention and 
> they basically told me that I was lucky that it ever worked at all, which I 
> thought was an odd response given that the iDRAC enterprise licenses cost 
> several hundred dollars each.
> 
> I know that the old Intel Datacenter Tool used to be able to reach through a 
> MegaRAID controller and read the remaining write endurance but that tool is 
> essentially defunct now.
> 
> What are you folks using to monitor your write endurance on your SSDs that 
> you couldn't buy from Dell because they had a 16 week lead time while the MFG 
> could deliver the drives in 3 days?
> 
> Thanks,
> -Drew
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Announcing go-ceph v0.20.0

2023-02-14 Thread Sven Anderson
We are happy to announce another release of the go-ceph API library. This
is a
regular release following our every-two-months release cadence.

https://github.com/ceph/go-ceph/releases/tag/v0.20.0

Changes include additions to the rbd, rgw and cephfs packages. More details
are
available at the link above.

The library includes bindings that aim to play a similar role to the
"pybind"
python bindings in the ceph tree but for the Go language. The library also
includes additional APIs that can be used to administer cephfs, rbd, and rgw
subsystems.
There are already a few consumers of this library in the wild, including the
ceph-csi project.

Sven
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around? [EXT]

2023-02-14 Thread Drew Weaver
That is pretty awesome, I will look into doing it that way. All of our 
monitoring is integrated to use the very very expensive DRAC enterprise license 
we pay for (my fault for trusting Dell).

We are looking for a new hardware vendor but this will likely work for the 
mistake we already made.

Thanks,
-Drew





-Original Message-
From: Dave Holland  
Sent: Tuesday, February 14, 2023 11:39 AM
To: Drew Weaver 
Cc: 'ceph-users@ceph.io' 
Subject: Re: [ceph-users] iDRAC 9 version 6.10 shows 0% for write endurance on 
non-dell drives, work around? [EXT]

On Tue, Feb 14, 2023 at 04:00:30PM +, Drew Weaver wrote:
> What are you folks using to monitor your write endurance on your SSDs that 
> you couldn't buy from Dell because they had a 16 week lead time while the MFG 
> could deliver the drives in 3 days?

Our Ceph servers are SuperMicro not Dell but this approach is portable. We 
wrote a little shell script to parse the output of "nvme"
and/or "smartctl" every hour and send the data to a Graphite server.
We have a Grafana dashboard to display the all-important graphs. After
~5 years life, our most worn NVMe (used for journal/db only -- data is on HDD) 
is showing 89% life remaining.

Dave
-- 
**   Dave Holland   ** Systems Support -- Informatics Systems Group **
** d...@sanger.ac.uk **Wellcome Sanger Institute, Hinxton, UK**


--
 The Wellcome Sanger Institute is operated by Genome Research  Limited, a 
charity registered in England with number 1021457 and a  company registered in 
England with number 2742969, whose registered  office is 215 Euston Road, 
London, NW1 2BE. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Missing object in bucket list

2023-02-14 Thread J. Eric Ivancich
A bug was reported recently where if a put object occurs when bucket resharding 
is finishing up, it would write to the old bucket shard rather than the new 
one. From your logs there is evidence that resharding is underway alongside the 
put object.

A fix for that bug is on main and pacific, and the quincy version is not yet 
merged. See:

https://tracker.ceph.com/issues/58034

Octopus was EOLed back in August so won’t receive the fix. But it seems the 
next releases pacific and quincy will have the fix as will reef.

Eric
(he/him)

> On Feb 13, 2023, at 11:41 AM, mahnoosh shahidi  
> wrote:
> 
> Hi all,
> 
> We have a cluster on 15.2.12. We are experiencing an unusual scenario in
> S3. User send PUT request to upload an object and RGW returns 200 as a
> response status code. The object has been uploaded and can be downloaded
> but it does not exist in the bucket list. We also tried to get the bucket
> index entry for that object but it does not exist. Below is the log of the
> RGW for the request.
> 
> 1 == starting new request req=0x7f246c4426b0 =
>> 2 req 44161 0s initializing for trans_id =
>> tx0ac81-0063e36653-17e18f0-default
>> 10 rgw api priority: s3=3 s3website=2
>> 10 host=192.168.0.201
>> 10 meta>> HTTP_X_AMZ_CONTENT_SHA256
>> 10 meta>> HTTP_X_AMZ_DATE
>> 10 x>>
>> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
>> 10 x>> x-amz-date:20230208T090731Z
>> 10 handler=22RGWHandler_REST_Obj_S3
>> 2 req 44161 0s getting op 1
>> 10 req 44161 0s s3:put_obj scheduling with dmclock client=2 cost=1
>> 10 op=21RGWPutObj_ObjStore_S3
>> 2 req 44161 0s s3:put_obj verifying requester
>> 10 v4 signature format =
>> 7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745
>> 10 v4 credential format =
>> 85ZYESW8HS34DC95MZBT/20230208/us-east-1/s3/aws4_request
>> 10 access key id = 85ZYESW8HS34DC95MZBT
>> 10 credential scope = 20230208/us-east-1/s3/aws4_request
>> 10 req 44161 0s canonical headers format =
>> content-md5:ttgbNgpWctgMJ0MPORU+LA==
>> host:192.168.0.201
>> 
>> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
>> x-amz-date:20230208T090731Z
>> 
>> 10 payload request hash =
>> 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
>> 10 canonical request = PUT
>> /test7/file508294
>> 
>> content-md5:ttgbNgpWctgMJ0MPORU+LA==
>> host:192.168.0.201
>> 
>> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
>> x-amz-date:20230208T090731Z
>> 
>> content-md5;host;x-amz-content-sha256;x-amz-date
>> 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
>> 10 canonical request hash =
>> 2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba
>> 10 string to sign = AWS4-HMAC-SHA256
>> 20230208T090731Z
>> 20230208/us-east-1/s3/aws4_request
>> 2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba
>> 10 req 44161 0s delaying v4 auth
>> 10 date_k=
>> a9dc6afa32600995d313f1b6a4fa40be3a3cd574d25db8789ac966a8e7f43356
>> 10 region_k  =
>> b9193e8e261f702b88549da7e81e6a4a7672725996ea8a86269fed665b39670d
>> 10 service_k =
>> 34214c91aec1192bcc413e02044e346b31ed4f13df8c15830bdb1d7bd3565126
>> 10 signing_k =
>> 7656d62334d92c982f8c21e0200e760054b214eebab6dbeab577fb655c00a6f4
>> 10 generated signature =
>> 7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745
>> 2 req 44161 0s s3:put_obj normalizing buckets and tenants
>> 10 s->object=file508294 s->bucket=test7
>> 2 req 44161 0s s3:put_obj init permissions
>> 10 cache get: name=default.rgw.meta+root+test7 : expiry miss
>> 10 cache put: name=default.rgw.meta+root+test7 info.flags=0x16
>> 10 adding default.rgw.meta+root+test7 to cache LRU end
>> 10 updating xattr: name=ceph.objclass.version bl.length()=42
>> 10 cache get: name=default.rgw.meta+root+test7 : type miss
>> (requested=0x11, cached=0x16)
>> 10 cache put: name=default.rgw.meta+root+test7 info.flags=0x11
>> 10 moving default.rgw.meta+root+test7 to cache LRU end
>> 10 cache get: name=default.rgw.meta+users.uid+storage : hit
>> (requested=0x6, cached=0x17)
>> 10 cache get: name=default.rgw.meta+users.uid+storage : hit
>> (requested=0x3, cached=0x17)
>> 2 req 44161 0.00345s s3:put_obj recalculating target
>> 2 req 44161 0.00345s s3:put_obj reading permissions
>> 2 req 44161 0.00345s s3:put_obj init op
>> 2 req 44161 0.00345s s3:put_obj verifying op mask
>> 2 req 44161 0.00345s s3:put_obj verifying op permissions
>> 5 req 44161 0.00345s s3:put_obj Searching permissions for
>> identity=rgw::auth::SysReqApplier ->
>> rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=,
>> perm_mask=15, is_admin=0) mask=50
>> 5 Searching permissions for uid=storage
>> 5 Found permission: 15
>> 5 Searching permissions for group=1 mask=50
>> 5 Permissions for group not found
>> 5 Searching permissions for group=2 mask=50
>> 5 Permissions for group not found
>> 5 req 44161 0.00345s s3:put_obj -- 

[ceph-users] Re: iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around? [EXT]

2023-02-14 Thread Dave Holland
On Tue, Feb 14, 2023 at 04:00:30PM +, Drew Weaver wrote:
> What are you folks using to monitor your write endurance on your SSDs that 
> you couldn't buy from Dell because they had a 16 week lead time while the MFG 
> could deliver the drives in 3 days?

Our Ceph servers are SuperMicro not Dell but this approach is
portable. We wrote a little shell script to parse the output of "nvme"
and/or "smartctl" every hour and send the data to a Graphite server.
We have a Grafana dashboard to display the all-important graphs. After
~5 years life, our most worn NVMe (used for journal/db only -- data is
on HDD) is showing 89% life remaining.

Dave
-- 
**   Dave Holland   ** Systems Support -- Informatics Systems Group **
** d...@sanger.ac.uk **Wellcome Sanger Institute, Hinxton, UK**


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] iDRAC 9 version 6.10 shows 0% for write endurance on non-dell drives, work around?

2023-02-14 Thread Drew Weaver
Hello,

After upgrading a lot of iDRAC9 modules to version 6.10 in servers that are 
involved in a Ceph cluster we noticed that the iDRAC9 shows the write endurance 
as 0% on any non-certified disk.

OMSA still shows the correct remaining write endurance but I am assuming that 
they are working feverishly to eliminate that too.

I opened a support ticket with Dell once this was brought to my attention and 
they basically told me that I was lucky that it ever worked at all, which I 
thought was an odd response given that the iDRAC enterprise licenses cost 
several hundred dollars each.

I know that the old Intel Datacenter Tool used to be able to reach through a 
MegaRAID controller and read the remaining write endurance but that tool is 
essentially defunct now.

What are you folks using to monitor your write endurance on your SSDs that you 
couldn't buy from Dell because they had a 16 week lead time while the MFG could 
deliver the drives in 3 days?

Thanks,
-Drew

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephalocon 2023 Amsterdam Call For Proposals Extended to February 19!

2023-02-14 Thread Satoru Takeuchi
Hi Mike,

I have two questions about Cephalocon 2023.


1. Will this event only be held as on-site (no virtual platform)?
2. Will the session records be available on YouTube as other Ceph events?

Thanks,
Satoru
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Renaming a ceph node

2023-02-14 Thread Manuel Lausch
Hi,

yes you can rename a node without massive rebalancing.

The following I tested with pacific. But I think this should work with
older versions as well.
You need to rename the node in the crushmap between shutting down the
node with the old name and starting it with the new name.
You only must keep the ID from the node in the crushmap!

Regards
Manuel


On Mon, 13 Feb 2023 22:22:35 +
"Rice, Christian"  wrote:

> Can anyone please point me at a doc that explains the most efficient 
> procedure to rename a ceph node WITHOUT causing a massive misplaced objects 
> churn?
> 
> When my node came up with a new name, it properly joined the cluster and 
> owned the OSDs, but the original node with no devices remained.  I expect 
> this affected the crush map such that a large qty of objects got reshuffled.  
> I want no object movement, if possible.
> 
> BTW this old cluster is on luminous. ☹
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Frequent calling monitor election

2023-02-14 Thread Frank Schilder
Hi Stefan,

thanks for that hint. We use xfs on a dedicated RAID array for the MON stores. 
I'm not sure if I have seen elections caused by trimming, I will keep an eye on 
it.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Stefan Kooman 
Sent: 10 February 2023 10:03:04
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Frequent calling monitor election

On 2/9/23 16:55, Frank Schilder wrote:


> We moved a switch from one rack to another and after the switch came beck up, 
> the monitors frequently bitch about who is the alpha. How do I get them to 
> focus more on their daily duties again?

Just checking here, do you use xfs as monitor database filesystem? We
encountered monitor elections when monthly trim (discard unused blocks)
would run. Disabling the trims solved the issue for us. If you have
"discard" option enabled this might hurt you more often ...

Gr. Stefan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io