[ceph-users] Re: How can I use not-replicated pool (replication 1 or raid-0)

2023-04-28 Thread mhnx
Hello Janne, thank you for your response.

I understand your advice and be sure that I've designed too many EC
pools and I know the mess. This is not an option because I need SPEED.

Please let me tell you, my hardware first to meet the same vision.
Server: R620
Cpu: 2 x Xeon E5-2630 v2 @ 2.60GHz
Ram: 128GB - DDR3
Disk1: 20x Samsung SSD 860 2TB
Disk2: 10x Samsung SSD 870 2TB

My ssds does not have PLP. Because of that, every ceph write also
waits for TRIM. I want to know how much latency we are talking about
because I'm thinking of adding PLP NVME for wal+db cache to gain some
speed.
As you can see, I even try to gain from every TRIM command.
Currently I'm testing replication 2 pool and even this speed is not
enough for my use case.
Now I'm trying to boost the deletion speed because I'm writing and
deleting files all the time and this never ends.
I write this mail because replication 1 will decrease the deletion
speed but still I'm trying to tune some MDS+ODS parameters to increase
delete speed.

Any help and idea will be great for me. Thanks.
Regards.



Janne Johansson , 12 Nis 2023 Çar, 10:10
tarihinde şunu yazdı:
>
> Den mån 10 apr. 2023 kl 22:31 skrev mhnx :
> > Hello.
> > I have a 10 node cluster. I want to create a non-replicated pool
> > (replication 1) and I want to ask some questions about it:
> >
> > Let me tell you my use case:
> > - I don't care about losing data,
> > - All of my data is JUNK and these junk files are usually between 1KB to 
> > 32MB.
> > - These files will be deleted in 5 days.
> > - Writable space and I/O speed is more important.
> > - I have high Write/Read/Delete operations, minimum 200GB a day.
>
> That is "only" 18MB/s which should easily be doable even with
> repl=2,3,4. or EC. This of course depends on speed of drives, network,
> cpus and all that, but in itself it doesn't seem too hard to achieve
> in terms of average speeds. We have EC8+3 rgw backed by some 12-14 OSD
> hosts with hdd and nvme (for wal+db) that can ingest over 1GB/s if you
> parallelize the rgw streams, so 18MB/s seems totally doable with 10
> decent machines. Even with replication.
>
> > I'm afraid that, in any failure, I won't be able to access the whole
> > cluster. Losing data is okay but I have to ignore missing files,
>
> Even with repl=1, in case of a failure, the cluster will still aim at
> fixing itself rather than ignoring currently lost data and moving on,
> so any solution that involves "forgetting" about lost data would need
> a ceph operator telling the cluster to ignore all the missing parts
> and to recreate the broken PGs. This would not be automatic.
>
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: import OSD after host OS reinstallation

2023-04-28 Thread Tony Liu
Thank you Eugen for looking into it!
In short, it works. I'm using 16.2.10.
What I did wrong was to remove the OSD, which makes no sense.

Tony

From: Eugen Block 
Sent: April 28, 2023 06:46 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: import OSD after host OS reinstallation

I chatted with Mykola who helped me get the OSDs back up. My test
cluster was on 16.2.5 (and still mostly is), after upgrading only the
MGRs to a more recent version (16.2.10) the activate command worked
successfully and the existing OSDs got back up. Not sure if that's a
bug or something else, but which exact versions are you using?

Zitat von Eugen Block :

> I found a small two-node cluster to test this on pacific, I can
> reproduce it. After reinstalling the host (VM) most of the other
> services are redeployed (mon, mgr, mds, crash), but not the OSDs. I
> will take a closer look.
>
> Zitat von Tony Liu :
>
>> Tried [1] already, but got error.
>> Created no osd(s) on host ceph-4; already created?
>>
>> The error is from [2] in deploy_osd_daemons_for_existing_osds().
>>
>> Not sure what's missing.
>> Should OSD be removed, or removed with --replace, or untouched
>> before host reinstallation?
>>
>> [1]
>> https://docs.ceph.com/en/pacific/cephadm/services/osd/#activate-existing-osds
>> [2]
>> https://github.com/ceph/ceph/blob/0a5b3b373b8a5ba3081f1f110cec24d82299cac8/src/pybind/mgr/cephadm/services/osd.py#L196
>>
>> Thanks!
>> Tony
>> 
>> From: Tony Liu 
>> Sent: April 27, 2023 10:20 PM
>> To: ceph-users@ceph.io; d...@ceph.io
>> Subject: [ceph-users] import OSD after host OS reinstallation
>>
>> Hi,
>>
>> The cluster is with Pacific and deployed by cephadm on container.
>> The case is to import OSDs after host OS reinstallation.
>> All OSDs are SSD who has DB/WAL and data together.
>> Did some research, but not able to find a working solution.
>> Wondering if anyone has experiences in this?
>> What needs to be done before host OS reinstallation and what's after?
>>
>>
>> Thanks!
>> Tony
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs - max snapshot limit?

2023-04-28 Thread Milind Changire
If a dir doesn't exist at the moment of snapshot creation, then the
schedule is deactivated for that dir.


On Fri, Apr 28, 2023 at 8:39 PM Jakob Haufe  wrote:

> On Thu, 27 Apr 2023 11:10:07 +0200
> Tobias Hachmer  wrote:
>
> >  > Given the limitation is per directory, I'm currently trying this:
> >  >
> >  > / 1d 30d
> >  > /foo 1h 48h
> >  > /bar 1h 48h
> >  >
> >  > I forgot to activate the new schedules yesterday so I can't say
> whether
> >  > it works as expected yet.
> >
> > Please let me know if this works.
>
> It doesn't.
>
> I haven't re-visited the code yet, but for some reason the lower level
> schedules get deactivated again, seemingly each time they are supposed
> to create a snapshot.
>
> Cheers,
> sur5r
>
> --
> ceterum censeo microsoftem esse delendam.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Milind
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs - max snapshot limit?

2023-04-28 Thread Jakob Haufe
> FYI, PR - https://github.com/ceph/ceph/pull/51278

Thanks!

I just applied this to my cluster and will report back. Looks simple
enough, tbh.

Cheers,
sur5r

-- 
ceterum censeo microsoftem esse delendam.


pgp1U9cMc_XaM.pgp
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: architecture help (iscsi, rbd, backups?)

2023-04-28 Thread Maged Mokhtar

Hello Angelo

You can try PetaSAN
www.petasan.org

We support scale out iscsi with Ceph and is actively developed.

/Maged


On 27/04/2023 23:05, Angelo Höngens wrote:

Hey guys and girls,

I'm working on a project to build storage for one of our departments,
and I want to ask you guys and girls for input on the high-level
overview part. It's a long one, I hope you read along and comment.

SUMMARY

I made a plan last year to build a 'storage solution' including ceph
and some windows VM's to expose the data over SMB to clients. A year
later I finally have the hardware, built a ceph cluster, and I'm doing
tests. Ceph itself runs great, but when I wanted to start exposing the
data using iscsi to our VMware farm, I ran into some issues. I know
the iscsi gateways will introduce some new performance bottlenecks,
but I'm seeing really slow performance, still working on that.

But then I ran into the warning on the iscsi gateway page: "The iSCSI
gateway is in maintenance as of November 2022. This means that it is
no longer in active development and will not be updated to add new
features.". Wait, what? Why!? What does this mean? Does this mean that
iSCSI is now 'feature complete' and will still be supported the next 5
years, or will it be deprecated in the future? I tried searching, but
couldn't find any info on the decision and the roadmap.

My goal is to build a future-proof setup, and using deprecated
components should not be part of that of course.

If the iscsi gateway will still be supported the next few years and I
can iron out the performance issues, I can still go on with my
original plan. If not, I have to go back to the drawing board. And
maybe you guys would advise me to take another route anyway.

GOALS

My goals/considerations are:

- we want >1PB of storage capacity for cheap (on a tight budget) for
research data. Most of it is 'store once, read sometimes'. <1% of the
data is 'hot'.
- focus is on capacity, but it would be nice to have > 200MB/s of
sequential write/read performance and not 'totally suck' on random
i/o. Yes, not very well quantified, but ah. Sequential writes are most
important.
- end users all run Windows computers (mostly VDI's) and a lot of
applications require SMB shares.
- security is a big thing, we want really tight ACL's, specific
monitoring agents, etc.
- our data is incredibly important to us, we still want the 3-2-1
backup rule. Primary storage solution, a second storage solution in a
different place, and some of the data that is not reproducible is also
written to tape. We also want to be protected from ransomware or user
errors (so no direct replication to the second storage).
- I like open source, reliability, no fork-lift upgrades, no vendor
lock-in, blah, well, I'm on the ceph list here, no need to convince
you guys ;)
- We're hiring a commercial company to do ceph maintenance and support
for when I'm on leave or leaving the company, but they won't support
clients, backup software, etc, so I want something as simple as
possible. We do have multiple Windows/VMware admins, but no other real
linux guru's.

THE INITIAL PLAN

Given these considerations, I ordered two identical clusters, each
consisting of 3 monitor nodes and 8 osd nodes, Each osd node has 2
ssd's and 10 capacity disks (EC 4:2 for the data), and each node is
connected using a 2x25Gbps bond. Ceph is running like a charm. Now I
just have to think about exposing the data to end users, and I've been
testing different setups.

My original plan was to expose for example 10x100TB rbd images using
iSCSI to our VMware farm, formatting the luns with VMFS6, and run for
example 2 Windows file servers per datastore on that with a single DFS
namespace to end users. Then backup the file servers using our
existing Veeam infrastructure to RGW running on the second cluster
with an immutable bucket. This way we would have easily defined
security boundaries: the clients can only reach the file servers, the
file servers only see their local VMDK's, ESX only sees the luns on
the iSCSI target, etc. When a file server would be compromised, it
would have no access to ceph. We have easy incremental backups,
immutability for ransomware protection, etc. And the best part is that
the ceph admin can worry about ceph, the vmware admin can focus on
ESX, VMFS and all the vmware stuff, and the Windows admins can focus
on the Windows boxes, Windows-specific ACLS and tools and Veeam
backups and stuff.

CURRENT SITUATION

I'm building out this plan now, but I'm running into issues with
iSCSI. Are any of you doing something similar? What is your iscsi
performance compared to direct rbd?

In regard to performance: If I take 2 test windows VM's, I put one on
an iSCSI datastore and another with direct rbd access using the
windows rbd driver, I create a share on those boxes and push data to
it, I see different results (of course). Copying some iso images over
SMB to the 'windows vm running direct rbd' I see around 800MB/s write,
and 200MB/s read, 

[ceph-users] Re: cephfs - max snapshot limit?

2023-04-28 Thread Jakob Haufe
On Thu, 27 Apr 2023 11:10:07 +0200
Tobias Hachmer  wrote:

>  > Given the limitation is per directory, I'm currently trying this:
>  >
>  > / 1d 30d
>  > /foo 1h 48h
>  > /bar 1h 48h
>  >
>  > I forgot to activate the new schedules yesterday so I can't say whether
>  > it works as expected yet.  
> 
> Please let me know if this works.

It doesn't.

I haven't re-visited the code yet, but for some reason the lower level
schedules get deactivated again, seemingly each time they are supposed
to create a snapshot.

Cheers,
sur5r

-- 
ceterum censeo microsoftem esse delendam.


pgpwgkP42anng.pgp
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: import OSD after host OS reinstallation

2023-04-28 Thread Eugen Block
I chatted with Mykola who helped me get the OSDs back up. My test  
cluster was on 16.2.5 (and still mostly is), after upgrading only the  
MGRs to a more recent version (16.2.10) the activate command worked  
successfully and the existing OSDs got back up. Not sure if that's a  
bug or something else, but which exact versions are you using?


Zitat von Eugen Block :

I found a small two-node cluster to test this on pacific, I can  
reproduce it. After reinstalling the host (VM) most of the other  
services are redeployed (mon, mgr, mds, crash), but not the OSDs. I  
will take a closer look.


Zitat von Tony Liu :


Tried [1] already, but got error.
Created no osd(s) on host ceph-4; already created?

The error is from [2] in deploy_osd_daemons_for_existing_osds().

Not sure what's missing.
Should OSD be removed, or removed with --replace, or untouched  
before host reinstallation?


[1]  
https://docs.ceph.com/en/pacific/cephadm/services/osd/#activate-existing-osds
[2]  
https://github.com/ceph/ceph/blob/0a5b3b373b8a5ba3081f1f110cec24d82299cac8/src/pybind/mgr/cephadm/services/osd.py#L196


Thanks!
Tony

From: Tony Liu 
Sent: April 27, 2023 10:20 PM
To: ceph-users@ceph.io; d...@ceph.io
Subject: [ceph-users] import OSD after host OS reinstallation

Hi,

The cluster is with Pacific and deployed by cephadm on container.
The case is to import OSDs after host OS reinstallation.
All OSDs are SSD who has DB/WAL and data together.
Did some research, but not able to find a working solution.
Wondering if anyone has experiences in this?
What needs to be done before host OS reinstallation and what's after?


Thanks!
Tony
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Lua scripting in the rados gateway

2023-04-28 Thread Thomas Bennett
Hey Yuval,

No problem. It was interesting to me to figure out how it all fits together
and works.  Thanks for opening an issue on the tracker.

Cheers,
Tom

On Thu, 27 Apr 2023 at 15:03, Yuval Lifshitz  wrote:

> Hi Thomas,
> Thanks for the detailed info!
> RGW lua scripting was never tested in a cephadm deployment :-(
> Opened a tracker: https://tracker.ceph.com/issues/59574 to make sure this
> would work out of the box.
>
> Yuval
>
>
> On Tue, Apr 25, 2023 at 10:25 PM Thomas Bennett  wrote:
>
>> Hi ceph users,
>>
>> I've been trying out the lua scripting for the rados gateway (thanks
>> Yuval).
>>
>> As in my previous email I mentioned that there is an error when trying to
>> load the luasocket module. However, I thought it was a good time to report
>> on my progress.
>>
>> My 'hello world' example below is called *test.lua* below includes the
>> following checks:
>>
>>1. Can I write to the debug log?
>>2. Can I use the lua socket package to do something stupid but
>>intersting, like connect to a webservice?
>>
>> Before you continue reading this, you might need to know that I run all
>> ceph processes in a *CentOS Stream release 8 *container deployed using
>> ceph
>> orchestrator running *Ceph v17.2.5*, so please view the information below
>> in that context.
>>
>> For anyone looking for a reference, I suggest going to the ceph lua rados
>> gateway documentation at radosgw/lua-scripting
>> .
>>
>> There are two new switches you need to know about in the radosgw-admin:
>>
>>- *script* -> loading your lua script
>>- *script-package* -> loading supporting packages for your script -
>> e.i.
>>luasocket in this case.
>>
>> For a basic setup, you'll need to have a few dependencies in your
>> containers:
>>
>>- cephadm container: requires luarocks (I've checked the code - it runs
>>a luarocks search command)
>>- radosgw container: requires luarocks, gcc, make,  m4, wget (wget just
>>in case).
>>
>> To achieve the above, I updated the container image for our running
>> system.
>> I needed to do this because I needed to redeploy the rados gateway
>> container to inject the lua script packages into the radosgw runtime
>> process. This will start with a fresh container based on the global config
>> *container_image* setting on your running system.
>>
>> For us this is currently captured in *quay.io/tsolo/ceph:v17.2.5-3
>> * and included the following exta
>> steps (including installing the lua dev from an rpm because there is no
>> centos package in yum):
>> yum install luarocks gcc make wget m4
>> rpm -i
>>
>> https://rpmfind.net/linux/centos/8-stream/PowerTools/x86_64/os/Packages/lua-devel-5.3.4-12.el8.x86_64.rpm
>>
>> You will notice that I've included a compiler and compiler support into
>> the
>> image. This is because luarocks on the radosgw to compile luasocket (the
>> package I want to install). This will happen at start time when the
>> radosgw
>> is restarted from ceph orch.
>>
>> In the cephadm container I still need to update our cephadm shell so I
>> need
>> to install luarocks by hand:
>> yum install luarocks
>>
>> Then set thew updated image to use:
>> ceph config set global container_image quay.io/tsolo/ceph:v17.2.5-3
>>
>> I now create a file called: *test.lua* in the cephadm container. This
>> contains the following lines to write to the log and then do a get request
>> to google. This is not practical in production, but it serves the purpose
>> of testing the infrastructure:
>>
>> RGWDebugLog("Tsolo start lua script")
>> local LuaSocket = require("socket")
>> client = LuaSocket.connect("google.com", 80)
>> client:send("GET / HTTP/1.0\r\nHost: google.com\r\n\r\n")
>> while true do
>>   s, status, partial = client:receive('*a')
>>   RGWDebugLog(s or partial)
>>   if status == "closed" then
>> break
>>   end
>> end
>> client:close()
>> RGWDebugLog("Tsolo stop lua")
>>
>> Next I run:
>> radosgw-admin script-package add --package=luasocket --allow-compilation
>>
>> And then list the added package to make sure it is there:
>> radosgw-admin script-package list
>>
>> Note - at this point the radosgw has not been modified, it must first be
>> restarted.
>>
>> Then I put the *test.lua *script into the pre request context:
>> radosgw-admin script put --infile=test.lua --context=preRequest
>>
>> You also need to raise the debug log level on the running rados gateway:
>> ceph daemon
>> /var/run/ceph/ceph-client.rgw.xxx.xxx-cms1.x.x.xx.asok
>> config set debug_rgw 20
>>
>> Inside the radosgw container I apply my fix (as per previous email):
>> cp -ru /tmp/luarocks/client.rgw.xx.xxx--.pcoulb/lib64/*
>> /tmp/luarocks/client.rgw.xx.xxx--.pcoulb/lib/
>>
>> Outside on the host running the radosgw-admin container I follow the
>> journalctl for the radosgw container (to get the logs):
>> journalctl -fu 

[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-28 Thread Thomas Bennett
A pleasure. Hope it helps :)

Happy to share if you need any more information Zac.

Cheers,
Tom

On Wed, 26 Apr 2023 at 18:14, Dan van der Ster 
wrote:

> Thanks Tom, this is a very useful post!
> I've added our docs guy Zac in cc: IMHO this would be useful in a
> "Tips & Tricks" section of the docs.
>
> -- dan
>
> __
> Clyso GmbH | https://www.clyso.com
>
>
>
>
> On Wed, Apr 26, 2023 at 7:46 AM Thomas Bennett  wrote:
> >
> > I would second Joachim's suggestion - this is exactly what we're in the
> > process of doing for a client, i.e migrating from Luminous to Quincy.
> > However below would also work if you're moving to Nautilus.
> >
> > The only catch with this plan would be if you plan to reuse any hardware
> -
> > i.e the hosts running rados gateways and mons, etc. If you have enough
> > hardware to spare this is a good plan.
> >
> > My process:
> >
> >1. Stand a new Quincy cluster and tune the cluster.
> >2. Migrate user information, secrets and access keys (using
> >radosg-admin in a script).
> >3. Using a combination of rclone and parallel to push data across from
> >the old cluster to the new cluster.
> >
> >
> > Below is a bash script I used to capture all the user information on the
> > old cluster and I ran it on the new cluster to create users and keep
> their
> > secrets and keys the same.
> >
> > #
> > for i in $(radosgw-admin user list | jq -r .[]); do
> > USER_INFO=$(radosgw-admin user info --uid=$i)
> > USER_ID=$(echo $USER_INFO | jq -r '.user_id')
> > DISPLAY_NAME=$(echo $USER_INFO | jq '.display_name')
> > EMAIL=$(echo $USER_INFO | jq '.email')
> > MAX_BUCKETS=$(echo $USER_INFO | jq -r '(.max_buckets|tostring)')
> > ACCESS=$(echo $USER_INFO | jq -r '.keys[].access_key')
> > SECRET=$(echo $USER_INFO | jq -r '.keys[].secret_key')
> > echo "radosgw-admin user create --uid=$USER_ID
> > --display-name=$DISPLAY_NAME --email=$EMAIL --max-buckets=$MAX_BUCKETS
> > --access-key=$ACCESS --secret-key=$SECRET" | tee -a
> > generated.radosgw-admin-user-create.sh
> > done
> > #
> >
> > Rclone is a really powerful tool! I lazily set up a backends for each
> user,
> > by appending below to the for loop in the above script. Below script is
> not
> > pretty but it does the job:
> > #
> > echo "" >> generated.rclone.conf
> > echo [old-cluster-$USER_ID] >> generated.rclone.conf
> > echo type = s3 >> generated.rclone.conf
> > echo provider = Ceph >> generated.rclone.conf
> > echo env_auth = false >> generated.rclone.conf
> > echo access_key_id = $ACCESS >> generated.rclone.conf
> > echo secret_access_key = $SECRET >> generated.rclone.conf
> > echo endpoint = http://xx.xx.xx.xx: >> generated.rclone.conf
> > echo acl = public-read >> generated.rclone.conf
> > echo "" >> generated.rclone.conf
> > echo [new-cluster-$USER_ID] >> generated.rclone.conf
> > echo type = s3 >> generated.rclone.conf
> > echo provider = Ceph >> generated.rclone.conf
> > echo env_auth = false >> generated.rclone.conf
> > echo access_key_id = $ACCESS >> generated.rclone.conf
> > echo secret_access_key = $SECRET >> generated.rclone.conf
> > echo endpoint = http://yy.yy.yy.yy: >> generated.rclone.conf
> > echo acl = public-read >> generated.rclone.conf
> > echo "" >> generated.rclone.conf
> > #
> >
> > Copy the generated.rclone.conf to the node that is going to act as the
> > transfer node (I just used the new rados gateway node) into
> > ~/.config/rclone/rclone.conf
> >
> > Now if you run rclone lsd old-cluser-{user}: (it even tab completes!)
> > you'll get a list of all the buckets for that user.
> >
> > You could even simply rclone sync old-cluser-{user}: new-cluser-{user}:
> and
> > it should sync all buckets for a user.
> >
> > Catches:
> >
> >- Use the scripts carefully - our buckets for this one user are set
> >public-read - you might want to check each line of the script if you
> use it.
> >- Quincy bucket naming convention is stricter than Luminous. I've had
> to
> >catch some '_' and upper cases and fix them in the command line I
> generate
> >for copying each bucket.
> >- Using rclone will take a long time.Feed a script into parallel sped
> >things up for me:
> >   - # parallel -j 10 < sync-script
> >- Watch out for lifecycling! Not sure how to handle this to make sure
> >it's captured correctly.
> >
> > Cheers,
> > Tom
> >
> > On Tue, 25 Apr 2023 at 22:36, Marc  wrote:
> >
> > >
> > > Maybe he is limited by the supported OS
> > >
> > >
> > > >
> > > > I would create a new cluster with Quincy and would migrate the data
> from
> > > > the old to the new cluster bucket by bucket. Nautilus is out of
> support
> > > > and
> > > > I would recommend at least to use a ceph version that is receiving
> > > > Backports.
> > > >
> > > > huxia...@horebdata.cn  schrieb am Di., 25.
> Apr.
> > > > 2023, 18:30:
> > > >
> > > > > Dear Ceph 

[ceph-users] Re: How to call cephfs-top

2023-04-28 Thread Jos Collin




On 28/04/23 13:51, E Taka wrote:

I'm using a dockerized Ceph 17.2.6 under Ubuntu 22.04.

Presumably I'm missing a very basic thing, since this seems a very simple
question: how can I call cephfs-top in my environment? It is not inckuded
in the Docker Image which is accessed by "cephadm shell".

And calling the version found in the source code always fails with "[errno
13] RADOS permission denied", even when using "--cluster" with the correct
ID, "--conffile" and "--id".


To run from the source code, you need to set PYTHONPATH to 
ceph/build/lib/cython_modules/lib.3/




The auth user client.fstop exists, and "ceph fs perf stats" runs.
What am I missing?

Thanks!
___
ceph-users mailing list --ceph-users@ceph.io
To unsubscribe send an email toceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help needed to configure erasure coding LRC plugin

2023-04-28 Thread Michel Jouvin

Hi,

I think I found a possible cause of my PG down but still understand why. 
As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9, 
m=6) but I have only 12 OSD servers in the cluster. To workaround the 
problem I defined the failure domain as 'osd' with the reasoning that as 
I was using the LRC plugin, I had the warranty that I could loose a site 
without impact, thus the possibility to loose 1 OSD server. Am I wrong?


Best regards,

Michel

Le 24/04/2023 à 13:24, Michel Jouvin a écrit :

Hi,

I'm still interesting by getting feedback from those using the LRC 
plugin about the right way to configure it... Last week I upgraded 
from Pacific to Quincy (17.2.6) with cephadm which is doing the 
upgrade host by host, checking if an OSD is ok to stop before actually 
upgrading it. I had the surprise to see 1 or 2 PGs down at some points 
in the upgrade (happened not for all OSDs but for every 
site/datacenter). Looking at the details with "ceph health detail", I 
saw that for these PGs there was 3 OSDs down but I was expecting the 
pool to be resilient to 6 OSDs down (5 for R/W access) so I'm 
wondering if there is something wrong in our pool configuration (k=9, 
m=6, l=5).


Cheers,

Michel

Le 06/04/2023 à 08:51, Michel Jouvin a écrit :

Hi,

Is somebody using LRC plugin ?

I came to the conclusion that LRC  k=9, m=3, l=4 is not the same as 
jerasure k=9, m=6 in terms of protection against failures and that I 
should use k=9, m=6, l=5 to get a level of resilience >= jerasure 
k=9, m=6. The example in the documentation (k=4, m=2, l=3) suggests 
that this LRC configuration gives something better than jerasure k=4, 
m=2 as it is resilient to 3 drive failures (but not 4 if I understood 
properly). So how many drives can fail in the k=9, m=6, l=5 
configuration first without loosing RW access and second without 
loosing data?


Another thing that I don't quite understand is that a pool created 
with this configuration (and failure domain=osd, locality=datacenter) 
has a min_size=3 (max_size=18 as expected). It seems wrong to me, I'd 
expected something ~10 (depending on answer to the previous question)...


Thanks in advance if somebody could provide some sort of 
authoritative answer on these 2 questions. Best regards,


Michel

Le 04/04/2023 à 15:53, Michel Jouvin a écrit :
Answering to myself, I found the reason for 2147483647: it's 
documented as a failure to find enough OSD (missing OSDs). And it is 
normal as I selected different hosts for the 15 OSDs but I have only 
12 hosts!


I'm still interested by an "expert" to confirm that LRC  k=9, m=3, 
l=4 configuration is equivalent, in terms of redundancy, to a 
jerasure configuration with k=9, m=6.


Michel

Le 04/04/2023 à 15:26, Michel Jouvin a écrit :

Hi,

As discussed in another thread (Crushmap rule for multi-datacenter 
erasure coding), I'm trying to create an EC pool spanning 3 
datacenters (datacenters are present in the crushmap), with the 
objective to be resilient to 1 DC down, at least keeping the 
readonly access to the pool and if possible the read-write access, 
and have a storage efficiency better than 3 replica (let say a 
storage overhead <= 2).


In the discussion, somebody mentioned LRC plugin as a possible 
jerasure alternative to implement this without tweaking the 
crushmap rule to implement the 2-step OSD allocation. I looked at 
the documentation 
(https://docs.ceph.com/en/latest/rados/operations/erasure-code-lrc/) 
but I have some questions if someone has experience/expertise with 
this LRC plugin.


I tried to create a rule for using 5 OSDs per datacenter (15 in 
total), with 3 (9 in total) being data chunks and others being 
coding chunks. For this, based of my understanding of examples, I 
used k=9, m=3, l=4. Is it right? Is this configuration equivalent, 
in terms of redundancy, to a jerasure configuration with k=9, m=6?


The resulting rule, which looks correct to me, is:



{
    "rule_id": 6,
    "rule_name": "test_lrc_2",
    "ruleset": 6,
    "type": 3,
    "min_size": 3,
    "max_size": 15,
    "steps": [
    {
    "op": "set_chooseleaf_tries",
    "num": 5
    },
    {
    "op": "set_choose_tries",
    "num": 100
    },
    {
    "op": "take",
    "item": -4,
    "item_name": "default~hdd"
    },
    {
    "op": "choose_indep",
    "num": 3,
    "type": "datacenter"
    },
    {
    "op": "chooseleaf_indep",
    "num": 5,
    "type": "host"
    },
    {
    "op": "emit"
    }
    ]
}



Unfortunately, it doesn't work as expected: a pool created with 
this rule ends up with its pages active+undersize, which is 
unexpected for me. Looking at 'ceph health detail` output, I see 
for each page something like:


pg 52.14 is stuck undersized for 27m, current state 
active+undersized, last acting 

[ceph-users] Re: cephfs - max snapshot limit?

2023-04-28 Thread Milind Changire
FYI, PR - https://github.com/ceph/ceph/pull/51278

On Fri, Apr 28, 2023 at 8:49 AM Milind Changire  wrote:

> There's a default/hard limit of 50 snaps that's maintained for any dir via
> the definition MAX_SNAPS_PER_PATH = 50 in the source file
> src/pybind/mgr/snap_schedule/fs/schedule_client.py.
> Every time the snapshot names are read for pruning, the last thing done is
> to check the length of the list and keep only MAX_SNAPS_PER_PATH and the
> rest are pruned.
>
> Jakob Haufe has pointed it out correctly.
>
>
>
> On Thu, Apr 27, 2023 at 12:38 PM Tobias Hachmer  wrote:
>
>> Hello,
>>
>> we are running a 3-node ceph cluster with version 17.2.6.
>>
>> For CephFS snapshots we have configured the following snap schedule with
>> retention:
>>
>> /PATH 2h 72h15d6m
>>
>> But we observed that max 50 snapshot are preserved. If a new snapshot is
>> created the oldest 51st is deleted.
>>
>> Is there a limit for maximum cephfs snapshots or maybe this is a bug?
>>
>> I have found the setting "mds_max_snaps_per_dir" which is 100 by default
>> but I think this is not related to my problem?
>>
>> Thanks,
>>
>> Tobias
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
> Milind
>
>

-- 
Milind
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: import OSD after host OS reinstallation

2023-04-28 Thread Eugen Block
I found a small two-node cluster to test this on pacific, I can  
reproduce it. After reinstalling the host (VM) most of the other  
services are redeployed (mon, mgr, mds, crash), but not the OSDs. I  
will take a closer look.


Zitat von Tony Liu :


Tried [1] already, but got error.
Created no osd(s) on host ceph-4; already created?

The error is from [2] in deploy_osd_daemons_for_existing_osds().

Not sure what's missing.
Should OSD be removed, or removed with --replace, or untouched  
before host reinstallation?


[1]  
https://docs.ceph.com/en/pacific/cephadm/services/osd/#activate-existing-osds
[2]  
https://github.com/ceph/ceph/blob/0a5b3b373b8a5ba3081f1f110cec24d82299cac8/src/pybind/mgr/cephadm/services/osd.py#L196


Thanks!
Tony

From: Tony Liu 
Sent: April 27, 2023 10:20 PM
To: ceph-users@ceph.io; d...@ceph.io
Subject: [ceph-users] import OSD after host OS reinstallation

Hi,

The cluster is with Pacific and deployed by cephadm on container.
The case is to import OSDs after host OS reinstallation.
All OSDs are SSD who has DB/WAL and data together.
Did some research, but not able to find a working solution.
Wondering if anyone has experiences in this?
What needs to be done before host OS reinstallation and what's after?


Thanks!
Tony
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to call cephfs-top

2023-04-28 Thread E Taka
I'm using a dockerized Ceph 17.2.6 under Ubuntu 22.04.

Presumably I'm missing a very basic thing, since this seems a very simple
question: how can I call cephfs-top in my environment? It is not inckuded
in the Docker Image which is accessed by "cephadm shell".

And calling the version found in the source code always fails with "[errno
13] RADOS permission denied", even when using "--cluster" with the correct
ID, "--conffile" and "--id".

The auth user client.fstop exists, and "ceph fs perf stats" runs.
What am I missing?

Thanks!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs - max snapshot limit?

2023-04-28 Thread MARTEL Arnaud
Hi Venky,

> Also, at one point the kclient wasn't able to handle more than 400 snapshots 
> (per file system), but we have come a long way from that and that is not a 
> constraint right now.
Does it mean that there is no more limit to the number of snapshots per 
filesystem? And, if not, do you know what is the max number of snapshots now 
(per filesystem) ??

Cheers,
Arnaud

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: import OSD after host OS reinstallation

2023-04-28 Thread Eugen Block

Hi,


Not sure what's missing.
Should OSD be removed, or removed with --replace, or untouched  
before host reinstallation?


If you want to reuse existing OSDs why would you remove them? That's  
the whole point of reusing them after installation.



Tried [1] already, but got error.
Created no osd(s) on host ceph-4; already created?


That is expected, you don't want to create new OSDs but just activate  
the existing ones. Do you see cephadm trying to activate the OSDs?  
Check /var/log/ceph/cephadm.log on the reinstalled host for more  
details, maybe the mgr log has some information as well.


Regards,
Eugen

Zitat von Tony Liu :


Tried [1] already, but got error.
Created no osd(s) on host ceph-4; already created?

The error is from [2] in deploy_osd_daemons_for_existing_osds().

Not sure what's missing.
Should OSD be removed, or removed with --replace, or untouched  
before host reinstallation?


[1]  
https://docs.ceph.com/en/pacific/cephadm/services/osd/#activate-existing-osds
[2]  
https://github.com/ceph/ceph/blob/0a5b3b373b8a5ba3081f1f110cec24d82299cac8/src/pybind/mgr/cephadm/services/osd.py#L196


Thanks!
Tony

From: Tony Liu 
Sent: April 27, 2023 10:20 PM
To: ceph-users@ceph.io; d...@ceph.io
Subject: [ceph-users] import OSD after host OS reinstallation

Hi,

The cluster is with Pacific and deployed by cephadm on container.
The case is to import OSDs after host OS reinstallation.
All OSDs are SSD who has DB/WAL and data together.
Did some research, but not able to find a working solution.
Wondering if anyone has experiences in this?
What needs to be done before host OS reinstallation and what's after?


Thanks!
Tony
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs - max snapshot limit?

2023-04-28 Thread Venky Shankar
Hi Tobias,

On Thu, Apr 27, 2023 at 2:42 PM Tobias Hachmer  wrote:
>
> Hi sur5r,
>
> Am 4/27/23 um 10:33 schrieb Jakob Haufe:
>  > On Thu, 27 Apr 2023 09:07:10 +0200
>  > Tobias Hachmer  wrote:
>  >
>  >> But we observed that max 50 snapshot are preserved. If a new snapshot is
>  >> created the oldest 51st is deleted.
>  >>
>  >> Is there a limit for maximum cephfs snapshots or maybe this is a bug?
>  >
>  > I've been wondering the same thing for about 6 months now and found the
>  > reason just yesterday.
>  >
>  > The snap-schedule mgr module has a hard limit on how many snapshots it
>  > preserves, see [1]. It's even documented at [2] in section
>  > "Limitations" near the end of the page.
>  >
>  > The commit[3] implementing this does not only not explain the reason
>  > for the number at all, it doesn't even mention the fact it implements
>  > this.
>
> Thanks. I've red the documentation, but it's not clear enough. I thought
> "the retention list will be shortened to the newest 50 snapshots" will
> just truncate the list and not delete the snapshots, effectively.
>
> So as you stated the max. number of snapshots is currently a hard limit.
>
> Can anyone clarify the reasons for this? If there's a big reason to hard
> limit this it would be great to schedule snapshots more granular e.g.
> mo-fr every two hours between 8am-6pm.

This was done so that a particular directory does not eat up all the
snapshots - there is a per directory limit on the number of snapshots
controlled by mds_max_snaps_per_dir which defaults to 100 and
therefore MAX_SNAPS_PER_PATH was chosen to be much lower than that.
Also, at one point the kclient wasn't able to handle more than 400
snapshots (per file system), but we have come a long way from that and
that is not a constraint right now.

>
>  > Given the limitation is per directory, I'm currently trying this:
>  >
>  > / 1d 30d
>  > /foo 1h 48h
>  > /bar 1h 48h
>  >
>  > I forgot to activate the new schedules yesterday so I can't say whether
>  > it works as expected yet.
>
> Please let me know if this works.
>
> Thanks,
> Tobias
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Cheers,
Venky
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io