[ceph-users] Multi-datacenter filesystem

2022-05-13 Thread Daniel Persson
Hi Team

We have grown out of our current solution, and we plan to migrate to
multiple data centers.

Our setup is a mix of radosgw data and filesystem data. But we have many
legacy systems that require a filesystem at the moment, so we will probably
run it for some of our data for at least 3-5 years.

At the moment, we have about 0.5 Petabytes of data, so it is a small
cluster. Still, we want more redundancy, so we will partner with a company
with multiple data centers within the city and have redundant fiber between
the locations.

Our current center has multiple 10GB connections, so the communication
between the new locations and our existing data center will be slower.
Still, I hope the network traffic will suffice for a multi-datacenter setup.

Currently, I plan to assign OSDs to different sites and racks so we can
configure a good replication rule to keep a copy of the data in each data
center.

My question is how to handle the monitor setup for good redundancy. For
example, should I set up two new monitors in each new location and have one
in our existing data center, so I get five monitors in total, or should I
keep it as three monitors, one for each data center? Or should I go for
nine monitors 3 in each data center?

Should I use a Stretch set up to define the location of each monitor? Could
you do the same for MDS:es? Do I need to configure the mounting of the
filesystem differently to signal in which data center the client is located?

Does anyone know of a partner we could consult on these issues?

Best regards
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Changing PG size of cache pool

2022-03-26 Thread Daniel Persson
Hi Eugen.

I've tried. The system says it's not recommended but I may force it.
Forcing something with the risk of losing data is not something I'm going
to do.

Best regards
Daniel

On Sat, Mar 26, 2022 at 8:55 PM Eugen Block  wrote:

> Hi,
>
> just because the autoscaler doesn’t increase the pg_num doesn’t mean
> you can’t increase it manually. Have you tried that?
>
> Zitat von Daniel Persson :
>
> > Hi Team.
> >
> > We are currently in the process of changing the size of our cache pool.
> > Currently it's set to 32 PGs and distributed weirdly on our OSDs. The
> > system has tried automatically to scale it up to 256 PGs without
> succeeding
> > and I read that cache pools are not automatically scaled so we are in the
> > process of scaling. Our plan is to remove the old one and create a new
> one
> > with more PGs.
> >
> > I've run the pool in readproxy now for a week so most of the objects
> should
> > be available in cold storage but I want to be totally sure so we don't
> lose
> > any data.
> >
> > I read in the documentation that you could remove the overlay and that
> > would redirect clients to cold storage.
> >
> > Is a preferred strategy to remove the overlay and then run
> > cache-flush-evict-all to clear it and then replace or should I be fine
> just
> > to remove overlay and tiering and replace it with a new pool?
> >
> > Currently we have configured it to have a write caching of 0.5 hours and
> > read cache of 2 days.
> >
> > --
> > ceph osd pool set cephfs_data_cache cache_min_flush_age 1800
> > ceph osd pool set cephfs_data_cache cache_min_evict_age 172800
> > 
> >
> > The cache is still 25Tb in size and would be sad to lose if we have
> > unwritten data.
> >
> > Best regards
> > Daniel
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Changing PG size of cache pool

2022-03-26 Thread Daniel Persson
Hi Team.

We are currently in the process of changing the size of our cache pool.
Currently it's set to 32 PGs and distributed weirdly on our OSDs. The
system has tried automatically to scale it up to 256 PGs without succeeding
and I read that cache pools are not automatically scaled so we are in the
process of scaling. Our plan is to remove the old one and create a new one
with more PGs.

I've run the pool in readproxy now for a week so most of the objects should
be available in cold storage but I want to be totally sure so we don't lose
any data.

I read in the documentation that you could remove the overlay and that
would redirect clients to cold storage.

Is a preferred strategy to remove the overlay and then run
cache-flush-evict-all to clear it and then replace or should I be fine just
to remove overlay and tiering and replace it with a new pool?

Currently we have configured it to have a write caching of 0.5 hours and
read cache of 2 days.

--
ceph osd pool set cephfs_data_cache cache_min_flush_age 1800
ceph osd pool set cephfs_data_cache cache_min_evict_age 172800


The cache is still 25Tb in size and would be sad to lose if we have
unwritten data.

Best regards
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Question about cephadm, WAL and DB devices.

2022-01-04 Thread Daniel Persson
Hi.

I'm currently trying out cephadm, and I got into a state that was a bit
unexpected for me.

I created three host machines in VirtualBox to try out cephadm. All drives
I made for OSD are 20GB in size for simplicity.

Bootstrapped one host with one drive and then added the other two. Then
they directly added all available drives, so now I had 3x20 GB OSDs.

Watching the GUI, it said I could add WAL and DB devices, I never figured
out how to do that in the GUI, but I tried to do it manually.

ceph osd destroy osd.2 --force
ceph-volume lvm zap --destroy /dev/sdb
ceph auth get client.bootstrap-osd >
/var/lib/ceph/bootstrap-osd/ceph.keyring
ceph-volume lvm prepare --data /dev/sdb --block.db /dev/sdc --block.wal
/dev/sdd
ceph-volume lvm activate 2 d4a590eb-c0f6-47bc-a5fa-221bf8541e09

It worked, and I got the new OSD registered, but the strange thing was that
it was 40 GB and half full. Is this expected?

Best regards
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cache tiers hit_set values

2021-10-06 Thread Daniel Persson
Hi everyone.

I added a lot more storage to our cluster, and we now have a lot of slower
hard drives that could contain archival data. So I thought setting up a
cache tier for the fast drives should be a good idea.

We want to retain data for about a week in the cache pool as the data could
be interesting for at least a week, and then it will probably rarely be
accessed.

You can see my config below.

Would you change any of these values?
Do you have any better suggestions?

This is my interpretation:
I keep the data for 30 minutes before writing it back to the slower drives.
We will retain the data for seven days before we evict them from the cache.
All accessed objects will be promoted to cache

Now we only have the hit_set values left, and I'm not sure the right
strategy here. I've read a paper from SUSE where they explain it.
https://documentation.suse.com/ses/6/html/ses-all/cha-ceph-tiered.html#ses-tiered-hitset

And my understanding from that paper I should change the hit_set_count to
42, so I will have a hit set for the entire duration I want to keep data in
my cache pool. Is this correct?

We have a lot of overhead when it comes to disk, and the hit sets should be
pretty small in this context, so I don't think there should be a reason for
not keeping 42 of them.

-
ceph osd pool set cephfs_data_cache cache_min_flush_age 1800
ceph osd pool set cephfs_data_cache cache_min_evict_age 604800

ceph osd pool set cephfs_data_cache hit_set_count 12
ceph osd pool set cephfs_data_cache hit_set_period 14400
ceph osd pool set cephfs_data_cache hit_set_fpp 0.01
ceph osd pool set cephfs_data_cache min_write_recency_for_promote 0
ceph osd pool set cephfs_data_cache min_read_recency_for_promote 0
-

I appreciate any help you can provide.

Best regards

Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS replay questions

2021-10-06 Thread Daniel Persson
Hi Brian

I'm not sure if it applies to your application, and I'm not an expert.
However, we have been running our solution for about a year now, and we
have one of our MDS's in standby-replay.

Sadly we have found a bug with extensive memory usage, and when we needed
to replay, it took up to a minute, even with standby-replay. However, our
solution is an online site, so a minute is the same as ages.

I have also read that you are now allowed to run multiple MDS's on a file
system. It will, of course, add more memory usage, and it could also lead
to less performance. But the feature should be fully supported in the
Pacific release.

I'm curious what the correct solution is here.

Best regards
Daniel

On Wed, Oct 6, 2021 at 12:07 AM Brian Kim  wrote:

> Dear ceph-users,
>
> We have a ceph cluster with 3 MDS's and recently had to replay our cache
> which is taking an extremely long time to complete. Is there some way to
> speed up this process as well as apply some checkpoint so it doesn't have
> to start all the way from the beginning?
>
> --
> Best Wishes,
> Brian Kim
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MacOS Ceph Filesystem client

2021-09-28 Thread Daniel Persson
Hi Duncan

Great, thank you for the tip. I never open the graphical user interface as
we use this as a server. When logging in with SSH you sometimes miss popups
on the desktop.

Current status:

Mac Mini Intel Catalina - Connected and working fine.
Mac Mini M1 BigSur - Can't compile brew cask, no popups for extra
permissions in the GUI.

As I really only need one of the machines connected at the moment because
the job I want to run on Apple hardware work similarly on both.

But if there is any need or want from the community to test the software on
M1 hardware I could test other things.

Best regards
Daniel



On Tue, Sep 28, 2021 at 12:00 AM Duncan Bellamy 
wrote:

> Hi Daniel,
> Is it a Mac firewall or security access issue for the machine that was
> able to build?
>
> Regards,
>
> Duncan
>
> On 27 Sep 2021, at 22:43, Daniel Persson  wrote:
>
> 
> Hi Duncan.
>
> I've tried with a couple of different libraries.
>
> brew install osxfuse
> brew install macfuse
> brew install fuse
>
> But none of them helped with installation or connection for the machine
> that was able to build the client.
>
> Thank you for helping.
>
> Best regards
> Daniel
>
> On Mon, Sep 27, 2021 at 11:31 PM Duncan Bellamy 
> wrote:
>
>> Hi Daniel,
>> Have you got macFuse installed? https://osxfuse.github.io/
>>
>> Or maybe there is a brew version, if you search for fuse on home brew
>> this is the result:
>>
>> https://formulae.brew.sh/cask/fuse#default
>>
>> Regards,
>> Duncan
>>
>>
>> On 27 Sep 2021, at 22:24, Daniel Persson  wrote:
>>
>> Hi Duncan.
>>
>> Great suggestion. Thank you for the link. I've run it on both the M1
>> BigSur
>> mac and it did not compile because it didn't have a FUSE:FUSE target
>> whatever that meant.
>>
>> ===
>> Last 15 lines from
>> /Users/danielp/Library/Logs/Homebrew/ceph-client/03.cmake:
>>
>> CMake Error at src/rbd_fuse/CMakeLists.txt:1 (add_executable):
>>  Target "rbd-fuse" links to target "FUSE::FUSE" but the target was not
>>  found.  Perhaps a find_package() call is missing for an IMPORTED target,
>> or
>>  an ALIAS target is missing?
>> ===
>>
>> On the Catalina Intel-mac I was able to build it and ran ceph-fuse
>> multiple
>> times and it seems that it connects because the client is added to the
>> client list in my Cluster but the drive is never mapped to the directory I
>> try to map it too. So it seems not to work there either sadly.
>>
>> ===
>> admin$ sudo ceph-fuse -r /mydirectory -m 192.168.6.6:6789
>> /Users/admin/cephfs
>> &
>>
>> ceph-fuse[60646]: starting ceph client
>> 2021-09-27T23:16:20.498+0200 113a7ddc0 -1 init, newargv = 0x7f8ad6449d90
>> newargc=5
>> =======
>>
>> Does anyone have any ideas on how I could proceed?
>>
>> Best regards
>> Daniel
>>
>> On Mon, Sep 27, 2021 at 8:48 PM Duncan Bellamy 
>> wrote:
>>
>> Hi,
>>
>> It’s in brew:
>>
>>
>>
>>
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SEEM7QGCPAWDPIIW3P4ZYU432N6BQJGT/
>>
>>
>> Regards,
>>
>> Duncan
>>
>>
>> On 27 Sep 2021, at 17:46, Daniel Persson  wrote:
>>
>>
>> Hi
>>
>>
>> I'm running some tests on a couple of Mac Mini machines. One of them is an
>>
>> M1 with BigSur, and the other one is a regular Intel Mac with Catalina.
>>
>>
>> I've tried to build Ceph Nautilus, Octopus, and Pacific multiple times
>> with
>>
>> different parameters and added many dependencies to the systems but have
>>
>> not been able to build the software.
>>
>>
>> Has anyone tried to connect a Mac to your Ceph filesystem before? Do I
>> need
>>
>> to build the packages on the machine, or is there a more straightforward
>>
>> way?
>>
>>
>> Thank you for reading this.
>>
>>
>> Best regards
>>
>> Daniel
>>
>> ___
>>
>> ceph-users mailing list -- ceph-users@ceph.io
>>
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MacOS Ceph Filesystem client

2021-09-27 Thread Daniel Persson
Hi Duncan.

I've tried with a couple of different libraries.

brew install osxfuse
brew install macfuse
brew install fuse

But none of them helped with installation or connection for the machine
that was able to build the client.

Thank you for helping.

Best regards
Daniel

On Mon, Sep 27, 2021 at 11:31 PM Duncan Bellamy 
wrote:

> Hi Daniel,
> Have you got macFuse installed? https://osxfuse.github.io/
>
> Or maybe there is a brew version, if you search for fuse on home brew this
> is the result:
>
> https://formulae.brew.sh/cask/fuse#default
>
> Regards,
> Duncan
>
>
> On 27 Sep 2021, at 22:24, Daniel Persson  wrote:
>
> Hi Duncan.
>
> Great suggestion. Thank you for the link. I've run it on both the M1 BigSur
> mac and it did not compile because it didn't have a FUSE:FUSE target
> whatever that meant.
>
> ===
> Last 15 lines from
> /Users/danielp/Library/Logs/Homebrew/ceph-client/03.cmake:
>
> CMake Error at src/rbd_fuse/CMakeLists.txt:1 (add_executable):
>  Target "rbd-fuse" links to target "FUSE::FUSE" but the target was not
>  found.  Perhaps a find_package() call is missing for an IMPORTED target,
> or
>  an ALIAS target is missing?
> ===
>
> On the Catalina Intel-mac I was able to build it and ran ceph-fuse multiple
> times and it seems that it connects because the client is added to the
> client list in my Cluster but the drive is never mapped to the directory I
> try to map it too. So it seems not to work there either sadly.
>
> ===
> admin$ sudo ceph-fuse -r /mydirectory -m 192.168.6.6:6789
> /Users/admin/cephfs
> &
>
> ceph-fuse[60646]: starting ceph client
> 2021-09-27T23:16:20.498+0200 113a7ddc0 -1 init, newargv = 0x7f8ad6449d90
> newargc=5
> ===
>
> Does anyone have any ideas on how I could proceed?
>
> Best regards
> Daniel
>
> On Mon, Sep 27, 2021 at 8:48 PM Duncan Bellamy 
> wrote:
>
> Hi,
>
> It’s in brew:
>
>
>
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SEEM7QGCPAWDPIIW3P4ZYU432N6BQJGT/
>
>
> Regards,
>
> Duncan
>
>
> On 27 Sep 2021, at 17:46, Daniel Persson  wrote:
>
>
> Hi
>
>
> I'm running some tests on a couple of Mac Mini machines. One of them is an
>
> M1 with BigSur, and the other one is a regular Intel Mac with Catalina.
>
>
> I've tried to build Ceph Nautilus, Octopus, and Pacific multiple times with
>
> different parameters and added many dependencies to the systems but have
>
> not been able to build the software.
>
>
> Has anyone tried to connect a Mac to your Ceph filesystem before? Do I need
>
> to build the packages on the machine, or is there a more straightforward
>
> way?
>
>
> Thank you for reading this.
>
>
> Best regards
>
> Daniel
>
> ___
>
> ceph-users mailing list -- ceph-users@ceph.io
>
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MacOS Ceph Filesystem client

2021-09-27 Thread Daniel Persson
Hi Duncan.

Great suggestion. Thank you for the link. I've run it on both the M1 BigSur
mac and it did not compile because it didn't have a FUSE:FUSE target
whatever that meant.

===
Last 15 lines from
/Users/danielp/Library/Logs/Homebrew/ceph-client/03.cmake:

CMake Error at src/rbd_fuse/CMakeLists.txt:1 (add_executable):
  Target "rbd-fuse" links to target "FUSE::FUSE" but the target was not
  found.  Perhaps a find_package() call is missing for an IMPORTED target,
or
  an ALIAS target is missing?
===

On the Catalina Intel-mac I was able to build it and ran ceph-fuse multiple
times and it seems that it connects because the client is added to the
client list in my Cluster but the drive is never mapped to the directory I
try to map it too. So it seems not to work there either sadly.

===
admin$ sudo ceph-fuse -r /mydirectory -m 192.168.6.6:6789 /Users/admin/cephfs
&

ceph-fuse[60646]: starting ceph client
2021-09-27T23:16:20.498+0200 113a7ddc0 -1 init, newargv = 0x7f8ad6449d90
newargc=5
===

Does anyone have any ideas on how I could proceed?

Best regards
Daniel

On Mon, Sep 27, 2021 at 8:48 PM Duncan Bellamy 
wrote:

> Hi,
> It’s in brew:
>
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SEEM7QGCPAWDPIIW3P4ZYU432N6BQJGT/
>
> Regards,
> Duncan
>
> On 27 Sep 2021, at 17:46, Daniel Persson  wrote:
>
> Hi
>
> I'm running some tests on a couple of Mac Mini machines. One of them is an
> M1 with BigSur, and the other one is a regular Intel Mac with Catalina.
>
> I've tried to build Ceph Nautilus, Octopus, and Pacific multiple times with
> different parameters and added many dependencies to the systems but have
> not been able to build the software.
>
> Has anyone tried to connect a Mac to your Ceph filesystem before? Do I need
> to build the packages on the machine, or is there a more straightforward
> way?
>
> Thank you for reading this.
>
> Best regards
> Daniel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MacOS Ceph Filesystem client

2021-09-27 Thread Daniel Persson
Hi

I'm running some tests on a couple of Mac Mini machines. One of them is an
M1 with BigSur, and the other one is a regular Intel Mac with Catalina.

I've tried to build Ceph Nautilus, Octopus, and Pacific multiple times with
different parameters and added many dependencies to the systems but have
not been able to build the software.

Has anyone tried to connect a Mac to your Ceph filesystem before? Do I need
to build the packages on the machine, or is there a more straightforward
way?

Thank you for reading this.

Best regards
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cache tiering adding a storage tier

2021-09-18 Thread Daniel Persson
Hi Everyone.

I'm "new" to Ceph, only been administrating a cluster for about a year now,
so there is a lot more for me to learn about the subject.

The latest concept I've been looking into is Cache Tiering. I added it to
my home cluster without a problem and didn't see a degradation in
performance, and it seemed just to work.

The documentation is pretty straightforward to follow. First, add a hot
pool and then enable the tier and overlay for that hot pool. Then configure
a bunch of values, and it will do its thing.

In our production environment, we are running the hot pool, a lot of NVME
disks with low latency. But as we work with a lot of archival data, it
would be nice to move it over to spinning disks for a long time archive.

How do I add cold storage to a running cluster without disrupting the
current operation?

Thank you for reading my email.

Best regards
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Fwd: Module 'devicehealth' has failed

2021-09-13 Thread Daniel Persson
Hi David

It's hard to say with so little information what could be wrong, and I have
not seen any response yet, so I thought I could give you something that
might help you.

I've done a video about setting up the Ceph, Grafana, and Prometheus
triangle from scratch, the components responsible for hardware metrics and
monitoring. Maybe that could help?
https://youtu.be/c8R64LF3JjU

And I've also done a separate video about disk prediction and smart data in
a Ceph cluster that could give you some insights.
https://youtu.be/KFBuqTyxalM

I hope this helps.

Best regards
Daniel

On Sun, Sep 5, 2021 at 7:26 AM David Yang  wrote:

> hi, buddy
>
> I have a ceph file system cluster, using ceph version 15.2.14.
>
> But the current status of the cluster is HEALTH_ERR.
>
> health: HEALTH_ERR
> Module 'devicehealth' has failed:
>
> The content in the mgr log is as follows:
>
> 2021-09-05T13:20:32.922+0800 7f2b8621b700 0 log_channel(audit) log [DBG]:
> from='client.2109753 -'entity='client.admin' cmd=[{"prefix": "fs status",
> "target": ["mon-mgr", ""]}]: dispatch
> 2021-09-05T13:20:32.922+0800 7f2b86a1c700 0 [status ERROR root]
> handle_command
>
>
> How to fix this error, please help, thank you
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Ceph Dashboard] Alert configuration.

2021-08-23 Thread Daniel Persson
Hi Lokendra

There are a lot of ways to see the status of your cluster. The main way to
see it is to watch the dashboard alerts to see the most pressing matters to
handle. You can also follow the log that the manager will keep as
notifications. I usually use the "ceph health detail" to get the
information in the terminal, which gives you full information about the
cluster and sometimes even more handy tips on what to look into.

The AlertManager is good if you want to have an alert sent to you so you
can keep track of it on the go but not necessary for your daily use.

I don't know if I've missed something or if I miss understood your
question, but I hope this helps.

Best regards
Daniel

On Tue, Aug 24, 2021 at 6:17 AM Lokendra Rathour 
wrote:

> Hello Everyone,
> We have deployed Ceph ceph-ansible (Pacific Release).
> Query:
> Is it possible (if yes then what is the way), to view/verify the alerts
> (health/System both) directly without AlertManager?
> Or
> Can Ceph Dashboard only Only can help us see the Alerts in the Ceph
> Cluster(Health/System)?
>
>
> Please advise.
> --
> ~ Lokendra
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-08-18 Thread Daniel Persson
Hi Everyone.

I thought I put in my 5 cents as I believe this is an exciting topic. I'm
also a newbie, only running a cluster for about a year. I did some research
before that and also have created a couple of videos on the topic. One of
them was upgrading a cluster using cephadm.

- ABOUT MY SETUP 
Currently, I manage a cluster with ten hosts and 29 OSDs which is not that
large but critical for our operations as it is the backbone of our web
application. We made a move in a hurry when we realized that the disc drive
in the machine where the application was hosted was too slow to handle all
the requests, and we also had the issue that we needed more compute power
and distribution to ensure fault tolerance. This leads to us moving all
data while buying hardware and migrating clusters in one marathon upgrade.

After that experience, I was delighted with the stability that a Ceph
solution gave us. And it has been working quite well since then. To do more
research and prepare for the future, my company bought a couple of machines
for my home, so now I have a small cluster with four hosts / four OSDs at
home to store my backup of youtube video material and also try new
technologies.
- ABOUT MY SETUP END 

Now back to the experience of using cephadm. I installed a test cluster
locally with nine hosts in a VirtualBox environment, running Debian.
Setting up cephadm was pretty straight forward and doing the upgrade was
also "easy". But I was not fond of it at all as I felt that I lost control.
I had set up a couple of machines with different hardware profiles to run
various services on each, and when I put hosts into the cluster and
deployed services, cephadm choose to put things on machines not well suited
to handle that kind of work. Also, future more running the upgrade, you got
one line of text on the current progress, so I felt I was not in control of
what happened.

Currently, I run with the built packages for Debian and use the same
operating system and packages on all machines, and upgrading a cluster is
as easy as running the apt update and apt upgrade. After reboot, that
machine is done. By doing that in the correct order, you will have complete
control. And if anything goes wrong on the way, you can manage that machine
by machine. I understand that this works well for a small cluster with less
than ten hosts, as in my case. And might not be feasible if you have a
server park with 1000 servers, but then again, controlling and managing
your cluster is a part of the work, so perhaps you don't want an automatic
solution there either.

A minor other issue is that Docker adds complexity and takes some resources
that you might want to the cluster instead. What comes to mind is a
solution running hundreds of OSDs hosts on Raspberry PI switchblades in a
rack over POE+. Also, I saw a solution to mount 16 PIs in 1U with M2 ports
for a large SSD, which could be a fun solution for a cluster.

Best regards
Daniel

On Tue, Aug 17, 2021 at 8:09 PM Andrew Walker-Brown <
andrew_jbr...@hotmail.com> wrote:

> Hi,
>
> I’m coming at this from the position of a newbie to Ceph.  I had some
> experience of it as part of Proxmox, but not as a standalone solution.
>
> I really don’t care whether Ceph is contained or not, I don’t have the
> depth of knowledge or experience to argue it either way.  I can see that
> containers may well offer a more consistent deployment scenario with fewer
> dependencies on the external host OS.  Upgrades/patches to the host OS may
> not impact the container deployment etc., with the two systems not held in
> any lock-step.
>
> The challenge for me hasn’t been Ceph its self. Ceph has worked
> brilliantly, I have a fully resilient architecture split between two active
> datacentres and my storage can survive up-to 50% node/OSD hardware failure.
>
> No, the challenge has been documentation.  I’ve run off down multiple
> rabbit holes trying to find solutions to problems or just background
> information.  I’ve been tripped up by not spotting the Ceph documentation
> was “v: latest” rather than “v: octopus”...so features didn’t exist or
> commands were structured slightly differently.
>
> Also just not being obvious whether the bit of documentation I was looking
> at related to a native Ceph package deployment or a container one.  Plus
> you get the Ceph/Suse/Redhat/Proxmox/IBM etc..etc.. flavour answer
> depending on which Google link you click.  Yes I know, its part of the joy
> of working with open sourcebut still, not what you need when I chunk of
> infrastructure has failed and you don’t know why.
>
> I’m truly in awe of what the Ceph community has produced and is planning
> for the future, so don’t think I’m any kind of hater.
>
> My biggest request is for the documentation to take on some
> restructuring.  Keep the different deployment methods documented
> separately, yes an intro covering the various options and recommendations
> is great, but 

[ceph-users] Re: BUG #51821 - client is using insecure global_id reclaim

2021-08-17 Thread Daniel Persson
Hi again.

I've now solved my issue with help from people in this group. Thank you for
helping out.
I thought the process was a bit complicated so I created a short video
describing the process.

https://youtu.be/Ds4Wvvo79-M

I hope this helps someone else, and again thank you.

Best regards
Daniel


On Mon, Aug 9, 2021 at 5:43 PM Ilya Dryomov  wrote:

> On Mon, Aug 9, 2021 at 5:14 PM Robert W. Eckert 
> wrote:
> >
> > I have had the same issue with the windows client.
> > I had to issue
> > ceph config set mon auth_expose_insecure_global_id_reclaim false
> > Which allows the other clients to connect.
> > I think you need to restart the monitors as well, because the first few
> times I tried this, I still couldn't connect.
>
> For archive's sake, I'd like to mention that disabling
> auth_expose_insecure_global_id_reclaim isn't right and it wasn't
> intended for this.  Enabling auth_allow_insecure_global_id_reclaim
> should be enough to allow all (however old) clients to connect.
> The fact that it wasn't enough for the available Windows build
> suggests that there is some subtle breakage in it because all "expose"
> does is it forces the client to connect twice instead of just once.
> It doesn't actually refuse old unpatched clients.
>
> (The breakage isn't surprising given that the available build is
> more or less a random development snapshot with some pending at the
> time Windows-specific patches applied.  I'll try to escalate issue
> and get the linked MSI bundle updated.)
>
> Thanks,
>
> Ilya
>
> >
> > -Original Message-
> > From: Richard Bade 
> > Sent: Sunday, August 8, 2021 8:27 PM
> > To: Daniel Persson 
> > Cc: Ceph Users 
> > Subject: [ceph-users] Re: BUG #51821 - client is using insecure
> global_id reclaim
> >
> > Hi Daniel,
> > I had a similar issue last week after upgrading my test cluster from
> > 14.2.13 to 14.2.22 which included this fix for Global ID reclaim in .20.
> My issue was a rados gw that I was re-deploying on the latest version. The
> problem seemed to be related with cephx authentication.
> > It kept displaying the error message you have and the service wouldn't
> start.
> > I ended up stopping and removing the old rgw service, deleting all the
> keys in /etc/ceph/ and all data in /var/lib/ceph/radosgw/ and re-deploying
> the radosgw. This used the new rgw bootstrap keys and new key for this
> radosgw.
> > So, I would suggest you double and triple check which keys your clients
> are using and that cephx is enabled correctly on your cluster.
> > Check your admin key in /etc/ceph as well, as that's what's being used
> for ceph status.
> >
> > Regards,
> > Rich
> >
> > On Sun, 8 Aug 2021 at 05:01, Daniel Persson 
> wrote:
> > >
> > > Hi everyone.
> > >
> > > I suggested asking for help here instead of in the bug tracker so that
> > > I will try it.
> > >
> > > https://tracker.ceph.com/issues/51821?next_issue_id=51820_issue_i
> > > d=51824
> > >
> > > I have a problem that I can't seem to figure out how to resolve the
> issue.
> > >
> > > AUTH_INSECURE_GLOBAL_ID_RECLAIM: client is using insecure global_id
> > > reclaim
> > > AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure
> > > global_id reclaim
> > >
> > >
> > > Both of these have to do with reclaiming ID and securing that no
> > > client could steal or reuse another client's ID. I understand the
> > > reason for this and want to resolve the issue.
> > >
> > > Currently, I have three different clients.
> > >
> > > * One Windows client using the latest Ceph-Dokan build. (ceph version
> > > 15.0.0-22274-g5656003758 (5656003758614f8fd2a8c49c2e7d4f5cd637b0ea)
> > > pacific
> > > (rc))
> > > * One Linux Debian build using the built packages for that kernel. (
> > > 4.19.0-17-amd64)
> > > * And one client that I've built from source for a raspberry PI as
> > > there is no arm build for the Pacific release. (5.11.0-1015-raspi)
> > >
> > > If I switch over to not allow global id reclaim, none of these clients
> > > could connect, and using the command "ceph status" on one of my nodes
> > > will also fail.
> > >
> > > All of them giving the same error message:
> > >
> > > monclient(hunting): handle_auth_bad_method server allowed_methods [2]
> > > but i only support [2]
> > >
> > >
> > > Has anyone encountered this problem and have any 

[ceph-users] Re: BUG #51821 - client is using insecure global_id reclaim

2021-08-09 Thread Daniel Persson
Hi Tobias and Richard.

Thank you for answering my questions. I got the link suggested by Tobias on
the issue report, which led me to further investigation. It was hard to see
what version the kernel version on the system was using, but looking at the
result of "ceph health detail" and ldd librados2.so could give me some
information.

It seemed that one of my Linux environments used the old buster kernel
model, which was 12.2.* and not compatible with the new global ID reclaim.

Another issue I got was that the windows client available for download uses
a strange version 15.0.0 Pacific, which is just not correct.

After reading and searching on GitHub, I realized that the windows
executables could be built in a Linux environment using the ceph source
code. So I've now built new binaries to windows that work just fine except
for a libwnbd.dll which were never built. But adding it from the old
installation, I got it to work.

Now ceph-dokan reports a version of 16.2.5, which was the version I built.

Building this was not straightforward, and something I think could be
interesting for the community. So I'm planning to create an instruction
video on the subject that I will publish next week.

Again thank you for your help.

Best regards
Daniel

On Mon, Aug 9, 2021 at 11:46 AM Tobias Urdin 
wrote:

> Hello,
>
> Did you follow the fix/recommendation when applying patches as per
> the documentation in the CVE security post [1] ?
>
> Best regards
>
> [1] https://docs.ceph.com/en/latest/security/CVE-2021-20288/
>
> > On 9 Aug 2021, at 02:26, Richard Bade  wrote:
> >
> > Hi Daniel,
> > I had a similar issue last week after upgrading my test cluster from
> > 14.2.13 to 14.2.22 which included this fix for Global ID reclaim in
> > .20. My issue was a rados gw that I was re-deploying on the latest
> > version. The problem seemed to be related with cephx authentication.
> > It kept displaying the error message you have and the service wouldn't
> > start.
> > I ended up stopping and removing the old rgw service, deleting all the
> > keys in /etc/ceph/ and all data in /var/lib/ceph/radosgw/ and
> > re-deploying the radosgw. This used the new rgw bootstrap keys and new
> > key for this radosgw.
> > So, I would suggest you double and triple check which keys your
> > clients are using and that cephx is enabled correctly on your cluster.
> > Check your admin key in /etc/ceph as well, as that's what's being used
> > for ceph status.
> >
> > Regards,
> > Rich
> >
> > On Sun, 8 Aug 2021 at 05:01, Daniel Persson 
> wrote:
> >>
> >> Hi everyone.
> >>
> >> I suggested asking for help here instead of in the bug tracker so that I
> >> will try it.
> >>
> >>
> https://tracker.ceph.com/issues/51821?next_issue_id=51820_issue_id=51824
> >>
> >> I have a problem that I can't seem to figure out how to resolve the
> issue.
> >>
> >> AUTH_INSECURE_GLOBAL_ID_RECLAIM: client is using insecure global_id
> reclaim
> >> AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure
> >> global_id reclaim
> >>
> >>
> >> Both of these have to do with reclaiming ID and securing that no client
> >> could steal or reuse another client's ID. I understand the reason for
> this
> >> and want to resolve the issue.
> >>
> >> Currently, I have three different clients.
> >>
> >> * One Windows client using the latest Ceph-Dokan build. (ceph version
> >> 15.0.0-22274-g5656003758 (5656003758614f8fd2a8c49c2e7d4f5cd637b0ea)
> pacific
> >> (rc))
> >> * One Linux Debian build using the built packages for that kernel. (
> >> 4.19.0-17-amd64)
> >> * And one client that I've built from source for a raspberry PI as
> there is
> >> no arm build for the Pacific release. (5.11.0-1015-raspi)
> >>
> >> If I switch over to not allow global id reclaim, none of these clients
> >> could connect, and using the command "ceph status" on one of my nodes
> will
> >> also fail.
> >>
> >> All of them giving the same error message:
> >>
> >> monclient(hunting): handle_auth_bad_method server allowed_methods [2]
> >> but i only support [2]
> >>
> >>
> >> Has anyone encountered this problem and have any suggestions?
> >>
> >> PS. The reason I have 3 different hosts is that this is a test
> environment
> >> where I try to resolve and look at issues before we upgrade our
> production
> >> environment to pacific. DS.
> >>
> >> Best regards
> >> Daniel
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] BUG #51821 - client is using insecure global_id reclaim

2021-08-07 Thread Daniel Persson
Hi everyone.

I suggested asking for help here instead of in the bug tracker so that I
will try it.

https://tracker.ceph.com/issues/51821?next_issue_id=51820_issue_id=51824

I have a problem that I can't seem to figure out how to resolve the issue.

AUTH_INSECURE_GLOBAL_ID_RECLAIM: client is using insecure global_id reclaim
AUTH_INSECURE_GLOBAL_ID_RECLAIM_ALLOWED: mons are allowing insecure
global_id reclaim


Both of these have to do with reclaiming ID and securing that no client
could steal or reuse another client's ID. I understand the reason for this
and want to resolve the issue.

Currently, I have three different clients.

* One Windows client using the latest Ceph-Dokan build. (ceph version
15.0.0-22274-g5656003758 (5656003758614f8fd2a8c49c2e7d4f5cd637b0ea) pacific
(rc))
* One Linux Debian build using the built packages for that kernel. (
4.19.0-17-amd64)
* And one client that I've built from source for a raspberry PI as there is
no arm build for the Pacific release. (5.11.0-1015-raspi)

If I switch over to not allow global id reclaim, none of these clients
could connect, and using the command "ceph status" on one of my nodes will
also fail.

All of them giving the same error message:

monclient(hunting): handle_auth_bad_method server allowed_methods [2]
but i only support [2]


Has anyone encountered this problem and have any suggestions?

PS. The reason I have 3 different hosts is that this is a test environment
where I try to resolve and look at issues before we upgrade our production
environment to pacific. DS.

Best regards
Daniel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io