On Tuesday, March 26, 2024 10:53:29 PM EDT David Yang wrote:
> This is great, we are currently using the smb protocol heavily to
> export kernel-mounted cephfs.
> But I encountered a problem. When there are many smb clients
> enumerating or listing the same directory, the smb server will
> experience high load, and the smb process will become D state.
> This problem has been going on for some time and no suitable solution
> has been found yet.
> 

Thanks for the heads up. I'll make sure concurrent dir access is part of the 
test plan.

> John Mulligan <phlogistonj...@asynchrono.us> 于2024年3月26日周二 03:43写道:
> 
> >
> >
> > On Monday, March 25, 2024 3:22:26 PM EDT Alexander E. Patrakov wrote:
> > 
> > > On Mon, Mar 25, 2024 at 11:01 PM John Mulligan
> > >
> > >
> > >
> > > <phlogistonj...@asynchrono.us> wrote:
> > > 
> > > > On Friday, March 22, 2024 2:56:22 PM EDT Alexander E. Patrakov wrote:
> > > > 
> > > > > Hi John,
> > > > >
> > > > >
> > > > >
> > > > > > A few major features we have planned include:
> > > > > > * Standalone servers (internally defined users/groups)
> > > > >
> > > > >
> > > > >
> > > > > No concerns here
> > > > >
> > > > >
> > > > >
> > > > > > * Active Directory Domain Member Servers
> > > > >
> > > > >
> > > > >
> > > > > In the second case, what is the plan regarding UID mapping? Is NFS
> > > > > coexistence planned, or a concurrent mount of the same directory
> > > > > using
> > > > > CephFS directly?
> > > >
> > > >
> > > >
> > > > In the immediate future the plan is to have a very simple, fairly
> > > > "opinionated" idmapping scheme based on the autorid backend.
> > >
> > >
> > >
> > > OK, the docs for clustered SAMBA do mention the autorid backend in
> > > examples. It's a shame that the manual page does not explicitly list
> > > it as compatible with clustered setups.
> > >
> > >
> > >
> > > However, please consider that the majority of Linux distributions
> > > (tested: CentOS, Fedora, Alt Linux, Ubuntu, OpenSUSE) use "realmd" to
> > > join AD domains by default (where "default" means a pointy-clicky way
> > > in a workstation setup), which uses SSSD, and therefore, by this
> > > opinionated choice of the autorid backend, you create mappings that
> > > disagree with the supposed majority and the default. This will create
> > > problems in the future when you do consider NFS coexistence.
> > >
> > >
> >
> >
> >
> > Thanks, I'll keep that in mind.
> >
> >
> >
> > > Well, it's a different topic that most organizations that I have seen
> > > seem to ignore this default. Maybe those that don't have any problems
> > > don't have any reason to talk to me? I think that more research is
> > > needed here on whether RedHat's and GNOME's push of SSSD is something
> > > not-ready or indeed the de-facto standard setup.
> > >
> > >
> >
> >
> >
> > I think it's a bit of a mix, but am not sure either.
> >
> >
> >
> >
> > > Even if you don't want to use SSSD, providing an option to provision a
> > > few domains with idmap rid backend with statically configured ranges
> > > (as an override to autorid) would be a good step forward, as this can
> > > be made compatible with the default RedHat setup.
> >
> >
> >
> > That's reasonable. Thanks for the suggestion.
> >
> >
> >
> >
> > >
> > >
> > > > Sharing the same directories over both NFS and SMB at the same time,
> > > > also
> > > > known as "multi-protocol", is not planned for now, however we're all
> > > > aware
> > > > that there's often a demand for this feature and we're aware of the
> > > > complexity it brings. I expect we'll work on that at some point but
> > > > not
> > > > initially. Similarly, sharing the same directories over a SMB share
> > > > and
> > > > directly on a cephfs mount won't be blocked but we won't recommend
> > > > it.
> > >
> > >
> > >
> > > OK. Feature request: in the case if there are several CephFS
> > > filesystems, support configuration of which one to serve.
> > >
> > >
> >
> >
> >
> > Putting it on the list.
> >
> >
> >
> > > > > In fact, I am quite skeptical, because, at least in my experience,
> > > > > every customer's SAMBA configuration as a domain member is a unique
> > > > > snowflake, and cephadm would need an ability to specify arbitrary
> > > > > UID
> > > > > mapping configuration to match what the customer uses elsewhere -
> > > > > and
> > > > > the match must be precise.
> > > >
> > > >
> > > >
> > > > I agree - our initial use case is something along the lines:
> > > > Users of a Ceph Cluster that have Windows systems, Mac systems, or
> > > > appliances that are joined to an existing AD
> > > > but are not currently interoperating with the Ceph cluster.
> > > >
> > > >
> > > >
> > > > I expect to add some idpapping configuration and agility down the
> > > > line,
> > > > especially supporting some form of rfc2307 idmapping (where unix IDs
> > > > are
> > > > stored in AD).
> > >
> > >
> > >
> > > Yes, for whatever reason, people do this, even though it is cumbersome
> > > to manage.
> > >
> > >
> > >
> > > > But those who already have idmapping schemes and samba accessing ceph
> > > > will
> > > > probably need to just continue using the existing setups as we don't
> > > > have
> > > > an immediate plan for migrating those users.
> > > >
> > > >
> > > >
> > > > > Here is what I have seen or was told about:
> > > > >
> > > > >
> > > > >
> > > > > 1. We don't care about interoperability with NFS or CephFS, so we
> > > > > just
> > > > > let SAMBA invent whatever UIDs and GIDs it needs using the "tdb2"
> > > > > idmap backend. It's completely OK that workstations get different
> > > > > UIDs
> > > > > and GIDs, as only SIDs traverse the wire.
> > > >
> > > >
> > > >
> > > > This is pretty close to our initial plan but I'm not clear why you'd
> > > > think
> > > > that "workstations get different UIDs and GIDs". For all systems
> > > > acessing
> > > > the (same) ceph cluster the id mapping should be consistent.
> > > > You did make me consider multi-cluster use cases with something like
> > > > cephfs
> > > > volume mirroring - that's something that I hadn't thought of before
> > > > *but*
> > > > using an algorithmic mapping backend like autorid (and testing) I
> > > > think
> > > > we're mostly OK there.
> > >
> > >
> > >
> > > The tdb2 backend (used in my example) is not algorithmic, it is
> > > allocating. That is, it sequentially allocates IDs on the
> > > first-seen-first-allocated basis. Yet this is what this customer uses,
> > > presumably because it is the only backend that explicitly specifies
> > > clustering operation in its manual page.
> > >
> > >
> > >
> > > And the "autorid" backend is also not fully algorithmic, it allocates
> > > ranges to domains on the same sequential basis (see
> > > https://github.com/samba-team/samba/blob/6fb98f70c6274e172787c8d5f73aa93
> > > 9201
 71e7c/source3/winbindd/idmap_autorid_tdb.c#L82), and therefore can
> > > create mismatching mappings if two workstations or servers have seen
> > > the users DOMA\usera and DOMB\userb in a different order. It is even
> > > mentioned in the manual page. SSSD largely avoids this problem by
> > > hashing the domain portion of the SID instead of
> > > allocating the subranges on a sequential basis.
> > >
> > >
> >
> >
> >
> > Agreed. Thanks for the reminder. This will certainly need to go on the
> > test
 plan.
> >
> >
> >
> > > > > 2. [not seen in the wild, the customer did not actually implement
> > > > > it,
> > > > > it's a product of internal miscommunication, and I am not sure if
> > > > > it
> > > > > is valid at all] We don't care about interoperability with CephFS,
> > > > > and, while we have NFS, security guys would not allow running NFS
> > > > > non-kerberized. Therefore, no UIDs or GIDs traverse the wire, only
> > > > > SIDs and names. Therefore, all we need is to allow both SAMBA and
> > > > > NFS
> > > > > to use shared UID mapping allocated on as-needed basis using the
> > > > > "tdb2" idmap module, and it doesn't matter that these UIDs and GIDs
> > > > > are inconsistent with what clients choose.
> > > >
> > > >
> > > >
> > > > Unfortunately, I don't really understand this item. Fortunately, you
> > > > say
> > > > it
> > > > was only considered not implemented. :-)
> > > >
> > > >
> > > >
> > > > > 3. We don't care about ACLs at all, and don't care about CephFS
> > > > > interoperability. We set ownership of all new files to root:root
> > > > > 0666
> > > > > using whatever options are available [well, I would rather use a
> > > > > dedicated nobody-style uid/gid here]. All we care about is that
> > > > > only
> > > > > authorized workstations or authorized users can connect to each NFS
> > > > > or
> > > > > SMB share, and we absolutely don't want them to be able to set
> > > > > custom
> > > > > ownership or ACLs.
> > > >
> > > >
> > > >
> > > > Some times known as the "drop-box" use case I think (not to be
> > > > confused
> > > > with the cloud app of a similar name).
> > > > We could probably implement something like that as an option but I had
> > > > not
> > > > considered it before.
> > > >
> > > >
> > > >
> > > > > 4. We care about NFS and CephFS file ownership being consistent
> > > > > with
> > > > > what Windows clients see. We store all UIDs and GIDs in Active
> > > > > Directory using the rfc2307 schema, and it's mandatory that all
> > > > > servers (especially SAMBA - thanks to the "ad" idmap backend)
> > > > > respect
> > > > > that and don't try to invent anything [well, they do -
> > > > > BUILTIN/Users
> > > > > gets its GID through tdb2]. Oh, and by the way, we have this
> > > > > strangely
> > > > > low-numbered group that everybody gets wrong unless they set "idmap
> > > > > config CORP : range = 500-999999".
> > > >
> > > >
> > > >
> > > > This is oh so similar to a project I worked on prior to working with
> > > > Ceph.
> > > > I think we'll need to do this one eventually but maybe not this year.
> > > > One nice side-effect of running in containers is that the low-id
> > > > number is
> > > > less of an issue because the ids only matter within the container
> > > > context
> > > > (and only then if using the kernel file system access methods). We
> > > > have
> > > > much more flexibility with IDs in a container.
> > >
> > >
> > >
> > > So - are you going to use the kernel-based mount or the ceph vfs
> > > module? My tests indicate that, in situations where there are
> > > frequently accessed files, allowing the kernel to cache them in RAM
> > > (which the vfs module does not do) can create a big boost in
> > > performance. Also, SUSE considers the ceph vfs module a
> > > non-recommended solution apparently for the same performance-related
> > > reason, see
> > > https://documentation.suse.com/ses/7/html/ses-all/cha-ses-cifs.html
> >
> >
> >
> >
> > The prototype module only uses the vfs module due to the extreme
> > simplicity of
 setting it up in containers. Otherwise, we're trying to
> > keep our options open and are investigating multiple approaches
> > currently.
> >
> >
> >
> > > > > 5. We use a few static ranges for algorithmic ID translation using
> > > > > the
> > > > > idmap rid backend. Everything works.
> > > >
> > > >
> > > >
> > > > See above.
> > > >
> > > >
> > > >
> > > > > 6. We use SSSD, which provides consistent IDs everywhere, and for a
> > > > > few devices which can't use it, we configured compatible idmap rid
> > > > > ranges for use with winbindd. The only problem is that we like
> > > > > user-private groups, and only SSSD has support for them (although
> > > > > we
> > > > > admit it's our fault that we enabled this non-default option).
> > > > > 7. We store ID mappings in non-AD LDAP and use winbindd with the
> > > > > "ldap" idmap backend.
> > > >
> > > >
> > > >
> > > > For now, we're only planning to do idmapping with winbind and AD.
> > > > We'd
> > > > probably only consider non-AD ldap and/or ssd if there was strong and
> > > > loud
> > > > demand for it.
> > >
> > >
> > >
> > > See above.
> > >
> > >
> > >
> > > However, as I said, providing a way to use the "rid" backend with
> > > statically defined domains and ranges in addition to the default
> > > "autorid" backend would be, for me, a good-enough substitute for SSSD.
> > >
> > >
> >
> >
> >
> > Sounds reasonable. I've done it that way in a prior role too, so it's
> > somewhat
 familiar.   Thanks!
> >
> >
> >
> > > > > I am sure other weird but valid setups exist - please extend the
> > > > > list
> > > > > if you can.
> > > > >
> > > > >
> > > > >
> > > > > Which of the above scenarios would be supportable without resorting
> > > > > to
> > > > > the old way of installing SAMBA manually alongside the cluster?
> > > >
> > > >
> > > >
> > > > I hope I covered the above with some inline replies. This was great
> > > > food
> > > > for thought and at just the right level of technical detail. So thank
> > > > you
> > > > very much for replying, this is exactly the kind of discussion I want
> > > > to
> > > > have now where the design is still young and flexible.
> > > >
> > > >
> > > >
> > > > One other cool thing I plan on doing is supporting multiple samba
> > > > containers running on the same cluster (even the same node if I can
> > > > wrangle the network properly). So one could in fact have completely
> > > > different domain joins and/or configurations. While I wouldn't
> > > > suggest
> > > > anyone run a whole lot of different configurations on the same cluster
> > > > -
> > > > this idea already allows for some level of agility between schemes.
> > > > Later
> > > > on we might be able to use that as a building block for migration
> > > > tools,
> > > > either from an existing samba setup or between configurations.
> > >
> > >
> > >
> > > Multiple SAMBA containers are also good for high availability (with
> > > ctdb) or scale-out (with round-robin DNS).
> > >
> > >
> > >
> > > > Also, I plan on adding `global_custom_options` and
> > > > `share_custom_options`
> > > > for special overrides for development, qa, and experimentation but
> > > > those
> > > > are strongly within the "you break it, you bought it" realm. But
> > > > these
> > > > could be used for experimenting  with idmapping schemes without
> > > > having
> > > > them all baked into the smb mgr module code.
> > >
> > >
> > >
> > > Great, thanks!
> >
> >
> >
> > Once again, thanks for the feedback. This discussion is very welcome!
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to