On Tuesday, March 26, 2024 10:53:29 PM EDT David Yang wrote: > This is great, we are currently using the smb protocol heavily to > export kernel-mounted cephfs. > But I encountered a problem. When there are many smb clients > enumerating or listing the same directory, the smb server will > experience high load, and the smb process will become D state. > This problem has been going on for some time and no suitable solution > has been found yet. >
Thanks for the heads up. I'll make sure concurrent dir access is part of the test plan. > John Mulligan <phlogistonj...@asynchrono.us> 于2024年3月26日周二 03:43写道: > > > > > > > On Monday, March 25, 2024 3:22:26 PM EDT Alexander E. Patrakov wrote: > > > > > On Mon, Mar 25, 2024 at 11:01 PM John Mulligan > > > > > > > > > > > > <phlogistonj...@asynchrono.us> wrote: > > > > > > > On Friday, March 22, 2024 2:56:22 PM EDT Alexander E. Patrakov wrote: > > > > > > > > > Hi John, > > > > > > > > > > > > > > > > > > > > > A few major features we have planned include: > > > > > > * Standalone servers (internally defined users/groups) > > > > > > > > > > > > > > > > > > > > No concerns here > > > > > > > > > > > > > > > > > > > > > * Active Directory Domain Member Servers > > > > > > > > > > > > > > > > > > > > In the second case, what is the plan regarding UID mapping? Is NFS > > > > > coexistence planned, or a concurrent mount of the same directory > > > > > using > > > > > CephFS directly? > > > > > > > > > > > > > > > > In the immediate future the plan is to have a very simple, fairly > > > > "opinionated" idmapping scheme based on the autorid backend. > > > > > > > > > > > > OK, the docs for clustered SAMBA do mention the autorid backend in > > > examples. It's a shame that the manual page does not explicitly list > > > it as compatible with clustered setups. > > > > > > > > > > > > However, please consider that the majority of Linux distributions > > > (tested: CentOS, Fedora, Alt Linux, Ubuntu, OpenSUSE) use "realmd" to > > > join AD domains by default (where "default" means a pointy-clicky way > > > in a workstation setup), which uses SSSD, and therefore, by this > > > opinionated choice of the autorid backend, you create mappings that > > > disagree with the supposed majority and the default. This will create > > > problems in the future when you do consider NFS coexistence. > > > > > > > > > > > > > > Thanks, I'll keep that in mind. > > > > > > > > > Well, it's a different topic that most organizations that I have seen > > > seem to ignore this default. Maybe those that don't have any problems > > > don't have any reason to talk to me? I think that more research is > > > needed here on whether RedHat's and GNOME's push of SSSD is something > > > not-ready or indeed the de-facto standard setup. > > > > > > > > > > > > > > I think it's a bit of a mix, but am not sure either. > > > > > > > > > > > Even if you don't want to use SSSD, providing an option to provision a > > > few domains with idmap rid backend with statically configured ranges > > > (as an override to autorid) would be a good step forward, as this can > > > be made compatible with the default RedHat setup. > > > > > > > > That's reasonable. Thanks for the suggestion. > > > > > > > > > > > > > > > > > > Sharing the same directories over both NFS and SMB at the same time, > > > > also > > > > known as "multi-protocol", is not planned for now, however we're all > > > > aware > > > > that there's often a demand for this feature and we're aware of the > > > > complexity it brings. I expect we'll work on that at some point but > > > > not > > > > initially. Similarly, sharing the same directories over a SMB share > > > > and > > > > directly on a cephfs mount won't be blocked but we won't recommend > > > > it. > > > > > > > > > > > > OK. Feature request: in the case if there are several CephFS > > > filesystems, support configuration of which one to serve. > > > > > > > > > > > > > > Putting it on the list. > > > > > > > > > > > In fact, I am quite skeptical, because, at least in my experience, > > > > > every customer's SAMBA configuration as a domain member is a unique > > > > > snowflake, and cephadm would need an ability to specify arbitrary > > > > > UID > > > > > mapping configuration to match what the customer uses elsewhere - > > > > > and > > > > > the match must be precise. > > > > > > > > > > > > > > > > I agree - our initial use case is something along the lines: > > > > Users of a Ceph Cluster that have Windows systems, Mac systems, or > > > > appliances that are joined to an existing AD > > > > but are not currently interoperating with the Ceph cluster. > > > > > > > > > > > > > > > > I expect to add some idpapping configuration and agility down the > > > > line, > > > > especially supporting some form of rfc2307 idmapping (where unix IDs > > > > are > > > > stored in AD). > > > > > > > > > > > > Yes, for whatever reason, people do this, even though it is cumbersome > > > to manage. > > > > > > > > > > > > > But those who already have idmapping schemes and samba accessing ceph > > > > will > > > > probably need to just continue using the existing setups as we don't > > > > have > > > > an immediate plan for migrating those users. > > > > > > > > > > > > > > > > > Here is what I have seen or was told about: > > > > > > > > > > > > > > > > > > > > 1. We don't care about interoperability with NFS or CephFS, so we > > > > > just > > > > > let SAMBA invent whatever UIDs and GIDs it needs using the "tdb2" > > > > > idmap backend. It's completely OK that workstations get different > > > > > UIDs > > > > > and GIDs, as only SIDs traverse the wire. > > > > > > > > > > > > > > > > This is pretty close to our initial plan but I'm not clear why you'd > > > > think > > > > that "workstations get different UIDs and GIDs". For all systems > > > > acessing > > > > the (same) ceph cluster the id mapping should be consistent. > > > > You did make me consider multi-cluster use cases with something like > > > > cephfs > > > > volume mirroring - that's something that I hadn't thought of before > > > > *but* > > > > using an algorithmic mapping backend like autorid (and testing) I > > > > think > > > > we're mostly OK there. > > > > > > > > > > > > The tdb2 backend (used in my example) is not algorithmic, it is > > > allocating. That is, it sequentially allocates IDs on the > > > first-seen-first-allocated basis. Yet this is what this customer uses, > > > presumably because it is the only backend that explicitly specifies > > > clustering operation in its manual page. > > > > > > > > > > > > And the "autorid" backend is also not fully algorithmic, it allocates > > > ranges to domains on the same sequential basis (see > > > https://github.com/samba-team/samba/blob/6fb98f70c6274e172787c8d5f73aa93 > > > 9201 71e7c/source3/winbindd/idmap_autorid_tdb.c#L82), and therefore can > > > create mismatching mappings if two workstations or servers have seen > > > the users DOMA\usera and DOMB\userb in a different order. It is even > > > mentioned in the manual page. SSSD largely avoids this problem by > > > hashing the domain portion of the SID instead of > > > allocating the subranges on a sequential basis. > > > > > > > > > > > > > > Agreed. Thanks for the reminder. This will certainly need to go on the > > test plan. > > > > > > > > > > > 2. [not seen in the wild, the customer did not actually implement > > > > > it, > > > > > it's a product of internal miscommunication, and I am not sure if > > > > > it > > > > > is valid at all] We don't care about interoperability with CephFS, > > > > > and, while we have NFS, security guys would not allow running NFS > > > > > non-kerberized. Therefore, no UIDs or GIDs traverse the wire, only > > > > > SIDs and names. Therefore, all we need is to allow both SAMBA and > > > > > NFS > > > > > to use shared UID mapping allocated on as-needed basis using the > > > > > "tdb2" idmap module, and it doesn't matter that these UIDs and GIDs > > > > > are inconsistent with what clients choose. > > > > > > > > > > > > > > > > Unfortunately, I don't really understand this item. Fortunately, you > > > > say > > > > it > > > > was only considered not implemented. :-) > > > > > > > > > > > > > > > > > 3. We don't care about ACLs at all, and don't care about CephFS > > > > > interoperability. We set ownership of all new files to root:root > > > > > 0666 > > > > > using whatever options are available [well, I would rather use a > > > > > dedicated nobody-style uid/gid here]. All we care about is that > > > > > only > > > > > authorized workstations or authorized users can connect to each NFS > > > > > or > > > > > SMB share, and we absolutely don't want them to be able to set > > > > > custom > > > > > ownership or ACLs. > > > > > > > > > > > > > > > > Some times known as the "drop-box" use case I think (not to be > > > > confused > > > > with the cloud app of a similar name). > > > > We could probably implement something like that as an option but I had > > > > not > > > > considered it before. > > > > > > > > > > > > > > > > > 4. We care about NFS and CephFS file ownership being consistent > > > > > with > > > > > what Windows clients see. We store all UIDs and GIDs in Active > > > > > Directory using the rfc2307 schema, and it's mandatory that all > > > > > servers (especially SAMBA - thanks to the "ad" idmap backend) > > > > > respect > > > > > that and don't try to invent anything [well, they do - > > > > > BUILTIN/Users > > > > > gets its GID through tdb2]. Oh, and by the way, we have this > > > > > strangely > > > > > low-numbered group that everybody gets wrong unless they set "idmap > > > > > config CORP : range = 500-999999". > > > > > > > > > > > > > > > > This is oh so similar to a project I worked on prior to working with > > > > Ceph. > > > > I think we'll need to do this one eventually but maybe not this year. > > > > One nice side-effect of running in containers is that the low-id > > > > number is > > > > less of an issue because the ids only matter within the container > > > > context > > > > (and only then if using the kernel file system access methods). We > > > > have > > > > much more flexibility with IDs in a container. > > > > > > > > > > > > So - are you going to use the kernel-based mount or the ceph vfs > > > module? My tests indicate that, in situations where there are > > > frequently accessed files, allowing the kernel to cache them in RAM > > > (which the vfs module does not do) can create a big boost in > > > performance. Also, SUSE considers the ceph vfs module a > > > non-recommended solution apparently for the same performance-related > > > reason, see > > > https://documentation.suse.com/ses/7/html/ses-all/cha-ses-cifs.html > > > > > > > > > > The prototype module only uses the vfs module due to the extreme > > simplicity of setting it up in containers. Otherwise, we're trying to > > keep our options open and are investigating multiple approaches > > currently. > > > > > > > > > > > 5. We use a few static ranges for algorithmic ID translation using > > > > > the > > > > > idmap rid backend. Everything works. > > > > > > > > > > > > > > > > See above. > > > > > > > > > > > > > > > > > 6. We use SSSD, which provides consistent IDs everywhere, and for a > > > > > few devices which can't use it, we configured compatible idmap rid > > > > > ranges for use with winbindd. The only problem is that we like > > > > > user-private groups, and only SSSD has support for them (although > > > > > we > > > > > admit it's our fault that we enabled this non-default option). > > > > > 7. We store ID mappings in non-AD LDAP and use winbindd with the > > > > > "ldap" idmap backend. > > > > > > > > > > > > > > > > For now, we're only planning to do idmapping with winbind and AD. > > > > We'd > > > > probably only consider non-AD ldap and/or ssd if there was strong and > > > > loud > > > > demand for it. > > > > > > > > > > > > See above. > > > > > > > > > > > > However, as I said, providing a way to use the "rid" backend with > > > statically defined domains and ranges in addition to the default > > > "autorid" backend would be, for me, a good-enough substitute for SSSD. > > > > > > > > > > > > > > Sounds reasonable. I've done it that way in a prior role too, so it's > > somewhat familiar. Thanks! > > > > > > > > > > > I am sure other weird but valid setups exist - please extend the > > > > > list > > > > > if you can. > > > > > > > > > > > > > > > > > > > > Which of the above scenarios would be supportable without resorting > > > > > to > > > > > the old way of installing SAMBA manually alongside the cluster? > > > > > > > > > > > > > > > > I hope I covered the above with some inline replies. This was great > > > > food > > > > for thought and at just the right level of technical detail. So thank > > > > you > > > > very much for replying, this is exactly the kind of discussion I want > > > > to > > > > have now where the design is still young and flexible. > > > > > > > > > > > > > > > > One other cool thing I plan on doing is supporting multiple samba > > > > containers running on the same cluster (even the same node if I can > > > > wrangle the network properly). So one could in fact have completely > > > > different domain joins and/or configurations. While I wouldn't > > > > suggest > > > > anyone run a whole lot of different configurations on the same cluster > > > > - > > > > this idea already allows for some level of agility between schemes. > > > > Later > > > > on we might be able to use that as a building block for migration > > > > tools, > > > > either from an existing samba setup or between configurations. > > > > > > > > > > > > Multiple SAMBA containers are also good for high availability (with > > > ctdb) or scale-out (with round-robin DNS). > > > > > > > > > > > > > Also, I plan on adding `global_custom_options` and > > > > `share_custom_options` > > > > for special overrides for development, qa, and experimentation but > > > > those > > > > are strongly within the "you break it, you bought it" realm. But > > > > these > > > > could be used for experimenting with idmapping schemes without > > > > having > > > > them all baked into the smb mgr module code. > > > > > > > > > > > > Great, thanks! > > > > > > > > Once again, thanks for the feedback. This discussion is very welcome! > > > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io