Hi Deepak, On Wed, Sep 24, 2025 at 1:40 PM Deepak Gupta <[email protected]> wrote: > > On Wed, Sep 24, 2025 at 08:36:11AM -0600, Paul Walmsley wrote: > >Hi, > > > >On Thu, 31 Jul 2025, Deepak Gupta wrote: > > > >[ ... ] > > > >> vDSO related Opens (in the flux) > >> ================================= > >> > >> I am listing these opens for laying out plan and what to expect in future > >> patch sets. And of course for the sake of discussion. > >> > > > >[ ... ] > > > >> How many vDSOs > >> --------------- > >> Shadow stack instructions are carved out of zimop (may be operations) and > >> if CPU > >> doesn't implement zimop, they're illegal instructions. Kernel could be > >> running on > >> a CPU which may or may not implement zimop. And thus kernel will have to > >> carry 2 > >> different vDSOs and expose the appropriate one depending on whether CPU > >> implements > >> zimop or not. > > > >If we merge this series without this, then when CFI is enabled in the > >Kconfig, we'll wind up with a non-portable kernel that won't run on older > >hardware. We go to great lengths to enable kernel binary portability > >across the presence or absence of other RISC-V extensions, and I think > >these CFI extensions should be no different. > > > >So before considering this for merging, I'd like to see at least an > >attempt to implement the dual-vDSO approach (or something equivalent) > >where the same kernel binary with CFI enabled can run on both pre-Zimop > >and post-Zimop hardware, with the existing userspaces that are common > >today. > > Added some distro folks in this email chain. > > After patchwork meeting today, I wanted to continue discussion here. So thanks > Paul for looking into it and initiating a discussion here. > > This patch series has been in the queue for quite a long time and we have had > deliberations on vDSO topic earlier as well and after those deliberations it > was decided to go ahead with merge and it indeed was sent for 6.17 merge > window. Unfortunatley due to other unforeseen reasons, entirety of riscv > changes were not picked. So it's a bit disappointing to see back-paddling on > this topic. > > Anyways, we are here. So I'll provide a bit of context for the list about > deliberations and discussions we have been having for so many merge windows. > This so that a holistic discussion can happen on this before we make a > decision. > > Issue > ====== > > Instructions in RISC-V shadow stack extension (zicfiss - [1]) are carved out > of > "may be ops" aka zimop extension [2]. "may be ops" are illegal on non-RVA23 > hardware. This means any existing riscv CPU or future CPU which isn't RVA23 > compliant and not implementing zimop will treat these encodings as illegal. > > Current kernel patches enable shadow stack and landing pad support for > userspace using config `CONFIG_RISCV_USER_CFI`. If this config is selected > then > vDSO that will be exposed to user space will also have shadow stack > instructions in them. Kernel compiled with `CONFIG_RISCV_USER_CFI`, for sake > of > this discussion lets call it RVA23 compiled kernel. > > Issue that we discussed earlier and even today is "This RVA23 compiled kernel > won't be able to support non-RVA23 userspace on non-RVA23 hardware because". > Please note that issue exists only on non-RVA23 hardware (which is existing > hardware and future hardware which is not implementing zimop). RVA23 compiled > kernel can support any sort of userspace on RVA23 hardware. > > > Discussion > =========== > > So the issue is not really shadow stack instructions but rather may be op > instructions in codegen (binaries and vDSO) which aren't hidden behind any > flag (to hide them if hardware doesn't support). And if I can narrow down > further, primary issue we are discussing is that if cfi is enabled during > kernel compile, it is bringing in a piece of code (vDSO) which won't work > on existing hardware. But the counter point is if someone were to deploy > RVA23 compiled kernel on non-RVA23 hardware, they must have compiled > rest of the userspace without shadow stack instructions in them for such > a hardware. And thus at this point they could simply choose *not* to turn on > `CONFIG_RISCV_USER_CFI` when compiling such kernel. It's not that difficult to > do so. > > Any distro who is shipping userspace (which all of them are) along with kernel > will not be shipping two different userspaces (one with shadow stack and one > without them). If distro are shipping two different userspaces, then they > might > as well ship two different kernels. Tagging some distro folks here to get > their > take on shipping different userspace depending on whether hardware is RVA23 or > not. @Heinrich, @Florian, @redbeard and @Aurelien. > > Major distro's have already drawn a distinction here that they will drop > support for hardware which isn't RVA23 for the sake of keeping binary > distribution simple. > > Only other use case that was discussed of a powerful linux user who just wants > to use a single kernel on all kinds of riscv hardware. I am imagining such a > user knows enough about kernel and if is really dear to them, they can develop > their own patches and send it upstream to support their own usecase and we can > discuss them out. Current patchset don't prevent such a developer to send such > patches upstream. > > I heard the argument in meeting today that "Zbb" enabling works similar for > kernel today. I looked at "Zbb" enabling. It's for kernel usage and it's > surgically placed in kernel using asm hidden behind alternatives. vDSO isn't > compiled with Zbb. Shadow stack instructions are part of codegen for C files > compiled into vDSO. > > Furthermore, > > Kernel control flow integrity will introduce shadow stack instructions all > over the kernel binary. Such kernel won't be deployable on non-RVA23 hardware. > How to deal with this problem for a savvy kernel developer who wants to run > same cfi enabled kernel binary on multiple hardware? > > Coming from engineering and hacker point of view, I understand the desire here > but I still see that it's complexity enforced on rest of the kernel from a > user > base which anyways can achieve such goals. For majority of usecases, I don't > see a reason to increase complexity in the kernel for build, possibly runtime > patching and thus possibly introduce more issues and errors just for the sake > of a science project. > > Being said that, re-iterating that currently default for > `CONFIG_RISCV_USER_CFI` > is "n" which means it won't be breaking anything unless a user opts "Y". So > even > though I really don't see a reason and usability to have complexity in kernel > to > carry multiple vDSOs, current patchsets are not a hinderance for such future > capability (because current default is No) and motivated developer is welcome > to build on top of it. Bottomline is I don't see a reason to block current > patchset from merging in v6.18.
Sorry for reiterating, I have been gone for a while, so maybe I lost a bit of context. In that case, should we add a comment in the Kconfig that says "it breaks userspace on older-than RVA23 platforms"? Perhaps a very ugly way to make RVA23-compiled kernel compatible with pre-RVA23 platforms is to decode maybe-ops in the illegal exception handler... Btw, I don't think kenrel-level shadow stack should be an argument here, as kernel-level APIs are more flexible by nature. Thanks, Andy
