Thanks everyone for the feedback. I'll address the concerns individually.


On Thu, 5 Feb 2026 12:46:19 +0000, 'unman' via qubes-devel wrote:

> How much of this is tied to the current state of shufflecake, which is 

> still in the experimental stage, no?



Actually, at least 3 out of 4 proposed changes (No. 1, 2 and 4) are completely 
unrelated to Shufflecake itself. No. 3 is related, but indirectly. They would 
be necessary for any form of multi-layer plausible deniability. As far as I am 
aware, Shufflecake is the most mature deniable encryption toolkit that 
satisfies modern standards, although you are right that it is experimental. I 
am also a contributor to Shufflecake and will be ready to address any issues on 
this side, should they arise at some point.



> I think it has only been tested 

> by the developers on Debian based system, and is not recommended as 

> reliable as yet.



It's true that they do not guarantee stable operation. However, that's exactly 
why I am not proposing that we add Shufflecake to the Qubes package 
distribution. The features I outlined are just necessary compatibility changes. 
The project itself, I think, should remain an external codebase which advanced 
users can install for themselves. Regarding testing, I have been running 
RELIANT (Qubes + Shufflecake) on my own system for about half a year now 
without any major issues. There was a problem with garbage collection 
(fstrim/discard) at some point, but I have addressed that with an upstream PR.





On Thu, 5 Feb 2026 15:07:28 +0100, 'Marek Marczykowski-Górecki' via qubes-devel 
wrote:

> This is quite complicated topic, as you already know. In general,

> proposed features mostly align with some other planned features.


> Especially, it would be useful for several other reasons to store some

> of the qubes on separate partitions or even external disks - and make

> them available only when such partition/disk is visible. Not only for

> plausible deniability reasons.

Agreed, this will certainly be useful. However, by going beyond the plausible 
deniability usecase, we're actually adding new restrictions to the problem.



> This one is I think the most complex change from those proposed here.


> This is because of dependencies between qubes - they must not fail, even

> if some of the qubes in separate qubes.xml (lets call them "external

> qubes" for now) are not available at the time.

The main point here is that under Shufflecake PD, when you have external qubes 
unavailable (boot under duress), any interaction with the system is very likely 
to break something in the hidden parts. The reason for this is that 
fundamentally, locked hidden space is equivalent to free space, which will be 
used for allocating new data. The exception is Maintenance Mode, where the 
entire Shufflecake device is locked and unaccessible.



> 1. You have an external qube based on debian-12-xfce template. You start


> your system with external qubes not visible, and then remove

> debian-12-xfce template (for install debian-13-xfce, migrate all visible

> qubes to it and then remove debian-12-xfce). What should happen next

> time you boot with external qubes visible? Same applies to other types

> of dependencies (netvm, default_dispvm, audiovm, guivm, etc).

I believe a reasonable solution here would be to store a journal of such 
changes, and determine the most preferable replacement candidate. This will 
rely either on the user's explicit choice or heuristics. Another way would be 
to select a family of templates (Debian/Fedora/etc.) as a target instead of an 
exact template. However, neither are readily implementable from my 
understanding. Still, we need some kind of approach here. No. 1 is both the 
most complex and the most necessary feature.



> 2. You create a template as an external qube, or vpn qube as an external

> qube, and then make one of the standard qubes (non-external) use it (as

> template or netvm respectively). What should happen next time you boot

> with external qubes not visible?

Agreed that this should be forbidden. We cannot reliably guarantee that an 
external qube which other qubes depend on will be present at any point for USB 
or external storage. And in case of plausible deniability, we cannot have 
dependencies beyond the local XML, as you have correctly mentioned.



> 1. Allowing to load such broken qube, but prevent starting it, until

> user fixes the dependencies (in the above example, like change the

> template to debian-13-xfce). The problem with this approach is, we

> generally assume no broken objects all over the code - for example if

> you have vm.template, then you can access vm.template.netvm. When

> allowing loading broken objects, that wouldn't be the case anymore, and

> would require carefully reviewing all the places where dependencies are

> accessed and choosing a fallback in all of them. This also applies to

> presenting such situation in CLI and GUI tools. TBH, I'm not optimistic

> about feasibility of such change.

I think it might work if the fixing process occurs inside initramfs, like what 
RELIANT does currently. We could analyze the shards and find problematic 
dependencies, then prompt the user for fixes. The advantage of this is that 
qubesd is not running yet, hence we do not need to modify the codebase to look 
for fallbacks. This would only require a self-contained system-healing script, 
which sound must more practical to me. Again, this is not applicable to 
hot-plug qubes, but No. 1 only requires static sharding where the shards are 
already known during the early boot stages.



However, a better option would be to conduct checks within qubesd load() and 
skip loading shards with broken configuration until resolution. This would 
avoid any need to interact with the initramfs or rewrite many components, but 
brings the additional complexity of a dedicated qubes.xml validator which must 
be thorough. While substantial, I believe this is the most straightforward and 
practical approach.



> 2. Automatically change such broken dependencies to some other value

> (the obvious choice would be the default one - based on global

> properties). While I think in most situations it would just work, there

> are surely some corner cases. For example resetting, say, netvm from

> sys-whonix to sys-firewall (just because you renamed sys-whonix) might

> be very undesirable. Some workaround might be preventing starting such

> "recovered" external qube until user review the change (this could use

> 'prohibit-start' feature added in R4.3), but it would still be clunky

> and prone to errors IMHO...

There are several solutions I see here,

The journaling feature I mentioned before, which would work reliably for 
renames since the mappings are unambiguous.

Use a heuristic-based fixing algorithm (e.g. debian-12 can become debian-13), 
and for missing network qubes just set them to None, circumventing security 
problems.

Prompt the user manually for each fix, either during initramfs where we can 
alter the XMLs or live using the prohibit-start feature or changes to load().


Neither is ideal, but I'd like to hear which one you're leaning towards. 
Perhaps a combination could work as well.





> The QID range is just 16 bits, and it's IMO too little to avoid

> conflicts just by hoping CSPRNG will not hit one.

Agreed. That's why I mentioned explicit collision prevention.



> Note, you still need

> to be able to create qubes while external qubes are not visible.

Not quite. In case of RELIANT, as I mentioned before, doing anything with the 
system when it's only partially unlocked is guaranteed to break things. If we 
consider hot-plug qubes or Maintenance Mode, however, this does become a 
problem.



> 1. Use dedicated QID ranges per qubes.xml "shard". You still need to

> avoid conflicts between those ranges, but requiring all shards to be

> visible when allocating new range is IMHO an acceptable limitation.

Sounds like a great solution to me. Ranges themselves could be randomly sampled 
from the 16-bit space, and the range size we'd have to decide upon (perhaps 
128?). This does not sacrifice any security and allows shards to be more 
self-contained. Varlibqubes could have a fixed range 1-127, where 0 is reserved 
for dom0.



> 2. Don't store QIDs for external qubes at all - allocate them at load

> time (possibly from a single range dedicated for all external qubes). QID is

> used mostly at runtime (in various structures), but on-disk metadata

> operate on names (most cases) and UUID (very few cases). The only

> downside I can think of is dynamic IP allocation (internal IP addresses

> are built based on QID) - this would break some custom firewall rules.

> But if you need static IP address, you can simply set "ip" property to a

> static value (and avoid IP conflicts on your own...).

Also possible, but seems more difficult to implement. This will be more robust 
since there is no requirement for external qubes to be visible when allocating 
new ranges.



>From my perspective, Option 1 is preferable here due to minimal infrastructure 
>changes. When creating a new qube, allow tagging it as external and in this 
>case 1) store the relevant qubes in a shard 2) allocate a QID range and put it 
>into the shard XML.





> But this also brings up another problem: how to avoid qube name

> conflicts? What should happen if you create a qube with the same name as

> one of external ones (while external qubes are not visible)?

I suggest establishing some precedence rules where e.g. varlibqubes takes 
priority over any shards, and static shards take priority over hot-plug qubes. 
This brings us back to the same problem where there is an issue preventing a 
qube from starting or even being loaded, similar to what was discussed about 
templates/renaming. The new qubesd load() function could be designed to 
automatically perform collision and existence checks on every shard, and 
'freeze' broken shards following the precedence rules. The conflicts could then 
be resolved by any of the 3 methods (manual/heuristic/journal) and 
asynchronously the shards will be unfrozen and loaded afterwards. This would 
also be a significant improvement in terms of user experience, since currently 
any issues must be resolved with manual XML editing and a mandatory reboot.



> 1. Such block device (with rw=False) is connected as read-only block

> device to the VM anyway (see templates/libvirt/xen.xml). So, setting

> loop device read-only is not strictly necessary.

Yes, but apparently there are some issues with how OverlayFS and loopback 
devices interact. The issue is on the side of dom0. Without the --readonly 
flag, create-snapshot simply fails for any of the template images. It works for 
small images (~100 MB) so I suspect that it might be checking for free space 
under the OverlayFS, of which there is only 1-2 GB, and failing the allocating 
in case the image is writeable and exceeds that constraint.



> 2. The "file" storage driver should be avoided, and will eventually be

> removed (in Qubes OS 5.0, whenever that happens). The use of dm-snapshot

> is very inefficient (especially when reverting to an earlier snapshot),

> and is incompatible with many features (like making a backup while the

> qube is running).

I think varlibqubes was under the file driver by default when I installed the 
system. Regarding the sflc_X_X pools, I have chosen the file driver for them 
due to 1) simplicity 2) conservative filesystem choices preferences for ext4 3) 
Shufflecake being optimized for ext4. Neither of these is a blocking issue, so 
when the time comes I'd be ready to migrate to either LVM storage pools or 
file-reflink.



> If you need plain files, I'd recommend using file-reflink driver, on a

> CoW-enabled filesystem (like xfs or btrfs).

We need some form of metadata attached to the volume for the shards, firewall 
rules, and other things. This could be achieved by either using btrfs for the 
whole volume with file-reflink, or using a LVM storage pool subdivided into 
data and metadata. This mostly depends on hot-plug capabilities of these 
drivers.



> Indeed a policy is a way to go. And you can quite easily make an

> extension that adds tags based on the storage pool.

Understood, thanks.





On Thu, 5 Feb 2026 15:51:22 +0000, 'Rusty Bird' via qubes-devel wrote:
> Can you switch to the 'file-reflink' storage driver? The legacy 'file'
> storage driver is deprecated and due to be ripped out:

Thanks, I'll look into that. See above for the rationale behind choosing the 
file driver.



Kind regards,

Anderson Rosenberg

-- 
You received this message because you are subscribed to the Google Groups 
"qubes-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/qubes-devel/19c3132ea70.584f11bb363117.2379038601726191331%40arcscience.org.

Reply via email to