> The idea of aligning cpio metadata is very interesting. I can see how it'd 
> help initramfs building speed tremendously.
> 
> As I understand it, RPM is pretty different: the main difference is that 
> we're trying (fairly hard) not to change the normal format of rpm as found on 
> mirrors for now. There are some very interesting ideas on how to change the 
> upstream format, but in doing so, we'd render all existing servers unable to 
> read the format.

To clarify, aligning cpio data segments for *newly built* rpms shouldn't 
necessarily require any change in format. They'd continue to function the same 
as earlier cpio payload rpms, be it with some extra zero-padding.

> If we could tolerate the breakage: I'd love to experiment with 
> `BTRFS_IOC_ENCODED_WRITE` which would reduce writes down and eliminate 
> explicit decompression. For clients or filesystems without CoW support: RPM 
> could decompress and write the normal file. I was hoping encoded writes would 
> eliminate the complex path with curl -> librepo -> rpm2extents. I'm not sure 
> you could get data from the network and write encoded data to disk in one 
> pass like we're doing now. Do you have any ideas on how to resolve that 
> challenge?

I'm not too familiar with the rpm on-disk format, but I'd hoped that 
`BTRFS_IOC_ENCODED_WRITE` could be used without a change to the format, by 
having the rpm header parsed during download to determine whether the 
compressed payload could be written as-is. With a cpio payload it'd then be a 
matter of copy_file_range()ing the (optimally aligned) compressed file data 
segments into the destination during installation.

`BTRFS_IOC_ENCODED_WRITE` appears very restrictive at this stage though:
- it requires `CAP_SYS_ADMIN`, so probably isn't a viable option for 
containers, etc.
- ioctl calls need to specify both unencoded and encoded offset+length, meaning 
that we'd still need to parse rpm payload compression metadata
- the ioctl unencoded length can't exceed 128 KiB
- for zstd encoded I/Os, the ioctl data must represent "as a single zstd frame 
with the windowLog compression parameter set to no more than 17"
  - On openSUSE Tumbleweed I see some rpms currently using zstd compression 
level 19. IIUC, Fedora uses the same zstd level

> Adding cpio metadata, along with a "null" compression type could help 
> eliminate the change in `fsm.c` on how the payload is iterated. Note that 
> `rpm2extents` does not (and cannot) touch headers without invalidating 
> signatures, so the change in compression type is inferred and handled in the 
> plugin.
> 
> Lastly, there's another optimization that would be lost in adopting cpio 
> formatting: content de-duplication. I'm not sure how important this is tho in 
> the big picture, so it might be a worthwhile tradeoff.

Indeed. FWIW, I think your extent based approach offers a lot of worthwhile 
benefits, but just wanted to point out that something similarly CoW friendly 
(although less efficient) is possible without necessarily requiring invasive 
changes :-)

> Thanks for the feedback! Matthew.

Thanks for the response!


-- 
Reply to this email directly or view it on GitHub:
https://github.com/rpm-software-management/rpm/discussions/2057#discussioncomment-4337009
You are receiving this because you are subscribed to this thread.

Message ID: 
<rpm-software-management/rpm/repo-discussions/2057/comments/4337...@github.com>
_______________________________________________
Rpm-maint mailing list
Rpm-maint@lists.rpm.org
http://lists.rpm.org/mailman/listinfo/rpm-maint

Reply via email to