On 3 Aug 2015, at 15:26, Dave Scott <[email protected]> wrote:
> 
> Hi,
> 
> At the moment the mirage websites are deployed automatically roughly like 
> this:
> 
> - developer makes a pull request against code repo (e.g. 
> https://github.com/mirage/mirage-www)
> - travis builds and performs sanity checks
> - reviewer reviews and merges the change
> - travis builds a single Xen unikernel image and checks it into a deployment 
> repo (e.g. https://github.com/mirage/mirage-www-deployment)
> - the host pulls from the deployment repo and restarts the VM
> 
> The Xen unikernel is standalone: it contains all the code and data linked 
> together, consistent with the Mirage philosophy. However as the Mirage 
> websites gain new content, the amount of static data increases. Since this is 
> all ‘crunch’ed into the kernel binary it ends up being loaded into RAM and 
> sitting in the OCaml heap. Therefore the memory footprint of the unikernels 
> is slowly increasing over time. It’s obviously a bit of a killer if you want 
> to serve something genuinely big (say a video) from a low-memory device (a 
> little cubieboard2 perhaps)
> 
> I was wondering if we should move away from crunch, and use some other method 
> for static data. Mirage already supports static data
> 
> - from Irmin
> - from BLOCK devices formatted with FAT
> - from BLOCK devices containing tar-format data (new in Mirage 2.6.0)
> 
> I can think of 2 general approaches:
> 
> 1. during the existing build process, build both a kernel and a second binary 
> blob containing data which will become a BLOCK device. The deployment scripts 
> would simply have to attach the BLOCK devices in the VM configuration.
> 
> 2. check in the data files into a subdirectory in the deployment tree, and 
> make the deployment scripts perform the final conversion (to Irmin, FAT or 
> tar). This has the disadvantage that it leaves some of the final ‘linking’ to 
> the deployment scripts (which are currently outside the scope of the ‘mirage’ 
> tool) but it has the advantage that the individual data files should be 
> de-duped by git/Irmin, since their sha1 hashes should match. If this final 
> assembly stage gets more complicated, should the ‘mirage’ tool gain some 
> extra support for it (mirage configure; mirage build; … later on a different 
> host …; mirage deploy?)

I agree with Justin that 2 is better from a dedup perspective, and to maintain 
link-time flexibility.

One thought that occurs to me is that crunch would be far more efficient if it 
didn't link the data in twice.  Right now it stores the ML values as a string.  
I wonder if it would be better for them to be linked into a separate ELF 
section, and then exposed directly as zero-copy Cstructs from that area of 
memory that's already mapped in.

This would play well with the scheme for dynamic data as well -- a dynamic 
attach could do the equivalent of a Dynlink and make the same filesystem 
variables available.

> There’s also the issue of how best to handle secret volumes such as those 
> containing keys.

I think this definitely has to be handled in the deployment scripts and not the 
build time.

-anil
_______________________________________________
MirageOS-devel mailing list
[email protected]
http://lists.xenproject.org/cgi-bin/mailman/listinfo/mirageos-devel

Reply via email to