The wiki-version of this document can be found in http://wiki.laptop.org/go/XO_updater
enjoy /X On Tuesday 26 June 2007 14:55, Ivan Krstić wrote: IK> Software updates on the One Laptop per Child's XO laptop IK> ======================================================== IK> IK> IK> IK> IK> 0. Problem statement and scope IK> ============================== IK> IK> This document aims to specify the mechanism for updating software on the IK> XO-1 laptop. When we talk about updating software, we are referring both IK> to system software such as the OS and the core services controlled by IK> OLPC that are required for the laptop's basic operation, and about any IK> installed user-facing applications ("activities"), both those provided IK> by OLPC and those provided by third parties. IK> IK> IK> IK> IK> 1. System updater IK> ================= IK> IK> 1.1. Core goals IK> --------------- IK> IK> The three core goals of a software update tool (hereafter "updater") IK> for the IK> XO are as follows: IK> IK> * Security IK> Given the initial age group of our users, it is the only reasonable IK> solution to default to automatic detection and installation of IK> updates, both to be able to apply security patches in a timely IK> fashion, and to enable users to benefit from rapid development and IK> improvements in the software they're using. Automatic updates, IK> however, are a security issue unto themselves: compromising the IK> update system in any way can provide an attacker with the IK> ability to IK> wreak havoc across entire installed bases of laptops while IK> bypassing IK> -- by design -- all the security measures on the machine. IK> Therefore, IK> the security of the updater is paramount and must be its first IK> design goal. IK> IK> * Uncompromising emphasis on fault-tolerance IK> Given the scale of our deployment, the relatively high IK> complexity of IK> our network stack when compared to currently-common deployments, IK> the IK> unreliability of Internet connectivity even when available, and IK> perhaps most importantly our desire for participating countries to IK> soon begin customizing the official OLPC OS images to best suit IK> them, it is clear that our updater must be fault-tolerant. This is IK> both in the simple sense -- cryptographic checksums need to be used IK> to ensure updates were received correctly -- and in the more IK> complex IK> sense that the likelihood of a human error with regard to update IK> preparation goes up proportionally to the number of different base IK> OS images at play. A fault-tolerant updater will therefore allow IK> _unconditional_ rollback of the most recently applied IK> update. "Unconditional" here means that, barring the failure of IK> other parts of the system which are dependencies of the updater IK> (e.g. the filesystem), the updater must always know how to IK> correctly IK> unapply an applied update, even if the update was malformed. IK> IK> * Low bandwidth IK> For much the same reasons (project scale, Internet access scarcity IK> and unreliability) that require fault-tolerance from the updater, IK> the tool must take maximum care to minimize data transfer IK> requirements. This means, concretely, that a delta-based approach IK> must be utilized by the updater, with a "keyframe" or "heavy" IK> update IK> being strictly a fallback in the unlikely case an update path IK> cannot IK> be constructed from the available or reachable delta sets. IK> IK> IK> IK> 1.2. Design IK> ----------- IK> IK> It is given, due to requirements imposed by the Bitfrost security IK> platform, that a laptop will attempt to make daily contact with the IK> OLPC anti-theft servers. During that interaction, the laptop will post IK> its system software version, and the response provided by the IK> anti-theft service will optionally contain a relative URL of a more IK> recent OS image. IK> IK> If such a pointer has been received and the laptop is behind a known IK> school server, it will probe the school server via rsync at the provided IK> relative URL to determine whether the server has cached the update IK> locally. If the update is not available locally, the laptop will wait up IK> to 24 hours, checking approximately hourly whether the school server has IK> obtained the update. If at the end of this wait period the school server IK> still does not have a local copy of the update, it is assumed to be IK> malfunctioning, and the laptop will contact an upstream master server IK> directly by using the URL provided originally by the anti-theft service. IK> IK> In any of these three cases (school server has update immediately, IK> school server has update after delay, upstream master has update), we IK> say the laptop has 'found an update source'. IK> IK> Once an update source has been found, the laptop will invoke the IK> standard rsync tool over a plaintext (unsecured) connection via the IK> rsync protocol -- not piped through a shell of any kind -- to bring IK> its own files up to date with the more recent version of the IK> system. rsync uses a network-efficient binary diff algorithm which IK> satisfies goal 3. IK> IK> IK> IK> 1.3. Design note: peer-to-peer updates IK> -------------------------------------- IK> IK> It is desirable to provide "viral update" functionality at a later date, IK> such that two laptops with different software versions (and without any IK> notion of trust) can engage in an update to bring the laptop with the IK> older software fully up to date. IK> IK> However, determining how to provide this functionality securely, IK> efficiently and elegantly is not feasible on the Gen1 FRS IK> timeline. Therefore, laptop-to-laptop updates will NOT be a part of the IK> updater that ships with the FRS image, and are a candidate for release IK> 2-3 months after FRS. IK> IK> IK> IK> 1.4. Design note: rsync scalability IK> ----------------------------------- IK> IK> rsync is a known CPU hog on the server side. It would be absolutely IK> infeasible to support a very large number of users from a single rsync IK> server. This is far less of a problem in our scenario for three reasons: IK> IK> * High branching factor IK> In all normal circumstances, the vast majority of the rsync IK> traffic to our upstream servers will come from school servers, IK> not IK> individual laptops. If school servers are unavailable of IK> malfunctioning, it is not the case that there will be a flood of IK> requests from individual laptops, because it's likely that the IK> school servers are those laptops' only gateway to the Internet. IK> IK> * Element of randomness in anti-theft requests IK> Instead of hitting the update servers every hour on the hour, IK> the laptops are already including an element of randomness in IK> choosing IK> when to contact the anti-theft service. This random delay IK> propagates to IK> the rsync requests, as well. IK> IK> * In-depth stagger abilities on the server side IK> Because notification of new updates is performed by the anti- IK> theft IK> service which is aware of a laptop's locale, updates can be IK> staggered over several days by country, region, or any other IK> metric such as server load. IK> IK> Additionally, some optimizations can be added to rsync proper to aid IK> with our use case, but such engineering will need to wait until after IK> FRS. IK> IK> IK> IK> 1.5. Implementation IK> ------------------- IK> IK> In order to implement runtime file protection, Bitfrost relies on the IK> COW functionality of the Linux-VServer patchset. The functionality IK> imbues immutable hardlinks within a designated context with special IK> meaning: when broken by some destructive file operation, VServer will IK> replace these hardlinks with the content of the file they were pointing IK> to and apply the desired operation on the resulting copy. IK> IK> The XO updater will run in a special context to which the security IK> service has exposed the entire underlying filesystem as a COW copy. The IK> updater will update this COW copy in-place with rsync. This COW IK> mechanism simply ensures no excess authority lies with the updater; any IK> failures or vulnerabilities in it do not propagate to the rest of the IK> system. IK> IK> One file contained within each OS image will be its cryptographically IK> signed manifest; at the end of the rsync operation, the laptop will have IK> obtained that file. At this point, the updater will request that the IK> security service applies the update. Note that due to the nature of IK> rsync, we can stop and restart the network phase of a single update IK> several times as connectivity becomes available, and until we've IK> received the complete update. IK> IK> The security service will terminate the updater and then analyze the IK> manifest and confirm the modified files in the updater's context exactly IK> match the expected OS image end-state. If any discrepancy is discovered, IK> the updater context will be discarded and the update operation aborted. IK> IK> If the update is verified to be complete and correct, the security IK> service will mark it as such, and designate the files within it to be IK> the files exported into all newly-created containers. System service IK> containers will be restarted gracefully. If the the image manifest did IK> not contain a header identifying that image as a high-priority update, IK> the update process ends here. Restartable services have been restarted, IK> and the rest of the system will be initialized from the update on IK> reboot. IK> IK> If the update has been marked as high-priority, the user will be asked IK> to close applications and reboot his machine immediately. A timer will IK> run that will reboot the machine in 60 minutes if the user does not do IK> so. The high-priority timer can be disabled in the security center; its IK> purpose is merely to provide some extra protection to the youngest users IK> who cannot necessarily be expected to understand or comply with the IK> reboot request. IK> IK> On boot, the first initialization script to run will perform a IK> pivot_root operation to the directory that currently holds the OS image IK> marked bootable by the security service. With the example above, it IK> would be the directory that belonged to the updater's context. If a key IK> is depressed during boot, however, the pivot_root is performed to the IK> _old_ bootable context, and the user presented a dialog asking whether IK> she would like to make the rollback permanent. IK> IK> The kernel is the only special case to this handling: in the event that IK> a verified update contains an updated kernel, that kernel will be placed IK> into a predetermined place in the underlying filesystem by the security IK> service. OpenFirmware will preferentially boot this newer kernel unless IK> the rollback key combination is depressed during boot. IK> IK> Notice that the update operation has been reduced to a simple state IK> toggle between (any) two OS images. In so doing, we have satisfied goals IK> 1 and 2. IK> IK> IK> IK> IK> 2. Application updater IK> ====================== IK> IK> 2.1. Design IK> ----------- IK> IK> The XO eschews traditional dependency-based approaches to package IK> management, making application upgrades somewhat difficult. The problem IK> is compounded by the fact that Bitfrost does not permit applications to IK> update themselves in-place, which is a common update method on platforms IK> such as Mac OS X and Windows. IK> IK> When it comes to application updates, we wish to stay true to our goals IK> of security and low-bandwidth updates, but are willing to settle for IK> less fault tolerance as necessitated by the fact that most activities IK> won't be OLPC-written or maintained. IK> IK> The design should make it possible to have a single tool that can IK> ascertain the existence of updated versions of any currently installed IK> activities, and then fetch and install those updates. It should do so IK> bandwidth-efficiently, such that files that are unchanged between IK> activity versions aren't downloaded as part of the update, and also such IK> that identical resources files packaged by multiple activities are never IK> downloaded more than once, or not at all if they already exist on the IK> system. IK> IK> IK> IK> 2.2. Implementation IK> ------------------- IK> IK> A manifest file is added to the bundle format specification. The IK> manifest consists of the filename and strong cryptographic hash of every IK> file in the bundle. Another file is added, called 'origin', that IK> specifies a URL where updated activity bundles may be found, and a IK> public key which will be used to sign such updated bundles. IK> IK> When a global activity update is initiated, the updater enumerates the IK> origins for all installed activities, then probes each one in turn to IK> determine which activities have available updates. The resulting IK> activity list is the 'available update set'. IK> IK> The most up-to-date bundle for each activity in the set is accessed, and IK> the first several kilobytes downloaded. Since bundles are simple ZIP IK> files, the downloaded data will contain the ZIP file index which stores IK> byte offsets for the constituent compressed files. The updater then IK> locates the bundle manifest in each index and makes a HTTP request with IK> the respective byte range to each bundle origin. At the end of this IK> process, the updater has cheaply obtained a set of manifests of the IK> files in all available activity updates. IK> IK> A local database of manifests of all installed activities is kept, IK> pruned only to records for files larger than a set size, e.g. 50 IK> KB. The updater cross-references each manifest from the available IK> update set with the installed database, and then with other manifests IK> in the set. Files which exist locally and are also present in the IK> available update set aren't downloaded; the updater simply "plants" IK> the files in the right places. The same happens for identical files IK> present in multiple bundles in the available update set; they are only IK> downloaded once. IK> IK> After a bundle (minus any redundant files) has been downloaded, it is IK> unpacked and reassembled (if it needs any of the files that haven't been IK> downloaded because they already exist). Cryptographic signature IK> verification is performed. If remaining disk space is larger than a IK> particular margin, e.g. 20%, then the context containing the older IK> version of the activity bundle is kept around, and the user given the IK> ability to perform rollback on the activity update. Otherwise, the old IK> version bundle is destroyed. IK> IK> IK> IK> IK> IK> :Author IK> Ivan Krstić IK> ivan AT laptop.org IK> One Laptop per Child IK> http://laptop.org IK> IK> :Metadata IK> Revision: Draft-14 IK> Timestamp: Tue Jun 26 17:51:45 UTC 2007 IK> IK> IK> END IK> IK> IK> IK> -- IK> Ivan Krstić <[EMAIL PROTECTED]> | GPG: 0x147C722D IK> IK> _______________________________________________ IK> Devel mailing list IK> Devel@lists.laptop.org IK> http://lists.laptop.org/listinfo/devel IK> -- XA ========= Don't Panic! The Answer is 42 _______________________________________________ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel