OLPC Developers, Greg asked me to write a report of the solutions to the NAND fillup problem. We need to present a set of solutions to LATU as soon as possible so that they can establish what solution(s) are tenable for their deployment.
Several provisional solutions are in the works. I describe and evaluate them here. I seek comments, suggestions, and clarifications. Description: -= 1. Boot on a read-only filesystem =- For the 8.2.0 release our developers are working on getting the system to boot and run without any writes to the underlying filesystem. This allows us to reach a state in which the user has access to the journal and can start deleting items. -= 2. Automatically delete files from the datastore =- Chris Ball has produced a patch which will delete items from the datastore when we encounter a NAND-full situation at boot. It builds a list of files in the datastore, sorts them in order of size, and deletes them from largest to smallest until the system's free space falls below some threshold. -= 3. Boot on a union-mounted writeable filesystem =- A union mount (http://en.wikipedia.org/wiki/Union_mount) can be used to unify a read/write filesystem (typically a ram-backed tmpfs) and a read-only filesystem (such as a CD-R or a full jffs2 partition) into a single writeable filesystem. This arrangement allows us to boot Sugar and run applications without any code-level modifications. -= 4. Store a large file on jffs2 and delete when space is low =- This solution is roughly equivalent to the aufs solution except that boot is guaranteed when NAND is full by removing a large buffer file stored in the jffs2 root partition. Discussion: 1. Booting Sugar even on a read-only filesystem is a development goal for 8.2.0 (and as I understand has been mostly achieved for that purpose), but for Uruguay it may not be possible to push such a complex set of changes from our development branch into the builds which they have deployed. 2. This solution is by far the simplest and most sure to immediately resolve the problem. However, automatically deleting files seems to me to be at least a user-confusing solution to the NAND fillup problem. We are teaching children what to expect from computers. Absolute breakage due to storage media exhaustion is intelligible, but apparently random patterns of file deletion may confuse users. More problematically, the only metric which we can establish for automatic deletion is size, and this may bias deletions toward specific activities, perhaps ones needed or specifically desired by users. Despite these concerns, we must acknowledge that such a change is certainly within the realm of feasibility for Uruguay's deployment and will at least resolve a mounting support problem caused by NAND fillup. In my opinion, this should be considered a viable failsafe solution. 3. This solution has been tested, and verified to boot Sugar and launch activities on an otherwise unmodified 656 system with a full NAND. To boot on a union-mount, all that is required is the addition of the aufs module (Another UnionFS) to the initramfs and a patch to the initscripts to check if the system has passed the NAND fill threshold. A small amount of work is still required to update the Journal to delete items from the jffs2 partition when the system is running on a union mount. Further work could be completed to force the user to delete items, but it may be sufficient to simply alert the user to the fact that the system will not save any data between reboots until they delete enough items from their journal. We will also have to convey some information to the user about how close they are to the fillup threshold. 4. This solution has not yet been tested, but it seems likely to work, to similar effect as #3. It presents us with a slighly different set of issues; namely, we must manage the episodic creation and deletion of a large file. We also must forbid the user from creating more data while the buffer file is not in existence, lest we decrease the amount of buffer available or end up with an unbootable system. This requires a much more stringent recovery console than #3, such as an X session only running an instance of the journal activity. Furthermore, depending on the size of the file, some percentage of deployed systems may fall into NAND-full territory during upgrade. Erik _______________________________________________ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel