Hey- I'm an undergraduate CS major currently participating in Google's SoC for Inkscape. There was a project I almost submitted to Gnome, but ultimately decided it was too large to be doable in Google's timeframe; its still something I'm interested in doing though, and I was hoping to find out whether its something Gnome could support in a similar manner (minus the Google funds, naturally).
Briefly, I'd like to add something similar to the labeling functionality in gmail to the linux desktop; I've attached a (much!) more detailed proposal. I'm sure that the Gnome community would be happy to see me work on this, but I'm actually looking for something more along the lines of a straight yes/no as to whether its something the developers could provisionally endorse and provide mentoring for. If so, I'd work on it for credit as independent study through my university (Gnome would have no interaction with my school, I just wouldn't want to take something like this on without community assistance). Regardless, any feedback about the proposal is welcome. Thanks,
Greg
Gnome Labeling Interface OVERVIEW Since the rise of Unix, operating systems have presented the filesystem as a tree, and obliged users and developers to either organize their information in trees or create userspace datastructures that are inaccessible to the operating system. Many alternatives have been proposed and developed over the years, often under the name of "database filesystems", but have never caught on- possibly, the author feels, because the conceptual difference between the relational database model and the tree model is too large, and no system that did not offer backward compatibility could be accepted in the real world. Very recently though, a new metaphor has started to gain traction. Although it appears under different names, such as "labels" or "virtual folders" or simple boolean keyword searching, all these fundamentally work by organizing information in sets, not trees. I propose to add a powerful set metaphor for organizing files in Linux to Gnome, while taking advantage of the closer conceptual distance between trees and sets to implement its backend in a way that preserves intuitive access to files through the traditional filesystem interface. PROBLEM The limitations of the hierarchical filesystem model can be seen in a simple example. Suppose that you're downloading pictures of your cousin Alice's wedding. You currently store your pictures in one folder, but you're starting to have enough pictures for this to become unwieldly. Should you create an "Alice" folder, a "Wedding" folder, an "Alice's Wedding" folder, an "Alice" folder containing a "Wedding" folder, or a "Weddings" folder containing an "Alice" folder? Or suppose that you've organized your music by genre, with Willie Nelson filed under "Country", then you buy Nelson's new reggae album. This category of organiztional problem is currently solved in userspace by music and photo management applications. Neurotic users can even develop partial solutions in the filesystem by using links. But in the year 2005, enough application domains have encountered these issues that the creation of a general solution has become worthwhile. PROPOSED SOLUTION I propose to add "labeling" functionality to the Gnome desktop. My implementation would place high importance on making it usable by as many existing applications as possible, by preserving backwards compatibility with the regular filesystem. I'll discuss the data model it would provide first, then the way it would (initially) be implemented. Data Model In brief, this project would implement something often called a "document store." It would be expected to be used to store the kinds of files usually considered to be "documents", though its not limited to that. The document store is intended to implement, from the user's perspective, labeling functionality similar to that found in GMail. At its simplest, the store would contain two kinds of objects: labels and files. Labels are unique strings representing sets of files. Files are bitstreams, essentially identical to traditional files (including traditional metadata like Unix permissions, modification times, etc.), with one exception: they don't have filenames. In the context of the document store, files are instead accessed purely by querying for some label or combination of labels, and selecting one of the search results. Processes themselves would access file contents either through the systems own api, or, more likely, by being given a working traditional filename for the file contents. The system described above, while useful, doesn't really add functionality compelling enough to make it preferrable to one of the already existing solutions. However, this proposal asserts that much more power becomes available when you add a third kind of object to the store: namespaces. The addition of namespaces preserves the simple GMail-like labeling and organizing functionality described above while providing a powerful hook for Gnome and other applications to add commonly-understandable metadata to files in a manner that is both easy to develop for and powerful enough to solve many of the data structuring problems traditionally dealt with through databases. Namespaces are ways to group collections of labels. They are NOT sets in the sense that labels are, because a label must belong to one and only one namespace, while files can belong to any number of labels (including labels from different namespaces). Every user would be given their own personal namespace, almost analagous to their home directory, in which they could create and delete whatever labels they liked. This is the namespace that would be used for the GMail-like functionality described above. However, by the creation of additional standardized namespaces, say, for freedb categories, it becomes possible, using only the label metaphor, to allow users to perform arbitrarily interesting queries, such as "find all jazz songs that I've marked as 'favorites'" in an application independent manner. And, by using the xml technique of allowing mnemonics to stand in for full namespaces, it even becomes possible to do this in a reasonably intuitive command-line format, such as "freedb:jazz and my:favorites". Other examples of possible standardized namespaces are mime-types, programming languages, content language (i.e. English, Spanish...), etc. Implmentation This functionality, while cool, would still probably not be successful if it required programs that wanted to access information in the store to be recoded to use the store API. But its possible to implement this so that all files in the store remain accessible as regular filesystem files too. In particular, its possible to view the traditional filesystem itself as simply a collection of fundamental objects (ultimately represented by inodes) that can be referred to by one or more names (hard links). Since directories are ultimately sets of files, the above data model can be implemented by creating a directory for every label, and putting a hard link to the file in the directory for every file that has a certain label applied to it. If label directories, in turn, are contained in directories representing namespaces, the whole model becomes accessible through the regular filesystem interface relatively intuitively. The whole store can be represented as a single root directory that contains a subdirectory for every namespace, each of which contains a subdirectory for every one of its labels, each of which contains a file hard link for every file that has that label applied to it. By using this implementation, you get the Unix permission scheme for free, and get reasonable file access performance (as opposed to backing the store with a relational database). Searching will have to be done through a separate index, probably initially backed by SQLite, and the index will have to remain synced by preventing direct file creation and deletion within the store (i.e. non root users must not have write access to the label directories). The store api is accessed through a service that is normally started with the gnome desktop (and that is run WITH write permission to the store directories). The real filesystem names of the files should be essentially random; this is a feature not a bug. Assuming the use of the 26 lowercase characters and the 10 digits, no filename should need to be more than 4 characters long. The details can be worked through later, but different links referring to the same inode would be given the same name in all label directories, and the canonical filename would be under, say, and "all" label under a reserved system namespace that contains links to every file object stored by the store. I'd like to implement this in Mono, but could be convinced otherwise. I also intend to put significant syntax restrictions on label names, such as banning whitespace, but could be convinced otherwise. USAGE To use this system in the real world, users would start some small program, maybe even a panel applet, that provides a search interface, and which allows users to see the the backend filename of files, or copy it to the clipboard. The details of this interface are important; we want to wean users from thinking in terms of file path, but still need to make that info available (perhaps search results will only be identified by a Nautilus-like preview image, but the currently selected result icon's path is visible in a separate pane). In real use, users will likely perform boolean label search (i.e. A and B or C not D), with the ability to sort results by size, last accessed, etc. In the distant future, if something like this were successful, the Gnome filechooser widget could be modified to include a store interface. NAMESPACE VERIFICATION This probably belongs under "DATA MODEL", but one feature I'm considering including, eventually, is something analagous to xml schema modification. While brainstorming this, I thought it might be useful to create 2 kinds of labels: normal ones, and ones that could only be applied to one object at a time, and thus could be used more like filenames. I then worried that this was an arbitrary decision; what if some developers wanted labels that could only be applied to two objects at once, or labels that are mutually exclusive (for example, an object should only have one mime-type label at a time). Ultimately, it seems that the needs of developers would be too diverse, and the only truly general solution would be to allow namespace creators to define these limitations in code. As such, I'm considering forcing namespaces to be real URLs that point to actual code of some sort that can be used to verify the validity of a namespace; if such code were provided, it could be run whenever a change to a namespace's label assignments occurred (i.e. whenever a label under a given namespace is removed or applied to an object). The performance implications would have to be studied, of course, but if it were possible to guarantee that the code runs in a sandbox with no access to libraries, and that the only input it operates on is the label assignments (as opposed to file contents themselves), performance could probably be made acceptable, at least if the store were used primarily as a store for documents, and not a general purpose filesystem. Because of these controlled execution requirements, as well as validator code portability needs, I'd like to implement this system in Mono. COMPARISONS TO OTHER PROJECTS The most exciting oss product currently targeting a similar problem domain appears to be Beagle. Because this system would be backed by the normal filesystem, it and Beagle would be able to exist side-by-side. This system attacks a similar problem from a different angle; while Beagle requries basically zero changes in how users use the filesystem, this system attempts to provide greater searching and organizational power by requiring a little bit more learning from users and developers (though still as little as possible). The product that initially convinved me that something like this was needed was Gnome Storage, whose development, sadly, appears to be in hiatus. However, I feel that Storage was probably too ambitious. As I understand it, it requires large changes in how applications and users work, and I think it would have a hard time becoming accepted even if development were active. That said, the brainstorming of this system began with the author asking himself, "Storage rocks. How could I try to solve the problems it does, but in a less ambitious way?" This system also took significant conceptual inspiration from the KDE Database Filesystem project. Apple's Spotlight and Microsoft's WinFS also target this problem domain, of course, and regardless of whether this proposal has a future, I hope that the OSS community is able to rally around some comparable solution over the next 2 years._______________________________________________ gnome-devel-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/gnome-devel-list
