Hello, I am interested in contributing code for persistent binary search trees to the GLib project. This would actually count as credit towards a grad level data structures class for me, but I have a lot of experience working with GLib in the past, and developing C-based open source projects. First some background on what persistence means and why it would be nice to have available in an open source data structures library.
A persistent binary search tree is simply a BST that is versioned over changes, such that you can do a lookup in any previous version of the tree. An efficient implementation of this data structure requires O(1) additional space per insertion into/deletion from the tree. That is, after n insertions, creating n different versions of the tree, the total space required would be roughly 2*(space for a single version), while allowing you to search in any version of the tree. Searching in any version takes the same number of comparisons and pointer redirections as a tree with only a single version. Persistent BSTs are used for operations such as a Plane Sweep [1] in Computational Geometry. At the same time, the code behind the persistence can be generalized to provide an additional data structure that performs Fractional Cascading [2] for no extra cost to the BST implementation. There are currently no readily available open source implementations of a persistent BST. This would be a great tool for students, academic researchers, and other developers. I have noticed that GLib has a Binary Search Tree data structure, which currently uses an AVL tree. AVL trees can be made persistent, but the space requirement becomes O(log n) per insert/delete instead of O(1). There are 2 other BSTs that would work effectively to provide a persistent BST, while also maintaining a very efficient implementation for when persistence is not used. They are: - Treaps - Red-Black Trees Treaps can be expected to outperform Red-Black Trees for any sized tree. The constant in the search cost for a Treap is Approximately equal to AVL trees (and slightly less for large trees), and is always smaller than Red-Black Trees. (It's a small constant regardless.) Treaps are also much simpler to implement, requiring less complicated code. However Treaps are a randomized data structure, which some people try to stay away from. My proposed changes to the GTree API are the addition of the following functions: // returns the latest version of the tree available (starts from 0) guint g_tree_current_version (GTree *tree); // creates a new version of the tree, changes to the tree will no longer // affect the previous versions. returns the new current version guint g_tree_next_version (GTree *tree); For each lookup/traversal function, a new function which takes arguments (GTree *tree, guint version, ...) would be added, with existing functions becoming macros pointing to the new ones. This API allows a version to contain more than a single insert/delete, if desired, making them more of a logical entity. So my questions are this: 1) Is there interest in having a persistent binary search tree in the GLib library ? 2) If so, would there be resistance to using a random data strucuture, i.e. a Treap to implement them ? Thanks very much, Dana Jansens p.s. I sent this before subscribing, but its been stuck in the moderator queue for more than a week now, so I'm sending again. Apologies if you see this twice. [1] http://en.wikipedia.org/wiki/Sweep_line_algorithm [2] http://en.wikipedia.org/wiki/Fractional_cascading _______________________________________________ gtk-devel-list mailing list gtk-devel-list@gnome.org http://mail.gnome.org/mailman/listinfo/gtk-devel-list