[eeps] EEP XXX: B-trees: balanced search trees of order n. (new EEP)

Walter Weinmann Mon, 02 Jan 2017 00:29:02 -0800

Please find attached my first proposal.

Regards


Walter Weinmann

    Author: Walter Weinmann <[email protected]>
    Status: Draft
    Type: Standards Track
    Created: 06-Dec-2016
    Erlang-Version: OTP 20.0
    Post-History:
****
EEP XXX: B-trees: balanced search trees of order n.
----



Abstract
========

This EEP proposes the creation of a new module named b_trees for the 
administration of b-trees.  Both the optional persistence and the sort 
order should be implemented by pluggable functionality.



Copyright
=========

This document has been placed in the public domain.



Specification
=============


## Data Structure ##

    {MinimumSubtrees, 
     MaximumKeys, 
     SizeKeyValues, 
     SortFunction/2, 
     State, 
     Tree}

`Tree` is composed of nodes of the form 

    {KeyNumber, 
     SubtreeNumber, 
     [{Key, Value}], 
     [Tree]} 

and the "empty b-tree" node 

    nil

`State` is a tuple composed of the following parameters: 

    {StateTarget, 
     DeleteFunction/3, 
     InsertFunction/3, 
     LookupFunction/3} 

Since the b-trees are always balanced, there is no need for a balance 
operation.


## DATA TYPES ##

    b_tree() = {pos_integer(), 
                pos_integer(), 
                non_neg_integer(), 
                sort_function(), 
                state(), 
                tree()}

A general balanced tree.

    iterator() = [{key_values(), subtrees()}]

A general balanced tree iterator.


## EXPORTS ##

### delete(Key, B-Tree1) -> B-Tree2 ###

Types:

    Key = any()
    B-Tree1 = B-Tree2 = b_tree()

Removes the node with key Key from b-tree B-Tree1 and returns the new 
b-tree B-Tree2.  Assumes that key Key is present in b-tree B-Tree1, 
crashes otherwise.

### delete_any (Key, B-Tree1) -> B-Tree2 ###

Types:

    Key = any()
    B-Tree1 = B-Tree2 = b_tree()

Removes the node with key Key from b-tree B-Tree1 if key Key is present 
in b-tree B-Tree1, otherwise does nothing.  Returns the new b-tree B-Tree2.

### empty (Order) -> B-Tree ###

Types:

    Order = pos_integer()
    B-Tree = b_tree()

Returns a new empty b-tree.  The order is defined as the maximum number 
of children nodes a non-leaf node may hold.

### enter (Key, Value, B-Tree1) -> B-Tree2 ###

Types:

    Key = any()
    Value = any()
    B-Tree1 = B-Tree2 = b_tree()

Inserts key Key with value Value into b-tree B-Tree1 if key Key is not 
present in b-tree B-Tree1, otherwise updates the current value of key Key 
to value Value in b-tree B-Tree1.  Returns a new b-tree B-Tree2.

### from_dict (B-Tree1, List) -> B-Tree2 ###

Types:

    B-Tree1 = B-Tree2 = b_tree()
    List = [{Key, Value}]


Turns an ordered list List of key value tuples into a b-tree.  The given 
b-tree B-Tree1 must be empty.  The list must not contain duplicate keys.

### get (Key, B-Tree) -> Value ###

Types:

    Key = any()
    B-Tree = b_tree()
    Value = any()
    
Retrieves the value stored with key Key in b-tree B-Tree.  Assumes that 
key Key is present in b-tree B-Tree, crashes otherwise.

### height (B-Tree) -> integer() >= 0 ###

Types:

    B-Tree = b_tree()
    
Returns the height of b-tree B-Tree as an integer.  Assumes that b-tree 
B-Tree is non-empty.

### insert (Key, Value, B-Tree1) -> B-Tree2 ###

Types:

    Key = any()
    Value = any()
    B-Tree1 = B-Tree2 = b_tree()
    
Inserts key Key with value Value into b-tree  B-Tree1 and returns the new 
b-tree B-Tree2.  Assumes that key Key is **not** present in b-tree B-Tree1, 
crashes otherwise.

### is_defined (Key, B-Tree) -> boolean() ###

Types:

    Key = any()
    B-Tree = b_tree()

Returns `true` if key Key is present in b-tree B-Tree, otherwise `false`.

### is_empty (B-Tree) -> boolean() ###

Types:

    B-Tree = b_tree()
    
Returns `true` if b-tree B-Tree is an empty b-tree, otherwise `false`.

### iterator (B-Tree) -> Iterator ###

Types:

    B-Tree = b_tree()
    Iterator = iterator()
    
Returns iterator Iterator that can be used for traversing the entries 
of b-tree B-Tree; see `next/1`.  The implementation of this iterator is 
very efficient; traversing the whole b-tree using `next/1` is only slightly
slower than getting the list of all key-value pairs using `to_list/1` 
and traversing that.  The main advantage of the iterator approach is that 
it does not require the complete list of all key-value pairs to be built 
in memory at one time.

### iterator_from (Key, B-Tree) -> Iterator ###

Types:

    Key = any(9
    B-Tree = b_tree()
    Iterator = iterator()
    
Returns iterator Iterator that can be used for traversing the entries 
of b-tree B-Tree; see `next/1`.  The difference, as compared to the 
iterator returned by iterator/1, is that the first key greater than 
or equal to key Key is returned.

### keys (B-Tree) -> [Key] ###

Types:

    B-Tree = b_tree()
    Key = any()
    
Returns the keys in b-tree B-Tree as an ordered list.

### largest (B-Tree) -> {Key, Value} ###

Types:

    B-Tree = b_tree()
    Key = any()
    Value = any()
    
Returns a tuple {Key, Value}, where Key is the largest key in b-tree 
B-Tree, and Value is the value associated with this key.  Assumes that 
b-tree B-Tree is not empty.

### lookup (Key, B-Tree) -> none | {value, Value} ###

Types:

    Key = any()
    B-Tree = b_tree()
    Value = any()
    
Looks up key Key in b-tree B-Tree. Returns {value, Value}, or none if 
key Key is not present.

### map (Function, B-Tree1) -> B-Tree2 ###

Types:

    Function = fun((Key, Value1) -> Value2)
    B-Tree1 = B-Tree2 = b_tree()
    Key = any()
    Value1 = Value2 = any()
    
Maps function Function(Key, Value1) -> Value2 to all key value pairs of 
b-tree B-Tree1.  Returns the new b-tree B-Tree2 with the same set of 
keys as b-tree B-Tree1 and the new set of values.

### next (Iterator1) -> 'none' | {Key, Value, Iterator2} ###

Types:

    Iterator1 = Iterator2 = iterator()
    Key = any()
    Value = any()
    

Returns the tuple {Key, Value, Iterator2}, where Key is the smallest 
key referred to by iterator Iterator1, and iterator Iterator2 is the 
new iterator to be used for traversing the remaining nodes, or the 
atom '**none**' if no nodes remain.

### set_parameter (B-Tree1, Name, Value) -> B-Tree2 ###

Types:

    B-Tree1 = B-Tree2 = b_tree()
    Name : Value = sort  : Function = fun((Key1, Key2) -> equal | 
                                                          greater | 
                                                          less)
                 | state : {StateTarget, 
                            Function = fun(StateTarget, delete, Key) 
                                     -> true,
                            Function = fun(StateTarget, insert, Subtrees) 
                                     -> Key,
                            Function = fun(StateTarget, lookup, Key) 
                                     -> Subtrees}
    
Sets the parameter Name to value Value in the empty b-tree B-Tree1 and 
returns the new b-tree B-Tree2.  This function can only be used in 
conjunction with an empty b-tree.

### size_key_values (B-Tree) -> integer() >= 0 ###

Types:

    B-Tree = b_tree()

Returns the number of key value pairs in b-tree B-Tree as an integer.  
Returns 0 (zero) if b-tree B-Tree is empty.

### size_nodes (B-Tree) -> {integer() >= 0, integer() >= 0} ###

Types:

    B-Tree = b_tree()

Returns the number of total nodes and the number of leaf nodes in b-tree 
B-Tree as a tuple of two integers.  Returns {0, 0} (zero) if b-tree B-Tree 
is empty.

### smallest (B-Tree) -> {Key, Value} ###

Types:

    B-Tree = b_tree()
    Key = any()
    Value = any()
    
Returns tuple {Key, Value}, where Key is the smallest key in b-tree 
B-Tree, and Value is the value associated with this key.  Assumes that 
b-tree B-Tree is not empty.

### sort_ascending (Key1, Key2) -> 'equal' | 'greater' | 'less' ###

Types:

    Key1 = Key2  = any()
    equal = greater = less = atom()
    
Returns the atom '**greater**' if Key1 > Key2, the atom '**less**' 
if Key1 < Key2 and otherwise the atom '**equal**'.

### sort_descending (Key1, Key2) -> 'equal' | 'greater' | 'less' ###

Types:

    Key1 = Key2  = any()
    equal = greater = less = atom()
    
Returns the atom '**less**' if Key1 > Key2, the atom '**greater**' 
if Key1 < Key2 and otherwise the atom '**equal**'.

### take(Key, B-Tree1) -> B-Tree2 ###

Types:

    Key = any()
    B-Tree1 = B-Tree2 = b_tree()

Removes the node with key Key from b-tree B-Tree1 and returns the new 
b-tree B-Tree2.  Assumes that key Key is present in b-tree B-Tree1, 
crashes otherwise.

### delete_any (Key, B-Tree1) -> B-Tree2 ###

Types:

    Key = any()
    B-Tree1 = B-Tree2 = b_tree()

Removes the node with key Key from b-tree B-Tree1 if key Key is present 
in b-tree B-Tree1, otherwise does nothing. Returns the new b-tree B-Tree2.

### take_largest (B-Tree1) -> {Key, Value, B-Tree2} ###

Types:

    B-Tree1 = B-Tree2 = b_tree()
    Key = any()
    Value = any()
    
Returns tuple {Key, Value, B-Tree2}, where Key is the largest key in 
b-tree B-Tree1, Value is the value associated with this key, and b-tree 
B-Tree2 is this b-tree with the corresponding key value pair deleted.  
Assumes that b-tree B-Tree1 is not empty.

### take_smallest (B-Tree1) -> {Key, Value, B-Tree2} ###

Types:

    B-Tree1 = B-Tree2 = b_tree()
    Key = any()
    Value = any()
    
Returns tuple {Key, Value, B-Tree2}, where Key is the smallest key in 
b-tree B-Tree1, Value is the value associated with this key, and b-tree 
B-Tree2 is this b-tree with the corresponding key value pair deleted.  
Assumes that b-tree B-Tree1 is not empty.

### to_list (B-Tree) -> [{Key, Value}] ###

Types:

    B-Tree = b_tree()
    Key = any()
    Value = any()

Converts b-tree B-Tree into an ordered list of key value tuples.

### update (Key, Value, B-Tree1) -> B-Tree2 ###

Types:

    Key = any()
    Value = any()
    B-Tree1 = B-Tree2 = b_tree()


Updates key Key to value Value in b-tree B-Tree1 and returns the new 
b-tree B-Tree2.  Assumes that key Key is present in b-tree B-Tree1.

### values (B-Tree) -> [Value] ###

Types:

    B-Tree = b_tree()
    Value = any()

Returns the values in b-tree B-Tree as an ordered list, sorted by their 
corresponding keys.  Duplicates are not removed.


## Pluggable Persistence Functionality ##

### Format: ###

    {StateTarget, DeleteFunction, InsertFunction, LookupFunction}
    
    StateTarget = any()
    
    DeleteFunction(StateTarget, delete, Key) -> true
    
    InsertFunction(StateTarget, insert, Subtrees) -> Key
    
    LookupFunction(StateTarget, lookup, Key) -> Subtrees

Examples for state targets are a Dets table or a Mnesia table.  The delete 
function takes a state target, the atom `delete` and a key as arguments 
and returns the atom `true` if successful.  The insert function takes a 
state target, the atom `insert` and a subtrees data structure as arguments 
and returns a key if successful.  The lookup function takes a state target, 
the atom `lookup` and a key as arguments and returns a subtrees data 
structure if successful.

### Example functions: ###

The following examples are based on Mnesia.

    persistence_by_mnesia(_, delete, SubtreesKey) 
                         when is_list(SubtreesKey) ->
        true;
    persistence_by_mnesia(StateTarget, delete, SubtreesKey) ->
        F = fun() ->
            ok = mnesia:delete({StateTarget, SubtreesKey}),
            true
        end,
        mnesia:activity(transaction, F);
        
    persistence_by_mnesia(_, insert, []) ->
        [];
    persistence_by_mnesia(StateTarget, insert, 
                          [{_, _, [{Key, _} | _], _} | _] = Subtrees) ->
        SubtreesKey = list_to_binary(Key),
        F = fun() ->
            ok = mnesia:write(StateTarget, 
                              #subtrees{subtreesKey = SubtreesKey, 
                              subtrees = Subtrees}, write),
            SubtreesKey
        end,
        mnesia:activity(transaction, F);
        
    persistence_by_mnesia(_, lookup, SubtreesKey) 
                         when is_list(SubtreesKey) ->
        SubtreesKey;
    persistence_by_mnesia(StateTarget, lookup, SubtreesKey) ->
        F = fun() ->
            [{subtrees, SubtreesKey, Subtrees}] = mnesia:read(StateTarget, 
                                                              SubtreesKey),
            Subtrees
        end,
    mnesia:activity(transaction, F).

### Example usage: ###

Creating the Mnesia table:

    -record(subtrees, {subtreesKey, subtrees}).
        
    {atomic, ok} = mnesia:create_table(StateTargetName, [{record_name, 
                                                          subtrees}]),

Creating the b-tree:

    BTree1 = b_trees:empty(500),
    BTree2 = b_trees:set_parameter(BTree1, state, 
                                   {StateTargetName, 
                                    fun persistence_by_mnesia/3, 
                                    fun persistence_by_mnesia/3, 
                                    fun persistence_by_mnesia/3}),


## Pluggable Sort Functionality ##

### Format: ###

    FunctionName(Key1, Key2) -> equal | greater | less
    
    Key1 = Key2 = any()

The sort function takes two keys as arguments and returns the atom `less` 
if Key1 < Key2, the atom `greater` if Key1 > Key2 and otherwise the 
atom `equal`.

### Example function: ###

    -spec sort_descending(key(), key()) -> sort_result().
    
    sort_descending(Key_1, Key_2) ->
    if
        Key_1 < Key_2 -> greater;
        Key_1 > Key_2 -> less;
        true -> equal
    end.
    
### Example usage: ###

    BTree1 = b_trees:empty(500),
    BTree2 = b_trees:set_parameter(BTree1, sort, fun sort_descending/2),



Motivation
==========

B-trees are self-balancing tree data structures that keep data sorted 
and allow searches, sequential access, insertions, and deletions in 
logarithmic time.  B-trees are a generalization of a binary search 
trees in that a node can have more than two children.  Unlike self-balancing 
binary search trees, the b-tree is optimized for systems that read and 
write large blocks of data.  B-trees are a good example of a data structure 
for external memory.



Rationale
=========

The functional design of the module b_trees is based on the module gb_trees:  

  b_trees          | gb_trees
 ------------------|---------
  n/a              | balance/1
  delete/2         | delete/2
  delete_any/2     | delete_any/2
  empty/1          | empty/0
  enter/3          | enter/3
  from_dict/2      | from_orddict/1
  get/2            | get/2
  height/1         | n/a
  insert/3         | insert/3
  is_defined/2     | is_defined/2
  is_empty/1       | is_empty/1
  iterator/1       | iterator/1
  iterator_from/2  | iterator_from/2
  keys/1           | keys/1
  largest/1        | largest/1
  lookup/2         | lookup/2
  map/2            | map/2
  next/1           | next/1
  set_parameter/3  | n/a
  size_key_values/1| size/1
  size_nodes/1     | n/a
  smallest/1       | smallest/1
  sort_ascending/2 | n/a
  sort_descending/2| n/a
  take/2           | take/2
  take_any/2       | take_any/2
  take_largest/1   | take_largest/1
  take_smallest/1  | take_smallest/1
  to_list/1        | to_list/1
  update/3         | update/3
  values/1         | values/1

The functions `delete/2` and `insert/3` are implementations of the algorithms 
of Cormen, Thomas; Leiserson, Charles; Rivest, Ronald; Stein, Clifford (2009), 
Introduction to Algorithms (Third ed.), MIT Press and McGraw-Hill, pp. 484-504, 
ISBN 0-262-03384-4. Chapter 18: B-Trees.



Backwards Compatibility
=======================

No issues - except module name collisions.



Reference Implementation
========================

The reference implementation can be fetched from Github:

    https://github.com/walter-weinmann/b_trees



[EmacsVar]: <> "Local Variables:"
[EmacsVar]: <> "mode: indented-text"
[EmacsVar]: <> "indent-tabs-mode: nil"
[EmacsVar]: <> "sentence-end-double-space: t"
[EmacsVar]: <> "fill-column: 70"
[EmacsVar]: <> "coding: utf-8"
[EmacsVar]: <> "End:"
[VimVar]: <> " vim: set fileencoding=utf-8 expandtab shiftwidth=4 softtabstop=4: "

_______________________________________________
eeps mailing list
[email protected]
http://erlang.org/mailman/listinfo/eeps

[eeps] EEP XXX: B-trees: balanced search trees of order n. (new EEP)

Reply via email to