Re: OT: A Good Data Structure for a Large Binary Tree

Steve Nelson Mon, 01 Mar 2004 20:23:44 -0800

(excuse my super long email, but you might like this)

I've been fascinated with binary trees for years. So much so that I've
designed my own. I have used three different methods of binary trees.
I'm sure there are other techniques, but I haven't tried them. Here are
the advantages and disadvantages of the three methods that I know of.

Method #1: Parent/Child Tree
-----------------------------------------
ID              ParentID              NodeData

This is the most commonly known, used and understood tree. It requires 3
fields: ID, ParentID and NodeData. The ID and ParentID are joined
together to create the tree. The root node will have a predefined value
for the ParentID. Usually it's just left NULL. To obtain the children
you just look for records where parentID=#ID#, one at a time. uggh.

Advantages:
-Inserts and updates are very fast. They do not require any
restructuring of parent/child data
-Pretty simple to understand
-multiple parents can be easily defined by breaking it into two tables.
This is the only real advantage to this method.

Disadvantages:
-Linear math, requires recursion
-Selects and Deletes are recursive. This becomes slower the larger the
set of data is
-Calculating the current level of a node is recursive. This becomes
slower the larger the set of data is

My advice on method #1. In most applications, selecting data is done far
more than inserts and updates, so I wouldn't recommend using this
method.   While this is a good technique to understand, I would
recommend learning the two below.

Method #2: Nested Set Model
-----------------------------------------
ID         startRow            endRow          NodeData

This type of tree is gaining popularity, but it is rather difficult to
understand. I think it was invented by Joe Celko, but I might be wrong.
It requires 4 fields: ID, startRow, endRow and NodeData. The ID and the
NodeData should be obvious, the startRow and endRow confuse most people.
Think of it like XML. XML has an open tag and a close tag. Everything
between those tags are children. Same thing.

Advantages:
-Based on set mathematics. You remember those Venn diagrams from high
school? That's set math. DBMSs are designed around set Math
-Selects are lightning fast and pretty simple. They do not require any
recursion. If you want to define a tree of children, you do this:
"select ID where startRow < #startrow# and endRow > #endrow#
-deletes are much faster than Parent/child trees
-Calculating the current level of a node is fairly easy, but it requires
a subquery. uggh

Disadvantages:
-it is much more difficult to understand than the parent/child model
-inserts, updates and deletes are more difficult to understand because
it requires rebuilding a portion of the tree.
-inserts, updates are a little slower than parent/child trees, because
all child data has to be reset. Of course, DBMSs are designed to deal
with large sets, so a good database can do this with no sweat.
-Some DBMSs can't handle sub queries, so calculating current level may
not be possible depending on the DBMS.
-Because calculating the current level uses sub-queries, it seems to me
that it is recursive.
-as far as i know you can't have multiple parents with this method

My advice on method #2. This method is not for the faint of heart, but
worth the effort. Selecting data is super fast with this method (same as
method 3, below). I would highly recommend learning it. Buy a copy of
"SQL for Smarties". He explains the theories and practice. If your
project requires multi-parent trees, you might need to go back to method
#1

Method #3: Steve's spinoff of the nested set Model (I'd love help with a
correct name for this)

-----------------------------------------
ID          indent             order                NodeData

As far as I know, I invented this. But if i'm a lunatic, someone correct
me. This model is another implementation of a set math tree inspired by
Joe Celko's model. I was helping one of my clients build a tree and was
having difficulty remembering how to use the nested set model. Afraid of
losing face, I invented this new method and never looked back. It turns
out this is my favorite of the three. It's a little strange, but it's
really fast (defining current levels is faster than method 2)

Advantages:
-Based on set mathematics just like the nested set model. Instead of
using startRow and endRow to find the boundaries, it uses indent and
order.
-Selects are lightning fast (same relative speed as method 2). They do
not require any recursion. If you want all children you do this: "select
ID where order > #currentOrder# and indent < #currentIndent#
-deletes are much faster than Parent/child trees, same speed as method 2

-Calculating the current level of a node is super easy because there is
no calculating! Just select the "indent" field. MUCH faster than method
1 and 2
-doesn't require sub-Queries. yeah!

Disadvantages:
-it is more difficult to understand than the parent/child model
-inserts, updates and deletes are more difficult to understand because
it requires rebuilding a portion of the tree.
-inserts, updates are a little slower than parent/child trees, because
all child data has to be reset. Of course, DBMSs are designed to deal
with sets so a good database can do this with no sweat. It requires the
same amount of processing as method 2
-as far as i know you can't have multiple parents with this method

My advice on method #3: Like method 2, this is not for the faint of
heart. But, try it! You'll like it.

If you want to see this third method in action, download the free
version of my Prototype Tool Kit, and look at the code in the
/tools/devnotes/ folder.

http://www.secretagents.com/index.cfm?fuseaction=tools.prototypetoolkit

Hope that helps.

Steve Nelson

Joe Eugene wrote:

> I am trying to solve a Binary Tree data structure problem, i think
> this
> can be done from a DataBase Perspective with relations but then that
> might involve doing something like a matrix to develop the relations
> between nodes.
>
> The other thought i have is solve the problem by using some
> native(Java/C++)
> data Structure (Binary Tree /TreeMap) and store keys of the database
> structure
> as keys of the Binary Tree... that might relate to a simple select.
>
> Any ideas are much appreciated.
>
> Thanks,
> Joe Eugene
>

[Todays Threads] [This Message] [Subscription] [Fast Unsubscribe] [User Settings]

Re: OT: A Good Data Structure for a Large Binary Tree

Reply via email to