[algogeeks] Re: Best way for a tree data structure for fast updating?

Gene Sat, 20 May 2006 07:36:19 -0700

Kevin wrote:
> Thanks.
>
> That's what I am thinking: a friend told me that, sometimes a linear
> look up can be faster than the hashtable, especially when the size is
> "small". But I am not sure how "small" will be that (since in my case,
> I estimate the average children number of a node will be 100-500). So I
> am trying to explore all the possible methods. (In the program, the
> tree update will be done in millions of times. And I figured out it
> took the majority time of the program.)

The hash tables have Theta(1) per lookup performance.  That means no
matter how many children a node has, the performance will remain
roughly the same.  So you are trying for "constant factor" speedup. The
only way to go about this is to try experiments and see what works.

In the big picture, going after constant factor speedups is frequently
a dead end for lots of reasons. But I'll talk about it anyway.

Unordered lists of children will have Theta(n) time per lookup, which
means that as your trees get wider, your program will slow down
proportionally.  If your IDs are integers or short strings, you'll
probably find that lists will match hash tables at some value of n
between 5 and 20.   If your program will be used for a while, I'd never
use them.  Even if today's data produce a speed increase, someone
tomorrow will try to run something much bigger, and the program will
die.  This approach might give a speedup by using less memory.

If you can keep the child IDs in _sorted_ lists, then you can look them
up with binary search, which is Theta(log n) per lookup.  This means
performance will degrade only very slowly with subtree width (e.g. if
the width increases by a factor of 32, lookups will slow down by a
factor of 5). When data structures get smaller, cache and/or paging
performance can get better, and that can have a big impact.  The only
way you'll find out is to try it.

Finally, you can embed your tree in a binary search tree.  In this case
you would define a key to be a level number concatenated with a child
Id.

X-Google-Language: ENGLISH,ASCII-7-bit
Received: by 10.11.88.14 with SMTP id l14mr27431cwb;
        Sat, 20 May 2006 07:36:18 -0700 (PDT)
X-Google-Token: tB7KvwwAAADdp_OCIpJ1pCkIYH0P6DBK
Received: from 70.101.174.178 by 38g2000cwa.googlegroups.com with HTTP;
        Sat, 20 May 2006 14:36:17 +0000 (UTC)
From: "Gene" <[EMAIL PROTECTED]>
To: "Algorithm Geeks" <algogeeks@googlegroups.com>
Subject: Re: Best way for a tree data structure for fast updating?
Date: Sat, 20 May 2006 07:36:17 -0700
Message-ID: <[EMAIL PROTECTED]>
In-Reply-To: <[EMAIL PROTECTED]>
References: <[EMAIL PROTECTED]>
   <[EMAIL PROTECTED]>
   <[EMAIL PROTECTED]>
User-Agent: G2/0.2
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET 
CLR 1.1.4322),gzip(gfe),gzip(gfe)
Mime-Version: 1.0
Content-Type: text/plain

Kevin wrote:
> Thanks.
>
> That's what I am thinking: a friend told me that, sometimes a linear
> look up can be faster than the hashtable, especially when the size is
> "small". But I am not sure how "small" will be that (since in my case,
> I estimate the average children number of a node will be 100-500). So I
> am trying to explore all the possible methods. (In the program, the
> tree update will be done in millions of times. And I figured out it
> took the majority time of the program.)

The hash tables have Theta(1) per lookup performance.  That means no
matter how many children a node has, the performance will remain
roughly the same.  So you are trying for "constant factor" speedup. The
only way to go about this is to try experiments and see what works.

In the big picture, going after constant factor speedups is frequently
a dead end for lots of reasons. But I'll talk about it anyway.

Unordered lists of children will have Theta(n) time per lookup, which
means that as your trees get wider, your program will slow down
proportionally.  If your IDs are integers or short strings, you'll
probably find that lists will match hash tables at some value of n
between 5 and 20.   If your program will be used for a while, I'd never
use them.  Even if today's data produce a speed increase, someone
tomorrow will try to run something much bigger, and the program will
die.  This approach might give a speedup by using less memory.

If you can keep the child IDs in _sorted_ lists, then you can look them
up with binary search, which is Theta(log n) per lookup.  This means
performance will degrade only very slowly with subtree width (e.g. if
the width increases by a factor of 32, lookups will slow down by a
factor of 5). When data structures get smaller, cache and/or paging
performance can get better, and that can have a big impact.  The only
way you'll find out is to try it.

Finally, you can embed your tree in a binary search tree.  In this case
you would define a key to be a level number concatenated with a child
Id.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

[algogeeks] Re: Best way for a tree data structure for fast updating?

Reply via email to