How to manage around 1000 entries which will constantly be changing

Wagner, David --- Senior Programmer Analyst --- WGO Fri, 31 Mar 2006 15:12:58 -0800

        I need to handle a set of numbers where I want only the lowest 1000. 
There will be no rhyme or reason as the data comes in. I will do some 
calculations and take this total against the array  or hash or ?????  The 
number of calculations will be tremendous and either I come up with a way to 
handle the lowest 1000 efficiently or I have to write each line which will come 
 severl 100 million plus and pare it down from there.


        I tried just a simple search and with the 1000 entries, it slowed 
everything down to a crawl over time( just started at the front of the array 
and worked way through).  What the data will be is a total which is the key ( 
if same total comes through don't need, throw away) plus the supporting data 
which will between 600 to 800 bytes per key. Not large, but when you start 
processing the key and either do the move or ??? then starts to add up.

        I have looked at some of the modules on CPAN:

Tree::Binary::Search 

        A Binary Search Tree for perl 


Tree::RedBlack 

        This is an implementation of a red-black tree which is a type of 
balanced binary tree (to the best of my knowledge that is, I am sure I am 
simplifying it). Tree::Binary::Search does not attempt to balance the tree, so 
if you are looking for a balanced tree, you might try this.

Tree::BPTree

        This module implements a B+ tree, rather than a binary search tree. In 
the authors own words, "B+ trees are balanced trees which provide an ordered 
map from keys to values. They are useful for indexing large bodies of data. 
They are similar to 2-3-4 Trees and Red-Black Trees. This implementation 
supports B+ trees using an arbitrary n value." I am not quite sure exactly how 
B+ Tree's work, but I am intrigued but this module. It seems to me to be well 
tested module as well. If you are looking for a B+ Tree, I suggest giving it a 
look.

Tree::M

        In its own words, this module "implement M-trees for efficient 
'metric/multimedia-searches'". From what I can tell, this module is not a 
b-tree (binary search tree), but an m-tree, which is a tree optimized to handle 
multi-dimensional (spatial) data, such as latitude and longitude. It is a 
wrapper around a C++ library.

Tree::FP

        In the authors own words, "Tree:FP is a Perl implmentation of the 
FP-Tree based association rule mining algorithm (association rules == market 
basket analysis). For a detailed explanation, see "Mining Frequent Patterns 
without Candidate Generation" by Jiawei Han, Jian Pei, and Yiwen Yin, 2000. 
Contrarywise, most books on data mining will have information on this 
algorithm." While it sounds like a very cool thing, it is not a binary search 
tree.

Tree::Ternary
This is a ternary search trees, as opposed to a binary search tree. Similar, 
but different. If two nodes isn't enough for you, I suggest taking a look at 
this. These is also an XS based implementation Tree::Ternary_XS. 

        I read and looked at the docs, but nothing jumped out at me as that 
this might solve what I am trying to do. As part of the processing, it must 
remove obviously the highest value when it goes over 1000 also.

        Any ideas or thoughts would be greatly appreciated.

        Thanks!

Wags ;)



*******************************************************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
*******************************************************


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

How to manage around 1000 entries which will constantly be changing

Reply via email to