Re: [Jprogramming] burstsort - using Burst Tries: fastest sort?

Oleg Kobchenko Sat, 04 Aug 2007 10:25:29 -0700

How can there be 45Gb worth of words?

Wouldn't it make sense to partition the
word index?


--- Elmer Fittery <[EMAIL PROTECTED]> wrote:

> My thoughts on this are:
> 
> It isn't so much the fastest sort that I was interested in, but rather
> the "trie" structure itself.  The sort is fast because of the "trie"
> structure.
> 
> I found this description which might be of interest:
> 
> The term "trie" comes from the word retrieval, but is generally
> pronounced like "try". A trie is a tree structure (or directed acyclic
> graph), the nodes of which represent letters in a word. For example, the
> final lookup for the word 'bob' would look something like
> $ref->{'b'}{'o'}{'b'}{'00'} (the 00 being an end marker). Only nodes
> which would represent words in the trie exist, making the structure
> slightly smaller than a hash of the same data set.
> 
> The advantages of the trie over other data storage methods is that
> lookup times are O(1) WRT the size of the index. For sparse data sets,
> it is probably not as efficient as performing a binary search on a
> sorted list, and for small files, it has a lot of overhead. The main
> advantage (at least from my perspective) is that it provides a
> relatively cheap method for finding a list of words in a large (45 GB),
> dense data set which begin with a certain string.
> 
> The term "word" in this documentation can refer to one of two things:
> either a reference to an array of strings, or a scalar which is not a
> reference. In the case of the former, each element of the array is
> treated as a "letter" of the "word". In the case of the latter, the
> scalar is evaluated in string context and it is split into its component
> letters. Return values of methods match the values of what is passed in
> -- that is, if you call lookup() with an array reference, the return
> value will be an array reference (if appropriate).
> 
> 
> 
> > ------------------------------
> > 
> > Message: 2
> > Date: Thu, 2 Aug 2007 21:56:57 -0400
> > From: "Devon McCormick" <[EMAIL PROTECTED]>
> > Subject: Re: [Jprogramming] burstsort - using Burst Tries: fastest
> >     sort?
> > To: "Programming forum" <[email protected]>
> > Message-ID:
> >     <[EMAIL PROTECTED]>
> > Content-Type: text/plain; charset=ISO-8859-1
> > 
> > I would be happy to let Roger implement this for J, just as he has already
> > implemented very good random number generators and factoring algorithms.
> > 
> > However, for my purposes, "grade" is usually more useful than "sort".
> > 
> > I don't mean this as a put-down.  Any real breakthrough in sorting is
> > potentially
> > of great benefit to us all, but not necessarily at the level of a J
> > application.
> > The power of the language is in encapsulating high-level concepts, not
> > in the low-level
> > details of the implementation of a well-known concept like sorting.
> 
> 
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> 



       
____________________________________________________________________________________
Be a better Globetrotter. Get better travel answers from someone who knows. 
Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] burstsort - using Burst Tries: fastest sort?

Reply via email to