I think I need to clarify.

--- Oleg Kobchenko <[EMAIL PROTECTED]> wrote:

> How can there be 45Gb worth of words?
unique words.

> Wouldn't it make sense to partition the
> word index?
from the rest of the data (make "inverted").

> --- Elmer Fittery <[EMAIL PROTECTED]> wrote:
> 
> > My thoughts on this are:
> > 
> > It isn't so much the fastest sort that I was interested in, but rather
> > the "trie" structure itself.  The sort is fast because of the "trie"
> > structure.
> > 
> > I found this description which might be of interest:
> > 
> > The term "trie" comes from the word retrieval, but is generally
> > pronounced like "try". A trie is a tree structure (or directed acyclic
> > graph), the nodes of which represent letters in a word. For example, the
> > final lookup for the word 'bob' would look something like
> > $ref->{'b'}{'o'}{'b'}{'00'} (the 00 being an end marker). Only nodes
> > which would represent words in the trie exist, making the structure
> > slightly smaller than a hash of the same data set.
> > 
> > The advantages of the trie over other data storage methods is that
> > lookup times are O(1) WRT the size of the index. For sparse data sets,
> > it is probably not as efficient as performing a binary search on a
> > sorted list, and for small files, it has a lot of overhead. The main
> > advantage (at least from my perspective) is that it provides a
> > relatively cheap method for finding a list of words in a large (45 GB),
> > dense data set which begin with a certain string.
> > 
> > The term "word" in this documentation can refer to one of two things:
> > either a reference to an array of strings, or a scalar which is not a
> > reference. In the case of the former, each element of the array is
> > treated as a "letter" of the "word". In the case of the latter, the
> > scalar is evaluated in string context and it is split into its component
> > letters. Return values of methods match the values of what is passed in
> > -- that is, if you call lookup() with an array reference, the return
> > value will be an array reference (if appropriate).
> > 
> > 
> > 
> > > ------------------------------
> > > 
> > > Message: 2
> > > Date: Thu, 2 Aug 2007 21:56:57 -0400
> > > From: "Devon McCormick" <[EMAIL PROTECTED]>
> > > Subject: Re: [Jprogramming] burstsort - using Burst Tries: fastest
> > >   sort?
> > > To: "Programming forum" <[email protected]>
> > > Message-ID:
> > >   <[EMAIL PROTECTED]>
> > > Content-Type: text/plain; charset=ISO-8859-1
> > > 
> > > I would be happy to let Roger implement this for J, just as he has already
> > > implemented very good random number generators and factoring algorithms.
> > > 
> > > However, for my purposes, "grade" is usually more useful than "sort".
> > > 
> > > I don't mean this as a put-down.  Any real breakthrough in sorting is
> > > potentially
> > > of great benefit to us all, but not necessarily at the level of a J
> > > application.
> > > The power of the language is in encapsulating high-level concepts, not
> > > in the low-level
> > > details of the implementation of a well-known concept like sorting.
> > 
> > 
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> > 
> 
> 
> 
>        
> ____________________________________________________________________________________
> Be a better Globetrotter. Get better travel answers from someone who knows. 
> Yahoo! Answers -
> Check it out.
> http://answers.yahoo.com/dir/?link=list&sid=396545469
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
> 



       
____________________________________________________________________________________
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, 
photos & more. 
http://mobile.yahoo.com/go?refer=1GNXIC
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to