Ryan Thompson wrote to Matt Dillon:
> Matt Dillon wrote to Ryan Thompson:
>
> > :> :> storage is rather inefficient for our table of about 2,850,000 members
> > :> :> (~2.1 GB total storage). There are 64M possible hash values in our
> > :> :> current implementation, and our record size is variable, but could be
> > :> :> safely fixed at about 1.5KB... So, total storage if all values were used
> > :> :> would be about 96GB. (See where I'm going with this?)
> > :...
> > :
> > :Remember, though, I'm not proposing a hash table. I have been lucky
> > :enough to design a "hash" function that will crunch down all possible
> > :values into about 64M unique keys--so, no collisions. I propose to use
> > :this function with address calculation. So, O(1) everything, for single
> > :random lookups, which is (virtually) all this data structure must deal
> > :with. Sorting isn't necessary, and that could be accomplished in O(n)
> > :anyway.
> >
> > Are you still talking about creating a 96GB file to manage 2.1GB worth
> > of data? I gotta say, that sounds kinda nuts to me! Cpu cycles are
> > cheap, I/O and memory is not.
>
> Hi, Matt! Thanks for the replies. I'll try and keep you interested ;-)
>
> Hmm... Perhaps you're still missing my original point? I'm talking about
> a file with 96GB in addressable bytes (well, probably a bunch of files,
> given logical filesize limitations, but let's say for simplicity's sake
> that we have a single file). It's actual "size" (in terms of allocated
> blocks) will be only a bit larger than 2.1GB. (Directly proportional to
> the used size of the dataset. Discrepancies only come into play when
> record size != block size, but that can be worked around somewhat)
>
> In other words, ls -ls will report the "size" as some ridiculously large
> number, will show a much smaller block count. So, assuming four records
> are added to the file on block boundaries, the file will actually only use
> four blocks... nowhere near 96GB!
>
> In the UNIX filesystem (ya, I know.. just pick one :-), size of file !=
> space allocated for file. Thus, my original questions were centered
> around filesystem holes. I.e., non-allocated chunks in the middle of a
> file. When trying to READ from within a hole, the kernel just sends back
> a buffer of zeros... which is enough to show that the record is not
> initialized.
If you prefer to read system documentation instead of me, see lseek(2) :-)
The lseek() function allows the file offset to be set beyond the end of
the existing end-of-file of the file. If data is later written at this
point, subsequent reads of the data in the gap return bytes of zeros (un-
til data is actually written into the gap).
I suppose gap == hole. Silly semantics. :-)
> Actually, something like an "exists" function for a record
> wouldn't touch the disk at all! When writing to a hole, the kernel simply
> allocates the necessary block(s). This is really fast, too, for creation,
> as the empty set can be written to disk with touch(1), and uses far less
> memory than virtual initialization or memory structures ;-)
>
> As an example, try
>
> fseek(f, 0-1, SEEK_SET);
> fputc('\n', f);
>
> And analyze the reported filesize, as well as the reported block count of
> the file. You should see a 2GB "size", and maybe 8K in allocated blocks
> (depends how you ran newfs ;-). This is behaviour that has been around
> since the original UFS.
>
>
> > Take the BTree example again -- if you think about it, all internal nodes
> > in the tree will fit into memory trivially. It is only the leaf nodes
> > that do not. So in regards to actual disk accesses you still wind up
> > only doing *ONE* for the btree, even if it winds up being four or five
> > levels deep.
>
> Right. And, actually, (without looking a bit more closely), I wouldn't be
> suprised if you could replace the four-line address calculation I have
> with your B+Tree structure and come up with the same result. Only
> difference would be a few hundred lines of code, much more testing, and
> quite a few megs of RAM... ;-)
>
> What you referred to as "nuts", above, is just a logical way to provide a
> huge address space for a set of data, without actually allocating blocks
> in the filesystem for the entire address space until they are used.
>
>
> > The result is that you will be able to create an index to your data,
> > which is only 2.8 million records, in around 32 bytes per record or
> > 89 Megabytes. Not 96GB. Since you can cache all the internal nodes
> > of the btree you are done. The machine will do a much better job
> > caching a 89MB index then a 96GB index.
> >
> > Do not make the mistake of equating cpu to I/O. The cpu required to
> > iterate through 4 levels in the btree (assuming 64 entries per node,
> > it only takes 4 to cover 16 million records) is *ZERO* ... as in,
> > probably less then a microsecond, whereas accessing the disk is going
> > to be milliseconds.
>
> CPU time for what I'm proposing is even closer to zero than for a tree...
> But, you're right, it doesn't make any real difference when compared to
> disk I/O... B-Trees are good for a lot of things. Address calculation
> can be really good, too, given a finite key set, and a way to represent
> that finite key set without wasting space.
>
>
> > > > A B+Tree will also scale with the size of the dataset being managed,
> > > > so you do not have to preallocate or prereserve file space.
> >
> > > So will address calculation + filesystem holes, and
> > > sufficiently large filesizes :-)
> >
> > Uh, not with a 50:1 file size factor it won't.
>
> See my above discussion on file size != allocated blocks.
>
> And, address calculation implies that no index need be created or
> maintained (in memory, or on disk). Memory utilization is practically
> nil, disk utilization is proportional to the actual data size, and all
> record operations are done with one seek (O(1)).
>
> Perhaps performance could be improved by keeping a small memory cache, but
> that is tangential to the actual data structure, and will increase
> performance by percentages, not orders of magnitude.
>
> >
> > -Matt
> >
>
>
> - Ryan
>
>
--
Ryan Thompson <[EMAIL PROTECTED]>
Network Administrator, Accounts
Phone: +1 (306) 664-1161
SaskNow Technologies http://www.sasknow.com
#106-380 3120 8th St E Saskatoon, SK S7H 0W2
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message