Re: efficiently keeping a short list

Mr. Shawn H. Corey Tue, 11 Jul 2006 07:56:17 -0700

Jay Savage wrote:
> Not sure what kind of hardware you have, but it takes considerably
> less than a second for me. Not counting the sleep(), of course, which
> was just to guarantee unique timestamps for the example. Assuming the
> data being used for the keys (the "elements") is of a reasonable size
> (for most values of "a reasonable size") my example should easily
> handle 10's of thousands of "elements" (without the sleep. ;) ).
> 
> Don't believe me? Try this:


No thank you, you have change the script from using seconds to fraction
of seconds. If you had done this the first time around, my comments
would be different (see below).

8< snip

> Yes the problem here is that you're pretty much guaranteed to overflow
> $id at some point, given a sufficienly long-running process, even
> discounting the size of the hash. That's why I went for time() or
> Time::HiRes::time() instead of an autoincrement; no muss, no fuss.
> Unless you're still running your script on the same hardware in 2038.
> 

$id is no more likely to overflow than Time::HiRes::time. Perl is not C.
In Perl, if an integer overflows, it is automatically convert to a
float; no muss, no fuss. The problem you would have with both is
dropping of insignificant digits. Consider the following:

#!/usr/bin/perl

use strict;
use warnings;

my $n = 1e138;
my $m = $n + 1;
my $diff = $m - $n;

print "diff = $diff\n";

__END__

Note the output is zero. This is because 1 is too insignificant to
effect a number as big as 1e138. There is a whole field of mathematics
called Numerical Analysis devoted to dealing with such problems.

But you are talking really big numbers here; you are more likely to
experience memory overflow than this.

> 
> Of course since OP didn't give any implementation details, it's
> possible that the list could grow even larger between calls to
> get_list(). Maybe the user will only want the list of 200 recent items
> after an average of 1,000,000 elements are added. Then, sure, you'll
> want to take a different approach. If that's the case, I'd recommend
> getting a copy of _Mastering Algorithms With Perl_, and reading up on
> linked lists and b-trees.
> 

The solution I proposed was a double-linked list with a hash to every
element. The hash is used to remove elements from the middle of the list.

B-trees are only effective when used on disk. Compared to other
algorithms you can use, b-trees are slow when used in RAM only. Even
binary b-trees are slower than other algorithms you can use to balance a
binary tree.


> In my experience, though, it's standard to size the list slightly
> larger than the number elements you expect to add between accesses.
> For most UI situations, the working definition of "recent items" is
> "the most items we think a normal user will access between views of
> 'recent items.'" The value of a recent items list decreases rapidly
> when users use more items in a session than the list can hold, and
> demonstrably recent items get shifted off the bottom.
> 

In Perl, you don't have to pre-allocate memory; again you are thinking
too C-ish.

I think that keeping a recent-items list of 200 to be silly. A normal
user will not look beyond the first five (but will complaint unless the
list holds at least ten items).


-- 
__END__

Just my 0.00000002 million dollars worth,
   --- Shawn

"For the things we have to learn before we can do them, we learn by
doing them."
  Aristotle

* Perl tutorials at http://perlmonks.org/?node=Tutorials
* A searchable perldoc is at http://perldoc.perl.org/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: efficiently keeping a short list

Reply via email to