Re: PerMsgStatus

Justin Mason Thu, 28 Jul 2005 19:14:11 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Loren Wilton writes:
> I just spent 45 minutes or so staring at the PerMsgStatus code and figuring
> out a bit more about how it works.  Baroque!  Still, there is the basis of a
> concept underlying the implementation, and it doesn't *look* like it would
> be all that hard to flop things around to work more the way I think they
> should.
> 
> It looks like the main things that aren't obvious and I'll need to figure
> out something about are:
> 
> a) what the heck are priorities, who sets them, and do they really have any
> justifiable purpose?  Ie: can they just quietly vanish into the night with
> nobody being any the wiser?

They order the rules -- or more correctly, sets of rules.

Most rules are priority 500 (iirc), but some need to run earlier and some
need to run later (e.g. AWL needs to run after all other rules).  Running
rules earlier is how we propose to implement early-exit -- certain rules
can run before all others, and cause an early-exit if they fire.

They cannot just vanish. ;)

> b) why were tests broken out into groups by test type and all the tests of a
> given type run at once?  My best guess was an attempt at efficiency based on
> assumptions about data set size and cache threashing.  Is there a known
> reason that it has to be this way, or would it work just as well to just run
> tests in 'whatever' order?

two reasons:

1. reducing the number of items in a hash is good for efficiency, as it
reduces hash collisions.

2. running all tests of a certain type in one block allows some
optimisations; e.g. for the body rules, we can iterate through all lines
in the body, and for each line, call all of the active body rules of that
priority level one after another.  (I'm not sure if we still do this
or not btw.)

it may work better to run in "whatever" order -- benchmarks are the one
true authority here ;)

> c) are there known ways in Perl to actually dispose of memory items and have
> them really return memory to the available pool, or do you just hope that
> exiting scope and garbage collection may eventually do the job for you?

  {
    my $obj = [ ...something...];
  } # $obj has gone out of scope.  GC happens now and $obj is deleted

in other words, once it goes out of scope, it is immediately GC'd.  it's
not like java, where it may be gc'd if you're lucky, the moon is full,
and you call System.gc() three times in a row.  (java programmers will
know I'm not joking about that ;)

If it's a member of a hash like $self, "delete $self->{variable}" is
how you force it to be deleted.  If something else has a ref to it,
it won't be deleted, of course -- everything's ref counted.

> d) can you build an array/list/hash/whatever of procedure names/pointers and
> efficiently iterate over the structure calling the procedures in sequence?
> Will this be slower than generating an eval containing a bunch of lines
> calling the same procedures in sequence?

You can indeed --

    foreach my $fn_ref (@array) {
      $fn_ref->(...arguments...);
    }

or even

    map { $_->(....arguments...); } @array;

But -- the bad news -- it will almost definitely be slower.
The only way to find out is with benchmarks.  the 'Benchmark'
CPAN module can be very useful to measure that stuff.

> Do you have any insightful (or alternately: quick) answers to the above
> questions?  I have a feeling that while I could make some deductions on the
> first two questions from tracking stuff into other modules, the *real*
> reasons are probably lost or stored in the group arcana of the dev's minds.
> 
> It seems to me as a first whack, there isn't any huge reason that the rules
> couldn't be looked at en masse and a quick dependency tree built, then the
> results sorted in some convenient score-and-whatnot-based order, and then,
> instead of half a dozen essentially identical rule building procedures that
> exist now, just have one procedure that will make the test and calling
> procedures come into existance.

One thing -- watch out for $score == 0.  If a score is set to 0, the
evaluation of the rule's code (be that an eval test, a regexp or whatever)
should not happen; and rules can have their scores set to 0 in user prefs,
so assuming that because a rule is 0 in the system-wide config, it'll
never be run from then on, is not a safe assumption.

> I'm not even sure that you really need to pass much more than @self to the
> procedures, and let them find the data they want to play with as member
> variables on @self with known names.

That's entirely true.

> (Although maybe Perl requires more
> parameters, I still don't understand things like @_ and the like.)

@_ is the parameter list.   btw accessing parameters passed to a function
directly as stuff in @_ is faster than assigning variable names to
them, in other words

  sub myfunction { 
    return $_[0] + $_[1];
  }

is faster than

  sub myfunction { 
    my ($foo,$bar) = @_;
    return $foo + $bar;
  }

- --j.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFC6ZBoMJF5cimLx9ARAqpFAJ9GG2CF7XFmVGlJLZ4teS+67bRbTACdESI5
8ZqOA7bn9Cv3yH/c59QqTLY=
=rhJf
-----END PGP SIGNATURE-----

Re: PerMsgStatus

Reply via email to