(moved off of -meta)

"David L. Nicol" wrote:
> 
> Steve Fink wrote (and I edited slightly):
> 
> > <groan>  I can't figure out why so many people misinterpret my RFC12
> > as requiring a solution to the halting problem.
> 
> a large class of incompletely expressed
> suggestions appear to get grouped into
> 
> "This requires solving the halting problem!"
> 
> without providing further explanation.
> 
> Looking at your RFC12, I imagine the problem is, that it is impossible
> to know at compile time if a variable is going to get used or not.  When
> exactly is the warning supposed to be generated?

Compile time. It's pretty much the same as your taint checking. An
assignment to a variable means that it is initialized immediately after
that statement, as well as at every other location in the code that will
only get executed if that assignment will have been executed first. So
certainly everything following that assignment in the same control
scope. I avoid the halting problem by never requiring to know whether a
variable _will_ be used, only whether it _can_ be. Under pessimistic
assumptions like "any boolean expression can be either true or false in
a valid run of a program". Which had better be true! In the case of
taintedness, it's those conservative assumptions that require some of
the work to be done at runtime.

It's standard semantic analysis. Both your taintedness analysis and my
reachability analyses can be fully described by specifying what things
generate the characteristic you're analyzing, what things block (in the
literature, "kill") it, and the transfer rules. It's often not the best
way of implementing it, since the fully general framework can be pretty
slow, but it's a concise way of describing things that you'll find in
many textbooks. They talk in terms of maintaining GEN and KILL sets and
describing when to add, subtract, union, and intersect the incoming &
outgoing GENs and KILLs.

Yours: taint-vulnerable, tainted
Mine: initialized, undefined, maybe-undefined, reachable,
maybe-reachable

The implementations of both our RFCs would probably share a lot of code.
Some sort of framework would be useful. And these sorts of things are
absolutely necessary for nontrivial optimizations, too.

Fortunately, both our RFCs are also ones that don't really _change_ the
language; they have to be either be implemented in the core or just have
deep access to the op/parse tree, but they can always be added later.
Well, barring really wacky ideas. (use weirdness
qw(all-lexicals-are-now-global-ha-ha))

Reply via email to