Re: breaking encapsulation on HTML::TreeBuilder|Element objects

Matt Sisk Wed, 29 Mar 2000 22:39:38 -0800
Hi Sean,

I finally got the time to check out the recent enhancements to
HTML::Element. (this post is a bit long, so if you want to skip to the
real nitty-gritty, scroll down to the traverse() discussion where I talk
about why I have to break encapsulation, and a possible solution).

As we've discussed in the past, much of what I do in
HTML-ElementExtended required me to break encapsulation due to the
original HTML::Element having limited methods. Most of these cases have
disappeared with your enhancements, but not all.

(For the sake of the list, I sub-classed HTML::Element to perform
various functions I found necessary for the HTML::ElementTable
class...these sub classes are in the HTML-Element-Extended bundle)

In one case, that of HTML::ElementGlob, breaking encapsulation is no
longer necessary due to your enhancements. I do believe it qualifies as
"sub-class" of HTML::Element since it behaves like an HTML::Element
through a HAS-A relationship.

In another case, that of HTML::ElementSuper, it is not as clear cut.
Some of the extended methods I added are now unnecessary -- in
particular, positional coordinates, cloning, and some
content-manipulation methods (such as the replace* methods).

BTW, you suggest in your comments that you don't expect the positional
coordinate methods to be used for much other than debugging. This is not
so! This is the technique by which HTML::ElementTable cells report their
grid coordinates for nearly all of the manipulator methods in the table
class! In the particular case of tables, these coordinates make a lot of
intuitive sense to us humans. I can't say the same for the more common
case of Nth-dimensional coordinates. Anyway, I will be migrating to your
native methods for positional stuff since they appear to be more
efficient than my implementation.

Where I have had the most trouble with the new additions (and breaking
encapsulation, etc) is in the combination of traverse() based methods
and my maskable elements. Masking is an ability that seems sort of
obscure for most any use other than table manipulation. In my table
trees, when a cell is extended via the colspan or rowspan attr, I
decided to simply *hide* the cells that this affected rather than
destroy them -- that way, if the colspan or rowspan is subsequently
changed I just reveal the element that is already there rather than
having to remember what it's configuration was before being hidden.

So how does masking take place? In the old HTML::Element, I simply
overrode traverse() since it was invoked on every element in the
structure. If an element was masked, I automatically pruned. Otherwise I
invoked SUPER::traverse(). This was simple, and elegant.

With your efficiency-minded changes, the new traverse() cannot be
effectively overridden in this case, because it is not called on a
per-element basis anymore. This is not a problem for a homogenous tree
structure. One of my design requirements for HTML::ElementTable,
however, is that it should be able to freely commingle with regular
HTML::Element based structures. The top-level traverse() in such a
structure has no idea about special requirements such as masking, and as
such, can no longer be overridden with guaranteed results. The same is
true of any traverse() based method such as as_HTML() -- the only
as_HTML() that gets invoked is the one in the first element.

I did come up with a way to get around this, but it required jumping
through bunches of hoops and I have to break encapsulation --
essentially, what I now have to do is make a masked node "play dead".
This is accomplished by overriding start_tag() and end_tag() to return
null values when masked. Since traverse() navigates internal content
structures directly (i.e., traverse() itself breaks encapsulation) I
have to *hide* the content of a masked node so that non-native
traverse() will not see it. There lies the encapsulation break -- in
order to do this efficiently I'm swapping _content array refs around.

So what to do? I've been thinking about it. I took a long look at your
traverse code, and in general I think it's good goal to make it
efficient, but not at the expense of breaking "expected" behavior when
you override the method. I think both can be accommodated.

If you want to keep traverse() in it's current form, I can think of a
couple of ways that would probably work, but at least one would add
complexity.

Solution a): Don't break encapsulation in traverse() -- have it use
content_list(), or even content(), rather than digging around in the
_content field of each element. This way, those content methods could be
overridden rather than traverse(). This does not solve the general
problem with traverse(), but it does solve the specific problem of
masking elements.

Solution b): You need to poke around at the object type of each element
you're visiting -- if it is different than the current element's type,
you need to invoke traverse() on the new element and integrate the
results into the original traverse() call. That seems like a nice
compromise between the efficiency and the override issues, but this adds
complexity.

You'll notice that I'm not submitting a patch for that second option.
;-)

Seriously, if you don't want to attack this, I'll give it a go and
submit a patch, but I'll have to find a little time first. Solution "a"
would be a quick workaround, but I'm still not entirely satisfied with
it because traverse() is still behaving like a class method rather than
an object method.

Thanks for picking up the ball on this module and making the recent
enhancements -- the original module certainly needed it.

Matt Sisk
[EMAIL PROTECTED]

P.S. Thanks for the hummus recipe! I actually *did* go and grab some
after I read your code -- your recipe made me start salivating.
Re: breaking encapsulation on HTML::TreeBuilder|Element objects

Reply via email to