Re: breaking encapsulation on HTML::TreeBuilder|Element objects

Matt Sisk Sun, 02 Apr 2000 13:48:44 -0700
"Sean M. Burke" wrote:
> I've been thinking a great deal about this lately, because it's such a
> problem case and therefore worthy of consideration.  And what I've come up
> with is this:
> I consider the semantics of traversal quite straightforward, and so related
> to the return value of $element->content_list (and internally,
> @{$element->{'_content'} || []} that there is no room for traversal not
> seeing some objects that address or content_list can see, or vice versa.
> To do otherwise would be simply inconsistent.  So if a node is to be really
> masked (i.e., unvisitable by traversal), then it shouldn't be in the tree
> -- that is, it shouldn't be in the content list of any element that's in
> the tree.
> But there's an unending amount of potential space for storing things off
> the side of the tree, by having them be in attributes of element in the
> tree (i.e., besides _content, _parent, and _tag).

I agree with this in principle. I've been trying to remember what
problems I encountered when I originally wrote the code, because at the
time it really bugged me that I had to mess with traverse(). As far as I
can remember, the basic problems were a) removing cells would trash the
positional coordinate scheme (since each element reports it's own
coord), and b) in an admittedly obscure circumstance, I wanted
operations on globs, such as an entire column, to still affect the cells
in question, whether they were masked or not.

The first problem would be solved with a "null" element, like you
mentioned in your note. Ie, a placeholder that would not show up in the
output but would "count" as far as coordinates are concerned. (I would
think that null text entries would work in the content array for this
purpose, but I have not analyzed whether any of your optimization code
would undermine this).

The second problem was only a problem with some of my earlier versions
of the element glob. I believe with the newere incarnation, there is no
conflict if the "removed" elments remain in their respective globs, and
any new globs that get formed check the hidden cache as well.

> Now, I'll not advise you on rewriting your code, because you know better
> than I really what your code is for and how it does it.  But for the sake
> of others on the list, I'll illustrate my point:
> One can manage this problem of table cell addressing by storing a map to
> the correct elements in an attribute of the table.  I.e., you'd give teach
> 'table' element a list-of-lists that maps from grid coordinates to what
> element occupies/occults that coordinate.  It'd start out:
>   $t = HTML::Element->new('table');
>   { #make and install a blank map
>    my @map = map [undef x $width], 1 .. $height;
>    $t->attr('_coordmap', \@map);
>   };
> then you're free to populate the _coordmap as needed.  If (2,3) were
> occulted by having some other cell's colspan or rowspan be >1, this could
> be signalled either by having $t->attr('_coordmap')->[2][3] be undef (as
> opposed to pointing to the element, which also occurs as a descendant of
> $t), or by having it to point to the element that occults that coordinate
> (an element that occurs only once in $t->descendants, of course, since no
> node can recur in the same tree, but which is free to occur many times in
> the _coordmap).

This was exactly what I did in my earliest versions of the module. Then
I decided to add the coordinate reporting to the ElementSupers, and then
it occurred to me "hey! why track these grid coordinates when the
elements themselves can tell me where they live!"...and then I ended up
with today's version. With, of course, the added complication of dealing
with rowspan and colspan issues correctly. I still feel that the basic
approach of letting the cells track their own coordinates is the
"elegant" way to do it.

> If code exists outside of an object's class/suite, and that code directly
> reads or writes attributes of the object instead of going thru accessors,
> that's breaking encapsulation.  If the documentation to that class says
> "but don't use this feature to try to modify object data" (as, for example,
> I say in the docs to HTML::Element's content() method), and code outside of
> that class/suite does, then that code is breaking encapsulation.  But
> overriding a base method in a subclass (which is what you're doing) isn't
> breaking encapsulation at all.

Understood -- but it's exactly with the content issues where I am
currently breaking encapsulation, in order to hide the content from the
traverse() method, which accesses the data structures directly rather
than via the content* methods.

While we're on the topic, here's a counter example...let's say you go
through with the changes you've been considering (ie, the elements
objects use arrays rather than hashes). Now, I'm not sure why you'd want
to, but let's say you try to put one of these new HTML::Element objects
into an existing structure -- it'll break, because traverse() (in this
case, the "old" traverse) is accessing the data structures directly. 
Granted, that scenario is pretty darn unlikely, it just bugs my
theoretical imp...it's not "wrong", it just seems inconsistent with (as
you state below) "a node in a tree is a node in a tree".

> So, these are the lessons of this story:
> 
> * A node in a tree is a node in a tree, and should be consistently
> addressable via the normal structure-related methods.  But if its as_HTML
> realization is '', that's fine; it's just that kind of node.
> And to that end, I think I'll add a new pseudo-element named "~null" or
> something, so that any element whose tag name is "~null" will be
> special-cased so starttag and endtag will both return ''.  The element
> would still be in the tree, tho, and so would still be visible to
> traversal, and in $parent->content_list, etc.  But if you want to make an
> existing HTML node invisible, then you'd do something like:
>  $it->attr('_real_tag', $it->attr('_tag', '~null'));
> and to make it visible again:
>  $it->attr('_tag', $it->attr('_real_tag', undef));
> (recall that the return value of a PUT call to attr() is the old value; a
> handy feature at times!)
> Does this ~null idea sound good to everyone?

This sounds good to me in general. In cases where the ~null element has
content, would you prune by default, or expect the whole sub tree to
have ~null tags? (that is, assuming you wanted to hide the whole sub
tree, as I tend to do)

> * Unless your last name is Aas or Burke, you don't get to write code that
> says $element->{'stuff'} = $whatever.

Not a problem...believe me, the times that I have done this are always a
last resort (until I come up with a better way, anyway). And I'm
assuming that you're not referring to additional fields added in a sub
class ;-). I'm enough of a theory-bug to loathe breaking my idea of
encapsulation.

> * But overriding methods is generally fine.  Of course, sanity would
> require that you not override a method with, say:
>   sub content_list { randomly_shuffle shift->SUPER::content_list() }
> or something that deliberately contradicts the superclass's method's
> documented behavior.  That would be basically nutty, but not a violation of
> encapsulation.  If you worried that your overriding of starttag et all is
> nutty (and nuttiness /is/ a bit subjective), don't worry, it's quite alright.
> 
> In fact, a careful reading of the new (1.53) HTML::Element's source will
> reveal my experimental and as-yet undocumented attempts to allow for
> comments, directives, and other fun in a tree.  The way I'm doing this (or
> was in the middle of doing this when I emitted the new Element) is to
> modify starttag and endtag (not yet there!) so that a comment doen't look
> like <~comment>...</~comment> but instead like <!-- ... -->.  As this is
> still a work in progress; since it's undocumented (like the ~null idea I
> just outlined above), it's not gospel yet; but the basic idea (modding
> starttag and endtag -- and not traverse!) is solid.

Yeah, I've overridden these methods for other purposes as
well...probably in some cases pushing even your idea of "nutty". The
first time I did it was when I was trying to get '&nbsp;' into the
content of an element and couldn't figure out a way to do it without
HTML::Entities encoding it into literal text...there's probably quite an
easy way to do it, but in the end I just put it into the starttag() when
needed. In another, related example, I wanted a quick way to graft raw
HTML text onto an element tree without having to encode it first, so I
ended up embedding the raw HTML into the starttag. A quick fix that made
me cringe.

Matt Sisk
Re: breaking encapsulation on HTML::TreeBuilder|Element objects

Reply via email to