Sean M. Burke wrote:
> I've been puzzling around with the idea of changing the underlying type of
> HTML::Element objects to be arrays.  True, hashes seem the obvious choice
> for representing elements (since an element's attributes are a key=value
> mapping), and hashes are nice and fast.  But since the average element has
> no external attributes at all, and it's almost unheard-of for an element to
> have more than three external attribues, the additional memory overhead of
> having a hash (instead of an array) for every element in a potentially very
> large parse tree is pretty significant.
> 
> The most radical version of this idea would be to change from this
> representation:
>   {'_tag' => 'foo',
>    '_parent' => some_node,
>    '_content' => [node2, node3],
>    'id' = 'stuff',
>   }
> to something like:
>   [
>    'foo',      # 0: always for the tag name
>    some_node,  # 1: always for the parent node
>    5           # 2: index of the start of contents
>    'id',       # 3 to $self->[2]-1: attribute keys and values
>      'stuff',
>    node2,      # $self->[2] - $#$self : contents (children)
>    node3,
>   ]

  I must admit that I have broken encapsulation to look at
  particular elements, though I'd have to revisit the code to
  determine whether it was required or just lazy.
  
  With regard to the above, it seems to go from simple and
  understandable to unnecesarily complex. Eschewing hashes means
  locating specific tags becomes an order-n process instead of
  order-1. What would the runtime effect be on those
  routines that find every image referenced in a page...
  
  What exactly is the memory tradeoff?

   -Norton Allen

Reply via email to