Re: XML::Reader

Klaus Tue, 11 May 2010 16:13:36 -0700

On Tue, May 11, 2010 at 4:14 PM, Dana Hudes <dhu...@hudes.org> wrote:
> Klaus,
> Thanks for contributing to CPAN


Dana,
Thanks for your message

> From: Klaus <klau...@gmail.com>
> Date: Tue, 11 May 2010 10:10:33 +0200
> > I would position XML::Reader in the same space as XML::Twig
> > and XML::TokeParser.
> >
> > I have taken a look at XML::Twig which has an established
> > userbase and I agree in that XML::Reader duplicates many of
> > the functionalities already provided by XML::Twig.
> >
> > However, unlike XML::Twig, XML::Reader does not rely on
> > callback functions to parse the XML. With XML::Reader you
> > loop over the XML-document yourself and the resulting XML-
> > elements (and/or XML-subtrees) are represented in text format.
> > This style of processing XML is similar to the classic pattern:
> >
> > "open my $fh, '<', 'file.txt'; while (<$fh>) { do_sth($_); } close $fh;"
> >
> > This pattern is also implemented by XML::TokeParser. However,
> > unlike XML::TokeParser, XML::Reader records the full XML path
> > as it processes the XML-document, therefore it can target not
> > only specific tags, but it can also target a full path of nested
> > element tags (a simplified XPath like expression).
> >
> > I would say that XML::Reader fills an ecological niche that is
> > neither filled by XML::Twig, nor by XML::TokeParser.

> On the question of callbacks: This is Perl, there's more than way
> to do it (whatever 'it' is ). That said the use of callbacks is very
> Perlish. Indeed Perl itself uses callbacks in it's builtins: look at
> sort(), the comparison function is a callback.

I agree, callbacks are Perlish. However, what my module wants to
achieve is not necessarily being Perlish, but most importantly it
wants to provide an alternative way of processing XML. (as you
said, "there is more than one way to do it").

> It is also true that while callbacks provide a clean interface
> (instead of overloading a generic object method, for example) you
> are adding another function call to processing. But instead of
> having a clean function call you have substituted calling an
> iterator from your object. This is actually more costly.

Also agreed.

In XML::Reader, I have added layers of abstraction to provide the
alternative interface. This is costly in terms of CPU. I see
this cost as consequence of the design. Having said this, I am
always looking for improvements to the performance of XML::Reader.

While we are talking about performance, we should also mention
memory consumption. XML::Reader uses pure text representation
to produce XML subtrees which is very memory efficient. If we
compare this to similar modules that use the DOM-approach to
represent XML-subtrees, we find that memory consumption is
better with XML::Reader.

> Look at XML::Simple. You get back a data structure and deal with
> it yourself using native Perl iteration (foreach). You can then do
> whatever you want: write to a file, find the piece you want or do
> some tranformation and turn it back into XML.

Coincidently, XML::Simple is a good example of how XML::Reader
cooperates nicely with existing modules.

The following example assumes that we are dealing with a huge
XML-file that does not fit entirely into memory for XML::Simple.
Therefore the strategy is to extract smaller sized sub-trees from
the XML file, each sub-tree fits into memory and can be
processed by XML::Simple.

As I already posted on comp.lang.perl.misc

subject "Get XML content using XML::Twig"
http://groups.google.com/group/comp.lang.perl.misc/msg/8ec3a393e37ae8f4

> [...]
> As I said before, take the advice of Tad McClellan and John
> Bokma first. If, for whatever reason, you can't follow their
> advice, (and, for whatever reason, you can't use XML::Twig
> either) there is always my "shameless plug" XML::Reader.
> [...]

later post, same subject "Get XML content using XML::Twig"
http://groups.google.com/group/comp.lang.perl.misc/msg/390696dd67c3939d

> This new version allows to write the same program (...the
> program that uses XML::Reader to capture sub-trees from
> a potentially very big XML file into a buffer and pass that
> buffer to XML::Simple...) even shorter:
>
> use strict;
> use warnings;
> use XML::Reader 0.34;
>
> use XML::Simple;
> use Data::Dumper;
>
> my $rdr = XML::Reader->newhd(\*DATA, {filter => 5},
>    { root => '/Data/ConnectionList/Connection/FileItemList
> /FileItem/FileType', branch => '*' },
>  );
> while ($rdr->iterate) {
>    my $buffer = $rdr->rval;
>    my $ref = XMLin($buffer);
>    print Dumper($ref), "\n\n";
> }

Re: XML::Reader

Reply via email to