tree frobbing facilities in Perl6?

2002-12-24 Thread Rich Morin
I find myself frobbing trees a lot these days: read in some XML,
wander around in tree-land for a while, then output either more XML
or somesuch.  And, quite frankly, it's a bit of a pain.

The issue, as I see it, is that Perl has no power tools for dealing
with trees.  I will admit that I don't know what these should look
like, but if Perl has them, it's news to me.  Here's an example:

Let's say that I've got a daemon which is running ps(1) on a regular
basis and logging the results.  A brute force approach would be to
save the raw ASCII output, but these days I'm trying to use XML.  So,
I write out the output as (informal) XML:

  log
ps time=123456789
  process
pid123/
pcpu4.6/
statSN+/
...
  /process
/ps
...
  /log

A bit bulky, bit nicely tagged and serialized.  Now, I want to do
something with it.  OK, the first thing I do is read it in as a tree.
I use my own SAX handler, because I want a pure Perl way to load in
a tree, preserving order.  It loads in something like this:

  [ 'log', {},
[ 'ps', { time = 123456789 },
  [ 'process', {},
[ 'pid',  {}, '123' ],
[ 'pcpu', {}, '4.6' ],
[ 'stat', {}, 'SN+' ],
...
  ],
],
...
  ]

The problem is that, although the data structure I've loaded in is a
tree, I generally want to use it as something else.  For example, let's
say that I want to boil down these log files a bit.  This means I
have to pick up the static values (e.g., pid), tally the distribution
of the flag values (e.g., stat), and average the numeric snapshots, as:

  foreach $time (sort(keys(%ps))) {
$pid  =  $ps{$time}{pid} unless defined ($pid);
$pcpu += $ps{$time}{pcpu};
$stat{$ps{$time}{stat}}++;
...
  }

My approach to this, currently, is to walk the tree, creating the data
structure I'd _like_ to have, before I try to do the actual work.  This
isn't TOO painful, but it isn't the sort of DWIMitude I'd like to see.

More to the point, let's say that I simply want to transform the data
into a different order.  In a multiply subscripted array, this is just
a matter of swapping subscripts on the output loop(s).  Turning the tree
above into something like:

  process pid=123
time123456789,.../
pcpu4.6,.../
statSN+,.../
  /process

is not something I want to try in XSLT.  I can do it in Perl, of course,
but I end up writing a lot of code.  Am I missing something?  And, to
bring the posting back on topic, will Perl6 bring anything new to the
campfire?

-r
--
email: [EMAIL PROTECTED]; phone: +1 650-873-7841
http://www.cfcl.com/rdm- my home page, resume, etc.
http://www.cfcl.com/Meta   - The FreeBSD Browser, Meta Project, etc.
http://www.ptf.com/dossier - Prime Time Freeware's DOSSIER series
http://www.ptf.com/tdc - Prime Time Freeware's Darwin Collection



Re: tree frobbing facilities in Perl6?

2002-12-24 Thread Michael G Schwern
I'm going to take a left turn in replying and say that your approach to the
problem is causing the problem.  This is diverging from the question of tree
manipulation, but I don't think that's what you really need.

Anyhow, on with the show...


On Tue, Dec 24, 2002 at 12:02:09AM -0800, Rich Morin wrote:
 Let's say that I've got a daemon which is running ps(1) on a regular
 basis and logging the results.  A brute force approach would be to
 save the raw ASCII output, but these days I'm trying to use XML.  So,
 I write out the output as (informal) XML:
 
   log
 ps time=123456789
   process
 pid123/
 pcpu4.6/
 statSN+/
 ...
   /process
 /ps
 ...
   /log

So with simple data like this, I'd just use YAML.  This isn't really
important, just a YAML plug. :)  But it does have a better resulting data
structure as we'll see below.

  - time: 123456789
processes:
  - pid:  123
pcpu: 4.6
stat: SN+
  - pid:  234
pcpu: 2.3
stat: R
  - time: 234567890
processes:
  - pid:  123
pcpu: 2.4
stat: R
  - pid:  456
pcpu: 3.4
stat: SN

(I've eliminated the redundant log and ps parts)


 A bit bulky, bit nicely tagged and serialized.  Now, I want to do
 something with it.  OK, the first thing I do is read it in as a tree.
 I use my own SAX handler, because I want a pure Perl way to load in
 a tree, preserving order.  It loads in something like this:
 
   [ 'log', {},
 [ 'ps', { time = 123456789 },
   [ 'process', {},
 [ 'pid',  {}, '123' ],
 [ 'pcpu', {}, '4.6' ],
 [ 'stat', {}, 'SN+' ],
 ...
   ],
 ],
 ...
   ]
 
 The problem is that, although the data structure I've loaded in is a
 tree, I generally want to use it as something else.

And there's your problem.  The data struture you've created above is not
really a comfortable one in Perl.  You're trying to create a Tree-like
structure using array references as nodes.  This is awkward.  Instead, use
hashes.  Here's how YAML dumps the structure:

my @ps_snapshots = [
  {
'processes' = [
  {
'stat' = 'SN+',
'pcpu' = '4.6',
'pid' = '123'
  },
  {
'stat' = 'R',
'pcpu' = '2.3',
'pid' = '234'
  }
],
'time' = '123456789'
  },
  {
'processes' = [
  {
'stat' = 'R',
'pcpu' = '2.4',
'pid' = '123'
  },
  {
'stat' = 'SN',
'pcpu' = '3.4',
'pid' = '456'
  }
],
'time' = '234567890'
  }
]

Since YAML itself is made up of hashes and arrays, it maps very well into
Perl.  The XML tree structure comes off awkward because Perl has no native
tree handling.

At this point you've got a fairly straightforward hash of list style
structure rather than the oddly put together set of array refs as tree
nodes.


 For example, let's
 say that I want to boil down these log files a bit.  This means I
 have to pick up the static values (e.g., pid), tally the distribution
 of the flag values (e.g., stat), and average the numeric snapshots, as:
 
   foreach $time (sort(keys(%ps))) {
 $pid  =  $ps{$time}{pid} unless defined ($pid);
 $pcpu += $ps{$time}{pcpu};
 $stat{$ps{$time}{stat}}++;
 ...
   }

I'm not sure I follow the code above, but I'll do something similar.  I'll
tally up all the flag values.

for @ps_snapshots - $snap {
for @$snap{processes} - $process {
%stats{$proc{stat}}++;
}
}

 My approach to this, currently, is to walk the tree, creating the data
 structure I'd _like_ to have, before I try to do the actual work.  This
 isn't TOO painful, but it isn't the sort of DWIMitude I'd like to see.

Basically, we're just manipulating a straight-forward list of hashes of
lists.  The already naturally formatted structure by YAML avoids the
necessity to create the intermediate structure.  Despite my use of Perl 6,
you can do the same in Perl 5.

That sort of look I've written above can probably better be done using
hyper-operators, but I'll let someone else take a stab at that.  I'm also
not sure what the slicing syntax is, so I made something up.


 More to the point, let's say that I simply want to transform the data
 into a different order.  In a multiply subscripted array, this is just
 a matter of swapping subscripts on the output loop(s).  Turning the tree
 above into something like:
 
   process pid=123
 time123456789,.../
 pcpu4.6,.../
 statSN+,.../
   /process

Sort of an odd structure, but ok.  Here's how I'd flip around the YAML
structure (again with the caveat about hyperoperators).

for @ps_shapshots - $snapshot {
my $time = $snapshot{time};

for @$snapshot{processes} - $proc {
my $pid = $proc{pid};
push @%procs{$pid}{time}, $time;

for qw(stat pcpu pid) - $key {
 

This week's Perl 6 summary

2002-12-24 Thread Piers Cawley
The Perl Summary for the week ending 20021222
Hello, good morning and welcome to the Christmas edition of the Perl 6
summary. For some reason I have convinced myself to sit here on
Christmas Eve writing a summary for all you crazy kids out there who
hang on my every word. Plus, it beats wrapping all the presents and last
minute panic shopping.

So, let's get perl6-internals out of the way first.

  The Road to 0.0.9
The first half of the week saw a feature freeze in the run up to the
release of Parrot 0.0.9, so people spent their time trying to track down
and fix various tinderbox issues and other bugs.

Steve Fink worked on trying to get the NCI (Native Call Interface) tests
to work properly.

Simon Glover and Leo Tötsch worked on tracking down a GC bug that was
causing problems for the scratchpad tests.

Andy Dougherty is having problems getting languages/perl6 to pass its
tests. Apparently part of the problem is that the undef function isn't
fully defined.

Andy also found problems with sprintf and 64 bit INTVALs (fixed by Brent
Dax), PMCs and 64 bit INTVALs (fixed by Leo Tötsch), PerlHashes and
gcc-2.95.3 and 2.8.1 on Solaris (confirmed as a problem with other
versions of gcc on Solaris by Joshua Hoblitt), dependency issues between
Jako and IMCC from a clean directory and problems with the Jako life
implementation.

Bruce Gray sent a pile of fixes for Win32 systems, covering GC and build
problems.

  Compiling to ParrotVM
Klaas-Jan Stol is thinking of writing a compiler that targets Parrot for
his Bachelor's in Computer Science, probably a TCL compiler, and he
asked for suggestions and tips.

David Robins made a few suggestions and pointed out that parrot is a
moving target. Dan protested that it wasn't moving that much (If I
'adn't nailed it to the perch, it'd've muscled up to them bars and...
VOOM!) and said that he thought a TCL to Parrot compiler would be
great. Will Coleda put up a URL for his first pass at such a beast and
asked that we be gentle with him (he put up a URL for his second pass
later, which is the link below). Gopal V pointed out that IMCC may be a
better target than Parrot assembly as that took care of register
allocation and generally helped programmers retain their hair and also
suggested that, if the compiler was written in C then DotGNU's TreeCC
would be worth looking at. Tanton Gibbs, who is working on a C++
compiler agreed that TreeCC is 'an extremely nice system' that he
recommended highly.

http://makeashorterlink.com/?T27042FD2

http://www.coleda.com/users/coke/parrot/

http://makeashorterlink.com/?H2CF62ED2

  Register scanning
Apologizing for reopening the register scanning can of worms, Steve Fink
wondered about the requirement that all Parrot GC implementations scan
all hardware registers for live pointers. Apparently this is a real
problem with, for example, the IA64 architecture. He proposed that
configure probe for systems that would support register scanning GC, but
that the default implementation should use a 'registration' system. He
followed this up with a 'naive' implementation of such a system. Jason
Gloudon suggested another scheme that I'm afraid I didn't understand to
implement 'accurate' GC.

http://makeashorterlink.com/?L2DF41ED2

http://makeashorterlink.com/?T1EF22ED2

  Returning a new PMC from ops
David Robins wondered about the cleanest way to return a new PMC from an
op. He and Leo Tötsch thrashed it out.

http://makeashorterlink.com/?K1FF62ED2

  Parrot v0.0.9 Nazgul released
Steve Fink announced the release of Parrot version 0.0.9, aka Nazgul
complete with a long list of new features, and the usual call for
further assistance. Well done everyone. As Steve says, Parrot is getting
dangerously close to being really usable...

http://makeashorterlink.com/?X10022FD2

http://makeashorterlink.com/?C51024FD2

Meanwhile, in perl6-language
It was quiet... too quiet. Only 48 messages in perl6-language, maybe
we're all keeping quiet so as not to distract Larry from writing the
next Apocalypse.

  Comparing Object Identity
This thread (along with every other thread in the language list this
week) continued from last week. Dan pointed out that using long lived
object IDs (ie. unique for all time) would be expensive, and reckoned
that the basic approach should be fast and good enough for the common
case. Piers Cawley wondered if doing object 'identity' comparison with a
method (eg: $obj.is($other_obj);) wasn't actually the best way
forward. (Piers had been applying his OO rule of thumb -- if you're not
sure of how to do something, take a look at a Smalltalk image). Dave
Whipp proposed an adverb syntax ($a eq : ID $b) which would be
generalizable:

   $a eq:i 

Re: tree frobbing facilities in Perl6?

2002-12-24 Thread Simon Cozens
[EMAIL PROTECTED] (Rich Morin) writes:
 I find myself frobbing trees a lot these days

So that's where the ents came from.

-- 
Within a computer, natural language is unnatural.



Re: This week's Perl 6 summary

2002-12-24 Thread David Wheeler
On Tuesday, December 24, 2002, at 02:55  AM, Piers Cawley wrote:


Apparently part of the problem is that the undef function isn't
fully defined.


Well, isn't that sort-of the point?

:-)

David

--
David Wheeler AIM: dwTheory
[EMAIL PROTECTED] ICQ: 15726394
http://david.wheeler.net/  Yahoo!: dew7e
   Jabber: [EMAIL PROTECTED]




Re: tree frobbing facilities in Perl6?

2002-12-24 Thread Dave Whipp
Rich Morin wrote:

is not something I want to try in XSLT.  I can do it in Perl, of course,
but I end up writing a lot of code.  Am I missing something?  And, to
bring the posting back on topic, will Perl6 bring anything new to the
campfire?


I think that one of the things that Perl6 will bring is continuations. 
This will enable you to treat a tree traversal in the same way as any 
other list.

For example:

  for $tree.depth_first_traversal(process) - $node
  {
...
  }

There would be no need to obscure the client-code with the details of 
hierarchical navigation. (Question: can I use Cyield inside a 
recursive implementation of the iterator?)


Dave.