tree frobbing facilities in Perl6?
I find myself frobbing trees a lot these days: read in some XML, wander around in tree-land for a while, then output either more XML or somesuch. And, quite frankly, it's a bit of a pain. The issue, as I see it, is that Perl has no power tools for dealing with trees. I will admit that I don't know what these should look like, but if Perl has them, it's news to me. Here's an example: Let's say that I've got a daemon which is running ps(1) on a regular basis and logging the results. A brute force approach would be to save the raw ASCII output, but these days I'm trying to use XML. So, I write out the output as (informal) XML: log ps time=123456789 process pid123/ pcpu4.6/ statSN+/ ... /process /ps ... /log A bit bulky, bit nicely tagged and serialized. Now, I want to do something with it. OK, the first thing I do is read it in as a tree. I use my own SAX handler, because I want a pure Perl way to load in a tree, preserving order. It loads in something like this: [ 'log', {}, [ 'ps', { time = 123456789 }, [ 'process', {}, [ 'pid', {}, '123' ], [ 'pcpu', {}, '4.6' ], [ 'stat', {}, 'SN+' ], ... ], ], ... ] The problem is that, although the data structure I've loaded in is a tree, I generally want to use it as something else. For example, let's say that I want to boil down these log files a bit. This means I have to pick up the static values (e.g., pid), tally the distribution of the flag values (e.g., stat), and average the numeric snapshots, as: foreach $time (sort(keys(%ps))) { $pid = $ps{$time}{pid} unless defined ($pid); $pcpu += $ps{$time}{pcpu}; $stat{$ps{$time}{stat}}++; ... } My approach to this, currently, is to walk the tree, creating the data structure I'd _like_ to have, before I try to do the actual work. This isn't TOO painful, but it isn't the sort of DWIMitude I'd like to see. More to the point, let's say that I simply want to transform the data into a different order. In a multiply subscripted array, this is just a matter of swapping subscripts on the output loop(s). Turning the tree above into something like: process pid=123 time123456789,.../ pcpu4.6,.../ statSN+,.../ /process is not something I want to try in XSLT. I can do it in Perl, of course, but I end up writing a lot of code. Am I missing something? And, to bring the posting back on topic, will Perl6 bring anything new to the campfire? -r -- email: [EMAIL PROTECTED]; phone: +1 650-873-7841 http://www.cfcl.com/rdm- my home page, resume, etc. http://www.cfcl.com/Meta - The FreeBSD Browser, Meta Project, etc. http://www.ptf.com/dossier - Prime Time Freeware's DOSSIER series http://www.ptf.com/tdc - Prime Time Freeware's Darwin Collection
Re: tree frobbing facilities in Perl6?
I'm going to take a left turn in replying and say that your approach to the problem is causing the problem. This is diverging from the question of tree manipulation, but I don't think that's what you really need. Anyhow, on with the show... On Tue, Dec 24, 2002 at 12:02:09AM -0800, Rich Morin wrote: Let's say that I've got a daemon which is running ps(1) on a regular basis and logging the results. A brute force approach would be to save the raw ASCII output, but these days I'm trying to use XML. So, I write out the output as (informal) XML: log ps time=123456789 process pid123/ pcpu4.6/ statSN+/ ... /process /ps ... /log So with simple data like this, I'd just use YAML. This isn't really important, just a YAML plug. :) But it does have a better resulting data structure as we'll see below. - time: 123456789 processes: - pid: 123 pcpu: 4.6 stat: SN+ - pid: 234 pcpu: 2.3 stat: R - time: 234567890 processes: - pid: 123 pcpu: 2.4 stat: R - pid: 456 pcpu: 3.4 stat: SN (I've eliminated the redundant log and ps parts) A bit bulky, bit nicely tagged and serialized. Now, I want to do something with it. OK, the first thing I do is read it in as a tree. I use my own SAX handler, because I want a pure Perl way to load in a tree, preserving order. It loads in something like this: [ 'log', {}, [ 'ps', { time = 123456789 }, [ 'process', {}, [ 'pid', {}, '123' ], [ 'pcpu', {}, '4.6' ], [ 'stat', {}, 'SN+' ], ... ], ], ... ] The problem is that, although the data structure I've loaded in is a tree, I generally want to use it as something else. And there's your problem. The data struture you've created above is not really a comfortable one in Perl. You're trying to create a Tree-like structure using array references as nodes. This is awkward. Instead, use hashes. Here's how YAML dumps the structure: my @ps_snapshots = [ { 'processes' = [ { 'stat' = 'SN+', 'pcpu' = '4.6', 'pid' = '123' }, { 'stat' = 'R', 'pcpu' = '2.3', 'pid' = '234' } ], 'time' = '123456789' }, { 'processes' = [ { 'stat' = 'R', 'pcpu' = '2.4', 'pid' = '123' }, { 'stat' = 'SN', 'pcpu' = '3.4', 'pid' = '456' } ], 'time' = '234567890' } ] Since YAML itself is made up of hashes and arrays, it maps very well into Perl. The XML tree structure comes off awkward because Perl has no native tree handling. At this point you've got a fairly straightforward hash of list style structure rather than the oddly put together set of array refs as tree nodes. For example, let's say that I want to boil down these log files a bit. This means I have to pick up the static values (e.g., pid), tally the distribution of the flag values (e.g., stat), and average the numeric snapshots, as: foreach $time (sort(keys(%ps))) { $pid = $ps{$time}{pid} unless defined ($pid); $pcpu += $ps{$time}{pcpu}; $stat{$ps{$time}{stat}}++; ... } I'm not sure I follow the code above, but I'll do something similar. I'll tally up all the flag values. for @ps_snapshots - $snap { for @$snap{processes} - $process { %stats{$proc{stat}}++; } } My approach to this, currently, is to walk the tree, creating the data structure I'd _like_ to have, before I try to do the actual work. This isn't TOO painful, but it isn't the sort of DWIMitude I'd like to see. Basically, we're just manipulating a straight-forward list of hashes of lists. The already naturally formatted structure by YAML avoids the necessity to create the intermediate structure. Despite my use of Perl 6, you can do the same in Perl 5. That sort of look I've written above can probably better be done using hyper-operators, but I'll let someone else take a stab at that. I'm also not sure what the slicing syntax is, so I made something up. More to the point, let's say that I simply want to transform the data into a different order. In a multiply subscripted array, this is just a matter of swapping subscripts on the output loop(s). Turning the tree above into something like: process pid=123 time123456789,.../ pcpu4.6,.../ statSN+,.../ /process Sort of an odd structure, but ok. Here's how I'd flip around the YAML structure (again with the caveat about hyperoperators). for @ps_shapshots - $snapshot { my $time = $snapshot{time}; for @$snapshot{processes} - $proc { my $pid = $proc{pid}; push @%procs{$pid}{time}, $time; for qw(stat pcpu pid) - $key {
This week's Perl 6 summary
The Perl Summary for the week ending 20021222 Hello, good morning and welcome to the Christmas edition of the Perl 6 summary. For some reason I have convinced myself to sit here on Christmas Eve writing a summary for all you crazy kids out there who hang on my every word. Plus, it beats wrapping all the presents and last minute panic shopping. So, let's get perl6-internals out of the way first. The Road to 0.0.9 The first half of the week saw a feature freeze in the run up to the release of Parrot 0.0.9, so people spent their time trying to track down and fix various tinderbox issues and other bugs. Steve Fink worked on trying to get the NCI (Native Call Interface) tests to work properly. Simon Glover and Leo Tötsch worked on tracking down a GC bug that was causing problems for the scratchpad tests. Andy Dougherty is having problems getting languages/perl6 to pass its tests. Apparently part of the problem is that the undef function isn't fully defined. Andy also found problems with sprintf and 64 bit INTVALs (fixed by Brent Dax), PMCs and 64 bit INTVALs (fixed by Leo Tötsch), PerlHashes and gcc-2.95.3 and 2.8.1 on Solaris (confirmed as a problem with other versions of gcc on Solaris by Joshua Hoblitt), dependency issues between Jako and IMCC from a clean directory and problems with the Jako life implementation. Bruce Gray sent a pile of fixes for Win32 systems, covering GC and build problems. Compiling to ParrotVM Klaas-Jan Stol is thinking of writing a compiler that targets Parrot for his Bachelor's in Computer Science, probably a TCL compiler, and he asked for suggestions and tips. David Robins made a few suggestions and pointed out that parrot is a moving target. Dan protested that it wasn't moving that much (If I 'adn't nailed it to the perch, it'd've muscled up to them bars and... VOOM!) and said that he thought a TCL to Parrot compiler would be great. Will Coleda put up a URL for his first pass at such a beast and asked that we be gentle with him (he put up a URL for his second pass later, which is the link below). Gopal V pointed out that IMCC may be a better target than Parrot assembly as that took care of register allocation and generally helped programmers retain their hair and also suggested that, if the compiler was written in C then DotGNU's TreeCC would be worth looking at. Tanton Gibbs, who is working on a C++ compiler agreed that TreeCC is 'an extremely nice system' that he recommended highly. http://makeashorterlink.com/?T27042FD2 http://www.coleda.com/users/coke/parrot/ http://makeashorterlink.com/?H2CF62ED2 Register scanning Apologizing for reopening the register scanning can of worms, Steve Fink wondered about the requirement that all Parrot GC implementations scan all hardware registers for live pointers. Apparently this is a real problem with, for example, the IA64 architecture. He proposed that configure probe for systems that would support register scanning GC, but that the default implementation should use a 'registration' system. He followed this up with a 'naive' implementation of such a system. Jason Gloudon suggested another scheme that I'm afraid I didn't understand to implement 'accurate' GC. http://makeashorterlink.com/?L2DF41ED2 http://makeashorterlink.com/?T1EF22ED2 Returning a new PMC from ops David Robins wondered about the cleanest way to return a new PMC from an op. He and Leo Tötsch thrashed it out. http://makeashorterlink.com/?K1FF62ED2 Parrot v0.0.9 Nazgul released Steve Fink announced the release of Parrot version 0.0.9, aka Nazgul complete with a long list of new features, and the usual call for further assistance. Well done everyone. As Steve says, Parrot is getting dangerously close to being really usable... http://makeashorterlink.com/?X10022FD2 http://makeashorterlink.com/?C51024FD2 Meanwhile, in perl6-language It was quiet... too quiet. Only 48 messages in perl6-language, maybe we're all keeping quiet so as not to distract Larry from writing the next Apocalypse. Comparing Object Identity This thread (along with every other thread in the language list this week) continued from last week. Dan pointed out that using long lived object IDs (ie. unique for all time) would be expensive, and reckoned that the basic approach should be fast and good enough for the common case. Piers Cawley wondered if doing object 'identity' comparison with a method (eg: $obj.is($other_obj);) wasn't actually the best way forward. (Piers had been applying his OO rule of thumb -- if you're not sure of how to do something, take a look at a Smalltalk image). Dave Whipp proposed an adverb syntax ($a eq : ID $b) which would be generalizable: $a eq:i
Re: tree frobbing facilities in Perl6?
[EMAIL PROTECTED] (Rich Morin) writes: I find myself frobbing trees a lot these days So that's where the ents came from. -- Within a computer, natural language is unnatural.
Re: This week's Perl 6 summary
On Tuesday, December 24, 2002, at 02:55 AM, Piers Cawley wrote: Apparently part of the problem is that the undef function isn't fully defined. Well, isn't that sort-of the point? :-) David -- David Wheeler AIM: dwTheory [EMAIL PROTECTED] ICQ: 15726394 http://david.wheeler.net/ Yahoo!: dew7e Jabber: [EMAIL PROTECTED]
Re: tree frobbing facilities in Perl6?
Rich Morin wrote: is not something I want to try in XSLT. I can do it in Perl, of course, but I end up writing a lot of code. Am I missing something? And, to bring the posting back on topic, will Perl6 bring anything new to the campfire? I think that one of the things that Perl6 will bring is continuations. This will enable you to treat a tree traversal in the same way as any other list. For example: for $tree.depth_first_traversal(process) - $node { ... } There would be no need to obscure the client-code with the details of hierarchical navigation. (Question: can I use Cyield inside a recursive implementation of the iterator?) Dave.