> > I couldn't figure out a Perl-ish way of parsing it,
> > resorting to putting it all in a big 2D array and
> > scurrying around it. With lots of consistency
> > checking it takes about 5s to run on my 1.6GHz box
> > (yikes).
>
> _Damn_. I tried the same, briefly, when it was first
> mentioned on london.pm; and concluded that it was
> entirely impossible to do properly. Any chance of a
> copy of the source to your solution?
The problem is very similar to PCB tracks in electronics. The tracks are the
connections between the people, and the nodes are the people. The orthogonal nature
of the grid makes it fairly easy to parse... honest :P
My solution shouldn't require (that) much memory or CPU ticks. I reckon that it could
run in under a second, easily.
Here is how I'd parse the graph:
.--------- turin -------------------------------------.
Store the name, with an associated number. You then have:
.--------- 00001 -------------------------------------.
Next, each "track" we can encounter has exactly two ends. At any one time you know
any of 0/1/2 endpoints (the people these links join). You need a new datastructure
for every new track you encounter. It should minimally be:
$working_track{track_id} = [person_id_1, person_id_2];
As soon as you find two people you can place this track aside - as we've seen all of
it. Keep it though, it's our parsed data.
Remember to keep information on those .'s that are linked to the node appearing
immediately below. You'll have to be able to be able to deal with the other change
direction symbols too, but not until you've got two lines. Get an above symbol on the
first line? Die with an error.
Take the first line, and number the tracks:
2 2 00001 3 3
you'll notice that we can start filling our working track hash, from this we put in:
$working_track{2} = [1, undef];
$working_track{3} = [3, undef];
Moving onto the second line:
| .----' | ||`---------------------------. |
becomes:
4 5 5 6 789 9 a
And you fill the working track hash with:
$working_track{4} = [undef, undef];
$working_track{5} = [undef, undef];
$working_track{6} = [undef, undef];
$working_track{7} = [undef, undef];
$working_track{8} = [undef, undef];
$working_track{9} = [undef, undef];
$working_track{a} = [undef, undef];
However, remember our previous line (which we should have in parsed form), and place
it on top of what we've just found:
2 2 00001 3 3
4 5 5 6 789 9 a
And we can now modify what we've just found, since now know some of the details:
$working_track{4} = 2; # Array = track, scalar = track alias.
$working_track{5} = [1, undef];
$working_track{6} = [1, undef];
$working_track{7} = [1, undef];
$working_track{8} = [1, undef];
$working_track{9} = [1, undef];
$working_track{a} = 3;
You probably want to remove alias as soon as possible, as on a large graph you might
get hundreds of alias.
If you get two identical person_id's, or working tracks at the end then you've got a
baddly formed sexchart. Good luck in coding this up for real :) If you want further
explaination, or even working code (please don't... I'll be at it for days ;)
Take care,
Jonathan Paton