> > I couldn't figure out a Perl-ish way of parsing it,
> > resorting to putting it all in a big 2D array and
> > scurrying around it.  With lots of consistency
> > checking it takes about 5s to run on my 1.6GHz box
> > (yikes).
> 
> _Damn_.  I tried the same, briefly, when it was first
> mentioned on london.pm; and concluded that it was
> entirely impossible to do properly.  Any chance of a
> copy of the source to your solution?

The problem is very similar to PCB tracks in electronics.  The tracks are the 
connections between the people, and the nodes are the people.  The orthogonal nature 
of the grid makes it fairly easy to parse... honest :P

My solution shouldn't require (that) much memory or CPU ticks.  I reckon that it could 
run in under a second, easily.

Here is how I'd parse the graph:


     .--------- turin -------------------------------------.

Store the name, with an associated number.  You then have:

     .--------- 00001 -------------------------------------.

Next, each "track" we can encounter has exactly two ends.  At any one time you know 
any of 0/1/2 endpoints (the people these links join).  You need a new datastructure 
for every new track you encounter.  It should minimally be:

$working_track{track_id} = [person_id_1, person_id_2];

As soon as you find two people you can place this track aside - as we've seen all of 
it.  Keep it though, it's our parsed data.

Remember to keep information on those .'s that are linked to the node appearing 
immediately below.  You'll have to be able to be able to deal with the other change 
direction symbols too, but not until you've got two lines.  Get an above symbol on the 
first line?  Die with an error.

Take the first line, and number the tracks:

     2        2 00001 3                                    3

you'll notice that we can start filling our working track hash, from this we put in:

$working_track{2} = [1, undef];
$working_track{3} = [3, undef];

Moving onto the second line:

     |    .----' | ||`---------------------------.         |         

becomes:

     4    5    5 6 789                           9         a

And you fill the working track hash with:

$working_track{4} = [undef, undef];
$working_track{5} = [undef, undef];
$working_track{6} = [undef, undef];
$working_track{7} = [undef, undef];
$working_track{8} = [undef, undef];
$working_track{9} = [undef, undef];
$working_track{a} = [undef, undef];

However, remember our previous line (which we should have in parsed form), and place 
it on top of what we've just found:


     2        2 00001 3                                    3
     4    5    5 6 789                           9         a

And we can now modify what we've just found, since now know some of the details:

$working_track{4} = 2;  # Array = track, scalar = track alias.
$working_track{5} = [1, undef];
$working_track{6} = [1, undef];
$working_track{7} = [1, undef];
$working_track{8} = [1, undef];
$working_track{9} = [1, undef];
$working_track{a} = 3;

You probably want to remove alias as soon as possible, as on a large graph you might 
get hundreds of alias.

If you get two identical person_id's, or working tracks at the end then you've got a 
baddly formed sexchart.  Good luck in coding this up for real :)  If you want further 
explaination, or even working code (please don't... I'll be at it for days ;)

Take care,

Jonathan Paton  

Reply via email to