Attached is a perl script which will scan a fossil repo's wiki pages,
and categorize them as:

  * "orphans"  -- pages to which no other pages link
  * "terminals" -- pages which link to no other pages
  * "nointernals" -- pages which do not reference any other wiki pages
  * "empties" -- pages which have no content

It will also print out the (internal) links it found for those pages
which do have links.

This is a first edition, so there are probably bugs in it, but it's
already useful to me.

It takes one parameter, which is the fossil repo to analyze.  If no
parameter is given, it will use the current fossil repo (e.g. from the
current directory).

Any comments appreciated

Best regards,
Ron

#!/usr/bin/perl

$repofile = shift;
$repocmd = '';
$repocmd = "-R $repofile" if -f $repofile;

print "Gathering list of pages...\n";
@pages = `fossil wiki list $repocmd`;

print "Gathering page data...\n";
%pages = ();
foreach $page ( @pages )
{
        chomp $page;
        $text = `fossil wiki export "$page" $repocmd`;
        $pages{$page} = $text;
}

print "Scanning for links...\n";
@orphans = ();
@nointernals = ();
@terminals = ();
@empties = ();
foreach $page ( keys %pages )
{
        my @links = ();
        my $text = $pages{$page};
        while ( $text =~ m/\[([^][]+)\]/g )
        {
                push @links,$1;
        }

        $numlinks = $#links;

        if (@links == ()) 
        {
                push @terminals, $page;
        }
        else
        {
                my @internals = grep { $_ !~ /(http:)|(mailto:)|(https:)/ } 
@links;
                if (@internals == ()) 
                {
                        push @nointernals, $page;
                }
                else
                {
                        @{$pages{$page}{'links'}} = @internals;
                        foreach $internal ( @internals )
                        {
                                my ($int_link, $display) = split /\|/, 
$internal;
                                ${$pages{$int_link}{'refs'}}++;
                        }
                }
        }

        if ($text eq '' || $text =~ m/^<i>Empty Page<\/i>[ \n\r]*/s)
        {
                push @empties, $page;
        }
}
foreach $page ( keys %pages )
{
        if (${$pages{$page}{'refs'}} == 0)
        {
                push @orphans, $page;
        }
}
foreach $empty ( @empties )
{
        print ("empty: '$empty'\n");
}
foreach $nointernals ( @nointernals )
{
        print ("nointernals: '$nointernals'\n");
}
foreach $terminal ( @terminals )
{
        print ("terminal: '$terminal'\n");
}
foreach $orphan ( @orphans )
{
        print ("orphan: '$orphan'\n");
}
foreach $page ( sort keys %pages )
{
        my @links = @{$pages{$page}{'links'}};
        if (@links != ()) 
        {
                print "links: '$page' -> ", join (", ", @links), "\n";
        }
}
_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to