I've implemented Jwz's mail threading algorithm, as described here 

        http://www.jwz.org/doc/threading.html 

which was mildy non-trivial because he describes certain steps really
badly. I couldn't find anything on CPAN that did it (there *is* a thread
function in Mail::Cclient though I'm not sure how decoupled that is
from the whole API and, of course, it's a bitch to install)

There are two problems with it.

1) It's slow. IT currently it takes seconds to sort 35 fairly threaded
mails, about 2 minutes to thread a heavily threaded mailbox with 661
mails in it and anything much bigger makes it run out of memory. This
could be fixed later either with optimisation or rewriting it in C but I
wanted a proof of concept first and also to work out my second problem
which is ...


2) I honestly can't think of a decent API.  Currently it returns a list
of Mail::Thread::Containers which is the root set (the starts of
threads) each of which contain a Mail::Thread::Message and a list of
children (all of which are Containers).

Mail::Thread::Messages contain a message_id, a list of references, and
subject.

My idea for an API was to do the sorting as it is at the moment and then
convert everything so that instead of a Container with a message and
chilren in it you have a Message which @ISA Mail::Internet but also has
a list of children and possibly a depth and maybe a pointer to a parent.

Then you'd do something like ...


my $thread = Mail::Thread->new(mbox=>'/home/simon/mail/inbox');
# could also do maildir and poss. imap and pop3

$thread->sort(\&by_data); # $a and $b will be Mail::Internet messages
$thread->calculate();

my @root_set =  $thread->root_set(); # returns it all at once

# alternatively
while (my ($msg, $depth) = $thread->next_message())  {
        # do whatever
}



Comments?


-- 
                       an atmosphere of PhD student 
  with a touch of alternative elitist radical

Reply via email to