thread ordering based on references and/or in-reply-to

2011-11-04 Thread Dirk-Jan C. Binnema


On Wed 02 Nov 2011 04:37:05 PM EET, Austin Clements wrote:

 > On Mon, Oct 31, 2011 at 7:07 PM, Florian Friesdorf  
 > wrote:
 > >
 > > Hi,
 > >
 > > I'm looking into taking the References header into account for thread
 > > ordering. So far only In-Reply-To is used. My C/C++ is rusty at best, so
 > > I'd need some help to get this done.



 > I know this came up on IRC, but have you looked at jwz's threading
 > algorithm (http://www.jwz.org/doc/threading.html)?  Carl mentioned
 > that notmuch already implements it (except for subject matching), but
 > notmuch only implements the subset of it necessary to group messages
 > into threads without structure.  Much of the algorithm is devoted to
 > exactly this problem of piecing together the thread structure based on
 > all of the information in both In-Reply-To and References.  The
 > algorithm as described combines the issues of grouping and structuring
 > since it's expecting a giant pile of mail as input, but there's no
 > reason these can't be teased apart.

I've implemented it for mu[1], maybe some of it can be reusable for notmuch;
see mu-threader.[ch] and mu-container.[ch] in

   http://gitorious.org/mu/mu/blobs/master/src/

(starting point is mu_threader_calculate).

I didn't implement subject matching yet, but it does build the hierarchy as
per JWZ and "References:".

Best wishes,
Dirk.

-- 
Dirk-Jan C. Binnema  Helsinki, Finland
e:djcb at djcbsoftware.nl   w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C


Re: Re: thread ordering based on references and/or in-reply-to

2011-11-04 Thread Dirk-Jan C . Binnema


On Wed 02 Nov 2011 04:37:05 PM EET, Austin Clements wrote:

 > On Mon, Oct 31, 2011 at 7:07 PM, Florian Friesdorf  wrote:
 > >
 > > Hi,
 > >
 > > I'm looking into taking the References header into account for thread
 > > ordering. So far only In-Reply-To is used. My C/C++ is rusty at best, so
 > > I'd need some help to get this done.


 
 > I know this came up on IRC, but have you looked at jwz's threading
 > algorithm (http://www.jwz.org/doc/threading.html)?  Carl mentioned
 > that notmuch already implements it (except for subject matching), but
 > notmuch only implements the subset of it necessary to group messages
 > into threads without structure.  Much of the algorithm is devoted to
 > exactly this problem of piecing together the thread structure based on
 > all of the information in both In-Reply-To and References.  The
 > algorithm as described combines the issues of grouping and structuring
 > since it's expecting a giant pile of mail as input, but there's no
 > reason these can't be teased apart.

I've implemented it for mu[1], maybe some of it can be reusable for notmuch;
see mu-threader.[ch] and mu-container.[ch] in

   http://gitorious.org/mu/mu/blobs/master/src/

(starting point is mu_threader_calculate).
   
I didn't implement subject matching yet, but it does build the hierarchy as
per JWZ and "References:".

Best wishes,
Dirk.

-- 
Dirk-Jan C. Binnema  Helsinki, Finland
e:d...@djcbsoftware.nl   w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


thread ordering based on references and/or in-reply-to

2011-11-02 Thread Austin Clements
On Mon, Oct 31, 2011 at 7:07 PM, Florian Friesdorf  wrote:
>
> Hi,
>
> I'm looking into taking the References header into account for thread
> ordering. So far only In-Reply-To is used. My C/C++ is rusty at best, so
> I'd need some help to get this done.
>
> Carl gave a try on irc already to clear things up for me, reading into
> it, I have more questions:
>
> lib/thread.cc/_resolve_thread_relationships adds messages as replies to
> a parent.
>
> Currently, we seem to treat In-Reply-To as empty or single msgid. If I
> understand rfc822 it can be a list of msgids and/or phrases. Do/shall we
> support that?
>
> References is a list of msgids, with the last one being the direct
> parent. I don't know how multiple direct parents are handled here.
>
> DJB recommends "... readers look for identifiers in In-Reply-To and
> append them to References if they are not already included in
> References." [1]
>
> In that case if there are two msgids in In-Reply-To and there are
> appended to the References list, than only the last one will be a parent
> and the one that used to be the last is not a parent anymore.
>
> And Carl recommends to treat references and in-reply-to as two separated
> sources of information, first using in-reply-to and then references in
> order "to attach to the deepest referenced parent".
>
> I fail to understand that. Am I complicating things?
> How do we want to treat the combination of References/In-Reply-To?
>
> Do we have code that returns the last msgid listed in references?
> database.cc/parse_references seems not to care about order, just
> existence - or is GHashTable ordered.
>
> [1] http://cr.yp.to/immhf/thread.html
>
>
> florian

I know this came up on IRC, but have you looked at jwz's threading
algorithm (http://www.jwz.org/doc/threading.html)?  Carl mentioned
that notmuch already implements it (except for subject matching), but
notmuch only implements the subset of it necessary to group messages
into threads without structure.  Much of the algorithm is devoted to
exactly this problem of piecing together the thread structure based on
all of the information in both In-Reply-To and References.  The
algorithm as described combines the issues of grouping and structuring
since it's expecting a giant pile of mail as input, but there's no
reason these can't be teased apart.


Re: thread ordering based on references and/or in-reply-to

2011-11-02 Thread Austin Clements
On Mon, Oct 31, 2011 at 7:07 PM, Florian Friesdorf  wrote:
>
> Hi,
>
> I'm looking into taking the References header into account for thread
> ordering. So far only In-Reply-To is used. My C/C++ is rusty at best, so
> I'd need some help to get this done.
>
> Carl gave a try on irc already to clear things up for me, reading into
> it, I have more questions:
>
> lib/thread.cc/_resolve_thread_relationships adds messages as replies to
> a parent.
>
> Currently, we seem to treat In-Reply-To as empty or single msgid. If I
> understand rfc822 it can be a list of msgids and/or phrases. Do/shall we
> support that?
>
> References is a list of msgids, with the last one being the direct
> parent. I don't know how multiple direct parents are handled here.
>
> DJB recommends "... readers look for identifiers in In-Reply-To and
> append them to References if they are not already included in
> References." [1]
>
> In that case if there are two msgids in In-Reply-To and there are
> appended to the References list, than only the last one will be a parent
> and the one that used to be the last is not a parent anymore.
>
> And Carl recommends to treat references and in-reply-to as two separated
> sources of information, first using in-reply-to and then references in
> order "to attach to the deepest referenced parent".
>
> I fail to understand that. Am I complicating things?
> How do we want to treat the combination of References/In-Reply-To?
>
> Do we have code that returns the last msgid listed in references?
> database.cc/parse_references seems not to care about order, just
> existence - or is GHashTable ordered.
>
> [1] http://cr.yp.to/immhf/thread.html
>
>
> florian

I know this came up on IRC, but have you looked at jwz's threading
algorithm (http://www.jwz.org/doc/threading.html)?  Carl mentioned
that notmuch already implements it (except for subject matching), but
notmuch only implements the subset of it necessary to group messages
into threads without structure.  Much of the algorithm is devoted to
exactly this problem of piecing together the thread structure based on
all of the information in both In-Reply-To and References.  The
algorithm as described combines the issues of grouping and structuring
since it's expecting a giant pile of mail as input, but there's no
reason these can't be teased apart.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


thread ordering based on references and/or in-reply-to

2011-10-31 Thread Florian Friesdorf

Hi,

I'm looking into taking the References header into account for thread
ordering. So far only In-Reply-To is used. My C/C++ is rusty at best, so
I'd need some help to get this done.

Carl gave a try on irc already to clear things up for me, reading into
it, I have more questions:

lib/thread.cc/_resolve_thread_relationships adds messages as replies to
a parent.

Currently, we seem to treat In-Reply-To as empty or single msgid. If I
understand rfc822 it can be a list of msgids and/or phrases. Do/shall we
support that?

References is a list of msgids, with the last one being the direct
parent. I don't know how multiple direct parents are handled here.

DJB recommends "... readers look for identifiers in In-Reply-To and
append them to References if they are not already included in
References." [1]

In that case if there are two msgids in In-Reply-To and there are
appended to the References list, than only the last one will be a parent
and the one that used to be the last is not a parent anymore.

And Carl recommends to treat references and in-reply-to as two separated
sources of information, first using in-reply-to and then references in
order "to attach to the deepest referenced parent". 

I fail to understand that. Am I complicating things?
How do we want to treat the combination of References/In-Reply-To?

Do we have code that returns the last msgid listed in references?
database.cc/parse_references seems not to care about order, just
existence - or is GHashTable ordered.

[1] http://cr.yp.to/immhf/thread.html


florian
-- 
Florian Friesdorf 
  GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
Jabber/XMPP: f...@chaoflow.net
IRC: chaoflow on freenode,ircnet,blafasel,OFTC


pgpLrCuJN3DPq.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


thread ordering based on references and/or in-reply-to

2011-10-31 Thread Florian Friesdorf

Hi,

I'm looking into taking the References header into account for thread
ordering. So far only In-Reply-To is used. My C/C++ is rusty at best, so
I'd need some help to get this done.

Carl gave a try on irc already to clear things up for me, reading into
it, I have more questions:

lib/thread.cc/_resolve_thread_relationships adds messages as replies to
a parent.

Currently, we seem to treat In-Reply-To as empty or single msgid. If I
understand rfc822 it can be a list of msgids and/or phrases. Do/shall we
support that?

References is a list of msgids, with the last one being the direct
parent. I don't know how multiple direct parents are handled here.

DJB recommends "... readers look for identifiers in In-Reply-To and
append them to References if they are not already included in
References." [1]

In that case if there are two msgids in In-Reply-To and there are
appended to the References list, than only the last one will be a parent
and the one that used to be the last is not a parent anymore.

And Carl recommends to treat references and in-reply-to as two separated
sources of information, first using in-reply-to and then references in
order "to attach to the deepest referenced parent". 

I fail to understand that. Am I complicating things?
How do we want to treat the combination of References/In-Reply-To?

Do we have code that returns the last msgid listed in references?
database.cc/parse_references seems not to care about order, just
existence - or is GHashTable ordered.

[1] http://cr.yp.to/immhf/thread.html


florian
-- 
Florian Friesdorf 
  GPG FPR: 7A13 5EEE 1421 9FC2 108D  BAAF 38F8 99A3 0C45 F083
Jabber/XMPP: flo at chaoflow.net
IRC: chaoflow on freenode,ircnet,blafasel,OFTC
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: