Gregory Stark wrote:

> I would be curious to see the average lifespan of threads over time.

I happen to have the mail archives stored in a database, so I've
expressed this in SQL and below are some results for hackers and
general, 2007-2008. count is the number of distinct threads whose
oldest message is in the specified month. A thread is started as
soon as a message has an In-Reply-To field pointing to an
existing Message-Id.

Results for pgsql-hackers:

  month  |     avg span     |   median span   | count
---------+------------------+-----------------+-------
 2007-01 | 7 days 10:00:00  | 1 day 04:18:00  |   211
 2007-02 | 7 days 10:00:00  | 1 day 00:23:48  |   186
 2007-03 | 16 days 30:00:00 | 1 day 05:45:37  |   171
 2007-04 | 13 days 26:00:00 | 19:07:00        |   142
 2007-05 | 19 days 30:00:00 | 1 day 04:46:36  |   122
 2007-06 | 15 days 19:00:00 | 23:38:13        |   111
 2007-07 | 19 days 25:00:00 | 21:04:04        |   106
 2007-08 | 13 days 30:00:00 | 20:26:39        |   133
 2007-09 | 21 days 32:00:00 | 1 day 16:43:10  |   121
 2007-10 | 13 days 19:00:00 | 17:23:24        |   148
 2007-11 | 16 days 15:00:00 | 16:23:00        |   140
 2007-12 | 17 days 16:00:00 | 1 day 07:28:05  |    81
 2008-01 | 13 days 12:00:00 | 23:02:33        |   127
 2008-02 | 9 days 11:00:00  | 12:44:28        |   130
 2008-03 | 10 days 14:00:00 | 22:57:18        |   140
 2008-04 | 10 days 14:00:00 | 1 day 00:32:34  |   132
 2008-05 | 13 days 09:00:00 | 1 day 20:57:57  |   113
 2008-06 | 7 days 27:00:00  | 1 day 05:42:46  |   102
 2008-07 | 13 days 26:00:00 | 2 days 07:43:34 |   133
 2008-08 | 9 days 33:00:00  | 1 day 07:47:09  |   121
 2008-09 | 7 days 25:00:00  | 1 day 19:00:50  |   125
 2008-10 | 6 days 14:00:00  | 1 day 10:31:01  |   178

 Results for pgsql-general:

  month  |    avg span     | median span | count
---------+-----------------+-------------+-------
 2007-01 | 1 day 25:00:00  | 10:57:11    |   329
 2007-02 | 2 days 28:00:00 | 10:50:38    |   295
 2007-03 | 3 days 08:00:00 | 14:54:08    |   310
 2007-04 | 6 days 18:00:00 | 17:40:55    |   244
 2007-05 | 3 days 22:00:00 | 16:43:54    |   287
 2007-06 | 2 days 13:00:00 | 11:26:46    |   297
 2007-07 | 2 days 19:00:00 | 11:59:40    |   263
 2007-08 | 3 days 14:00:00 | 16:35:16    |   335
 2007-09 | 3 days 14:00:00 | 13:23:09    |   245
 2007-10 | 2 days 16:00:00 | 08:46:09    |   302
 2007-11 | 3 days 07:00:00 | 08:28:06    |   294
 2007-12 | 2 days 31:00:00 | 10:25:14    |   255
 2008-01 | 2 days 14:00:00 | 13:23:12    |   248
 2008-02 | 2 days 14:00:00 | 10:02:16    |   257
 2008-03 | 1 day 25:00:00  | 13:20:06    |   245
 2008-04 | 1 day 30:00:00  | 08:26:06    |   238
 2008-05 | 3 days 22:00:00 | 18:58:27    |   211
 2008-06 | 2 days 24:00:00 | 14:46:02    |   191
 2008-07 | 1 day 29:00:00  | 10:37:17    |   221
 2008-08 | 1 day 22:00:00  | 14:14:45    |   205
 2008-09 | 1 day 24:00:00  | 14:26:26    |   202
 2008-10 | 1 day 19:00:00  | 12:32:56    |   219

"median span" is the median computed with the pl/R median function
applied to intervals as a number of seconds and then cast back to
intervals for display. I believe the median is good to mitigate the
contribution of messages with wrong dates and posters that reply to
very old messages. And median span appears to differs a lot from the
average span.

If people feel like playing with the database to build other queries,
feel free to bug me off-list about it. I can arrange to make a dump
available or share the scripts to build it yourself from the
mailboxes archives.

 Best regards,
--
 Daniel
 PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to