Re: Why does mutt sees Mailman mailboxes as empty?

2009-06-13 Thread Cameron Simpson
On 10May2009 19:40, I wrote:
| On 10May2009 11:13, M. Fioretti  wrote:
| | > http://www.cskk.ezoshosting.com/cs/css/bin/get-mailman-archive which
[...]
| | >   http://www.cskk.ezoshosting.com/cs/css/bin/un-at-
| | the regular expression in the second script above should be updated.
| | As is now, it won't recognize .edu, .int and .mil addresses. I know
| | how to do change myself:
| | 
| |   s/(\S) at 
(([-\w]+\.)(net|com|biz|org|info|edu|mil|int|[a-z][a-z]))\b/$...@$2/ig;
[...]
| Change applied, thanks; it may be a while before it's visible on the web site.

George Davidovich reported brokenness in this regexp. The brokenness was that:
  ([-\w]+\.)
is supposed to be followed by a "+".

A new script with the bugfix is up at:
  http://www.cskk.ezoshosting.com/cs/css/bin/un-at-

Thanks George!
-- 
Cameron Simpson  DoD#743
http://www.cskk.ezoshosting.com/cs/

| I'm the female partner of a climber (I don't climb) and until now, I was
| under the impression that climbers are cool people, but alas, you had to
| ruin it for me.
*REAL* climbers are crude, impolite, solitary, abrupt, arrogant.  Sport
climbers are cool.
- Rene Tio  in rec.climbing


Re: Why does mutt sees Mailman mailboxes as empty?

2009-05-12 Thread Christian Brabandt
Hi Rocco!

On Di, 12 Mai 2009, Rocco Rutte wrote:

> Hi,
> 
> * Christian Brabandt wrote:
> > http://blog.256bit.org/archives/345-Mutt-als-Mailbox-Konvertierer.html
> 
> Regarding that entry: compressed folders support is not in the mainline.

Thanks, I'll add a note. BTW: What's the reason, it is not included? I 
find it incredibly useful.

> Two comments: First, mutt does more than is necessary when saving
> messages or opening folders, so a formail+procmail combination may be
> more suitable.

Yeah, but I usually do not fiddle with procmail/formail since I use 
Sieve and thus it was a lot easier for me to figure that task out 
using mutt than to install and configure procmail. But YMMV of course.

> To speed things up a little, you should try to use the -n
> switch and see if that speeds things up a little.

Thanks, I'll add a note.

But anyway, speed wasn't an issue for me and it just works™.

regards,
Christian
-- 
hundred-and-one symptoms of being an internet addict:
236. You start saving URL's in your digital watch.


Re: Why does mutt sees Mailman mailboxes as empty?

2009-05-12 Thread Rocco Rutte
Hi,

* Christian Brabandt wrote:

> I have been doing something like this using mutt. For those that 
> understand German I have once documented this approach here:
> http://blog.256bit.org/archives/345-Mutt-als-Mailbox-Konvertierer.html

Regarding that entry: compressed folders support is not in the mainline.

> in short:
> you create a simple muttrc file like this:
[...]
> and then start mutt like this:
> 
> mutt -F muttrc_convert -f mbox_file
> 
> This won't touch your original mbox file only read it and then store 
> your mails in the desired format at ~/mutt_archive as either Maildir, 
> MH, mbox or MMDF depending on your output format.

Two comments: First, mutt does more than is necessary when saving
messages or opening folders, so a formail+procmail combination may be
more suitable. To speed things up a little, you should try to use the -n
switch and see if that speeds things up a little.

Rocco


Re: Why does mutt sees Mailman mailboxes as empty?

2009-05-12 Thread Christian Brabandt
Hi M.!

On Sa, 09 Mai 2009, M. Fioretti wrote:

> My real interest was testing automated mbox-maildir automatic
> conversion via mutt on some sample email that I needed to analyze
> anyway, this issue was really unexpected.
> 

I have been doing something like this using mutt. For those that 
understand German I have once documented this approach here:
http://blog.256bit.org/archives/345-Mutt-als-Mailbox-Konvertierer.html

in short:
you create a simple muttrc file like this:
,[ muttrc_convert ]-
| # Specify your target type
| # use one of
| # Maildir, MH, mbox und MMDF
| set mbox_type=Maildir
| 
| # Where to store the mails
| # the path needs to exist!
| set my_archivedir="~/mutt_archive/$mbox_type"
| 
| # Create new mails, without confirmation
| set confirmcreate=no
| 
| # append mails without confirmation
| set confirmappend=yes
| 
| # quit without confirmation
| set quit=yes
| 
| folder-hook . 'push ~A\
| \
| $my_archivedir'
`

and then start mutt like this:

mutt -F muttrc_convert -f mbox_file

This won't touch your original mbox file only read it and then store 
your mails in the desired format at ~/mutt_archive as either Maildir, 
MH, mbox or MMDF depending on your output format.

HTH,
Christian
-- 
hundred-and-one symptoms of being an internet addict:
234. You started college as a chemistry major, and walk out four years
 later as an Internet provider.


Re: Why does mutt sees Mailman mailboxes as empty?

2009-05-10 Thread Cameron Simpson
On 10May2009 11:13, M. Fioretti  wrote:
| Cameron,
| thanks a lot for these scripts! Just a couple of notes:
| 
| On Sun, May 10, 2009 18:41:44 PM +1000, Cameron Simpson wrote:
| 
| > http://www.cskk.ezoshosting.com/cs/css/bin/get-mailman-archive which
| > fetches all the archives
| 
| looking at the code, this downloads all and only the gzipped mbox
| files, right?

Yes. Conveniently you can concatenate gzip files and ungzip the result.

| >   http://www.cskk.ezoshosting.com/cs/css/bin/fix-mail-dates
| >   http://www.cskk.ezoshosting.com/cs/css/bin/un-at-
| 
| the regular expression in the second script above should be updated.
| As is now, it won't recognize .edu, .int and .mil addresses. I know
| how to do change myself:
| 
|   s/(\S) at 
(([-\w]+\.)(net|com|biz|org|info|edu|mil|int|[a-z][a-z]))\b/$...@$2/ig;
| 
| it's just to let others know. Am I forgetting any other relevant TLD?

Change applied, thanks; it may be a while before it's visible on the web site.

| Thanks again for this and all the other cool scripts at your site,

A pleasure.

Cheers,
-- 
Cameron Simpson  DoD#743
http://www.cskk.ezoshosting.com/cs/

A cookie store is a bad idea. Besides, the market research reports say
America likes crispy cookies, not soft and chewy cookies like you make.
  --Response to Debbi Fields' idea of starting Mrs. Fields' Cookies.


Re: Why does mutt sees Mailman mailboxes as empty?

2009-05-10 Thread M. Fioretti
Cameron,
thanks a lot for these scripts! Just a couple of notes:

On Sun, May 10, 2009 18:41:44 PM +1000, Cameron Simpson wrote:

> http://www.cskk.ezoshosting.com/cs/css/bin/get-mailman-archive which
> fetches all the archives

looking at the code, this downloads all and only the gzipped mbox
files, right?
 
>   http://www.cskk.ezoshosting.com/cs/css/bin/fix-mail-dates
>   http://www.cskk.ezoshosting.com/cs/css/bin/un-at-

the regular expression in the second script above should be updated.
As is now, it won't recognize .edu, .int and .mil addresses. I know
how to do change myself:

  s/(\S) at 
(([-\w]+\.)(net|com|biz|org|info|edu|mil|int|[a-z][a-z]))\b/$...@$2/ig;

it's just to let others know. Am I forgetting any other relevant TLD?
 
Thanks again for this and all the other cool scripts at your site,

Marco Fioretti
Digital Rights writings -> http://mfioretti.com
-- 
Your own civil rights and the quality of your life heavily depend on how
software is used *around* you:http://digifreedom.net/node/84


Re: Why does mutt sees Mailman mailboxes as empty?

2009-05-10 Thread Cameron Simpson
On 09May2009 19:08, M. Fioretti  wrote:
| On Sat, May 09, 2009 09:47:06 AM -0700, George Davidovich wrote:
| > On Sat, May 09, 2009 at 04:49:50PM +0200, M. Fioretti wrote:
| >
| > > From mfioretti at nexaima.net  Sun May 25 07:56:41 2008
| > > From: mfioretti at nexaima.net (M. Fioretti)
| > 
| > Note that 'From ' and 'From:' are distinct; the space in the former is
| > not a typo.
| 
| Yes, I know. I also agree with your other advice, thanks. In this
| particular case what I'd need to do is simply to open each mailbox
| file and save it in maildir format. I don't need to reply, etc...

I routinely download mailman archives when I join such a mailing list
so I can search them locally in the future (mairix etc). They do indeed
need munging before use. I wrote this script:

  http://www.cskk.ezoshosting.com/cs/css/bin/get-mailman-archive

which fetches all the archives, concatenates them and then filters the
almost-mbox result through these two scripts:

  http://www.cskk.ezoshosting.com/cs/css/bin/fix-mail-dates
  http://www.cskk.ezoshosting.com/cs/css/bin/un-at-

and the result is a clean mbox file. I then move them into maildirs with
mutt's copy-message facility. However, the above scripts will let you
convert mailman archives into useable mbox files.

| My real interest was testing automated mbox-maildir automatic
| conversion via mutt on some sample email that I needed to analyze
| anyway, this issue was really unexpected.

Enjoy,
-- 
Cameron Simpson  DoD#743
http://www.cskk.ezoshosting.com/cs/

Out on the road, feeling the breeze, passing the cars.  - Bob Seger


Re: Why does mutt sees Mailman mailboxes as empty?

2009-05-09 Thread M. Fioretti
On Sat, May 09, 2009 09:47:06 AM -0700, George Davidovich wrote:
> On Sat, May 09, 2009 at 04:49:50PM +0200, M. Fioretti wrote:
>
> > From mfioretti at nexaima.net  Sun May 25 07:56:41 2008
> > From: mfioretti at nexaima.net (M. Fioretti)
> 
> Note that 'From ' and 'From:' are distinct; the space in the former is
> not a typo.

Yes, I know. I also agree with your other advice, thanks. In this
particular case what I'd need to do is simply to open each mailbox
file and save it in maildir format. I don't need to reply, etc...

I only need to put each message in a separate file, because I would
like to figure out how many list subscribers use each mail client, who
are the most active posters each month, how many use HTML email and
other similar statistics, and it would be easier to script all this if
each message were in a separate file.

Why doing the mbox->mdir conversion with mutt? Because I've found
online several ways on how to do it with mutt from a shell script, and
I wanted to test them on a pet project before running them on my own
archive, which is much bigger and more valuable for me than a public
mailing list.

BUt of course, to make those mutt tricks work, the initial mbox file
must be such that that mutt can recognize and parse it... at least
well enough to find message boundaries without errors. Hence my
initial question.

My real interest was testing automated mbox-maildir automatic
conversion via mutt on some sample email that I needed to analyze
anyway, this issue was really unexpected.

Thanks,
Marco
-- 
Your own civil rights and the quality of your life heavily depend on how
software is used *around* you:http://digifreedom.net/node/84


Re: Why does mutt sees Mailman mailboxes as empty?

2009-05-09 Thread George Davidovich
On Sat, May 09, 2009 at 04:49:50PM +0200, M. Fioretti wrote:
> On Sat, May 09, 2009 07:02:14 AM -0700, George Davidovich wrote:
> 
> > Email address munging where 'u...@example.org' is replaced with some
> > variation of 'user at example.org'
> 
> Holy cow, you're right.  I know about this practice, but had forgotten
> the downloaded mailbox would have this problem. This is what I get
> from grepping my own address from that file:
> 
> From mfioretti at nexaima.net  Sat May 17 08:23:09 2008
> From: mfioretti at nexaima.net (M. Fioretti)
> From mfioretti at nexaima.net  Sun May 25 07:23:42 2008
> From: mfioretti at nexaima.net (M. Fioretti)
> From mfioretti at nexaima.net  Sun May 25 07:56:41 2008
> From: mfioretti at nexaima.net (M. Fioretti)

Note that 'From ' and 'From:' are distinct; the space in the former is
not a typo.

> So is this (munged email addresses) what would prevent mutt from
> detecting message boundaries?

My guess is that mutt doesn't see any valid messages, let alone any
boundaries.  If you de-munged the addresses, however, each 'From '
line would be in proper format, so messages and message boundaries would
be interpreted correctly by mutt.

That's not to say you should expect that a given archive contains 
'From ' lines, or that if it does, they're in a valid format with or
without address munging.  Like I said, don't be surprised by what you
download.

> I also have no problem to put together a script to fix any text file,
> but in this case I had no idea what to look for, that's why I asked.

Personally, I'd use a text editor and visually confirm each
substitution.  You could probably get away with just fixing the 'From '
as far as mutt is concerned, but that won't help if you every need to
reply to a given message, or anything else that one normally does with
email.

As a side note, if you've got NNTP support compiled into mutt and the
mailing list you're interested in is carried by Gmane, you can skip a
lot of this nonsense by letting mutt handle everything.

-- 
George


Re: Why does mutt sees Mailman mailboxes as empty?

2009-05-09 Thread M. Fioretti
On Sat, May 09, 2009 07:02:14 AM -0700, George Davidovich wrote:

> Email address munging where 'u...@example.org' is replaced with some
> variation of 'user at example.org'

Holy cow, you're right.  I know about this practice, but had forgotten
the downloaded mailbox would have this problem. This is what I get
from grepping my own address from that file:

>From mfioretti at nexaima.net  Sat May 17 08:23:09 2008
From: mfioretti at nexaima.net (M. Fioretti)
>From mfioretti at nexaima.net  Sun May 25 07:23:42 2008
From: mfioretti at nexaima.net (M. Fioretti)
>From mfioretti at nexaima.net  Sun May 25 07:56:41 2008
From: mfioretti at nexaima.net (M. Fioretti)

So is this (munged email addresses) what would prevent mutt from
detecting message boundaries?

I also have no problem to put together a script to fix any text file,
but in this case I had no idea what to look for, that's why I asked.

Thanks George,
   Marco
   Digital Rights writing at http://mfioretti.com
-- 
Your own civil rights and the quality of your life heavily depend on how
software is used *around* you:http://digifreedom.net/node/84


Re: Why does mutt sees Mailman mailboxes as empty?

2009-05-09 Thread George Davidovich
On Sat, May 09, 2009 at 10:56:30AM +0200, M. Fioretti wrote:
> I've downloaded the gzipped montly archive of a mailing list (with
> closed archives, that's why I can't give the direct link) ran with
> mailman/pipermail. When I unzip the compressed mailbox, it looks fine
> with cat, more and similar pagers. If I run commands like "grep ^To'
> or 'grep ^Subject' on it, I see all the senders and subjects
> properly.

Can't speak to what you've actually downloaded, but "looks fine" isn't
necessarily synonymous with "in valid mbox format".

http://en.wikipedia.org/wiki/Mbox

> But if I open it with mutt -f mailbox_file, mutt says it's empty.
> What can the reason be, and how should I proceed to see the mailbox?
> different mutt settings, removing special characters from the mailbox
> (but which ones?)

Try running 

formail -ds < downloaded_archive > new_archive

and see if that corrects the problem.

If it doesn't, I'd suggest there are additional issues that will require
manual correction.  Email address munging where 'u...@example.org' is
replaced with some variation of 'user at example.org' (to frustrate
automated address harvesting by spammers), is common enough, but there
could be other changes.  The archives from one source may be very
different from those provided by another, so don't be surprised by what
you get.

A quick test here on my own system reveals that a missing or borked
'From ' line, will cause mutt to exit with a "... is not a mailbox"
error, but an otherwise valid mbox with munged email addresses will
cause mutt to report "There are no messages".  If the latter is your
problem, then you're looking de-munging addresses.

Whatever changes you do make, it's probably a good idea to grep the
archive using a unique header name to get the total message count, and
then compare that against the count as reported by mutt.
 
-- 
George  



Why does mutt sees Mailman mailboxes as empty?

2009-05-09 Thread M. Fioretti
Greetings,

I've downloaded the gzipped montly archive of a mailing list (with
closed archives, that's why I can't give the direct link) ran with
mailman/pipermail. When I unzip the compressed mailbox, it looks fine
with cat, more and similar pagers. If I run commands like "grep ^To'
or 'grep ^Subject' on it, I see all the senders and subjects
properly.

But if I open it with mutt -f mailbox_file, mutt says it's empty.
What can the reason be, and how should I proceed to see the mailbox?
different mutt settings, removing special characters from the mailbox
(but which ones?)

TIA,
Marco
Digital rights writings -> http://mfioretti.com 
-- 
Your own civil rights and the quality of your life heavily depend on how
software is used *around* you:http://digifreedom.net/node/84