Re: [Nmh-workers] mime-aware filtering?

2012-06-26 Thread Paul Fox

paul vixie wrote:
 > On 2012-06-26 3:19 AM, Jerrad Pierce wrote:
 > > Sorry for the premature reply.
 > >
 > > I see now that Paul did understand my idea.
 > > I can underatd that some might not want duplicate
 > > content, but that's what I proposed it be optional.
 > > A temporary cache does not allow for indexing.
 > 
 > i'm ok with that. disk space is cheap. the index can keep copies of the
 > content. the mh hook system can keep them in synch. unless you have
 > multiple terabytes of stored e-mail you'll never feel the cost of the
 > second copy.
 > 
 > > Keeping it in Mail means you have whichever
 > > decoded messages you want greppable/indexable;
 > > be it done to all on inc, or manually for a select
 > > few. Then, when you remove them message, the parts
 > > get automagically wiped out by rmm.
 > 
 > i don't see how to support indexing on a read-only mail store if we're
 > interleaving the files. while bboards may be long gone usenet is still
 > out there, and imap too.

why couldn't an indexer know the difference between the message file
and the content cache?

anyway:  i think i still prefer the idea that the content cache
directories be kept in the message tree.  but i also understand why
one might want them separate.  if the idea is that the message tree
and the cache tree are roughly isomorphic, i'll bet that could be made
a per-user choice, as long as the content directories were really
named "53.mime/" and not simply "53/" -- i.e., the messages and the
mime-dirs could either live in the same tree or not, since they use
different parts of the namespace.  (but clients certainly would need
to be careful not to assume one model or the other.)

paul
=-
 paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 56.7 degrees)

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-26 Thread Christian Neukirchen
Paul Vixie  writes:

> i consider MH's basic mailbox format to be flawed in a MIME world for
> which MH was never designed or redesigned. every attachment should be in
> its own file, even if that meant that messages were directories no
> longer files themselves.
>
> note, i know we can't do that and i know why. i am not proposing it.

Some Plan 9 guys have explored this aproach:
http://plan9.lsub.org/magic/man2html/1/mails

It should not pose a problem to modern filesystems to have that many
directories.

-- 
Christian Neukirchenhttp://chneukirchen.org


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-26 Thread Paul Vixie
On 2012-06-26 11:45 AM, Paul Fox wrote:
> anyway:  i think i still prefer the idea that the content cache
> directories be kept in the message tree.  but i also understand why
> one might want them separate.  if the idea is that the message tree
> and the cache tree are roughly isomorphic, i'll bet that could be made
> a per-user choice, as long as the content directories were really
> named "53.mime/" and not simply "53/" -- i.e., the messages and the
> mime-dirs could either live in the same tree or not, since they use
> different parts of the namespace.  (but clients certainly would need
> to be careful not to assume one model or the other.)

lots of code (here i'm thinking of uw-imap) makes the assumption that if
there's a directory then it's a folder. such names need not be
all-numeric or semi-numeric. you'd have to preface the name with a dot
('.') to prevent it from opendir()'ing or even chdir()'ing. i see this
as an unfortunate and unnecessary burden on code whose assumptions have
been valid for a long time.


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-26 Thread Jerrad Pierce
>as an unfortunate and unnecessary burden on code whose assumptions have
>been valid for a long time.
But it's still an assumption, and we know what those mean.
More seriously though, is there an actual spec for MH declaring
what valid folder and filenames are?

What's the worst-case for those using older software with assumptions?
They see sub-folders, all ending in .mime, with no valid messages within them.
Annoying perhaps, but not fatal. It could actually be useful, because parts
that are valid messages could be linked to a valid filename for them to access.

Making MIME directories dotted is a work-around, but that's a bit of an
annoyance for things/users wishing to access to them, depending upon the
languages available to you e.g; having to be sure to exclude . and ..

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-26 Thread Paul Fox

paul wrote:
 > On 2012-06-26 11:45 AM, Paul Fox wrote:
 > > anyway:  i think i still prefer the idea that the content cache
 > > directories be kept in the message tree.  but i also understand why
 > > one might want them separate.  if the idea is that the message tree
 > > and the cache tree are roughly isomorphic, i'll bet that could be made
 > > a per-user choice, as long as the content directories were really
 > > named "53.mime/" and not simply "53/" -- i.e., the messages and the
 > > mime-dirs could either live in the same tree or not, since they use
 > > different parts of the namespace.  (but clients certainly would need
 > > to be careful not to assume one model or the other.)
 > 
 > lots of code (here i'm thinking of uw-imap) makes the assumption that if
 > there's a directory then it's a folder. such names need not be
 > all-numeric or semi-numeric. you'd have to preface the name with a dot
 > ('.') to prevent it from opendir()'ing or even chdir()'ing. i see this
 > as an unfortunate and unnecessary burden on code whose assumptions have
 > been valid for a long time.

ah, good point.  i never ever use nested folders, and so didn't even
consider that issue.  and i also wasn't considering non-nmh clients
of the tree.

paul
=-
 paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 64.6 degrees)

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-26 Thread Paul Vixie
On 2012-06-26 3:02 AM, Ken Hornstein wrote:
>> int m_getfld (int state, unsigned char *name, unsigned char *buf, int
>> bufsz, FILE *iob) 
> Okay ... just shooting from the hip, and based on our discussion back
> in January ... here's something (I'm ignoring how this would be
> implemented for now, and I'm not defining any of the structures). I
> hope these functions would be obvious in operation.

this is a good start, assuming that the places which currently use
m_getfld() could be mollified by it.

> int nmh_openmsg(struct message, messagehandle *, char **error); int
> nmh_getheader(messagehandle, const char *, char **header, int
> *numheaders, char **error); int nmh_getmime(messagehandle,
> mimehandle_ret *, char **error); int nmh_openmime(mimehandle, char
> **type, char **subtype, int *nested, mimehandle_ret *, char **error);
> int nmh_nextmime(mimehandle, char **type, char **subtype, int
> *iterator, char **error); int nmh_closemime(mimehandle); int
> nmh_closemsg(message); I'm sure there are problems with this, just
> wanted to get the ball rolling.

i'm ignoring stylistic quirks, for example, i'd return an "struct
message *" from the open function, and it would contain function
pointers to the "methods" of the "object".

i'm ignoring correctness concerns, like how do the objects inside "char
**x; int *y" get freed.

i'm ignoring naming concerns, whereby i think that "nmh_" is the wrong
prefix for these, since they could be used for any message that's in a
disk file, even if its repository was Maildir.

focusing just on the problem statement and solution shape:

a message has a header, zero or more child parts, and may have a body.

a part has a header, zero or more child parts, and may have a body.

therefore a message is really just a special case of a part, having no
parent object.

a header may specify a mime type, mime version, and/or encoding. as well
as subject:, et al.

if we want object recursion to be done by the caller and not by some
function that uses callbacks, we're in hell since most interesting mime
messages are deep.

we'd like to be able to parse in one pass, put all content (decoded) in
the file system not on the heap, and never have to remember more than
where are in terms of object depth. that is, our stack or heap would
only know what object we were looking at, and who its ancestors are. we
would not try to represent the full message in RAM or even the full
message structure in RAM.

---

typedef struct mime_part *mime_part_t;

mime_part_t mime_fopen(const char *filename, const char *filemode);
mime_part_t mime_fdopen(int fileno, int mode);
voidmime_rewind(mime_part_t);
boolmime_hasbody(mime_part_t);
size_t  mime_bodyread(mime_part_t, u_char *, size_t);   /* 0 means
EOF */
char *  mime_bodygets(mime_part_t, char *, size_t); /* NULL
means EOF */
boolmime_hasparts(mime_part_t);
mime_part_t mime_nextpart(mime_part_t);
voidmime_dispose(mime_part_t);

---

this assumes that every iterator will keep a linked list of ancestors
while tree walking -- something that used callbacks would be just as
difficult but in a different way. it makes no provision for writing MIME
objects, and does not show how one retrieves the content type or mime
version or any other header.

it's otherwise patterned after "MIME::Parser(3) -- User Contributed Perl
Documentation".

paul

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-26 Thread Jeffrey Honig
A few points on this discussion:

1) The person who promised to re-write the API was an Internet Elder.
 Google it.

2) Callbacks vs data structures

   One reason you might want to have callbacks is that the content might be
GPG or otherwise encrypted and you may want to prompt the user.  You could
of course put methods/callbacks in the data structure to handle this.

3) Expanding MIME messages into dirs

   a) Don't forget about encrypted content when using a cache,  you
probably don't want to cache it.
   b) If you use .msgnum.mime would most clients ignore the dirs (i.e.
.55.mime)?

Jeffrey C. Honig 
http://www.honig.net/jch
GnuPG ID:14E29E13 
___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-26 Thread Paul Vixie
On 2012-06-26 11:18 PM, Jeffrey Honig wrote:
> A few points on this discussion:
>
> 1) The person who promised to re-write the API was an Internet Elder.
>  Google it.

and after that... bite me.

> 2) Callbacks vs data structures
>
>One reason you might want to have callbacks is that the content
> might be GPG or otherwise encrypted and you may want to prompt the
> user.  You could of course put methods/callbacks in the data structure
> to handle this.

i think a part handler could read/write from a mime_part_t into a gnupg
pipe either way. we may want to offer a recursive iterator that does the
callback thing, for callers who prefer working that way. but such
callers would have to maintain their own ancestor-state to know which
leg of an alternative-multipart they were in, and so on. so it's not
obviously easier, just different.

> 3) Expanding MIME messages into dirs
>
>a) Don't forget about encrypted content when using a cache,  you
> probably don't want to cache it.

i agree that you certainly would not want to cache the cleartext. but
caching a second copy of the crypted text, where the part it was in got
copied to a file somewhere and all the base64 got decoded, is no big deal?

>b) If you use .msgnum.mime would most clients ignore the dirs (i.e.
> .55.mime)?

all the mh directory processors i've written (or in the case of uw imap,
that i've patches) ignore dirent's whose name begin with a dot, before
they bother to stat() it to see if it's a directory or not. i think we
could ignore those who don't. but i still prefer not to permanently
unpack mimeballs. the authoritative source of a message is what came in
over SMTP, with a received: header added. in fact i'd've been willing to
keep the \r\n line terminations, though that ship has already sailed.
anything that's non-canonical should be in a separate storage container,
such as nmh-cache.

paul



___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-26 Thread David Levine
Paul Fox wrote:

> why couldn't an indexer know the difference between the message file
> and the content cache?
> 
> anyway:  i think i still prefer the idea that the content cache
> directories be kept in the message tree.  but i also understand why
> one might want them separate.  if the idea is that the message tree
> and the cache tree are roughly isomorphic, i'll bet that could be made
> a per-user choice, as long as the content directories were really
> named "53.mime/" and not simply "53/" -- i.e., the messages and the
> mime-dirs could either live in the same tree or not, since they use
> different parts of the namespace.  (but clients certainly would need
> to be careful not to assume one model or the other.)

If we follow and enforce these rules:

1) Files in the message tree can only be named [1-9][0-9]*
   or `mhparam mh-sequences` (defaults to .mh_sequences).
   I think that's what an MH folder is.  The old
   documentation mentions "standard entries", but I can only
   find mh-sequences now.

2) Subfolders in the message tree cannot match the form
   specified in 1).  nmh doesn't currently enforce this now:
   some nmh programs (scan) complain about a subfolder named
   inbox/2000, but folder happily creates it (but should not).

   It's OK for a top-level message folder to be named
   [1-9][0-9]* (or even .mh_sequences, but I wouldn't recommend
   that).

3) Files and directories in the cache tree cannot match the
   form specified in 1).

Then you could do, e.g.,

  Path: Mail
  nmh-private-cache: Mail

to have them in the same directory.

David

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers