Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Paul Vixie
On 2012-06-26 3:19 AM, Jerrad Pierce wrote:
> Sorry for the premature reply.
>
> I see now that Paul did understand my idea.
> I can underatd that some might not want duplicate
> content, but that's what I proposed it be optional.
> A temporary cache does not allow for indexing.

i'm ok with that. disk space is cheap. the index can keep copies of the
content. the mh hook system can keep them in synch. unless you have
multiple terabytes of stored e-mail you'll never feel the cost of the
second copy.

> Keeping it in Mail means you have whichever
> decoded messages you want greppable/indexable;
> be it done to all on inc, or manually for a select
> few. Then, when you remove them message, the parts
> get automagically wiped out by rmm.

i don't see how to support indexing on a read-only mail store if we're
interleaving the files. while bboards may be long gone usenet is still
out there, and imap too.

paul

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Ken Hornstein
>at a high level, how do people feel about callbacks vs. state blobs?
>that is, would we like the replacement for m_getfld() to continue to
>return each time it finds something, maintaining its state in a
>caller-supplied opaque state blob, or would we like it to call the
>caller's "work function" every time it discovers a new object?
>that's the level we have to plan at, if we're going to get MH out of the
>1980's. (where it totally ruled, btw.)

So I've been inside all of that code a lot more than when we first had
that discussion in January.

In my experience, callers of m_getfld() want one of two things:

- They want everthing of a particular "thing".  They want all headers (to
  iterate over all of them) or the complete body (to search/display it).
  Example: show.
- They want ONE particular thing; they just have to look through the whole
  parts to do it.  Example: anything that uses mh-format; right now the
  current design of m_getfld() means you have to look over all of headers
  to get the ones you care about.  It would make things a lot cleaner if
  the API just let us pick out the one(s) we care about.

As for callbacks versus state blobs, I think callbacks are fine for
threaded or event-driven programming where you tend to do things
asychronously.  But since we're going to be pretty synchronous (I
think) I'd rather have state blobs.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Jerrad Pierce
Sorry for the premature reply.

I see now that Paul did understand my idea.
I can underatd that some might not want duplicate
content, but that's what I proposed it be optional.
A temporary cache does not allow for indexing.
Keeping it in Mail means you have whichever
decoded messages you want greppable/indexable;
be it done to all on inc, or manually for a select
few. Then, when you remove them message, the parts
get automagically wiped out by rmm.

refile & rmm are the only things that need to be
aware of this AFAICT. It could probably all be done
with scripts if there was a refileproc profile
component (presumaably passed source folder,
dest folder, msgnum[s]), although it may take
some effort to bend mhstore to these ends.

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Paul Vixie
On 2012-06-26 3:01 AM, Jerrad Pierce wrote:
> You seem to have misunderstood my proposql.
>
> Paul, Message 76 would still be what came over the wire,
> however something like mhstore could optionally make 76.*
> as the split out compoents
>
> Tet, nothing in what I wrote implied you couldn't have
> 76.1.1.4 grep's, not going to care.

i think i understood; i just don't want these other files in the MH
store. but read on for a friendly closure.

On 2012-06-26 3:05 AM, David Levine wrote:
> Paul Vixie wrote:
>
>> mhpart (or whatever) would need a -clean option to get rid
>> of the /var/tmp files it has made for you in this session.
>>
>> but i do not think we should pollute the Mail subdirectory
>> hierarchy with permanent copies of parts.
> nmh already has nmh-cache, how about putting parts there?
> They could go into a hierarchy that shadows the MH
> hierarchy, but with one root directory, e.g., 53.mime (or
> just 53) corresponding to each message.  That way scripts
> that troll the two hierarchies would look similar.  And
> a script that trolls the MH hierarchy would know where
> to look for the parts.
>
> The cache be populated on demand.  And cleaned up manually.
> Temporal locality of reference suggests not cleaning after
> each use, but rather periodically.

sounds right to me. (i did not know about nmh-cache until now.)

paul

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Paul Vixie

On 2012-06-26 2:50 AM, Jon Steinhart wrote:
> Paul Vixie writes:
>> ...
>>
>> int
>> m_getfld (int state, unsigned char *name, unsigned char *buf,
>>   int bufsz, FILE *iob)
>>
>> your move.
> OK, well, I understand your point of view here but I really don't think
> that my point of view is really different.  As far as I can tell (once
> I get past the dire warnings), the m_getfld looks for stuff in a mail
> message and stops once it gets what it needs.  It was designed in the
> age before MIME, so its notion of what constituted headers was limited.

not just that. its idea of what content is, is limited. its callers
generally expect fully decoded text, but it's perfectly capable of
returning quoted-printable or base64. the caller has currently got the
responsibility to know that it's encoded and to know how to decode it.
unsurprisingly, most parts of MH don't do this. so, to decode something
you use a special version of the command (mhshow vs. show, for example).
this is wrong, and it isn't working.

to revise the API we have to figure out what the callers need, yes, but
also what the callers should be forced to do differently. i think we're
going to have to start from an idealized environment and work backward
to the practical.

> Now, MIME did many things that maybe should have been kept separate in
> hindsight, but one of them was to extend the definition of headers.  So,
> I'm proposing that m_getfld be extended so that it finds these "extended"
> headers.  I'm not presently suggesting that it be extended to be able to
> decode the multiple body parts that MIME squeezes into the old definition
> of body.

i don't think you can have A and !A at the same time. either callers of
m_getfld() will continue to believe that there is just one set of
headers and that iteration through a message consists of repeated calls
to m_getfld(), or else (and this is what i think has to happen) these
callers are going to have to become MIME tolerant (note: this is not the
same as MIME aware) and that iteration consists of repeated calls to...
something... that gives it a header/body object, which might require
recursion back through itself if the object in question contains other
header/body objects rather than just a body.

at a high level, how do people feel about callbacks vs. state blobs?
that is, would we like the replacement for m_getfld() to continue to
return each time it finds something, maintaining its state in a
caller-supplied opaque state blob, or would we like it to call the
caller's "work function" every time it discovers a new object?

that's the level we have to plan at, if we're going to get MH out of the
1980's. (where it totally ruled, btw.)

> As I said in an email years ago, I'd be happy to be able to have scan
> optionally do something like this:
>
> 1695+ 06/26 Paul Vixie Re: [Nmh-workers] mime-aware filtering?< 1695.1 image/png name="foo"
> 1695.2 application/pdf

i agree with this vision.

> It would be nice to be able to decode the body parts to flesh out the part
> subject lines but even without that it would be a huge improvement.

i can't think of a non-fundamental but still on-the-right-track rewrite
that would give *only* the above.

> I realize that this could all be done by hacking a script around mhlist
> but that is really ugly.

yeah.

> Biggest internal structural change that I can think of is that we might
> want some array of fields indexed by part number, or a tree of fields.

nothing in MH currently requires the entire message, or a map of it, to
fit in memory. i think we should preserve that.

let's talk iteration and access of messages and parts and subparts.
we'll assume for now that folders are still just what they are and that
there's no need to change how we access or iterate through them. (though
there is such a need, i think we can disconnect it from this discussion
and proceed independently for now.)

paul

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread David Levine
Paul Vixie wrote:

> mhpart (or whatever) would need a -clean option to get rid
> of the /var/tmp files it has made for you in this session.
> 
> but i do not think we should pollute the Mail subdirectory
> hierarchy with permanent copies of parts.

nmh already has nmh-cache, how about putting parts there?
They could go into a hierarchy that shadows the MH
hierarchy, but with one root directory, e.g., 53.mime (or
just 53) corresponding to each message.  That way scripts
that troll the two hierarchies would look similar.  And
a script that trolls the MH hierarchy would know where
to look for the parts.

The cache be populated on demand.  And cleaned up manually.
Temporal locality of reference suggests not cleaning after
each use, but rather periodically.

David

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Ken Hornstein
>m_getfld() is the heart of MH.

Truer words have never been spoken; it's used by so much (including the
profile parser).

>so when i say "let's talk about what m_getfld should look like" i really
>mean "let's talk about what MH's storage and access model should be."
>
>int
>m_getfld (int state, unsigned char *name, unsigned char *buf,
>  int bufsz, FILE *iob)

Okay ... just shooting from the hip, and based on our discussion back in
January ... here's something (I'm ignoring how this would be implemented
for now, and I'm not defining any of the structures).  I hope these
functions would be obvious in operation.

int nmh_openmsg(struct message, messagehandle *, char **error);

int nmh_getheader(messagehandle, const char *, char **header, int *numheaders,
  char **error);

int nmh_getmime(messagehandle, mimehandle_ret *, char **error);

int nmh_openmime(mimehandle, char **type, char **subtype,
 int *nested, mimehandle_ret *, char **error);

int nmh_nextmime(mimehandle, char **type, char **subtype, int *iterator,
 char **error);

int nmh_closemime(mimehandle);

int nmh_closemsg(message);

I'm sure there are problems with this, just wanted to get the ball rolling.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Jerrad Pierce
You seem to have misunderstood my proposql.

Paul, Message 76 would still be what came over the wire,
however something like mhstore could optionally make 76.*
as the split out compoents

Tet, nothing in what I wrote implied you couldn't have
76.1.1.4 grep's, not going to care.

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Ken Hornstein
>hindsight, but one of them was to extend the definition of headers.  So,
>I'm proposing that m_getfld be extended so that it finds these "extended"
>headers.

I think Paul made his convincing case here that m_getfld() needs to die:

http://lists.gnu.org/archive/html/nmh-workers/2012-01/msg00248.html

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Jon Steinhart
Paul Vixie writes:
> 
> On 2012-06-26 2:28 AM, Jon Steinhart wrote:
> > Paul Vixie writes:
> >
> >> let's start talking about what it should look like?
> > Well, for starters, it shouldn't include any threatening commmentary!
> > Big thing that I think that it needs other than cleanup is the ability
> > to scan for attachment part headers instead of stopping at the end of
> > regular headers.
> 
> well that didn't take long. :-).
> 
> m_getfld() is the heart of MH. everything about the storage and access
> model is contained in it, either by its signature or its logic. i'm
> opposed to grafting MIME onto it with a couple more arguments that if
> non-NULL will trigger additional behaviour.
> 
> so when i say "let's talk about what m_getfld should look like" i really
> mean "let's talk about what MH's storage and access model should be."
> 
> int
> m_getfld (int state, unsigned char *name, unsigned char *buf,
>   int bufsz, FILE *iob)
> 
> your move.
> 
> paul

OK, well, I understand your point of view here but I really don't think
that my point of view is really different.  As far as I can tell (once
I get past the dire warnings), the m_getfld looks for stuff in a mail
message and stops once it gets what it needs.  It was designed in the
age before MIME, so its notion of what constituted headers was limited.
Now, MIME did many things that maybe should have been kept separate in
hindsight, but one of them was to extend the definition of headers.  So,
I'm proposing that m_getfld be extended so that it finds these "extended"
headers.  I'm not presently suggesting that it be extended to be able to
decode the multiple body parts that MIME squeezes into the old definition
of body.

As I said in an email years ago, I'd be happy to be able to have scan
optionally do something like this:

1695+ 06/26 Paul Vixie Re: [Nmh-workers] mime-aware filtering?

Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Paul Vixie

On 2012-06-26 2:28 AM, Jon Steinhart wrote:
> Paul Vixie writes:
>
>> let's start talking about what it should look like?
> Well, for starters, it shouldn't include any threatening commmentary!
> Big thing that I think that it needs other than cleanup is the ability
> to scan for attachment part headers instead of stopping at the end of
> regular headers.

well that didn't take long. :-).

m_getfld() is the heart of MH. everything about the storage and access
model is contained in it, either by its signature or its logic. i'm
opposed to grafting MIME onto it with a couple more arguments that if
non-NULL will trigger additional behaviour.

so when i say "let's talk about what m_getfld should look like" i really
mean "let's talk about what MH's storage and access model should be."

int
m_getfld (int state, unsigned char *name, unsigned char *buf,
  int bufsz, FILE *iob)

your move.

paul

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Jon Steinhart
Paul Vixie writes:
> On 2012-06-25 11:56 PM, Ken Hornstein wrote:
> > I also note that thread included someone (who shall remain nameless)
> > offering to design a new API to replace m_getfld() :-)
> 
> let's start talking about what it should look like?

Well, for starters, it shouldn't include any threatening commmentary!
Big thing that I think that it needs other than cleanup is the ability
to scan for attachment part headers instead of stopping at the end of
regular headers.

Jon

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Paul Vixie
On 2012-06-25 11:56 PM, Ken Hornstein wrote:
> I also note that thread included someone (who shall remain nameless)
> offering to design a new API to replace m_getfld() :-)

let's start talking about what it should look like?


___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Ken Hornstein
>>> You know ... given that & Norm's comments, that actually might work.
>>> Thoughts? 
>
>i'm opposed. what should be in the file system is what SMTP received and
>handed to /var/mail or whatever.

My personal thoughts in terms of implementation was that "53" would be
the original message.  "53.mime" would contain the decoded MIME parts.

But it's important to note that while I'm not AGAINST this, I'm not
going to work on it myself.  AFAICT it only helps out people who want
to use Unix tools on MIME messages; it doesn't help nmh to be more MIME
aware.  I'm not against that, but I would personally rather focus my
own energy and time on better native MIME support.

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Ken Hornstein
>My only thought is that MIME is more than just the linear list of
>attachments that many seem to believe, and we need to come up with
>a naming convention capable of representing that. And even then,
>deciding what to store as content for a given part isn't necessarily
>straightforward. For example, if you have a multipart/alternative
>part, how do you represent that in the filesystem? We've briefly
>touched on some of this before:

I think that it's solvable; seems like the multipart "container" objects
wouldn't be represented in the filesystem.

>   http://lists.nongnu.org/archive/html/nmh-workers/2012-02/msg00088.html

I also note that thread included someone (who shall remain nameless) offering
to design a new API to replace m_getfld() :-)

--Ken

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Jon Steinhart
Paul Vixie writes:
> On 6/25/2012 10:43 PM, Tethys wrote:
> > Ken Hornstein writes:
> >
> >>> A possible way to solve the access to MIME parts problem
> >>> might be to store the parts as messageNumber.partNumber*
> >>> Creation of these parts would be optional, and eat space,
> >>> but it would make indexing/grepping easy.
> >> You know ... given that & Norm's comments, that actually might work.
> >> Thoughts? 
> 
> i'm opposed. what should be in the file system is what SMTP received and
> handed to /var/mail or whatever.
> 
> > My only thought is that MIME is more than just the linear list of
> > attachments that many seem to believe, and we need to come up with
> > a naming convention capable of representing that. And even then,
> > deciding what to store as content for a given part isn't necessarily
> > straightforward. For example, if you have a multipart/alternative
> > part, how do you represent that in the filesystem? We've briefly
> > touched on some of this before:
> >
> > http://lists.nongnu.org/archive/html/nmh-workers/2012-02/msg00088.html
> >
> > But whatever we do, it needs careful thought to cover the edge cases
> > that are increasingly becoming the common case in mail I'm being sent.
> 
> thus my proposal which is to provide shell level commands that can
> expose the message structure (as "msg.part{.subpart ...}") and something
> like mhpath that will make you a /var/tmp file from the specified
> part/subpart without any encoding, and then update the rest of the
> command set to be able to accept a msg.part{.subpart ...} specifier
> wherever it makes sense. as in, rmm would not make sense, but show would
> make sense.
> 
> mhparts as the structure-exposer and mhpart as the tmp-file-maker would
> be fine. or someone else will have a better idea. mhpart (or whatever)
> would need a -clean option to get rid of the /var/tmp files it has made
> for you in this session.
> 
> but i do not think we should pollute the Mail subdirectory hierarchy
> with permanent copies of parts.
> 
> paul

This goes back to the project that I want to do when time permits and
someone makes m_scan into something that I can hack without fear of
breaking old VAX stuff.

I'd like a "show parts" option to scan, and the ability to do show msg.part,
next -part, and prev -part.  In other words, the option to work on messages
as a whole or on the parts individually.

I don't support polluting the Mail subdirectory.  I use a separate program
for indexing which I'm presently slowly rewriting as real work calls but
will re-release when done.

Jon

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Paul Vixie
On 6/25/2012 10:43 PM, Tethys wrote:
> Ken Hornstein writes:
>
>>> A possible way to solve the access to MIME parts problem
>>> might be to store the parts as messageNumber.partNumber*
>>> Creation of these parts would be optional, and eat space,
>>> but it would make indexing/grepping easy.
>> You know ... given that & Norm's comments, that actually might work.
>> Thoughts? 

i'm opposed. what should be in the file system is what SMTP received and
handed to /var/mail or whatever.

> My only thought is that MIME is more than just the linear list of
> attachments that many seem to believe, and we need to come up with
> a naming convention capable of representing that. And even then,
> deciding what to store as content for a given part isn't necessarily
> straightforward. For example, if you have a multipart/alternative
> part, how do you represent that in the filesystem? We've briefly
> touched on some of this before:
>
>   http://lists.nongnu.org/archive/html/nmh-workers/2012-02/msg00088.html
>
> But whatever we do, it needs careful thought to cover the edge cases
> that are increasingly becoming the common case in mail I'm being sent.

thus my proposal which is to provide shell level commands that can
expose the message structure (as "msg.part{.subpart ...}") and something
like mhpath that will make you a /var/tmp file from the specified
part/subpart without any encoding, and then update the rest of the
command set to be able to accept a msg.part{.subpart ...} specifier
wherever it makes sense. as in, rmm would not make sense, but show would
make sense.

mhparts as the structure-exposer and mhpart as the tmp-file-maker would
be fine. or someone else will have a better idea. mhpart (or whatever)
would need a -clean option to get rid of the /var/tmp files it has made
for you in this session.

but i do not think we should pollute the Mail subdirectory hierarchy
with permanent copies of parts.

paul

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Tethys

Ken Hornstein writes:

>>A possible way to solve the access to MIME parts problem
>>might be to store the parts as messageNumber.partNumber*
>>Creation of these parts would be optional, and eat space,
>>but it would make indexing/grepping easy.
>
>You know ... given that & Norm's comments, that actually might work.
>Thoughts? 

My only thought is that MIME is more than just the linear list of
attachments that many seem to believe, and we need to come up with
a naming convention capable of representing that. And even then,
deciding what to store as content for a given part isn't necessarily
straightforward. For example, if you have a multipart/alternative
part, how do you represent that in the filesystem? We've briefly
touched on some of this before:

http://lists.nongnu.org/archive/html/nmh-workers/2012-02/msg00088.html

But whatever we do, it needs careful thought to cover the edge cases
that are increasingly becoming the common case in mail I'm being sent.

Tet

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers


Re: [Nmh-workers] mime-aware filtering?

2012-06-25 Thread Yoshi Rokuko
--- Ralph Corderoy on Sun, 24 Jun 2012 14:13:45 +0100 ---
> Hi,
> 
> Paul Fox wrote:
> > i'm not convinced that introducing a directory level might not be a
> > good idea:  i.e., a message might have the message file itself ("53")
> > and a directory which mh would currently ignore ("53.mime").  the
> > directory could then contain lots of stuff that would clutter the
> > upper-level MH folder otherwise.
> 
> I'd also prefer a directory for the email's contents.
> 
> Plan 9's upasfs(4) shows the kind of thing that can be done.
> http://plan9.bell-labs.com/magic/man2html/4/upasfs

+1

___
Nmh-workers mailing list
Nmh-workers@nongnu.org
https://lists.nongnu.org/mailman/listinfo/nmh-workers