[PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-16 Thread Tomi Ollila
On Mon, 16 Jan 2012 08:49:03 +, David Edmondson  wrote:
> On Sun, 15 Jan 2012 12:58:40 -0500, Austin Clements  
> wrote:
> 
> This, I suspect, brings us back to what may have been Dmitry's original
> concern. If we codify the current behaviour then we're actually
> _forcing_ clients to use inline content if it's present, because
> otherwise they have no way to discover the charset/encoding used for the
> raw part.
> 
> That seems as though it could be a problem for some clients.
> 
> > OTOH, I don't understand the encoding story for HTML, since the encoding
> > can come from either a header or from the body of the HTML.  Does this
> > make it strictly necessary for the client to handle the encoding?

Either from a header or from the body of the *HTTP*. Html meta tag is part
of the header of the HTML document, body of the HTTP message.

> If it can be specified by the content of a part rather than the part
> headers, then I think that the client will have to be prepared to handle
> it.
> 
> Even if not, it might still be more effective to choose that approach -
> it would remove the need to add arbitrary encoding support to the CLI
> application.

In case of w3m interface if charset is not told it defaults
to iso-8859-1 as both input and output encoding.

That is no problem in w3m -- it just doesn't do any conversion.

But emacs thinks that it gets input in iso-8859-1 format and will
attempt to convert that to whatever charset the window user has
is using. 

If input was utf8 but emacs thinks input was latin1 then we get
this infamous 'double-utf8'ied output (a subset of wtf-8 charset ;)
(in case window charset is utf8)

In case of w3m the content feed to w3m could be pre-encoded to utf-8
and w3m interface is told that charset is utf-8 -- w3m will now
called using utf-8 as input and output encoding -- and at the end
emacs does conversion from utf-8 to the window encoding (if needed).

As mentioned in IRC: 2012-01-10 11:46 (UTC)  xxxXX  indeed, the headers
should take precedence to meta tag, as defined in HTML4

So, complying clients takes the precedence from command line options
(which, in case of command line clients makes perfect sense).

Tomi


The overloading of show (was Re: [PATCH] Output unmodified Content-Type header value for JSON format.)

2012-01-16 Thread David Edmondson
On Sat, 14 Jan 2012 19:36:17 -0500, Austin Clements  wrote:
> ...there are several levels of structure here:
> 
> 1. Threads (query results)
> 2. Thread structure
> 3. Message structure (MIME)
> 4. Part content
> 
> Currently, search returns 1; show --format=json returns 2, 3, and
> sometimes 4 (but sometimes not); and show --format=raw returns 4.
> Notably, 1 does not require opening message files (neither does 2),
> which I consider an important distinction between search and show.
> 
> Some of the discussion has been about putting 4 squarely in the realm
> of show --format=raw.  One counterargument (which has grown on me
> since this discussion) is that the part content included in
> --format=json can be thought of as pre-fetching content that clients
> are likely to need in order to avoid re-parsing the message in most
> circumstances.  I believe this is not the *intent* of the current
> code, though without a specification of the JSON format it's hard to
> tell.

The JSON output included what was considered useful (mostly for the
Emacs UI), but how much deep thought went into 'useful' I couldn't say.

> Other discussion (more interesting, in my mind) has been about
> separating retrieving thread structure, 2, from retrieving message
> structure, 3.  To me, splitting these feels much more natural than
> what we do now, which seems to be inflexibly bound to the specific way
> the Emacs show mode currently works.  The thread structure is readily
> available from the database, so I think separating these would open up
> some new UI opportunities, particularly easy and fast thread outlining
> and navigation.

Given that the current output already includes both 2 and 3, anything
that could be done with 2 can be done with the current output, so
there's no block on the kind of innovation that you describe other than
possibly some performance problems.

notmuch-lkml.el[1] was a quick prototype of an alternative way to find
messages to read based on suggestions from Aneesh. It could have used
the proposed 'thread structure only' output.

The changes you allude to make sense. My only concern would be any
potential impact on the current Emacs UI's use of JSON output. Switching
to a model where a typical 'notmuch-show' buffer requires many calls to
notmuch (and commensurate JSON parsing) may perform significantly worse
than the current approach.

> I believe it would also simplify the code and address some irritating
> asymmetries in the way notmuch show handles the --part argument.

Footnotes: 
[1]  http://dme.org/data/emacs/notmuch-lkml.el

-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



[PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-16 Thread David Edmondson
On Sun, 15 Jan 2012 12:58:40 -0500, Austin Clements  wrote:
> Yes.  I was mostly reiterating the IRC discussion for Pieter.  Since
> this discussion, I've stabilized on the pre-fetching notion I described
> in id:"20120115003617.GH1801 at mit.edu",

Will read when I get there.

> though I do think we should make this clear in the code: that the rule
> for whether the JSON includes a "content" key for a leaf part is
> internal to the CLI and that consumers should be prepared to use it if
> it's there and to retrieve the content separately if it's not.  This
> is exactly how the Emacs code happens to work, it just hasn't been
> codified anywhere.

It's a bit more than 'happens to work' :-)

> Looking at it this way gives us more flexibility than the current code
> takes advantage of; for example we could omit content from the JSON if
> it's over some size threshold since the cost of sending that to a
> client that doesn't need it is high while the cost of having the
> client retrieve it for itself is relatively low.

This, I suspect, brings us back to what may have been Dmitry's original
concern. If we codify the current behaviour then we're actually
_forcing_ clients to use inline content if it's present, because
otherwise they have no way to discover the charset/encoding used for the
raw part.

That seems as though it could be a problem for some clients.

> OTOH, I don't understand the encoding story for HTML, since the encoding
> can come from either a header or from the body of the HTML.  Does this
> make it strictly necessary for the client to handle the encoding?

If it can be specified by the content of a part rather than the part
headers, then I think that the client will have to be prepared to handle
it.

Even if not, it might still be more effective to choose that approach -
it would remove the need to add arbitrary encoding support to the CLI
application.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-16 Thread David Edmondson
On Sun, 15 Jan 2012 12:58:40 -0500, Austin Clements amdra...@mit.edu wrote:
 Yes.  I was mostly reiterating the IRC discussion for Pieter.  Since
 this discussion, I've stabilized on the pre-fetching notion I described
 in id:20120115003617.gh1...@mit.edu,

Will read when I get there.

 though I do think we should make this clear in the code: that the rule
 for whether the JSON includes a content key for a leaf part is
 internal to the CLI and that consumers should be prepared to use it if
 it's there and to retrieve the content separately if it's not.  This
 is exactly how the Emacs code happens to work, it just hasn't been
 codified anywhere.

It's a bit more than 'happens to work' :-)

 Looking at it this way gives us more flexibility than the current code
 takes advantage of; for example we could omit content from the JSON if
 it's over some size threshold since the cost of sending that to a
 client that doesn't need it is high while the cost of having the
 client retrieve it for itself is relatively low.

This, I suspect, brings us back to what may have been Dmitry's original
concern. If we codify the current behaviour then we're actually
_forcing_ clients to use inline content if it's present, because
otherwise they have no way to discover the charset/encoding used for the
raw part.

That seems as though it could be a problem for some clients.

 OTOH, I don't understand the encoding story for HTML, since the encoding
 can come from either a header or from the body of the HTML.  Does this
 make it strictly necessary for the client to handle the encoding?

If it can be specified by the content of a part rather than the part
headers, then I think that the client will have to be prepared to handle
it.

Even if not, it might still be more effective to choose that approach -
it would remove the need to add arbitrary encoding support to the CLI
application.


pgp90cJG8s8tW.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-16 Thread Tomi Ollila
On Mon, 16 Jan 2012 08:49:03 +, David Edmondson d...@dme.org wrote:
 On Sun, 15 Jan 2012 12:58:40 -0500, Austin Clements amdra...@mit.edu wrote:
 
 This, I suspect, brings us back to what may have been Dmitry's original
 concern. If we codify the current behaviour then we're actually
 _forcing_ clients to use inline content if it's present, because
 otherwise they have no way to discover the charset/encoding used for the
 raw part.
 
 That seems as though it could be a problem for some clients.
 
  OTOH, I don't understand the encoding story for HTML, since the encoding
  can come from either a header or from the body of the HTML.  Does this
  make it strictly necessary for the client to handle the encoding?

Either from a header or from the body of the *HTTP*. Html meta tag is part
of the header of the HTML document, body of the HTTP message.

 If it can be specified by the content of a part rather than the part
 headers, then I think that the client will have to be prepared to handle
 it.
 
 Even if not, it might still be more effective to choose that approach -
 it would remove the need to add arbitrary encoding support to the CLI
 application.

In case of w3m interface if charset is not told it defaults
to iso-8859-1 as both input and output encoding.

That is no problem in w3m -- it just doesn't do any conversion.

But emacs thinks that it gets input in iso-8859-1 format and will
attempt to convert that to whatever charset the window user has
is using. 

If input was utf8 but emacs thinks input was latin1 then we get
this infamous 'double-utf8'ied output (a subset of wtf-8 charset ;)
(in case window charset is utf8)

In case of w3m the content feed to w3m could be pre-encoded to utf-8
and w3m interface is told that charset is utf-8 -- w3m will now
called using utf-8 as input and output encoding -- and at the end
emacs does conversion from utf-8 to the window encoding (if needed).

As mentioned in IRC: 2012-01-10 11:46 (UTC)  xxxXX  indeed, the headers
should take precedence to meta tag, as defined in HTML4

So, complying clients takes the precedence from command line options
(which, in case of command line clients makes perfect sense).

Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: The overloading of show (was Re: [PATCH] Output unmodified Content-Type header value for JSON format.)

2012-01-16 Thread David Edmondson
On Sat, 14 Jan 2012 19:36:17 -0500, Austin Clements amdra...@mit.edu wrote:
 ...there are several levels of structure here:
 
 1. Threads (query results)
 2. Thread structure
 3. Message structure (MIME)
 4. Part content
 
 Currently, search returns 1; show --format=json returns 2, 3, and
 sometimes 4 (but sometimes not); and show --format=raw returns 4.
 Notably, 1 does not require opening message files (neither does 2),
 which I consider an important distinction between search and show.
 
 Some of the discussion has been about putting 4 squarely in the realm
 of show --format=raw.  One counterargument (which has grown on me
 since this discussion) is that the part content included in
 --format=json can be thought of as pre-fetching content that clients
 are likely to need in order to avoid re-parsing the message in most
 circumstances.  I believe this is not the *intent* of the current
 code, though without a specification of the JSON format it's hard to
 tell.

The JSON output included what was considered useful (mostly for the
Emacs UI), but how much deep thought went into 'useful' I couldn't say.

 Other discussion (more interesting, in my mind) has been about
 separating retrieving thread structure, 2, from retrieving message
 structure, 3.  To me, splitting these feels much more natural than
 what we do now, which seems to be inflexibly bound to the specific way
 the Emacs show mode currently works.  The thread structure is readily
 available from the database, so I think separating these would open up
 some new UI opportunities, particularly easy and fast thread outlining
 and navigation.

Given that the current output already includes both 2 and 3, anything
that could be done with 2 can be done with the current output, so
there's no block on the kind of innovation that you describe other than
possibly some performance problems.

notmuch-lkml.el[1] was a quick prototype of an alternative way to find
messages to read based on suggestions from Aneesh. It could have used
the proposed 'thread structure only' output.

The changes you allude to make sense. My only concern would be any
potential impact on the current Emacs UI's use of JSON output. Switching
to a model where a typical 'notmuch-show' buffer requires many calls to
notmuch (and commensurate JSON parsing) may perform significantly worse
than the current approach.

 I believe it would also simplify the code and address some irritating
 asymmetries in the way notmuch show handles the --part argument.

Footnotes: 
[1]  http://dme.org/data/emacs/notmuch-lkml.el



pgprCxbFv74AV.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-15 Thread Austin Clements
On Sun, 15 Jan 2012 11:52:40 +, David Edmondson  wrote:
> > Technically the IRC discussion was about not including *any* part
> > content in the JSON output, and always using show --format=raw or
> > similar to retrieve desired parts.  Currently, notmuch includes part
> > content in the JSON only for text/*, *except* when it's text/html.  I
> > assume non-text parts are omitted because binary data is hard to
> > represent in JSON and text/html is omitted because some people don't
> > need it.  However, this leads to some peculiar asymmetry in the Emacs
> > code where sometimes it pulls part content out of the JSON and
> > sometimes it retrieves it using show --format=raw.  This in turn leads
> > to asymmetry in content encoding handling, since notmuch handles
> > content encoding for parts included in the JSON (and there's no good
> > way around that since JSON is Unicode), but not for parts retrieved as
> > raw.
> 
> Including the text output in the JSON results in significantly fewer
> calls to 'notmuch' during the building of a typical `notmuch-show-mode'
> buffer. Someone with one of those older, crankier computers could easily
> test how much effect this has by changing
> `notmuch-show-get-bodypart-content' slightly.

Yes.  I was mostly reiterating the IRC discussion for Pieter.  Since
this discussion, I've stabilized on the pre-fetching notion I described
in id:"20120115003617.GH1801 at mit.edu", though I do think we should make
this clear in the code: that the rule for whether the JSON includes a
"content" key for a leaf part is internal to the CLI and that consumers
should be prepared to use it if it's there and to retrieve the content
separately if it's not.  This is exactly how the Emacs code happens to
work, it just hasn't been codified anywhere.  Looking at it this way
gives us more flexibility than the current code takes advantage of; for
example we could omit content from the JSON if it's over some size
threshold since the cost of sending that to a client that doesn't need
it is high while the cost of having the client retrieve it for itself is
relatively low.

> > The idea discussed on IRC was to remove all part content from the JSON
> > output and to always use show to retrieve it, possibly beefing up
> > show's support for content decoding (and possibly introducing a way to
> > retrieve multiple raw parts at once to avoid re-parsing).  This would
> > get the JSON format out of the business of guessing what consumers
> > need, simplify the Emacs code, and normalize content encoding
> > handling.
> 
> Is there a real problem being solved here? Having a clean structure is
> nice, except when it's not.

The "real" problem is the asymmetry in encoding handling that started
this discussion.  Content included in the JSON is re-encoded by the CLI,
while content retrieved via raw needs to be re-encoded by the client.

OTOH, I don't understand the encoding story for HTML, since the encoding
can come from either a header or from the body of the HTML.  Does this
make it strictly necessary for the client to handle the encoding?


[PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-15 Thread David Edmondson
> Technically the IRC discussion was about not including *any* part
> content in the JSON output, and always using show --format=raw or
> similar to retrieve desired parts.  Currently, notmuch includes part
> content in the JSON only for text/*, *except* when it's text/html.  I
> assume non-text parts are omitted because binary data is hard to
> represent in JSON and text/html is omitted because some people don't
> need it.  However, this leads to some peculiar asymmetry in the Emacs
> code where sometimes it pulls part content out of the JSON and
> sometimes it retrieves it using show --format=raw.  This in turn leads
> to asymmetry in content encoding handling, since notmuch handles
> content encoding for parts included in the JSON (and there's no good
> way around that since JSON is Unicode), but not for parts retrieved as
> raw.

Including the text output in the JSON results in significantly fewer
calls to 'notmuch' during the building of a typical `notmuch-show-mode'
buffer. Someone with one of those older, crankier computers could easily
test how much effect this has by changing
`notmuch-show-get-bodypart-content' slightly.

> The idea discussed on IRC was to remove all part content from the JSON
> output and to always use show to retrieve it, possibly beefing up
> show's support for content decoding (and possibly introducing a way to
> retrieve multiple raw parts at once to avoid re-parsing).  This would
> get the JSON format out of the business of guessing what consumers
> need, simplify the Emacs code, and normalize content encoding
> handling.

Is there a real problem being solved here? Having a clean structure is
nice, except when it's not.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-15 Thread David Edmondson
 Technically the IRC discussion was about not including *any* part
 content in the JSON output, and always using show --format=raw or
 similar to retrieve desired parts.  Currently, notmuch includes part
 content in the JSON only for text/*, *except* when it's text/html.  I
 assume non-text parts are omitted because binary data is hard to
 represent in JSON and text/html is omitted because some people don't
 need it.  However, this leads to some peculiar asymmetry in the Emacs
 code where sometimes it pulls part content out of the JSON and
 sometimes it retrieves it using show --format=raw.  This in turn leads
 to asymmetry in content encoding handling, since notmuch handles
 content encoding for parts included in the JSON (and there's no good
 way around that since JSON is Unicode), but not for parts retrieved as
 raw.

Including the text output in the JSON results in significantly fewer
calls to 'notmuch' during the building of a typical `notmuch-show-mode'
buffer. Someone with one of those older, crankier computers could easily
test how much effect this has by changing
`notmuch-show-get-bodypart-content' slightly.

 The idea discussed on IRC was to remove all part content from the JSON
 output and to always use show to retrieve it, possibly beefing up
 show's support for content decoding (and possibly introducing a way to
 retrieve multiple raw parts at once to avoid re-parsing).  This would
 get the JSON format out of the business of guessing what consumers
 need, simplify the Emacs code, and normalize content encoding
 handling.

Is there a real problem being solved here? Having a clean structure is
nice, except when it's not.


pgpQcdKKYaD13.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-15 Thread Austin Clements
On Sun, 15 Jan 2012 11:52:40 +, David Edmondson d...@dme.org wrote:
  Technically the IRC discussion was about not including *any* part
  content in the JSON output, and always using show --format=raw or
  similar to retrieve desired parts.  Currently, notmuch includes part
  content in the JSON only for text/*, *except* when it's text/html.  I
  assume non-text parts are omitted because binary data is hard to
  represent in JSON and text/html is omitted because some people don't
  need it.  However, this leads to some peculiar asymmetry in the Emacs
  code where sometimes it pulls part content out of the JSON and
  sometimes it retrieves it using show --format=raw.  This in turn leads
  to asymmetry in content encoding handling, since notmuch handles
  content encoding for parts included in the JSON (and there's no good
  way around that since JSON is Unicode), but not for parts retrieved as
  raw.
 
 Including the text output in the JSON results in significantly fewer
 calls to 'notmuch' during the building of a typical `notmuch-show-mode'
 buffer. Someone with one of those older, crankier computers could easily
 test how much effect this has by changing
 `notmuch-show-get-bodypart-content' slightly.

Yes.  I was mostly reiterating the IRC discussion for Pieter.  Since
this discussion, I've stabilized on the pre-fetching notion I described
in id:20120115003617.gh1...@mit.edu, though I do think we should make
this clear in the code: that the rule for whether the JSON includes a
content key for a leaf part is internal to the CLI and that consumers
should be prepared to use it if it's there and to retrieve the content
separately if it's not.  This is exactly how the Emacs code happens to
work, it just hasn't been codified anywhere.  Looking at it this way
gives us more flexibility than the current code takes advantage of; for
example we could omit content from the JSON if it's over some size
threshold since the cost of sending that to a client that doesn't need
it is high while the cost of having the client retrieve it for itself is
relatively low.

  The idea discussed on IRC was to remove all part content from the JSON
  output and to always use show to retrieve it, possibly beefing up
  show's support for content decoding (and possibly introducing a way to
  retrieve multiple raw parts at once to avoid re-parsing).  This would
  get the JSON format out of the business of guessing what consumers
  need, simplify the Emacs code, and normalize content encoding
  handling.
 
 Is there a real problem being solved here? Having a clean structure is
 nice, except when it's not.

The real problem is the asymmetry in encoding handling that started
this discussion.  Content included in the JSON is re-encoded by the CLI,
while content retrieved via raw needs to be re-encoded by the client.

OTOH, I don't understand the encoding story for HTML, since the encoding
can come from either a header or from the body of the HTML.  Does this
make it strictly necessary for the client to handle the encoding?
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


The overloading of show (was Re: [PATCH] Output unmodified Content-Type header value for JSON format.)

2012-01-14 Thread Austin Clements
(was in reply to id:87ehv2proa.fsf at praet.org, but I wanted to start a
new top-level thread)

Quoth Pieter Praet on Jan 14 at 10:19 am:
> On Thu, 12 Jan 2012 12:28:40 -0500, Austin Clements  
> wrote:
> > Quoth Pieter Praet on Jan 12 at  6:07 pm:
> > > On Tue, 22 Nov 2011 22:40:21 -0500, Austin Clements  
> > > wrote:
> > > > Quoth Jameson Graef Rollins on Nov 20 at 12:10 pm:
> > > > > The open question seems to be how we handle the content encoding
> > > > > parameters.  My argument is that those should either be used by 
> > > > > notmuch
> > > > > to properly encode the content for the consumer.  If that's not
> > > > > possible, then just those parameters needed by the consumer to decode
> > > > > the content should be output.
> > > > 
> > > > If notmuch is going to include part content in the JSON output (which
> > > > perhaps it shouldn't, as per recent IRC discussions), then it must
> > > > handle content encodings because JSON must be Unicode and therefore
> > > > the content strings in the JSON must be Unicode.
> > > 
> > > Having missed the IRC discussions: what is the rationale for not
> > > including (specific types of?) part content in the JSON output ?
> > > Eg. how about inline attached text/x-patch ?
> > 
> > Technically the IRC discussion was about not including *any* part
> > content in the JSON output, and always using show --format=raw or
> > similar to retrieve desired parts.  Currently, notmuch includes part
> > content in the JSON only for text/*, *except* when it's text/html.  I
> > assume non-text parts are omitted because binary data is hard to
> > represent in JSON and text/html is omitted because some people don't
> > need it.  However, this leads to some peculiar asymmetry in the Emacs
> > code where sometimes it pulls part content out of the JSON and
> > sometimes it retrieves it using show --format=raw.  This in turn leads
> > to asymmetry in content encoding handling, since notmuch handles
> > content encoding for parts included in the JSON (and there's no good
> > way around that since JSON is Unicode), but not for parts retrieved as
> > raw.
> > 
> > The idea discussed on IRC was to remove all part content from the JSON
> > output and to always use show to retrieve it, possibly beefing up
> > show's support for content decoding (and possibly introducing a way to
> > retrieve multiple raw parts at once to avoid re-parsing).  This would
> > get the JSON format out of the business of guessing what consumers
> > need, simplify the Emacs code, and normalize content encoding
> > handling.
> 
> Full ACK.
> 
> One concern though (IIUC): Due to the prevalence of retarded MUA's, not
> outputting 'text/plain' and/or 'text/html' parts is unfortunately all
> too often equivalent to not outputting anything at all, so wouldn't we,
> in essence, be reducing `show --format=json' to an ever-so-slightly
> augmented `search --format=json' ?

I'm not sure I fully understand what you're saying, but there are
several levels of structure here:

1. Threads (query results)
2. Thread structure
3. Message structure (MIME)
4. Part content

Currently, search returns 1; show --format=json returns 2, 3, and
sometimes 4 (but sometimes not); and show --format=raw returns 4.
Notably, 1 does not require opening message files (neither does 2),
which I consider an important distinction between search and show.

Some of the discussion has been about putting 4 squarely in the realm
of show --format=raw.  One counterargument (which has grown on me
since this discussion) is that the part content included in
--format=json can be thought of as pre-fetching content that clients
are likely to need in order to avoid re-parsing the message in most
circumstances.  I believe this is not the *intent* of the current
code, though without a specification of the JSON format it's hard to
tell.

Other discussion (more interesting, in my mind) has been about
separating retrieving thread structure, 2, from retrieving message
structure, 3.  To me, splitting these feels much more natural than
what we do now, which seems to be inflexibly bound to the specific way
the Emacs show mode currently works.  The thread structure is readily
available from the database, so I think separating these would open up
some new UI opportunities, particularly easy and fast thread outlining
and navigation.  I believe it would also simplify the code and address
some irritating asymmetries in the way notmuch show handles the --part
argument.


[PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-14 Thread Pieter Praet
On Thu, 12 Jan 2012 12:28:40 -0500, Austin Clements  wrote:
> Quoth Pieter Praet on Jan 12 at  6:07 pm:
> > On Tue, 22 Nov 2011 22:40:21 -0500, Austin Clements  
> > wrote:
> > > Quoth Jameson Graef Rollins on Nov 20 at 12:10 pm:
> > > > The open question seems to be how we handle the content encoding
> > > > parameters.  My argument is that those should either be used by notmuch
> > > > to properly encode the content for the consumer.  If that's not
> > > > possible, then just those parameters needed by the consumer to decode
> > > > the content should be output.
> > > 
> > > If notmuch is going to include part content in the JSON output (which
> > > perhaps it shouldn't, as per recent IRC discussions), then it must
> > > handle content encodings because JSON must be Unicode and therefore
> > > the content strings in the JSON must be Unicode.
> > 
> > Having missed the IRC discussions: what is the rationale for not
> > including (specific types of?) part content in the JSON output ?
> > Eg. how about inline attached text/x-patch ?
> 
> Technically the IRC discussion was about not including *any* part
> content in the JSON output, and always using show --format=raw or
> similar to retrieve desired parts.  Currently, notmuch includes part
> content in the JSON only for text/*, *except* when it's text/html.  I
> assume non-text parts are omitted because binary data is hard to
> represent in JSON and text/html is omitted because some people don't
> need it.  However, this leads to some peculiar asymmetry in the Emacs
> code where sometimes it pulls part content out of the JSON and
> sometimes it retrieves it using show --format=raw.  This in turn leads
> to asymmetry in content encoding handling, since notmuch handles
> content encoding for parts included in the JSON (and there's no good
> way around that since JSON is Unicode), but not for parts retrieved as
> raw.
> 
> The idea discussed on IRC was to remove all part content from the JSON
> output and to always use show to retrieve it, possibly beefing up
> show's support for content decoding (and possibly introducing a way to
> retrieve multiple raw parts at once to avoid re-parsing).  This would
> get the JSON format out of the business of guessing what consumers
> need, simplify the Emacs code, and normalize content encoding
> handling.

Full ACK.

One concern though (IIUC): Due to the prevalence of retarded MUA's, not
outputting 'text/plain' and/or 'text/html' parts is unfortunately all
too often equivalent to not outputting anything at all, so wouldn't we,
in essence, be reducing `show --format=json' to an ever-so-slightly
augmented `search --format=json' ?


Peace

-- 
Pieter


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-14 Thread Pieter Praet
On Thu, 12 Jan 2012 12:28:40 -0500, Austin Clements amdra...@mit.edu wrote:
 Quoth Pieter Praet on Jan 12 at  6:07 pm:
  On Tue, 22 Nov 2011 22:40:21 -0500, Austin Clements amdra...@mit.edu 
  wrote:
   Quoth Jameson Graef Rollins on Nov 20 at 12:10 pm:
The open question seems to be how we handle the content encoding
parameters.  My argument is that those should either be used by notmuch
to properly encode the content for the consumer.  If that's not
possible, then just those parameters needed by the consumer to decode
the content should be output.
   
   If notmuch is going to include part content in the JSON output (which
   perhaps it shouldn't, as per recent IRC discussions), then it must
   handle content encodings because JSON must be Unicode and therefore
   the content strings in the JSON must be Unicode.
  
  Having missed the IRC discussions: what is the rationale for not
  including (specific types of?) part content in the JSON output ?
  Eg. how about inline attached text/x-patch ?
 
 Technically the IRC discussion was about not including *any* part
 content in the JSON output, and always using show --format=raw or
 similar to retrieve desired parts.  Currently, notmuch includes part
 content in the JSON only for text/*, *except* when it's text/html.  I
 assume non-text parts are omitted because binary data is hard to
 represent in JSON and text/html is omitted because some people don't
 need it.  However, this leads to some peculiar asymmetry in the Emacs
 code where sometimes it pulls part content out of the JSON and
 sometimes it retrieves it using show --format=raw.  This in turn leads
 to asymmetry in content encoding handling, since notmuch handles
 content encoding for parts included in the JSON (and there's no good
 way around that since JSON is Unicode), but not for parts retrieved as
 raw.
 
 The idea discussed on IRC was to remove all part content from the JSON
 output and to always use show to retrieve it, possibly beefing up
 show's support for content decoding (and possibly introducing a way to
 retrieve multiple raw parts at once to avoid re-parsing).  This would
 get the JSON format out of the business of guessing what consumers
 need, simplify the Emacs code, and normalize content encoding
 handling.

Full ACK.

One concern though (IIUC): Due to the prevalence of retarded MUA's, not
outputting 'text/plain' and/or 'text/html' parts is unfortunately all
too often equivalent to not outputting anything at all, so wouldn't we,
in essence, be reducing `show --format=json' to an ever-so-slightly
augmented `search --format=json' ?


Peace

-- 
Pieter
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


The overloading of show (was Re: [PATCH] Output unmodified Content-Type header value for JSON format.)

2012-01-14 Thread Austin Clements
(was in reply to id:87ehv2proa@praet.org, but I wanted to start a
new top-level thread)

Quoth Pieter Praet on Jan 14 at 10:19 am:
 On Thu, 12 Jan 2012 12:28:40 -0500, Austin Clements amdra...@mit.edu wrote:
  Quoth Pieter Praet on Jan 12 at  6:07 pm:
   On Tue, 22 Nov 2011 22:40:21 -0500, Austin Clements amdra...@mit.edu 
   wrote:
Quoth Jameson Graef Rollins on Nov 20 at 12:10 pm:
 The open question seems to be how we handle the content encoding
 parameters.  My argument is that those should either be used by 
 notmuch
 to properly encode the content for the consumer.  If that's not
 possible, then just those parameters needed by the consumer to decode
 the content should be output.

If notmuch is going to include part content in the JSON output (which
perhaps it shouldn't, as per recent IRC discussions), then it must
handle content encodings because JSON must be Unicode and therefore
the content strings in the JSON must be Unicode.
   
   Having missed the IRC discussions: what is the rationale for not
   including (specific types of?) part content in the JSON output ?
   Eg. how about inline attached text/x-patch ?
  
  Technically the IRC discussion was about not including *any* part
  content in the JSON output, and always using show --format=raw or
  similar to retrieve desired parts.  Currently, notmuch includes part
  content in the JSON only for text/*, *except* when it's text/html.  I
  assume non-text parts are omitted because binary data is hard to
  represent in JSON and text/html is omitted because some people don't
  need it.  However, this leads to some peculiar asymmetry in the Emacs
  code where sometimes it pulls part content out of the JSON and
  sometimes it retrieves it using show --format=raw.  This in turn leads
  to asymmetry in content encoding handling, since notmuch handles
  content encoding for parts included in the JSON (and there's no good
  way around that since JSON is Unicode), but not for parts retrieved as
  raw.
  
  The idea discussed on IRC was to remove all part content from the JSON
  output and to always use show to retrieve it, possibly beefing up
  show's support for content decoding (and possibly introducing a way to
  retrieve multiple raw parts at once to avoid re-parsing).  This would
  get the JSON format out of the business of guessing what consumers
  need, simplify the Emacs code, and normalize content encoding
  handling.
 
 Full ACK.
 
 One concern though (IIUC): Due to the prevalence of retarded MUA's, not
 outputting 'text/plain' and/or 'text/html' parts is unfortunately all
 too often equivalent to not outputting anything at all, so wouldn't we,
 in essence, be reducing `show --format=json' to an ever-so-slightly
 augmented `search --format=json' ?

I'm not sure I fully understand what you're saying, but there are
several levels of structure here:

1. Threads (query results)
2. Thread structure
3. Message structure (MIME)
4. Part content

Currently, search returns 1; show --format=json returns 2, 3, and
sometimes 4 (but sometimes not); and show --format=raw returns 4.
Notably, 1 does not require opening message files (neither does 2),
which I consider an important distinction between search and show.

Some of the discussion has been about putting 4 squarely in the realm
of show --format=raw.  One counterargument (which has grown on me
since this discussion) is that the part content included in
--format=json can be thought of as pre-fetching content that clients
are likely to need in order to avoid re-parsing the message in most
circumstances.  I believe this is not the *intent* of the current
code, though without a specification of the JSON format it's hard to
tell.

Other discussion (more interesting, in my mind) has been about
separating retrieving thread structure, 2, from retrieving message
structure, 3.  To me, splitting these feels much more natural than
what we do now, which seems to be inflexibly bound to the specific way
the Emacs show mode currently works.  The thread structure is readily
available from the database, so I think separating these would open up
some new UI opportunities, particularly easy and fast thread outlining
and navigation.  I believe it would also simplify the code and address
some irritating asymmetries in the way notmuch show handles the --part
argument.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-12 Thread Pieter Praet
On Tue, 22 Nov 2011 22:40:21 -0500, Austin Clements  wrote:
> Quoth Jameson Graef Rollins on Nov 20 at 12:10 pm:
> > The open question seems to be how we handle the content encoding
> > parameters.  My argument is that those should either be used by notmuch
> > to properly encode the content for the consumer.  If that's not
> > possible, then just those parameters needed by the consumer to decode
> > the content should be output.
> 
> If notmuch is going to include part content in the JSON output (which
> perhaps it shouldn't, as per recent IRC discussions), then it must
> handle content encodings because JSON must be Unicode and therefore
> the content strings in the JSON must be Unicode.

Having missed the IRC discussions: what is the rationale for not
including (specific types of?) part content in the JSON output ?
Eg. how about inline attached text/x-patch ?

> ___
> notmuch mailing list
> notmuch at notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch


Peace

-- 
Pieter


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-12 Thread Pieter Praet
On Tue, 22 Nov 2011 22:40:21 -0500, Austin Clements amdra...@mit.edu wrote:
 Quoth Jameson Graef Rollins on Nov 20 at 12:10 pm:
  The open question seems to be how we handle the content encoding
  parameters.  My argument is that those should either be used by notmuch
  to properly encode the content for the consumer.  If that's not
  possible, then just those parameters needed by the consumer to decode
  the content should be output.
 
 If notmuch is going to include part content in the JSON output (which
 perhaps it shouldn't, as per recent IRC discussions), then it must
 handle content encodings because JSON must be Unicode and therefore
 the content strings in the JSON must be Unicode.

Having missed the IRC discussions: what is the rationale for not
including (specific types of?) part content in the JSON output ?
Eg. how about inline attached text/x-patch ?

 ___
 notmuch mailing list
 notmuch@notmuchmail.org
 http://notmuchmail.org/mailman/listinfo/notmuch


Peace

-- 
Pieter
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2012-01-12 Thread Austin Clements
Quoth Pieter Praet on Jan 12 at  6:07 pm:
 On Tue, 22 Nov 2011 22:40:21 -0500, Austin Clements amdra...@mit.edu wrote:
  Quoth Jameson Graef Rollins on Nov 20 at 12:10 pm:
   The open question seems to be how we handle the content encoding
   parameters.  My argument is that those should either be used by notmuch
   to properly encode the content for the consumer.  If that's not
   possible, then just those parameters needed by the consumer to decode
   the content should be output.
  
  If notmuch is going to include part content in the JSON output (which
  perhaps it shouldn't, as per recent IRC discussions), then it must
  handle content encodings because JSON must be Unicode and therefore
  the content strings in the JSON must be Unicode.
 
 Having missed the IRC discussions: what is the rationale for not
 including (specific types of?) part content in the JSON output ?
 Eg. how about inline attached text/x-patch ?

Technically the IRC discussion was about not including *any* part
content in the JSON output, and always using show --format=raw or
similar to retrieve desired parts.  Currently, notmuch includes part
content in the JSON only for text/*, *except* when it's text/html.  I
assume non-text parts are omitted because binary data is hard to
represent in JSON and text/html is omitted because some people don't
need it.  However, this leads to some peculiar asymmetry in the Emacs
code where sometimes it pulls part content out of the JSON and
sometimes it retrieves it using show --format=raw.  This in turn leads
to asymmetry in content encoding handling, since notmuch handles
content encoding for parts included in the JSON (and there's no good
way around that since JSON is Unicode), but not for parts retrieved as
raw.

The idea discussed on IRC was to remove all part content from the JSON
output and to always use show to retrieve it, possibly beefing up
show's support for content decoding (and possibly introducing a way to
retrieve multiple raw parts at once to avoid re-parsing).  This would
get the JSON format out of the business of guessing what consumers
need, simplify the Emacs code, and normalize content encoding
handling.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-22 Thread Austin Clements
Quoth Jameson Graef Rollins on Nov 20 at 12:10 pm:
> The open question seems to be how we handle the content encoding
> parameters.  My argument is that those should either be used by notmuch
> to properly encode the content for the consumer.  If that's not
> possible, then just those parameters needed by the consumer to decode
> the content should be output.

If notmuch is going to include part content in the JSON output (which
perhaps it shouldn't, as per recent IRC discussions), then it must
handle content encodings because JSON must be Unicode and therefore
the content strings in the JSON must be Unicode.


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-22 Thread Austin Clements
Quoth Jameson Graef Rollins on Nov 20 at 12:10 pm:
 The open question seems to be how we handle the content encoding
 parameters.  My argument is that those should either be used by notmuch
 to properly encode the content for the consumer.  If that's not
 possible, then just those parameters needed by the consumer to decode
 the content should be output.

If notmuch is going to include part content in the JSON output (which
perhaps it shouldn't, as per recent IRC discussions), then it must
handle content encodings because JSON must be Unicode and therefore
the content strings in the JSON must be Unicode.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-20 Thread Dmitry Kurochkin
On Sat, 19 Nov 2011 13:58:18 -0500, Austin Clements  wrote:
> Quoth Dmitry Kurochkin on Nov 19 at  9:26 am:
> > On Fri, 18 Nov 2011 23:59:57 -0500, Austin Clements  
> > wrote:
> > > Quoth Dmitry Kurochkin on Nov 19 at  6:42 am:
> > > > Hi Jamie.
> > > > 
> > > > On Fri, 18 Nov 2011 17:58:52 -0800, Jameson Graef Rollins  > > > finestructure.net> wrote:
> > > > > On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin 
> > > > >  wrote:
> > > > > > Before the change, notmuch used g_mime_content_type_to_string(3)
> > > > > > function to output Content-Type header value.  Turns out it outputs
> > > > > > only "type/subtype" part and ignores all parameters.  Also, if there
> > > > > > is no Content-Type header, default "text/plain" value is used.
> > > > > 
> > > > > Hi, Dmitry.  Can you explain under what circumstances you would need 
> > > > > the
> > > > > extra content-type parameters?
> > > > 
> > > > Charset is an example of a parameter which you need to render a part
> > > > correctly.
> > > 
> > > Can notmuch convert to a common charset, given that, otherwise, every
> > > client is going to have to implement this conversion anyway?
> > > 
> > 
> > Notmuch can handle charset (and any other) parameters but only for known
> > media types (i.e. text/*).  I think it would be useful (especially for
> > human-readable output formats).  But it is a separate issue.
> > 
> > Notmuch can not convert other types it does not know how to handle.
> > E.g. HTML charset conversion is not as simple as for plain text.
> > 
> > AFAIK standard defines charset parameter just for few types.  So in
> > general, charset parameter can have any meaning for some custom media
> > type.
> 
> Interesting.  I hadn't realized the content-type specification was so
> open-ended.  However, there are many things that *could* be included
> in the JSON format but aren't; what's included is primarily driven by
> what consumers actually need and it seems like the actual need here is
> charset handling.  Maybe the JSON format *shouldn't* evolve this way,
> but I think it should either be driven by its needs like it is now, or
> we should be taking bigger steps like providing *all* of the headers
> (essentially, a JSON-ification of the MIME structure), which would
> subsume more specific generalizations like exposing just the full
> content-type header.
> 

I think it is a good idea to provide all headers in JSON output.  Still
I believe this patch is still valid.  It is a simple change, which makes
the JSON format simpler and we have consumers that need it.  Providing
all headers would be a bigger change (and I expect it to be much more
difficult to get accepted).

What I definately do not like, is adding an exception for charset
parameter and inventing complex rules for JSON format instead of keeping
it simple.

> Regarding charset, specifically, though, the JSON format only includes
> part bodies for text/* types and, according to RFC 2045,
> 
>   For example, the "charset" parameter is applicable to any subtype of
>   "text", ...
> 
> Section 4.1.2 (Charset Parameter) of RFC 2046 beats around the bush,
> but I think it's saying essentially the same thing in a lot more
> detail.  Given that, I think it does make sense for notmuch to handle
> the charset parameter and re-coding.
> 

I think it may be a good idea but it is not trivial to do right.  We
should not just convert all text parts unconditionally to locale or
UTF-8.

> > > (And are there other examples of useful things in the content type?)
> > 
> > What is meant by useful?  All parameters do have some use.  The fact
> > that notmuch does not handle them does not mean they are useless.  And
> > notmuch can not handle all parameters just because the list of
> > parameters is not defined.  So there is no choice but to let notmuch
> > users see and use these parameters.
> 
> Yes, I now agree with this, modulo my statements about generality above.
> 

Thanks.

Regards,
  Dmitry

> > Regards,
> >   Dmitry
> > 
> 
> -- 
> Austin Clements  MIT/'06/PhD/CSAIL
> amdragon at mit.edu   http://web.mit.edu/amdragon
>Somewhere in the dream we call reality you will find me,
>   searching for the reality we call dreams.


[PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-20 Thread Dmitry Kurochkin
Hi Jameson.

On Sat, 19 Nov 2011 02:49:43 -0800, Jameson Graef Rollins  wrote:
> On Sat, 19 Nov 2011 06:42:00 +0400, Dmitry Kurochkin  gmail.com> wrote:
> > The parameters are there for a reason.  They are part of the
> > content-type and are needed to handle the body properly.  If you say
> > that the parameters are not needed by notmuch users, that implies that
> > they are handled by notmuch.  Which is just not possible.
> 
> Hey, Dimitry.  At least some of the parameters in the content-type are
> actually related to the mime structure itself.  A good example is:
> 
> boundary=\"=-=-=\"
> 
> This parameter is there to tell the MIME parser how to parse the content
> that follows.  It is meant for the first level parser and no more.  Once
> the MIME has been separated into it's constituent parts, there's no need
> for any further clients to know anything about boundary string.
> 

Yes, at least in most cases.  On the other hand, if you can make notmuch
show raw multipart part (you can, right?), then it seems natural that
notmuch provides enough information to parse it.

> I would argue that notmuch is acting as the first level parser.  As far
> as I can tell, most of the rest of the parameters I've seen should only
> be useful to the those first-level parsers.
> 

I do not think first-level parser is a standard term.  As I understand,
you mean that notmuch parses MIME recursively until the leaf non-MIME
parts.  Correct?

I do not know what parameters you have seen.  The most common example of
a useful parameter for second-level parsers is a charset.

I do not understand why do we try to come up with excuses for not
providing useful information to users.  Current assumption that all
parameters that notmuch does not handle are useless is plain invalid.

> As Austin mentioned, is it not possible for notmuch itself to act on the
> parameter to give a properly formatted output to its clients?
> 

Please see my answer to Austin.  I explained why this is not an option
in general case.

As for parameters that notmuch already handles, like "boundary", I just
do not understand why we should invent some artificial exceptions and
decide for our users what parameters are useful or useless for them
instead of implementing a simple and kind of expected approach:
content-type in JSON is original Content-Type header value.  It makes
both the code and the format simpler.

> > The fact that this change happens to fix an issue with HTML charsets for
> > me is just a side effect.
> 
> But isn't that actually a large part of the issue?  If this patch fixes
> something that you think notmuch is doing improperly, could there not be
> a test for it?
> 

No.  It just happens to be how I found the problem.  The issue is:
notmuch JSON format mangles Content-Type header value by throwing away
useful information in some cases and adding info that was not there in
others.  Note that I do not mention any single parameter name here.  It
is a general issue, not a "charset" or "boundary" parameter issue.

> However, based on your patches and as far as I can tell, this change
> adds more than a boundary parameter to only crypto parts
> (application/signed and application/encrypted).  However, I don't think
> any of the crypto functionality needs any of the extra information
> provided in the extended output.  If there was a test for the
> functionality you think is missing, it would help bolster the case for
> the additional output.
> 

Again, the patch is not about "add boundary and other useless crypto
parameters".  The patch is about stop throwing away useless
information.

> > > >   "content": [{"id": 2,
> > > > - "content-type": "text/plain",
> > > >   "content": "This is a test signed message.\n"},
> > > 
> > > Without figuring out what's going on, I notice that some of the tests
> > > have been modified to remove the content-type fields on a bunch of
> > > parts.  I think that is probably not right.
> > > 
> > 
> > I tried to explain this in the preable.  These parts do not have
> > Content-Type in the original message.  So I think it is wrong for
> > notmuch JSON format to add it.
> 
> Ah, ok, I think I understand this point.  I think this is actually a
> separate issue than the one the rest of the patch set is for, though.
> One part of the patch is that content-type parameters are also about,
> and another part is that parts without content-type shouldn't be
> assigned one automatically.  I personally think those should be separate
> patches.
> 

The implementation makes it not practical to separate these changes.
They come as a result of the same code change.

Regards,
  Dmitry

> jamie.


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-20 Thread Austin Clements
Quoth Dmitry Kurochkin on Nov 19 at  9:26 am:
 On Fri, 18 Nov 2011 23:59:57 -0500, Austin Clements amdra...@mit.edu wrote:
  Quoth Dmitry Kurochkin on Nov 19 at  6:42 am:
   Hi Jamie.
   
   On Fri, 18 Nov 2011 17:58:52 -0800, Jameson Graef Rollins 
   jroll...@finestructure.net wrote:
On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin 
dmitry.kuroch...@gmail.com wrote:
 Before the change, notmuch used g_mime_content_type_to_string(3)
 function to output Content-Type header value.  Turns out it outputs
 only type/subtype part and ignores all parameters.  Also, if there
 is no Content-Type header, default text/plain value is used.

Hi, Dmitry.  Can you explain under what circumstances you would need the
extra content-type parameters?
   
   Charset is an example of a parameter which you need to render a part
   correctly.
  
  Can notmuch convert to a common charset, given that, otherwise, every
  client is going to have to implement this conversion anyway?
  
 
 Notmuch can handle charset (and any other) parameters but only for known
 media types (i.e. text/*).  I think it would be useful (especially for
 human-readable output formats).  But it is a separate issue.
 
 Notmuch can not convert other types it does not know how to handle.
 E.g. HTML charset conversion is not as simple as for plain text.
 
 AFAIK standard defines charset parameter just for few types.  So in
 general, charset parameter can have any meaning for some custom media
 type.

Interesting.  I hadn't realized the content-type specification was so
open-ended.  However, there are many things that *could* be included
in the JSON format but aren't; what's included is primarily driven by
what consumers actually need and it seems like the actual need here is
charset handling.  Maybe the JSON format *shouldn't* evolve this way,
but I think it should either be driven by its needs like it is now, or
we should be taking bigger steps like providing *all* of the headers
(essentially, a JSON-ification of the MIME structure), which would
subsume more specific generalizations like exposing just the full
content-type header.

Regarding charset, specifically, though, the JSON format only includes
part bodies for text/* types and, according to RFC 2045,

  For example, the charset parameter is applicable to any subtype of
  text, ...

Section 4.1.2 (Charset Parameter) of RFC 2046 beats around the bush,
but I think it's saying essentially the same thing in a lot more
detail.  Given that, I think it does make sense for notmuch to handle
the charset parameter and re-coding.

  (And are there other examples of useful things in the content type?)
 
 What is meant by useful?  All parameters do have some use.  The fact
 that notmuch does not handle them does not mean they are useless.  And
 notmuch can not handle all parameters just because the list of
 parameters is not defined.  So there is no choice but to let notmuch
 users see and use these parameters.

Yes, I now agree with this, modulo my statements about generality above.

 Regards,
   Dmitry
 

-- 
Austin Clements  MIT/'06/PhD/CSAIL
amdra...@mit.edu   http://web.mit.edu/amdragon
   Somewhere in the dream we call reality you will find me,
  searching for the reality we call dreams.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-20 Thread Dmitry Kurochkin
Hi Jameson.

On Sat, 19 Nov 2011 02:49:43 -0800, Jameson Graef Rollins 
jroll...@finestructure.net wrote:
 On Sat, 19 Nov 2011 06:42:00 +0400, Dmitry Kurochkin 
 dmitry.kuroch...@gmail.com wrote:
  The parameters are there for a reason.  They are part of the
  content-type and are needed to handle the body properly.  If you say
  that the parameters are not needed by notmuch users, that implies that
  they are handled by notmuch.  Which is just not possible.
 
 Hey, Dimitry.  At least some of the parameters in the content-type are
 actually related to the mime structure itself.  A good example is:
 
 boundary=\=-=-=\
 
 This parameter is there to tell the MIME parser how to parse the content
 that follows.  It is meant for the first level parser and no more.  Once
 the MIME has been separated into it's constituent parts, there's no need
 for any further clients to know anything about boundary string.
 

Yes, at least in most cases.  On the other hand, if you can make notmuch
show raw multipart part (you can, right?), then it seems natural that
notmuch provides enough information to parse it.

 I would argue that notmuch is acting as the first level parser.  As far
 as I can tell, most of the rest of the parameters I've seen should only
 be useful to the those first-level parsers.
 

I do not think first-level parser is a standard term.  As I understand,
you mean that notmuch parses MIME recursively until the leaf non-MIME
parts.  Correct?

I do not know what parameters you have seen.  The most common example of
a useful parameter for second-level parsers is a charset.

I do not understand why do we try to come up with excuses for not
providing useful information to users.  Current assumption that all
parameters that notmuch does not handle are useless is plain invalid.

 As Austin mentioned, is it not possible for notmuch itself to act on the
 parameter to give a properly formatted output to its clients?
 

Please see my answer to Austin.  I explained why this is not an option
in general case.

As for parameters that notmuch already handles, like boundary, I just
do not understand why we should invent some artificial exceptions and
decide for our users what parameters are useful or useless for them
instead of implementing a simple and kind of expected approach:
content-type in JSON is original Content-Type header value.  It makes
both the code and the format simpler.

  The fact that this change happens to fix an issue with HTML charsets for
  me is just a side effect.
 
 But isn't that actually a large part of the issue?  If this patch fixes
 something that you think notmuch is doing improperly, could there not be
 a test for it?
 

No.  It just happens to be how I found the problem.  The issue is:
notmuch JSON format mangles Content-Type header value by throwing away
useful information in some cases and adding info that was not there in
others.  Note that I do not mention any single parameter name here.  It
is a general issue, not a charset or boundary parameter issue.

 However, based on your patches and as far as I can tell, this change
 adds more than a boundary parameter to only crypto parts
 (application/signed and application/encrypted).  However, I don't think
 any of the crypto functionality needs any of the extra information
 provided in the extended output.  If there was a test for the
 functionality you think is missing, it would help bolster the case for
 the additional output.
 

Again, the patch is not about add boundary and other useless crypto
parameters.  The patch is about stop throwing away useless
information.

  content: [{id: 2,
- content-type: text/plain,
  content: This is a test signed message.\n},
   
   Without figuring out what's going on, I notice that some of the tests
   have been modified to remove the content-type fields on a bunch of
   parts.  I think that is probably not right.
   
  
  I tried to explain this in the preable.  These parts do not have
  Content-Type in the original message.  So I think it is wrong for
  notmuch JSON format to add it.
 
 Ah, ok, I think I understand this point.  I think this is actually a
 separate issue than the one the rest of the patch set is for, though.
 One part of the patch is that content-type parameters are also about,
 and another part is that parts without content-type shouldn't be
 assigned one automatically.  I personally think those should be separate
 patches.
 

The implementation makes it not practical to separate these changes.
They come as a result of the same code change.

Regards,
  Dmitry

 jamie.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-20 Thread Dmitry Kurochkin
On Sat, 19 Nov 2011 13:58:18 -0500, Austin Clements amdra...@mit.edu wrote:
 Quoth Dmitry Kurochkin on Nov 19 at  9:26 am:
  On Fri, 18 Nov 2011 23:59:57 -0500, Austin Clements amdra...@mit.edu 
  wrote:
   Quoth Dmitry Kurochkin on Nov 19 at  6:42 am:
Hi Jamie.

On Fri, 18 Nov 2011 17:58:52 -0800, Jameson Graef Rollins 
jroll...@finestructure.net wrote:
 On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin 
 dmitry.kuroch...@gmail.com wrote:
  Before the change, notmuch used g_mime_content_type_to_string(3)
  function to output Content-Type header value.  Turns out it outputs
  only type/subtype part and ignores all parameters.  Also, if there
  is no Content-Type header, default text/plain value is used.
 
 Hi, Dmitry.  Can you explain under what circumstances you would need 
 the
 extra content-type parameters?

Charset is an example of a parameter which you need to render a part
correctly.
   
   Can notmuch convert to a common charset, given that, otherwise, every
   client is going to have to implement this conversion anyway?
   
  
  Notmuch can handle charset (and any other) parameters but only for known
  media types (i.e. text/*).  I think it would be useful (especially for
  human-readable output formats).  But it is a separate issue.
  
  Notmuch can not convert other types it does not know how to handle.
  E.g. HTML charset conversion is not as simple as for plain text.
  
  AFAIK standard defines charset parameter just for few types.  So in
  general, charset parameter can have any meaning for some custom media
  type.
 
 Interesting.  I hadn't realized the content-type specification was so
 open-ended.  However, there are many things that *could* be included
 in the JSON format but aren't; what's included is primarily driven by
 what consumers actually need and it seems like the actual need here is
 charset handling.  Maybe the JSON format *shouldn't* evolve this way,
 but I think it should either be driven by its needs like it is now, or
 we should be taking bigger steps like providing *all* of the headers
 (essentially, a JSON-ification of the MIME structure), which would
 subsume more specific generalizations like exposing just the full
 content-type header.
 

I think it is a good idea to provide all headers in JSON output.  Still
I believe this patch is still valid.  It is a simple change, which makes
the JSON format simpler and we have consumers that need it.  Providing
all headers would be a bigger change (and I expect it to be much more
difficult to get accepted).

What I definately do not like, is adding an exception for charset
parameter and inventing complex rules for JSON format instead of keeping
it simple.

 Regarding charset, specifically, though, the JSON format only includes
 part bodies for text/* types and, according to RFC 2045,
 
   For example, the charset parameter is applicable to any subtype of
   text, ...
 
 Section 4.1.2 (Charset Parameter) of RFC 2046 beats around the bush,
 but I think it's saying essentially the same thing in a lot more
 detail.  Given that, I think it does make sense for notmuch to handle
 the charset parameter and re-coding.
 

I think it may be a good idea but it is not trivial to do right.  We
should not just convert all text parts unconditionally to locale or
UTF-8.

   (And are there other examples of useful things in the content type?)
  
  What is meant by useful?  All parameters do have some use.  The fact
  that notmuch does not handle them does not mean they are useless.  And
  notmuch can not handle all parameters just because the list of
  parameters is not defined.  So there is no choice but to let notmuch
  users see and use these parameters.
 
 Yes, I now agree with this, modulo my statements about generality above.
 

Thanks.

Regards,
  Dmitry

  Regards,
Dmitry
  
 
 -- 
 Austin Clements  MIT/'06/PhD/CSAIL
 amdra...@mit.edu   http://web.mit.edu/amdragon
Somewhere in the dream we call reality you will find me,
   searching for the reality we call dreams.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-20 Thread Jameson Graef Rollins
On Sun, 20 Nov 2011 22:32:53 +0400, Dmitry Kurochkin 
dmitry.kuroch...@gmail.com wrote:
 Yes, at least in most cases.  On the other hand, if you can make notmuch
 show raw multipart part (you can, right?), then it seems natural that
 notmuch provides enough information to parse it.

This is kind of an unresolved issue, actually.  Currently headers are
only included in the raw format output of an entire message.
Otherwise, when raw output is requested of an individual part only the
body is output.  For multipart parts, the raw format output includes all
body parts concatenated together, still without any headers.

This raw multipart output clearly doesn't really make much sense and
we need to figure that out.  dkg wrote a good breakdown of the issue
here:

id:4e09072a.7040...@fifthhorseman.net

However, this only for raw output.  It's definitely not the same as
the json output.  For json the parts are all parsed by notmuch and
placed into separate json elements.  The receiver is not going to do any
further parsing since all the mime structure parsing has been done.

We need to keep clear the distinction between parsing the mime
structure, and encoding the content of the part.  Confusion seems to
be coming from the fact that the Content-Type header includes
information needed for both mime parsing and content encoding.  However,
I don't think that means that we need to just include everything in the
output.  Parameters that have to do with mime parsing should be dropped,
since that information has already been used in the mime parsing and
can't is no longer useful to the consumer.  It's just noise, and I don't
think notmuch should be outputting useless noise.

The open question seems to be how we handle the content encoding
parameters.  My argument is that those should either be used by notmuch
to properly encode the content for the consumer.  If that's not
possible, then just those parameters needed by the consumer to decode
the content should be output.

  But isn't that actually a large part of the issue?  If this patch fixes
  something that you think notmuch is doing improperly, could there not be
  a test for it?
 
 No.  It just happens to be how I found the problem.  The issue is:
 notmuch JSON format mangles Content-Type header value by throwing away
 useful information in some cases and adding info that was not there in
 others.  Note that I do not mention any single parameter name here.  It
 is a general issue, not a charset or boundary parameter issue.

I'm sorry, but I still don't believe it's not possible to test for this
issue.  If there's a problem that you're seeing, then you must of
identified it somehow, and therefore there must be a way to test for it.

jamie.


pgpXH3DgFKp8r.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-19 Thread Austin Clements
Quoth Dmitry Kurochkin on Nov 19 at  9:26 am:
> On Fri, 18 Nov 2011 23:59:57 -0500, Austin Clements  
> wrote:
> > Quoth Dmitry Kurochkin on Nov 19 at  6:42 am:
> > > Hi Jamie.
> > > 
> > > On Fri, 18 Nov 2011 17:58:52 -0800, Jameson Graef Rollins  > > finestructure.net> wrote:
> > > > On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin  > > > at gmail.com> wrote:
> > > > > Before the change, notmuch used g_mime_content_type_to_string(3)
> > > > > function to output Content-Type header value.  Turns out it outputs
> > > > > only "type/subtype" part and ignores all parameters.  Also, if there
> > > > > is no Content-Type header, default "text/plain" value is used.
> > > > 
> > > > Hi, Dmitry.  Can you explain under what circumstances you would need the
> > > > extra content-type parameters?
> > > 
> > > Charset is an example of a parameter which you need to render a part
> > > correctly.
> > 
> > Can notmuch convert to a common charset, given that, otherwise, every
> > client is going to have to implement this conversion anyway?
> > 
> 
> Notmuch can handle charset (and any other) parameters but only for known
> media types (i.e. text/*).  I think it would be useful (especially for
> human-readable output formats).  But it is a separate issue.
> 
> Notmuch can not convert other types it does not know how to handle.
> E.g. HTML charset conversion is not as simple as for plain text.
> 
> AFAIK standard defines charset parameter just for few types.  So in
> general, charset parameter can have any meaning for some custom media
> type.

Interesting.  I hadn't realized the content-type specification was so
open-ended.  However, there are many things that *could* be included
in the JSON format but aren't; what's included is primarily driven by
what consumers actually need and it seems like the actual need here is
charset handling.  Maybe the JSON format *shouldn't* evolve this way,
but I think it should either be driven by its needs like it is now, or
we should be taking bigger steps like providing *all* of the headers
(essentially, a JSON-ification of the MIME structure), which would
subsume more specific generalizations like exposing just the full
content-type header.

Regarding charset, specifically, though, the JSON format only includes
part bodies for text/* types and, according to RFC 2045,

  For example, the "charset" parameter is applicable to any subtype of
  "text", ...

Section 4.1.2 (Charset Parameter) of RFC 2046 beats around the bush,
but I think it's saying essentially the same thing in a lot more
detail.  Given that, I think it does make sense for notmuch to handle
the charset parameter and re-coding.

> > (And are there other examples of useful things in the content type?)
> 
> What is meant by useful?  All parameters do have some use.  The fact
> that notmuch does not handle them does not mean they are useless.  And
> notmuch can not handle all parameters just because the list of
> parameters is not defined.  So there is no choice but to let notmuch
> users see and use these parameters.

Yes, I now agree with this, modulo my statements about generality above.

> Regards,
>   Dmitry
> 

-- 
Austin Clements  MIT/'06/PhD/CSAIL
amdragon at mit.edu   http://web.mit.edu/amdragon
   Somewhere in the dream we call reality you will find me,
  searching for the reality we call dreams.


[PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-19 Thread Dmitry Kurochkin
On Fri, 18 Nov 2011 23:59:57 -0500, Austin Clements  wrote:
> Quoth Dmitry Kurochkin on Nov 19 at  6:42 am:
> > Hi Jamie.
> > 
> > On Fri, 18 Nov 2011 17:58:52 -0800, Jameson Graef Rollins  > finestructure.net> wrote:
> > > On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin  > > gmail.com> wrote:
> > > > Before the change, notmuch used g_mime_content_type_to_string(3)
> > > > function to output Content-Type header value.  Turns out it outputs
> > > > only "type/subtype" part and ignores all parameters.  Also, if there
> > > > is no Content-Type header, default "text/plain" value is used.
> > > 
> > > Hi, Dmitry.  Can you explain under what circumstances you would need the
> > > extra content-type parameters?
> > 
> > Charset is an example of a parameter which you need to render a part
> > correctly.
> 
> Can notmuch convert to a common charset, given that, otherwise, every
> client is going to have to implement this conversion anyway?
> 

Notmuch can handle charset (and any other) parameters but only for known
media types (i.e. text/*).  I think it would be useful (especially for
human-readable output formats).  But it is a separate issue.

Notmuch can not convert other types it does not know how to handle.
E.g. HTML charset conversion is not as simple as for plain text.

AFAIK standard defines charset parameter just for few types.  So in
general, charset parameter can have any meaning for some custom media
type.

> (And are there other examples of useful things in the content type?)

What is meant by useful?  All parameters do have some use.  The fact
that notmuch does not handle them does not mean they are useless.  And
notmuch can not handle all parameters just because the list of
parameters is not defined.  So there is no choice but to let notmuch
users see and use these parameters.

Regards,
  Dmitry


[PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-19 Thread Dmitry Kurochkin
Hi Jamie.

On Fri, 18 Nov 2011 17:58:52 -0800, Jameson Graef Rollins  wrote:
> On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin  gmail.com> wrote:
> > Before the change, notmuch used g_mime_content_type_to_string(3)
> > function to output Content-Type header value.  Turns out it outputs
> > only "type/subtype" part and ignores all parameters.  Also, if there
> > is no Content-Type header, default "text/plain" value is used.
> 
> Hi, Dmitry.  Can you explain under what circumstances you would need the
> extra content-type parameters?

Charset is an example of a parameter which you need to render a part
correctly.

>  It just seems like a lot of extra noise
> in the output to me, but that's partially because I can't think of any
> reason why something that is receiving pre-parsed mime content would
> need it.  Maybe there's a better way to handle what you're trying to get
> to.
> 

Why extra output in JSON is an issue?

The parameters are there for a reason.  They are part of the
content-type and are needed to handle the body properly.  If you say
that the parameters are not needed by notmuch users, that implies that
they are handled by notmuch.  Which is just not possible.

> I think it would help a lot if you could submit some sort of test
> modification that demonstrates the issue.  This is one of the reasons we
> keep emphasizing that it's good to first have tests in hand that
> demonstrate issues before patches that address them.
> 

The fact that this change happens to fix an issue with HTML charsets for
me is just a side effect.

The real issue is that JSON format throws away information which is
required to properly render a part.  I do not think we need to add a
dedicated test to check that JSON outputs charsets with parameters,
considering that it is already present in many other tests.

I do not think it was intended that notmuch outputs stripped
Content-Type values.  It was just a side effect of using
g_mime_content_type_to_string(3) which gone unnoticed.

> >   "content": [{"id": 2,
> > - "content-type": "text/plain",
> >   "content": "This is a test signed message.\n"},
> 
> Without figuring out what's going on, I notice that some of the tests
> have been modified to remove the content-type fields on a bunch of
> parts.  I think that is probably not right.
> 

I tried to explain this in the preable.  These parts do not have
Content-Type in the original message.  So I think it is wrong for
notmuch JSON format to add it.

Regards,
  Dmitry

> jamie.


[PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-19 Thread Dmitry Kurochkin
Before the change, notmuch used g_mime_content_type_to_string(3)
function to output Content-Type header value.  Turns out it outputs
only "type/subtype" part and ignores all parameters.  Also, if there
is no Content-Type header, default "text/plain" value is used.

JSON is supposed to be a "low-level" structured format and should not
add missing values or throw away information.  The patch changes
notmuch show to use unmodified Content-Type value for JSON format.
Also, no default value is added if the header is missing.

Corresponding changes to Emacs UI are made to handle full Content-Type
header values.  The header is parsed using MIME
`mail-header-parse-content-type' function.  All message part rendering
functions have access to full Content-Type value.  In particular, this
is important for `notmuch-show-mm-display-part-inline' which uses
`mm-display-part' to display parts that notmuch-show does not handle.

Expected results for the tests are updated accordingly.
---
 emacs/notmuch-show.el |   28 ++--
 notmuch-show.c|   14 --
 test/crypto   |   23 ---
 test/json |6 +++---
 test/maildir-sync |1 -
 test/multipart|   36 ++--
 6 files changed, 59 insertions(+), 49 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index d5c95d8..d2c2fa3 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -261,120 +261,120 @@ message at DEPTH in the current thread."
   (if (and header-value
(not (string-equal "" header-value)))
   (notmuch-show-insert-header header header-value
  notmuch-message-headers)
 (save-excursion
   (save-restriction
(narrow-to-region start (point-max))
(run-hooks 'notmuch-show-markup-headers-hook)

 (define-button-type 'notmuch-show-part-button-type
   'action 'notmuch-show-part-button-action
   'follow-link t
   'face 'message-mml)

 (defun notmuch-show-insert-part-header (nth content-type declared-type 
 name comment)
   (let ((button))
 (setq button
  (insert-button
   (concat "[ "
   (if name (concat name ": ") "")
-  declared-type
-  (if (not (string-equal declared-type content-type))
-  (concat " (as " content-type ")")
+  (car declared-type)
+  (if (not (string-equal (car declared-type) (car 
content-type)))
+  (concat " (as " (car content-type) ")")
 "")
   (or comment "")
   " ]")
   :type 'notmuch-show-part-button-type
   :notmuch-part nth
   :notmuch-filename name))
 (insert "\n")
 ;; return button
 button))

 ;; Functions handling particular MIME parts.

 (defun notmuch-show-save-part (message-id nth  filename)
   (let ((process-crypto notmuch-show-process-crypto))
 (with-temp-buffer
   (setq notmuch-show-process-crypto process-crypto)
   ;; Always acquires the part via `notmuch part', even if it is
   ;; available in the JSON output.
   (insert (notmuch-show-get-bodypart-internal message-id nth))
   (let ((file (read-file-name
   "Filename to save as: "
   (or mailcap-download-directory "~/")
   nil nil
   filename)))
;; Don't re-compress .gz & al.  Arguably we should make
;; `file-name-handler-alist' nil, but that would chop
;; ange-ftp, which is reasonable to use here.
(mm-write-region (point-min) (point-max) file nil nil nil 
'no-conversion t)

 (defun notmuch-show-mm-display-part-inline (msg part content-type content)
   "Use the mm-decode/mm-view functions to display a part in the
 current buffer, if possible."
   (let ((display-buffer (current-buffer)))
 (with-temp-buffer
   (insert content)
-  (let ((handle (mm-make-handle (current-buffer) (list content-type
+  (let ((handle (mm-make-handle (current-buffer) content-type)))
(set-buffer display-buffer)
(if (and (mm-inlinable-p handle)
 (mm-inlined-p handle))
(progn
  (mm-display-part handle)
  t)
  nil)

 (defvar notmuch-show-multipart/alternative-discouraged
   '(
 ;; Avoid HTML parts.
 "text/html"
 ;; multipart/related usually contain a text/html part and some associated 
graphics.
 "multipart/related"
 ))

 (defun notmuch-show-multipart/*-to-list (part)
-  (mapcar '(lambda (inner-part) (plist-get inner-part :content-type))
+  (mapcar '(lambda (inner-part) (car (notmuch-show-get-content-type 
inner-part)))
  (plist-get part :content)))

 (defun notmuch-show-multipart/alternative-choose (types)
   ;; Based on `mm-preferred-alternative-precedence'.
   (let ((seq types))
 (dolist (pref (reverse 

Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-19 Thread Jameson Graef Rollins
On Sat, 19 Nov 2011 06:42:00 +0400, Dmitry Kurochkin 
dmitry.kuroch...@gmail.com wrote:
 The parameters are there for a reason.  They are part of the
 content-type and are needed to handle the body properly.  If you say
 that the parameters are not needed by notmuch users, that implies that
 they are handled by notmuch.  Which is just not possible.

Hey, Dimitry.  At least some of the parameters in the content-type are
actually related to the mime structure itself.  A good example is:

boundary=\=-=-=\

This parameter is there to tell the MIME parser how to parse the content
that follows.  It is meant for the first level parser and no more.  Once
the MIME has been separated into it's constituent parts, there's no need
for any further clients to know anything about boundary string.

I would argue that notmuch is acting as the first level parser.  As far
as I can tell, most of the rest of the parameters I've seen should only
be useful to the those first-level parsers.

As Austin mentioned, is it not possible for notmuch itself to act on the
parameter to give a properly formatted output to its clients?

 The fact that this change happens to fix an issue with HTML charsets for
 me is just a side effect.

But isn't that actually a large part of the issue?  If this patch fixes
something that you think notmuch is doing improperly, could there not be
a test for it?

However, based on your patches and as far as I can tell, this change
adds more than a boundary parameter to only crypto parts
(application/signed and application/encrypted).  However, I don't think
any of the crypto functionality needs any of the extra information
provided in the extended output.  If there was a test for the
functionality you think is missing, it would help bolster the case for
the additional output.

 content: [{id: 2,
   - content-type: text/plain,
 content: This is a test signed message.\n},
  
  Without figuring out what's going on, I notice that some of the tests
  have been modified to remove the content-type fields on a bunch of
  parts.  I think that is probably not right.
  
 
 I tried to explain this in the preable.  These parts do not have
 Content-Type in the original message.  So I think it is wrong for
 notmuch JSON format to add it.

Ah, ok, I think I understand this point.  I think this is actually a
separate issue than the one the rest of the patch set is for, though.
One part of the patch is that content-type parameters are also about,
and another part is that parts without content-type shouldn't be
assigned one automatically.  I personally think those should be separate
patches.

jamie.


pgpAaSz5QtRMV.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-18 Thread Austin Clements
Quoth Dmitry Kurochkin on Nov 19 at  6:42 am:
> Hi Jamie.
> 
> On Fri, 18 Nov 2011 17:58:52 -0800, Jameson Graef Rollins  finestructure.net> wrote:
> > On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin  > gmail.com> wrote:
> > > Before the change, notmuch used g_mime_content_type_to_string(3)
> > > function to output Content-Type header value.  Turns out it outputs
> > > only "type/subtype" part and ignores all parameters.  Also, if there
> > > is no Content-Type header, default "text/plain" value is used.
> > 
> > Hi, Dmitry.  Can you explain under what circumstances you would need the
> > extra content-type parameters?
> 
> Charset is an example of a parameter which you need to render a part
> correctly.

Can notmuch convert to a common charset, given that, otherwise, every
client is going to have to implement this conversion anyway?

(And are there other examples of useful things in the content type?)


[PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-18 Thread Jameson Graef Rollins
On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin  wrote:
> Before the change, notmuch used g_mime_content_type_to_string(3)
> function to output Content-Type header value.  Turns out it outputs
> only "type/subtype" part and ignores all parameters.  Also, if there
> is no Content-Type header, default "text/plain" value is used.

Hi, Dmitry.  Can you explain under what circumstances you would need the
extra content-type parameters?  It just seems like a lot of extra noise
in the output to me, but that's partially because I can't think of any
reason why something that is receiving pre-parsed mime content would
need it.  Maybe there's a better way to handle what you're trying to get
to.

I think it would help a lot if you could submit some sort of test
modification that demonstrates the issue.  This is one of the reasons we
keep emphasizing that it's good to first have tests in hand that
demonstrate issues before patches that address them.

>   "content": [{"id": 2,
> - "content-type": "text/plain",
>   "content": "This is a test signed message.\n"},

Without figuring out what's going on, I notice that some of the tests
have been modified to remove the content-type fields on a bunch of
parts.  I think that is probably not right.

jamie.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 



Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-18 Thread Jameson Graef Rollins
On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin 
dmitry.kuroch...@gmail.com wrote:
 Before the change, notmuch used g_mime_content_type_to_string(3)
 function to output Content-Type header value.  Turns out it outputs
 only type/subtype part and ignores all parameters.  Also, if there
 is no Content-Type header, default text/plain value is used.

Hi, Dmitry.  Can you explain under what circumstances you would need the
extra content-type parameters?  It just seems like a lot of extra noise
in the output to me, but that's partially because I can't think of any
reason why something that is receiving pre-parsed mime content would
need it.  Maybe there's a better way to handle what you're trying to get
to.

I think it would help a lot if you could submit some sort of test
modification that demonstrates the issue.  This is one of the reasons we
keep emphasizing that it's good to first have tests in hand that
demonstrate issues before patches that address them.

   content: [{id: 2,
 - content-type: text/plain,
   content: This is a test signed message.\n},

Without figuring out what's going on, I notice that some of the tests
have been modified to remove the content-type fields on a bunch of
parts.  I think that is probably not right.

jamie.


pgpVWp48YSL8h.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-18 Thread Dmitry Kurochkin
Hi Jamie.

On Fri, 18 Nov 2011 17:58:52 -0800, Jameson Graef Rollins 
jroll...@finestructure.net wrote:
 On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin 
 dmitry.kuroch...@gmail.com wrote:
  Before the change, notmuch used g_mime_content_type_to_string(3)
  function to output Content-Type header value.  Turns out it outputs
  only type/subtype part and ignores all parameters.  Also, if there
  is no Content-Type header, default text/plain value is used.
 
 Hi, Dmitry.  Can you explain under what circumstances you would need the
 extra content-type parameters?

Charset is an example of a parameter which you need to render a part
correctly.

  It just seems like a lot of extra noise
 in the output to me, but that's partially because I can't think of any
 reason why something that is receiving pre-parsed mime content would
 need it.  Maybe there's a better way to handle what you're trying to get
 to.
 

Why extra output in JSON is an issue?

The parameters are there for a reason.  They are part of the
content-type and are needed to handle the body properly.  If you say
that the parameters are not needed by notmuch users, that implies that
they are handled by notmuch.  Which is just not possible.

 I think it would help a lot if you could submit some sort of test
 modification that demonstrates the issue.  This is one of the reasons we
 keep emphasizing that it's good to first have tests in hand that
 demonstrate issues before patches that address them.
 

The fact that this change happens to fix an issue with HTML charsets for
me is just a side effect.

The real issue is that JSON format throws away information which is
required to properly render a part.  I do not think we need to add a
dedicated test to check that JSON outputs charsets with parameters,
considering that it is already present in many other tests.

I do not think it was intended that notmuch outputs stripped
Content-Type values.  It was just a side effect of using
g_mime_content_type_to_string(3) which gone unnoticed.

content: [{id: 2,
  - content-type: text/plain,
content: This is a test signed message.\n},
 
 Without figuring out what's going on, I notice that some of the tests
 have been modified to remove the content-type fields on a bunch of
 parts.  I think that is probably not right.
 

I tried to explain this in the preable.  These parts do not have
Content-Type in the original message.  So I think it is wrong for
notmuch JSON format to add it.

Regards,
  Dmitry

 jamie.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-18 Thread Austin Clements
Quoth Dmitry Kurochkin on Nov 19 at  6:42 am:
 Hi Jamie.
 
 On Fri, 18 Nov 2011 17:58:52 -0800, Jameson Graef Rollins 
 jroll...@finestructure.net wrote:
  On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin 
  dmitry.kuroch...@gmail.com wrote:
   Before the change, notmuch used g_mime_content_type_to_string(3)
   function to output Content-Type header value.  Turns out it outputs
   only type/subtype part and ignores all parameters.  Also, if there
   is no Content-Type header, default text/plain value is used.
  
  Hi, Dmitry.  Can you explain under what circumstances you would need the
  extra content-type parameters?
 
 Charset is an example of a parameter which you need to render a part
 correctly.

Can notmuch convert to a common charset, given that, otherwise, every
client is going to have to implement this conversion anyway?

(And are there other examples of useful things in the content type?)
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Output unmodified Content-Type header value for JSON format.

2011-11-18 Thread Dmitry Kurochkin
On Fri, 18 Nov 2011 23:59:57 -0500, Austin Clements amdra...@mit.edu wrote:
 Quoth Dmitry Kurochkin on Nov 19 at  6:42 am:
  Hi Jamie.
  
  On Fri, 18 Nov 2011 17:58:52 -0800, Jameson Graef Rollins 
  jroll...@finestructure.net wrote:
   On Sat, 19 Nov 2011 03:45:05 +0400, Dmitry Kurochkin 
   dmitry.kuroch...@gmail.com wrote:
Before the change, notmuch used g_mime_content_type_to_string(3)
function to output Content-Type header value.  Turns out it outputs
only type/subtype part and ignores all parameters.  Also, if there
is no Content-Type header, default text/plain value is used.
   
   Hi, Dmitry.  Can you explain under what circumstances you would need the
   extra content-type parameters?
  
  Charset is an example of a parameter which you need to render a part
  correctly.
 
 Can notmuch convert to a common charset, given that, otherwise, every
 client is going to have to implement this conversion anyway?
 

Notmuch can handle charset (and any other) parameters but only for known
media types (i.e. text/*).  I think it would be useful (especially for
human-readable output formats).  But it is a separate issue.

Notmuch can not convert other types it does not know how to handle.
E.g. HTML charset conversion is not as simple as for plain text.

AFAIK standard defines charset parameter just for few types.  So in
general, charset parameter can have any meaning for some custom media
type.

 (And are there other examples of useful things in the content type?)

What is meant by useful?  All parameters do have some use.  The fact
that notmuch does not handle them does not mean they are useless.  And
notmuch can not handle all parameters just because the list of
parameters is not defined.  So there is no choice but to let notmuch
users see and use these parameters.

Regards,
  Dmitry
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch