Re: emacs complains about encoding?

2012-06-02 Thread Adam Wolfe Gordon
On Wed, May 23, 2012 at 4:15 AM, Michal Sojka sojk...@fel.cvut.cz wrote:
 I think the current plan is to use the same decoding lookup table that
 notmuch-show is using in reply too.

 Which table do you refer to? notmuch-show-handlers-for?

Yep, that looks like the right thing.

I've been particularly busy with work and other things lately, so I
haven't had time to get this change done. If someone else wants to
work on this I'll be happy to review it; otherwise I'll try to get it
done in the next few weeks. Tomi's gnus-decoded fix went into 0.13.1
anyway, so I don't think the rest is particularly urgent.

-- Adam
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: emacs complains about encoding?

2012-05-23 Thread Michal Sojka
Tomi Ollila tomi.oll...@iki.fi writes:
 Michal Sojka sojk...@fel.cvut.cz writes:

 Hello Adam,

 Adam Wolfe Gordon awg+notm...@xvx.ca writes:
 It turns out it's actually not the emacs side, but an interaction
 between our JSON reply format and emacs.

 The JSON reply (and show) code includes part content for all text/*
 parts except text/html. Because all JSON is required to be UTF-8, it
 handles the encoding itself, puts UTF-8 text in, and omits a
 content-charset field from the output. Emacs passes on the
 content-charset field to mm-display-part-inline if it's available, but
 for text/plain parts it's not, leaving mm-display-part-inline to its
 own devices for figuring out what the charset is. It seems
 mm-display-part-inline correctly figures out that it's UTF-8, and puts
 in the series of ugly \nnn characters because that's what emacs does
 with UTF-8 sometimes.

 In the original reply stuff (pre-JSON reply format) emacs used the
 output of notmuch reply verbatim, so all the charset stuff was handled
 in notmuch. Before f6c170fabca8f39e74705e3813504137811bf162, emacs was
 using the JSON reply format, but was inserting the text itself instead
 of using mm-display-part-inline, so emacs still wasn't trying to do
 any charset manipulation. Using mm-display-part-inline is desirable
 because it lets us handle non-text/plain (e.g. text/html) parts
 correctly in reply, and makes the display more consistent (since we
 use it for show). But, it leads to this problem.

 So, there are a couple of solutions I can see:

 1) Have the JSON formats include the original content-charset even
 though they're actually outputting UTF-8. Of the solutions I tried,
 this is the best, even though it doesn't sound like a good thing to
 do.

 2) Have the JSON formats include content only if it's actually UTF-8.
 This means that for non-UTF-8 parts (including ASCII parts), the emacs
 interface has to do more work to display the part content, since it
 must fetch it from outside first. When I tried this, it worked but
 caused the \nnn to show up when viewing messages in emacs. I suspect
 this is because it sets a charset for the whole buffer, and can't
 accommodate messages with different charsets in the same buffer
 properly. Reply works correctly, though.

 3) Have the JSON formats include the charset for all parts, but make
 it UTF-8 for all parts they include content for (since we're actually
 outputting UTF-8). This doesn't seem to fix the problem, even though
 it seems like it should.

 If no one has a better idea or a strong reason not to, I'll send a
 patch for solution (1).

 Thank you very much for your analysis. It encouraged me to dig into the
 problem and I've found another solution, which might be better than
 those you suggested.

 I traced what Emacs does with the text inside
 notmuch-mm-display-part-inline and the wrong charset conversion happens
 deeply in elisp code in mm-with-part called by mm-get-part, which is in
 turn called by mm-inline-text. There is a way to make mm-inline-text not
 to call mm-get-part, which is to set the charset to 'gnus-decoded. This
 sounds like something that applies to our situation, where the part is
 already decoded.

 You've digged deeper than I did... :)


 The following patch (apply it with git am -c) solves the problem for me.
 However, I'm not sure it is a universal solution. It sets the charset
 only if it is not defined in notmuch json output and I'm not sure that
 this is correct. text/html parts seem to have charset defined, but as
 you wrote that json is always utf-8, so it might be that we need
 'gnus-decoded always, independently of the json output. What do you
 think?

 No -- when non-inlined content is fetched by executing command
 notmuch show --format=raw --part=n --decrypt id:message-id the content
 is received with original charset -- and then mm-* components needs to have
 correct charset set (well, I think, I have not tested ;). 

 Also, we cannot rely that the json output doesn't contain content-charset
 information in the future...

 I'm currently applying this to my build tree whenever I rebuild notmuch for
 my own use: id:1337533094-5467-1-git-send-email-tomi.oll...@iki.fi

Great, this is more or less the same solution :-)

 I think the current plan is to use the same decoding lookup table that
 notmuch-show is using in reply too. 

Which table do you refer to? notmuch-show-handlers-for?

 That is good plan for consistency point of view. That just requires
 some code to be moved from notmuch-show.el to some other file (maybe a
 new one).

Sounds good.

Cheers,
-Michal
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: emacs complains about encoding?

2012-05-22 Thread Michal Sojka
Hello Adam,

Adam Wolfe Gordon awg+notm...@xvx.ca writes:
 It turns out it's actually not the emacs side, but an interaction
 between our JSON reply format and emacs.

 The JSON reply (and show) code includes part content for all text/*
 parts except text/html. Because all JSON is required to be UTF-8, it
 handles the encoding itself, puts UTF-8 text in, and omits a
 content-charset field from the output. Emacs passes on the
 content-charset field to mm-display-part-inline if it's available, but
 for text/plain parts it's not, leaving mm-display-part-inline to its
 own devices for figuring out what the charset is. It seems
 mm-display-part-inline correctly figures out that it's UTF-8, and puts
 in the series of ugly \nnn characters because that's what emacs does
 with UTF-8 sometimes.

 In the original reply stuff (pre-JSON reply format) emacs used the
 output of notmuch reply verbatim, so all the charset stuff was handled
 in notmuch. Before f6c170fabca8f39e74705e3813504137811bf162, emacs was
 using the JSON reply format, but was inserting the text itself instead
 of using mm-display-part-inline, so emacs still wasn't trying to do
 any charset manipulation. Using mm-display-part-inline is desirable
 because it lets us handle non-text/plain (e.g. text/html) parts
 correctly in reply, and makes the display more consistent (since we
 use it for show). But, it leads to this problem.

 So, there are a couple of solutions I can see:

 1) Have the JSON formats include the original content-charset even
 though they're actually outputting UTF-8. Of the solutions I tried,
 this is the best, even though it doesn't sound like a good thing to
 do.

 2) Have the JSON formats include content only if it's actually UTF-8.
 This means that for non-UTF-8 parts (including ASCII parts), the emacs
 interface has to do more work to display the part content, since it
 must fetch it from outside first. When I tried this, it worked but
 caused the \nnn to show up when viewing messages in emacs. I suspect
 this is because it sets a charset for the whole buffer, and can't
 accommodate messages with different charsets in the same buffer
 properly. Reply works correctly, though.

 3) Have the JSON formats include the charset for all parts, but make
 it UTF-8 for all parts they include content for (since we're actually
 outputting UTF-8). This doesn't seem to fix the problem, even though
 it seems like it should.

 If no one has a better idea or a strong reason not to, I'll send a
 patch for solution (1).

Thank you very much for your analysis. It encouraged me to dig into the
problem and I've found another solution, which might be better than
those you suggested.

I traced what Emacs does with the text inside
notmuch-mm-display-part-inline and the wrong charset conversion happens
deeply in elisp code in mm-with-part called by mm-get-part, which is in
turn called by mm-inline-text. There is a way to make mm-inline-text not
to call mm-get-part, which is to set the charset to 'gnus-decoded. This
sounds like something that applies to our situation, where the part is
already decoded.

The following patch (apply it with git am -c) solves the problem for me.
However, I'm not sure it is a universal solution. It sets the charset
only if it is not defined in notmuch json output and I'm not sure that
this is correct. text/html parts seem to have charset defined, but as
you wrote that json is always utf-8, so it might be that we need
'gnus-decoded always, independently of the json output. What do you
think?

-Michal

8---
diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 7fa441a..8070f05 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -244,7 +244,7 @@ the given type.
 current buffer, if possible.
   (let ((display-buffer (current-buffer)))
 (with-temp-buffer
-  (let* ((charset (plist-get part :content-charset))
+  (let* ((charset (or (plist-get part :content-charset) 'gnus-decoded))
 (handle (mm-make-handle (current-buffer) `(,content-type (charset 
. ,charset)
;; If the user wants the part inlined, insert the content and
;; test whether we are able to inline it (which includes both
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: emacs complains about encoding?

2012-05-22 Thread Tomi Ollila
Michal Sojka sojk...@fel.cvut.cz writes:

 Hello Adam,

 Adam Wolfe Gordon awg+notm...@xvx.ca writes:
 It turns out it's actually not the emacs side, but an interaction
 between our JSON reply format and emacs.

 The JSON reply (and show) code includes part content for all text/*
 parts except text/html. Because all JSON is required to be UTF-8, it
 handles the encoding itself, puts UTF-8 text in, and omits a
 content-charset field from the output. Emacs passes on the
 content-charset field to mm-display-part-inline if it's available, but
 for text/plain parts it's not, leaving mm-display-part-inline to its
 own devices for figuring out what the charset is. It seems
 mm-display-part-inline correctly figures out that it's UTF-8, and puts
 in the series of ugly \nnn characters because that's what emacs does
 with UTF-8 sometimes.

 In the original reply stuff (pre-JSON reply format) emacs used the
 output of notmuch reply verbatim, so all the charset stuff was handled
 in notmuch. Before f6c170fabca8f39e74705e3813504137811bf162, emacs was
 using the JSON reply format, but was inserting the text itself instead
 of using mm-display-part-inline, so emacs still wasn't trying to do
 any charset manipulation. Using mm-display-part-inline is desirable
 because it lets us handle non-text/plain (e.g. text/html) parts
 correctly in reply, and makes the display more consistent (since we
 use it for show). But, it leads to this problem.

 So, there are a couple of solutions I can see:

 1) Have the JSON formats include the original content-charset even
 though they're actually outputting UTF-8. Of the solutions I tried,
 this is the best, even though it doesn't sound like a good thing to
 do.

 2) Have the JSON formats include content only if it's actually UTF-8.
 This means that for non-UTF-8 parts (including ASCII parts), the emacs
 interface has to do more work to display the part content, since it
 must fetch it from outside first. When I tried this, it worked but
 caused the \nnn to show up when viewing messages in emacs. I suspect
 this is because it sets a charset for the whole buffer, and can't
 accommodate messages with different charsets in the same buffer
 properly. Reply works correctly, though.

 3) Have the JSON formats include the charset for all parts, but make
 it UTF-8 for all parts they include content for (since we're actually
 outputting UTF-8). This doesn't seem to fix the problem, even though
 it seems like it should.

 If no one has a better idea or a strong reason not to, I'll send a
 patch for solution (1).

 Thank you very much for your analysis. It encouraged me to dig into the
 problem and I've found another solution, which might be better than
 those you suggested.

 I traced what Emacs does with the text inside
 notmuch-mm-display-part-inline and the wrong charset conversion happens
 deeply in elisp code in mm-with-part called by mm-get-part, which is in
 turn called by mm-inline-text. There is a way to make mm-inline-text not
 to call mm-get-part, which is to set the charset to 'gnus-decoded. This
 sounds like something that applies to our situation, where the part is
 already decoded.

You've digged deeper than I did... :)


 The following patch (apply it with git am -c) solves the problem for me.
 However, I'm not sure it is a universal solution. It sets the charset
 only if it is not defined in notmuch json output and I'm not sure that
 this is correct. text/html parts seem to have charset defined, but as
 you wrote that json is always utf-8, so it might be that we need
 'gnus-decoded always, independently of the json output. What do you
 think?

No -- when non-inlined content is fetched by executing command
notmuch show --format=raw --part=n --decrypt id:message-id the content
is received with original charset -- and then mm-* components needs to have
correct charset set (well, I think, I have not tested ;). 

Also, we cannot rely that the json output doesn't contain content-charset
information in the future...

I'm currently applying this to my build tree whenever I rebuild notmuch for
my own use: id:1337533094-5467-1-git-send-email-tomi.oll...@iki.fi


I think the current plan is to use the same decoding lookup table that
notmuch-show is using in reply too. That is good plan for consistency
point of view. That just requires some code to be moved from
notmuch-show.el to some other file (maybe a new one).

 -Michal

Tomi



 8---
 diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
 index 7fa441a..8070f05 100644
 --- a/emacs/notmuch-lib.el
 +++ b/emacs/notmuch-lib.el
 @@ -244,7 +244,7 @@ the given type.
  current buffer, if possible.
(let ((display-buffer (current-buffer)))
  (with-temp-buffer
 -  (let* ((charset (plist-get part :content-charset))
 +  (let* ((charset (or (plist-get part :content-charset) 'gnus-decoded))
  (handle (mm-make-handle (current-buffer) `(,content-type 
 (charset . ,charset)
 ;; If the user wants the part 

Re: emacs complains about encoding?

2012-05-20 Thread Adam Wolfe Gordon
On Wed, May 16, 2012 at 3:24 AM, Tomi Ollila tomi.oll...@iki.fi wrote:
 Haa, It doesn't matter which is the original encoding of the message;

 notmuch reply id:20120515194455.b7ad5100...@guru.guru-group.fi

 where  notmuch show --format=raw ^^^  outputs (among other lines):

  Content-Type: text/plain; charset=iso-8859-1
  Content-Transfer-Encoding: quoted-printable

 and

 notmuch reply id:878vgsbprq@nikula.org

 where  notmuch show --format=raw ^^^  outputs (among other lines):

  Content-Type: text/plain; charset=utf-8
  Content-Transfer-Encoding: base64

 produce correct reply content, both in utf-8.

 So it is the emacs side which breaks replies.

It turns out it's actually not the emacs side, but an interaction
between our JSON reply format and emacs.

The JSON reply (and show) code includes part content for all text/*
parts except text/html. Because all JSON is required to be UTF-8, it
handles the encoding itself, puts UTF-8 text in, and omits a
content-charset field from the output. Emacs passes on the
content-charset field to mm-display-part-inline if it's available, but
for text/plain parts it's not, leaving mm-display-part-inline to its
own devices for figuring out what the charset is. It seems
mm-display-part-inline correctly figures out that it's UTF-8, and puts
in the series of ugly \nnn characters because that's what emacs does
with UTF-8 sometimes.

In the original reply stuff (pre-JSON reply format) emacs used the
output of notmuch reply verbatim, so all the charset stuff was handled
in notmuch. Before f6c170fabca8f39e74705e3813504137811bf162, emacs was
using the JSON reply format, but was inserting the text itself instead
of using mm-display-part-inline, so emacs still wasn't trying to do
any charset manipulation. Using mm-display-part-inline is desirable
because it lets us handle non-text/plain (e.g. text/html) parts
correctly in reply, and makes the display more consistent (since we
use it for show). But, it leads to this problem.

So, there are a couple of solutions I can see:

1) Have the JSON formats include the original content-charset even
though they're actually outputting UTF-8. Of the solutions I tried,
this is the best, even though it doesn't sound like a good thing to
do.

2) Have the JSON formats include content only if it's actually UTF-8.
This means that for non-UTF-8 parts (including ASCII parts), the emacs
interface has to do more work to display the part content, since it
must fetch it from outside first. When I tried this, it worked but
caused the \nnn to show up when viewing messages in emacs. I suspect
this is because it sets a charset for the whole buffer, and can't
accommodate messages with different charsets in the same buffer
properly. Reply works correctly, though.

3) Have the JSON formats include the charset for all parts, but make
it UTF-8 for all parts they include content for (since we're actually
outputting UTF-8). This doesn't seem to fix the problem, even though
it seems like it should.

If no one has a better idea or a strong reason not to, I'll send a
patch for solution (1).

-- Adam
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: emacs complains about encoding?

2012-05-16 Thread Jani Nikula
Tomi Ollila tomi.oll...@iki.fi writes:
 IT häppens.

 Attempting to reply to this email should expose the problem.

 Ääliö älä lyö, ööliä läikkyy.

The problem bisected to f6c170fabca8f39e74705e3813504137811bf162
(emacs: Correctly quote non-text/plain parts in reply). The commit
reverts cleanly, replying without it now.

Tomi, it would be great if you could post your message as a patch to the
test suite.

BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: emacs complains about encoding?

2012-05-16 Thread Tomi Ollila
On Wed, May 16 2012, Jani Nikula wrote:

 Tomi Ollila tomi.oll...@iki.fi writes:
 IT h\344ppens.

 Attempting to reply to this email should expose the problem.

 \304\344li\366 \344l\344 ly\366, \366\366li\344 l\344ikkyy.

(\nnn text above hand-(query)-replaced)


 The problem bisected to f6c170fabca8f39e74705e3813504137811bf162
 (emacs: Correctly quote non-text/plain parts in reply). The commit
 reverts cleanly, replying without it now.

 Tomi, it would be great if you could post your message as a patch to the
 test suite.

Haa, It doesn't matter which is the original encoding of the message;

notmuch reply id:20120515194455.b7ad5100...@guru.guru-group.fi

where  notmuch show --format=raw ^^^  outputs (among other lines):

  Content-Type: text/plain; charset=iso-8859-1
  Content-Transfer-Encoding: quoted-printable

and

notmuch reply id:878vgsbprq@nikula.org

where  notmuch show --format=raw ^^^  outputs (among other lines):

  Content-Type: text/plain; charset=utf-8
  Content-Transfer-Encoding: base64

produce correct reply content, both in utf-8.

So it is the emacs side which breaks replies.

I'll see what kind of test cases I can create...


 BR,
 Jani.

Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: emacs complains about encoding?

2012-05-15 Thread Tomi Ollila
IT häppens.

Attempting to reply to this email should expose the problem.

Ääliö älä lyö, ööliä läikkyy.

--
Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: emacs complains about encoding?

2012-05-15 Thread Tomi Ollila
On Tue, May 15 2012, Tomi Ollila tomi.oll...@iki.fi wrote:

 IT häppens.

 Attempting to reply to this email should expose the problem.

 Ääliö älä lyö, ööliä läikkyy.

This email: id:20120515194455.b7ad5100...@guru.guru-group.fi

was supposed to be reply to email id:878vgzrvik@beesknees.cern.ch

I (once again) failed to copy Message id correctly to the
email headers ;/ :

In-Reply-To: id:878vgzrvik@beesknees.cern.ch
References: id:878vgzrvik@beesknees.cern.ch

Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch