Re: Unicode problem with export of literal contents

2023-02-21 Thread Jens Lechtenboerger
On 2023-02-20, Bruno Barbier wrote:

> Jens Lechtenboerger  writes:
>
>> On 2023-02-20, Bruno Barbier wrote:
>>
>> However, if I use insert-file-contents-literally with a unicode
>> file, I do *not* have to set the coding-system-for-write.  This just
>> works:
>>
>>(with-temp-buffer
>>   (insert-file-contents-literally "~/unicode.org")
>>   (secure-hash 'md5 (current-buffer)))
>
> Humm. Emacs is amazing: it managed to guess the right encoding, from the
> buffer context, probably...
>
> But, what you are giving to 'org-export-string-as' is not the buffer,
> it's a string. So, let's try the same without using an org function:
>
>  (with-temp-buffer
>(insert (with-temp-buffer
>  (insert-file-contents-literally "~/unicode.org")
>  (buffer-string)))
>(secure-hash 'md5 (current-buffer)))
>
> And, that fails, requesting an encoding.

Thank you for this example.

>> In the context of Org export, secure-hash seems to require a coding
>> system.  Why?
>
> I'm not an expert, so, you'll need to confirm with other sources.  But
> secure-hash requires an encoding in all cases, to compute the hash of
> some text, because it needs the array of bytes that represents that text
> to compute its hash.
>
> I don't see any bug in org, and, I don't see any bug in secure-hash either.
>
> You literally shoud stop using "literally" ;-)

Indeed.  

> And, you might want to read:
>(info "(elisp) Non-ASCII Characters")

The first section was already helpful, thanks!  (I still need to
read more of this...)

Best wishes
Jens


smime.p7s
Description: S/MIME cryptographic signature


Re: Unicode problem with export of literal contents

2023-02-20 Thread Bruno Barbier


Jens Lechtenboerger  writes:

> On 2023-02-20, Bruno Barbier wrote:
>
> However, if I use insert-file-contents-literally with a unicode
> file, I do *not* have to set the coding-system-for-write.  This just
> works:
>
>(with-temp-buffer
>   (insert-file-contents-literally "~/unicode.org")
>   (secure-hash 'md5 (current-buffer)))

Humm. Emacs is amazing: it managed to guess the right encoding, from the
buffer context, probably...

But, what you are giving to 'org-export-string-as' is not the buffer,
it's a string. So, let's try the same without using an org function:

 (with-temp-buffer
   (insert (with-temp-buffer
 (insert-file-contents-literally "~/unicode.org")
 (buffer-string)))
   (secure-hash 'md5 (current-buffer)))
   
And, that fails, requesting an encoding.


> In the context of Org export, secure-hash seems to require a coding
> system.  Why?

I'm not an expert, so, you'll need to confirm with other sources.  But
secure-hash requires an encoding in all cases, to compute the hash of
some text, because it needs the array of bytes that represents that text
to compute its hash.

I don't see any bug in org, and, I don't see any bug in secure-hash either.

You literally shoud stop using "literally" ;-)

And, you might want to read:
   (info "(elisp) Non-ASCII Characters")



Bruno


>
> Best wishes
> Jens



Re: Unicode problem with export of literal contents

2023-02-20 Thread Jens Lechtenboerger
On 2023-02-20, Bruno Barbier wrote:

> If you're always using utf-8, here is a way to force it so that
> secure-hash can compute the hash. This should work:
>
>(with-temp-buffer
>   (let ((coding-system-for-write 'utf-8))
> (insert "Lechtenb\303\266rger")
> (secure-hash 'md5 (current-buffer

Yes, that works.

However, if I use insert-file-contents-literally with a unicode
file, I do *not* have to set the coding-system-for-write.  This just
works:

   (with-temp-buffer
  (insert-file-contents-literally "~/unicode.org")
  (secure-hash 'md5 (current-buffer)))

In the context of Org export, secure-hash seems to require a coding
system.  Why?

Best wishes
Jens


smime.p7s
Description: S/MIME cryptographic signature


Re: Unicode problem with export of literal contents

2023-02-20 Thread Bruno Barbier


Jens Lechtenboerger  writes:

> On 2023-02-17, Bruno Barbier wrote:
>
>> Here is a way to reproduce that doesn't use org, in case it might help
>> to manully fix your encoding issue:
>>
>>(with-temp-buffer
>>   (insert "Lechtenb\303\266rger")
>>   (let ((buffer-file-name (make-temp-file "mailtest")))
>> (save-buffer)))
>>
>> Does it work with your old config (with your old org) ?
>
> This also asks for an encoding.

If you're always using utf-8, here is a way to force it so that
secure-hash can compute the hash. This should work:

   (with-temp-buffer
  (let ((coding-system-for-write 'utf-8))
(insert "Lechtenb\303\266rger")
(secure-hash 'md5 (current-buffer

Without setting coding-system-for-write to utf-8, it asks for an
encoding:

   (with-temp-buffer
  (insert "Lechtenb\303\266rger")
  (secure-hash 'md5 (current-buffer)))


I'm still no getting your use case, but, let's hope that this naive hack
is enough for you :-)


Bruno


> Best wishes
> Jens



Re: Unicode problem with export of literal contents

2023-02-20 Thread Jens Lechtenboerger
On 2023-02-17, Ihor Radchenko wrote:

> Jens Lechtenboerger  writes:

>> Also, when I call secure-hash on the literal buffer-string, no
>> problem arises.
>
> Org is calling secure-hash on buffer. Calling on buffer-string would
> require unnecessary memory allocation to create the string.

I can call secure-hash on the buffer with literally inserted
contents without problems.

>> It is not obvious that Org tries to write something here and why
>> that fails now
>
> Org is not trying to write something. In you example, Org is just trying
> to calculate buffer string hash - nothing wrong on Org side. "Something
> wrong with encoding" way my guess. If you think that your case should be
> perfectly fine, I recommend asking Emacs devs by filing a bug report to
> them.

Thank you for the clarifications.  Probably I have to do that.

For the record, if I insert "Lechtenb\303\266rger" as string into a
buffer, secure-hash asks for a decoding.  If I insert that literally
via an UTF-8 encoded file, secure-hash works.

Best wishes
Jens


smime.p7s
Description: S/MIME cryptographic signature


Re: Unicode problem with export of literal contents

2023-02-20 Thread Jens Lechtenboerger
On 2023-02-17, Bruno Barbier wrote:

> Here is a way to reproduce that doesn't use org, in case it might help
> to manully fix your encoding issue:
>
>(with-temp-buffer
>   (insert "Lechtenb\303\266rger")
>   (let ((buffer-file-name (make-temp-file "mailtest")))
> (save-buffer)))
>
> Does it work with your old config (with your old org) ?

This also asks for an encoding.

> What kind of failure do you get elsewhere if you let Emacs use the
> correct encoding (i.e. if you use `insert-file-contents') ?

I want to be sure to use the file contents in unchanged form, as
promised by insert-file-contents-literally.  For now, I copied part
of the code from insert-file-contents-literally to avoid
after-insert processing and file handlers.  I still do not
understand what is happening differently in my case, though...

Best wishes
Jens


smime.p7s
Description: S/MIME cryptographic signature


Re: Unicode problem with export of literal contents

2023-02-17 Thread Ihor Radchenko
Jens Lechtenboerger  writes:

> I was afraid you would say so.  To me, this is a breaking change.

It is not a breaking change. It is Org's change revealing issues with
your files. If you need to edit or act upon that part of the file, you
could see the same problem.

> Also, when I call secure-hash on the literal buffer-string, no
> problem arises.

Org is calling secure-hash on buffer. Calling on buffer-string would
require unnecessary memory allocation to create the string.

> It is not obvious that Org tries to write something here and why
> that fails now

Org is not trying to write something. In you example, Org is just trying
to calculate buffer string hash - nothing wrong on Org side. "Something
wrong with encoding" way my guess. If you think that your case should be
perfectly fine, I recommend asking Emacs devs by filing a bug report to
them.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at .
Support Org development at ,
or support my work at 



Re: Unicode problem with export of literal contents

2023-02-17 Thread Bruno Barbier


Jens Lechtenboerger  writes:

> So, maybe my question is: Must text be decoded for Org mode from now on?

Yes. Since forever.  Emacs must know how to read/write from/to files and
what text to display to you. Org is just relying on Emacs for that part.

Bruno




Re: Unicode problem with export of literal contents

2023-02-17 Thread Bruno Barbier
Jens Lechtenboerger  writes:

> On 2023-02-17, Ihor Radchenko wrote:
>
>> Jens Lechtenboerger  writes:
>>
>
>> Not a bug. You need to fix your files with improper encoding.
>
> The file has the proper encoding.  I insert literally on purpose as
> stated above.

IIUC, the file has the proper encoding. But, when loading it with
`insert-file-contents-literally', it doesn't: that's part of the
"literally" feature I guess.

When loading it with `insert-file-contents', it should work (it does in
my case).


Here is a way to reproduce that doesn't use org, in case it might help
to manully fix your encoding issue:

   (with-temp-buffer
  (insert "Lechtenb\303\266rger")
  (let ((buffer-file-name (make-temp-file "mailtest")))
(save-buffer)))

Does it work with your old config (with your old org) ?

What kind of failure do you get elsewhere if you let Emacs use the
correct encoding (i.e. if you use `insert-file-contents') ?



Bruno



Re: Unicode problem with export of literal contents

2023-02-17 Thread Jens Lechtenboerger
On 2023-02-17, Ihor Radchenko wrote:

> Jens Lechtenboerger  writes:
>
>> With Org 9.6.1 from Emacs master, I get the following warning, and I
>> am asked to select a coding system:
>>
>>> These default coding systems were tried to encode the following
>>> problematic characters in the buffer ‘ *temp*’:
>>> ...
>>
>> With previous Org versions, this did not happen, export would just
>> work.  Note that I insert contents literally because I do not want
>> ‘find-file-hook’, automatic uncompression, etc. (which are avoided
>> according to the doc string of insert-file-contents-literally).
>
> This warning appears upon Org calling `secure-hash'.
> Org is doing nothing wrong here - your file does not have proper encoding.
> You did not see this error in the past by chance.

I was afraid you would say so.  To me, this is a breaking change.

Also, when I call secure-hash on the literal buffer-string, no
problem arises.

> Not a bug. You need to fix your files with improper encoding.

The file has the proper encoding.  I insert literally on purpose as
stated above.

It is not obvious that Org tries to write something here and why
that fails now (I could use the results in exporters writing to
files just fine previously).

Best wishes
Jens


smime.p7s
Description: S/MIME cryptographic signature


Re: Unicode problem with export of literal contents

2023-02-17 Thread Ihor Radchenko
Jens Lechtenboerger  writes:

> With Org 9.6.1 from Emacs master, I get the following warning, and I
> am asked to select a coding system:
>
>> These default coding systems were tried to encode the following
>> problematic characters in the buffer ‘ *temp*’:
>> ...
>
> With previous Org versions, this did not happen, export would just
> work.  Note that I insert contents literally because I do not want
> ‘find-file-hook’, automatic uncompression, etc. (which are avoided
> according to the doc string of insert-file-contents-literally).

This warning appears upon Org calling `secure-hash'.
Org is doing nothing wrong here - your file does not have proper encoding.
You did not see this error in the past by chance.

Not a bug. You need to fix your files with improper encoding.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at .
Support Org development at ,
or support my work at 



Re: Unicode problem with export of literal contents

2023-02-16 Thread Jens Lechtenboerger
Hi Bruno,

On 2023-02-17, Bruno Barbier wrote:

> Hi Jens,
>
> Jens Lechtenboerger  writes:
>
>> ...
>> Note that I insert contents literally because I do not want
>> ‘find-file-hook’, automatic uncompression, etc. (which are avoided
>> according to the doc string of insert-file-contents-literally).
>>
>> Could the old behavior be restored?
>
> By using `insert-file-contents-literally' (as opposed to
> `insert-file-contents'), you're also forbidding Emacs to decode the
> binary content of your file into text.
>
> My guess is that it was working by chance in previous versions.

in any case, this will introduce failures elsewhere.

> In case somebody might help you, here is a simple way to trigger the
> encoding question with a recent version of org (mine is Org mode version 
> 9.6.1).
>
>(with-temp-buffer
>   (insert "Lechtenb\303\266rger")
>   (org-mode))

Thank you for the simpler recipe.  This indeed fails now.

So, maybe my question is: Must text be decoded for Org mode from now on?

Best wishes
Jens


smime.p7s
Description: S/MIME cryptographic signature


Re: Unicode problem with export of literal contents

2023-02-16 Thread Bruno Barbier


Hi Jens,

Jens Lechtenboerger  writes:

> ...
> Note that I insert contents literally because I do not want
> ‘find-file-hook’, automatic uncompression, etc. (which are avoided
> according to the doc string of insert-file-contents-literally).
>
> Could the old behavior be restored?

By using `insert-file-contents-literally' (as opposed to
`insert-file-contents'), you're also forbidding Emacs to decode the
binary content of your file into text.

My guess is that it was working by chance in previous versions.

In case somebody might help you, here is a simple way to trigger the
encoding question with a recent version of org (mine is Org mode version 9.6.1).

   (with-temp-buffer
  (insert "Lechtenb\303\266rger")
  (org-mode))



Bruno




Unicode problem with export of literal contents

2023-02-16 Thread Jens Lechtenboerger
Hi there,

consider this piece code, where unicode-file.org contains umlauts
(say, just the word “Lechtenbörger”):

(org-export-string-as
 (with-temp-buffer
   (insert-file-contents-literally "unicode-file.org")
   (buffer-string))
 'html t)

With Org 9.6.1 from Emacs master, I get the following warning, and I
am asked to select a coding system:

> These default coding systems were tried to encode the following
> problematic characters in the buffer ‘ *temp*’:
> ...

With previous Org versions, this did not happen, export would just
work.  Note that I insert contents literally because I do not want
‘find-file-hook’, automatic uncompression, etc. (which are avoided
according to the doc string of insert-file-contents-literally).

Could the old behavior be restored?

Best wishes
Jens