Re: Non-ASCII characters in a node body make edit operations produce unintended results

2022-05-06 Thread Edward K. Ream
On Fri, May 6, 2022 at 12:02 PM tbp1...@gmail.com 
wrote:

the symbols involved do not change with the font, so it's not a matter of
> missing or wrong glyphs getting drawn.  it might be something like utf8
> decoding getting shifted by a bit or a nibble, or getting an extra byte
> inserted spuriously into the decoded stream.  if so, it would probably be
> an issue with the decoding library that qt uses.


I agree with this analysis. The only workaround I can think of is not to
paste the offending glyphs/characters.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAMF8tS3nz2ZYHB%3D6teQWhUTspb8Tb45qv808npgrO4ia4o%3DH9w%40mail.gmail.com.


Re: Non-ASCII characters in a node body make edit operations produce unintended results

2022-05-06 Thread tbp1...@gmail.com
I'm in the dark too, but when i encountered the same problem several leo 
versions ago, i raised the issue and then after a few more merges the 
problem was gone.  actually, that time it was more serious because every 
use of the ctrl key (iirc) inserted those symbols.  it's always been those 
exact strange symbols, so there must be some very specific thing going on.  
the symbols involved do not change with the font, so it's not a matter of 
missing or wrong glyphs getting drawn.  it might be something like utf8 
decoding getting shifted by a bit or a nibble, or getting an extra byte 
inserted spuriously into the decoded stream.  if so, it would probably be 
an issue with the decoding library that qt uses.

i never knew what had changed - i assumed that @edward had done something 
to fix it - but if someone can find my report - maybe 2 years ago by now?  
- and has some skill with tracking changes through git - something might 
come too light.  sorry, i'm not in a position to do it myself right now.

On Friday, May 6, 2022 at 10:15:40 AM UTC-4 Edward K. Ream wrote:

> On Thu, May 5, 2022 at 9:27 AM Arjan  wrote:
>
>> It's not a new problem: 
>> https://github.com/leo-editor/leo-editor/issues/1368
>
>
> Yes. I have no idea how it could be fixed.  Anyone have any suggestions?
>
> Edward
>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/52951c93-76ef-4caf-a616-d662707c62bcn%40googlegroups.com.


Re: Non-ASCII characters in a node body make edit operations produce unintended results

2022-05-06 Thread Edward K. Ream
On Thu, May 5, 2022 at 9:27 AM Arjan  wrote:

> It's not a new problem:
> https://github.com/leo-editor/leo-editor/issues/1368


Yes. I have no idea how it could be fixed.  Anyone have any suggestions?

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAMF8tS2pSOzRPBZAB0xVHWGL%2BQ7B4v8f5kK4ce0wsCFWo21C1A%40mail.gmail.com.


Re: Non-ASCII characters in a node body make edit operations produce unintended results

2022-05-05 Thread Arjan
It's not a new problem: https://github.com/leo-editor/leo-editor/issues/1368

On Thursday, April 14, 2022 at 2:55:33 AM UTC+2 tbp1...@gmail.com wrote:

>
> There could also be a problem with a specific version of Qt, so if you can 
> try later version (or possibly earlier) it might behave differently. 
> Supposedly, all Qt widgets and strings work correctly with unicode and/or 
> utf-8 encoding.
> On Wednesday, April 13, 2022 at 4:14:07 PM UTC-4 tbp1...@gmail.com wrote:
>
>> It looks like that on particular page, the non-ascii characters are 
>> emojis.  I copied part of that page with two of the emojis into a Leo node 
>> and didn't see any unusual behavior.  ,  and copying with 
>>  worked as expected.  Do you have an example that didn't work right 
>> for you?
>>
>> Here's an online checker for non-ascii characters:  Non-Ascii Checker 
>> . You can paste suspect 
>> text in or point it to a file.
>>
>> Since Python by default uses utf-8 and unicode, text that isn't encoded 
>> in utf-8 could cause problems.  Or if it is wrongly encoded, or encoded 
>> with some other encoding.  Some text editors can figure it out and you can 
>> tell them to save a file in a different encoding.  EditPlus is the one I 
>> use for this.  Not free but worth the $35.  Notepad++ also can do it, 
>> though I haven't used it.
>>
>> Characters that your font does not have a glyph for might be troublesome 
>> too, but I'm not sure.  Again, emojis probably would be the most likely if 
>> we're not getting into cjk characters, since so many new emojis are getting 
>> introduced..
>>
>> If we see the kind of behavior you experienced in properly encoded 
>> strings, then for sure we'd have a problem.  Unfortunately there is a lot 
>> of incorrectly encoded material out there.  Hmm, I wonder if Leo should 
>> have an encoding checker built in?
>>
>> On Wednesday, April 13, 2022 at 2:48:06 PM UTC-4 SegundoBob wrote:
>>
>>> I don't know if this is a bug or just the way PyQt works, but this is a 
>>> very annoying problem.  Sometimes HOME takes you to the end of line instead 
>>> of the start.  Sometimes select and Ctrl+C copies unselected characters.  
>>> The "mistakes" are endless because the displayed cursor position is not 
>>> "correct".
>>>
>>> I first noticed this problem in 2022-02 because more and more articles 
>>> posted on the Internet contain non-ASCII and everyday I copy many articles 
>>> to node bodies and then edit them slightly.
>>>
>>> 2022-04-13 Wed I definitely identified the problem with the help of this 
>>> command:
>>>
>>> grep --color='auto' -P -n "[^\x00-\x7F]" x.txt
>>>
>>> which I obtained from
>>>
>>>
>>> https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters
>>>
>>> Here is an example article containing many non-ASCII characters:
>>>
>>> https://newsletter.pragmaticengineer.com/p/scoop-atlassian
>>>
>>> There are many suggestions on the Internet for removing non-ASCII 
>>> characters using Python.  So far this is the best workaround that I've come 
>>> up with.  If we don't come with a fix or a better workaround, I'll 
>>> eventually figure out how to replace non-ASCII charcters that have similar 
>>> ASCII characters with the appropriate ASCII characters.  Someone has 
>>> probably implemented this, but so far I have not found it.
>>>
>>> Unfortunately, I have higher priority problems right now that prevent me 
>>> from devoting much time to this problem.
>>>
>>> Versions tested:
>>>
>>> Leo 6.6b2-devel, devel branch, build 0ce2fa9ad5
>>> 2022-02-24 09:55:29 -0600
>>> Python 3.8.10, PyQt version 5.12.8
>>> linux
>>> ---
>>> Leo 6.6.1-devel, devel branch, build 90bad4f475
>>> 2022-04-13 09:33:47 -0500
>>> Python 3.8.10, PyQt version 5.12.8
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/0a388816-9669-42a9-8104-84bcb66da992n%40googlegroups.com.


Re: Non-ASCII characters in a node body make edit operations produce unintended results

2022-04-13 Thread tbp1...@gmail.com

There could also be a problem with a specific version of Qt, so if you can 
try later version (or possibly earlier) it might behave differently. 
Supposedly, all Qt widgets and strings work correctly with unicode and/or 
utf-8 encoding.
On Wednesday, April 13, 2022 at 4:14:07 PM UTC-4 tbp1...@gmail.com wrote:

> It looks like that on particular page, the non-ascii characters are 
> emojis.  I copied part of that page with two of the emojis into a Leo node 
> and didn't see any unusual behavior.  ,  and copying with 
>  worked as expected.  Do you have an example that didn't work right 
> for you?
>
> Here's an online checker for non-ascii characters:  Non-Ascii Checker 
> . You can paste suspect text 
> in or point it to a file.
>
> Since Python by default uses utf-8 and unicode, text that isn't encoded in 
> utf-8 could cause problems.  Or if it is wrongly encoded, or encoded with 
> some other encoding.  Some text editors can figure it out and you can tell 
> them to save a file in a different encoding.  EditPlus is the one I use for 
> this.  Not free but worth the $35.  Notepad++ also can do it, though I 
> haven't used it.
>
> Characters that your font does not have a glyph for might be troublesome 
> too, but I'm not sure.  Again, emojis probably would be the most likely if 
> we're not getting into cjk characters, since so many new emojis are getting 
> introduced..
>
> If we see the kind of behavior you experienced in properly encoded 
> strings, then for sure we'd have a problem.  Unfortunately there is a lot 
> of incorrectly encoded material out there.  Hmm, I wonder if Leo should 
> have an encoding checker built in?
>
> On Wednesday, April 13, 2022 at 2:48:06 PM UTC-4 SegundoBob wrote:
>
>> I don't know if this is a bug or just the way PyQt works, but this is a 
>> very annoying problem.  Sometimes HOME takes you to the end of line instead 
>> of the start.  Sometimes select and Ctrl+C copies unselected characters.  
>> The "mistakes" are endless because the displayed cursor position is not 
>> "correct".
>>
>> I first noticed this problem in 2022-02 because more and more articles 
>> posted on the Internet contain non-ASCII and everyday I copy many articles 
>> to node bodies and then edit them slightly.
>>
>> 2022-04-13 Wed I definitely identified the problem with the help of this 
>> command:
>>
>> grep --color='auto' -P -n "[^\x00-\x7F]" x.txt
>>
>> which I obtained from
>>
>>
>> https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters
>>
>> Here is an example article containing many non-ASCII characters:
>>
>> https://newsletter.pragmaticengineer.com/p/scoop-atlassian
>>
>> There are many suggestions on the Internet for removing non-ASCII 
>> characters using Python.  So far this is the best workaround that I've come 
>> up with.  If we don't come with a fix or a better workaround, I'll 
>> eventually figure out how to replace non-ASCII charcters that have similar 
>> ASCII characters with the appropriate ASCII characters.  Someone has 
>> probably implemented this, but so far I have not found it.
>>
>> Unfortunately, I have higher priority problems right now that prevent me 
>> from devoting much time to this problem.
>>
>> Versions tested:
>>
>> Leo 6.6b2-devel, devel branch, build 0ce2fa9ad5
>> 2022-02-24 09:55:29 -0600
>> Python 3.8.10, PyQt version 5.12.8
>> linux
>> ---
>> Leo 6.6.1-devel, devel branch, build 90bad4f475
>> 2022-04-13 09:33:47 -0500
>> Python 3.8.10, PyQt version 5.12.8
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/82e6e393-a7a9-4d4d-8dac-30b8756cd0b7n%40googlegroups.com.


Re: Non-ASCII characters in a node body make edit operations produce unintended results

2022-04-13 Thread tbp1...@gmail.com
It looks like that on particular page, the non-ascii characters are 
emojis.  I copied part of that page with two of the emojis into a Leo node 
and didn't see any unusual behavior.  ,  and copying with 
 worked as expected.  Do you have an example that didn't work right 
for you?

Here's an online checker for non-ascii characters:  Non-Ascii Checker 
. You can paste suspect text 
in or point it to a file.

Since Python by default uses utf-8 and unicode, text that isn't encoded in 
utf-8 could cause problems.  Or if it is wrongly encoded, or encoded with 
some other encoding.  Some text editors can figure it out and you can tell 
them to save a file in a different encoding.  EditPlus is the one I use for 
this.  Not free but worth the $35.  Notepad++ also can do it, though I 
haven't used it.

Characters that your font does not have a glyph for might be troublesome 
too, but I'm not sure.  Again, emojis probably would be the most likely if 
we're not getting into cjk characters, since so many new emojis are getting 
introduced..

If we see the kind of behavior you experienced in properly encoded strings, 
then for sure we'd have a problem.  Unfortunately there is a lot of 
incorrectly encoded material out there.  Hmm, I wonder if Leo should have 
an encoding checker built in?

On Wednesday, April 13, 2022 at 2:48:06 PM UTC-4 SegundoBob wrote:

> I don't know if this is a bug or just the way PyQt works, but this is a 
> very annoying problem.  Sometimes HOME takes you to the end of line instead 
> of the start.  Sometimes select and Ctrl+C copies unselected characters.  
> The "mistakes" are endless because the displayed cursor position is not 
> "correct".
>
> I first noticed this problem in 2022-02 because more and more articles 
> posted on the Internet contain non-ASCII and everyday I copy many articles 
> to node bodies and then edit them slightly.
>
> 2022-04-13 Wed I definitely identified the problem with the help of this 
> command:
>
> grep --color='auto' -P -n "[^\x00-\x7F]" x.txt
>
> which I obtained from
>
>
> https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters
>
> Here is an example article containing many non-ASCII characters:
>
> https://newsletter.pragmaticengineer.com/p/scoop-atlassian
>
> There are many suggestions on the Internet for removing non-ASCII 
> characters using Python.  So far this is the best workaround that I've come 
> up with.  If we don't come with a fix or a better workaround, I'll 
> eventually figure out how to replace non-ASCII charcters that have similar 
> ASCII characters with the appropriate ASCII characters.  Someone has 
> probably implemented this, but so far I have not found it.
>
> Unfortunately, I have higher priority problems right now that prevent me 
> from devoting much time to this problem.
>
> Versions tested:
>
> Leo 6.6b2-devel, devel branch, build 0ce2fa9ad5
> 2022-02-24 09:55:29 -0600
> Python 3.8.10, PyQt version 5.12.8
> linux
> ---
> Leo 6.6.1-devel, devel branch, build 90bad4f475
> 2022-04-13 09:33:47 -0500
> Python 3.8.10, PyQt version 5.12.8
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/16bd0c28-8763-49cd-a88c-0fc70e2be6d3n%40googlegroups.com.


Non-ASCII characters in a node body make edit operations produce unintended results

2022-04-13 Thread SegundoBob
I don't know if this is a bug or just the way PyQt works, but this is a 
very annoying problem.  Sometimes HOME takes you to the end of line instead 
of the start.  Sometimes select and Ctrl+C copies unselected characters.  
The "mistakes" are endless because the displayed cursor position is not 
"correct".

I first noticed this problem in 2022-02 because more and more articles 
posted on the Internet contain non-ASCII and everyday I copy many articles 
to node bodies and then edit them slightly.

2022-04-13 Wed I definitely identified the problem with the help of this 
command:

grep --color='auto' -P -n "[^\x00-\x7F]" x.txt

which I obtained from

https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters

Here is an example article containing many non-ASCII characters:

https://newsletter.pragmaticengineer.com/p/scoop-atlassian

There are many suggestions on the Internet for removing non-ASCII 
characters using Python.  So far this is the best workaround that I've come 
up with.  If we don't come with a fix or a better workaround, I'll 
eventually figure out how to replace non-ASCII charcters that have similar 
ASCII characters with the appropriate ASCII characters.  Someone has 
probably implemented this, but so far I have not found it.

Unfortunately, I have higher priority problems right now that prevent me 
from devoting much time to this problem.

Versions tested:

Leo 6.6b2-devel, devel branch, build 0ce2fa9ad5
2022-02-24 09:55:29 -0600
Python 3.8.10, PyQt version 5.12.8
linux
---
Leo 6.6.1-devel, devel branch, build 90bad4f475
2022-04-13 09:33:47 -0500
Python 3.8.10, PyQt version 5.12.8

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/895ae293-b693-4b34-85d9-be1092db412en%40googlegroups.com.