[issue46935] import of submodule polutes global namespace

2022-03-06 Thread Max Bachmann


Max Bachmann  added the comment:

Thanks Dennis. This helped me track down the issue in rapidfuzz.

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue46935>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46935] import of submodule polutes global namespace

2022-03-05 Thread Max Bachmann


Max Bachmann  added the comment:

It appears this only occurs when a C Extension is involved. When the so is 
imported first it is preferred over the .py file that the user would like to 
import. I could not find any documentation on this behavior, so I assume that 
this is not the intended.

My current workaround is the usage of a unique name for the C Extension and the 
importing everything from a Python file with the corresponding name.

--

___
Python tracker 
<https://bugs.python.org/issue46935>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46935] import of submodule polutes global namespace

2022-03-05 Thread Max Bachmann


New submission from Max Bachmann :

In my environment I installed the following two libraries:
```
pip install rapidfuzz
pip install python-Levenshtein
```
Those two libraries have the following structures:
rapidfuzz
|-distance
  |- __init__.py (from . import Levenshtein)
  |- Levenshtein.*.so
|-__init__.py (from rapidfuzz import distance)


Levenshtein
|-__init__.py

When importing Levenshtein first everything behaves as expected:
```
>>> import Levenshtein
>>> Levenshtein.
Levenshtein.apply_edit(   Levenshtein.jaro_winkler( Levenshtein.ratio(
Levenshtein.distance( Levenshtein.matching_blocks(  
Levenshtein.seqratio(
Levenshtein.editops(  Levenshtein.median(   
Levenshtein.setmedian(
Levenshtein.hamming(  Levenshtein.median_improve(   
Levenshtein.setratio(
Levenshtein.inverse(  Levenshtein.opcodes(  
Levenshtein.subtract_edit(
Levenshtein.jaro( Levenshtein.quickmedian(   
>>> import rapidfuzz
>>> Levenshtein.
Levenshtein.apply_edit(   Levenshtein.jaro_winkler( Levenshtein.ratio(
Levenshtein.distance( Levenshtein.matching_blocks(  
Levenshtein.seqratio(
Levenshtein.editops(  Levenshtein.median(   
Levenshtein.setmedian(
Levenshtein.hamming(  Levenshtein.median_improve(   
Levenshtein.setratio(
Levenshtein.inverse(  Levenshtein.opcodes(  
Levenshtein.subtract_edit(
Levenshtein.jaro( Levenshtein.quickmedian( 
```

However when importing rapidfuzz first it import 
`rapidfuzz.distance.Levenshtein` when running `import Levenshtein`
```
>>> import rapidfuzz
>>> Levenshtein
Traceback (most recent call last):
  File "", line 1, in 
NameError: name 'Levenshtein' is not defined
>>> import Levenshtein
>>> Levenshtein.
Levenshtein.array(  Levenshtein.normalized_distance(
Levenshtein.similarity(
Levenshtein.distance(   Levenshtein.normalized_similarity(  
Levenshtein.editops(Levenshtein.opcodes( 
```

My expectation was that in both cases `import Levenshtein` should import the 
`Levenshtein` module. I could reproduce this behavior on all Python versions I 
had available (Python3.8 - Python3.10) on Ubuntu and Fedora.

--
components: Interpreter Core
messages: 414599
nosy: maxbachmann
priority: normal
severity: normal
status: open
title: import of submodule polutes global namespace
type: behavior
versions: Python 3.10, Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue46935>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45105] Incorrect handling of unicode character \U00010900

2021-09-05 Thread Max Bachmann

Max Bachmann  added the comment:

As far as a I understood this is caused by the same reason:

```
>>> s = '123\U00010900456'
>>> s
'123ऀ456'
>>> list(s)
['1', '2', '3', 'ऀ', '4', '5', '6']
# note that everything including the commas is mirrored until ] is reached
>>> s[3]
'ऀ'
>>> list(s)[3]
'ऀ'
>>> ls = list(s)
>>> ls[3] += 'a'
>>> ls
['1', '2', '3', 'ऀa', '4', '5', '6']
```

Which as far as I understood is the expected behavior when a right-to-left 
character is encountered.

--

___
Python tracker 
<https://bugs.python.org/issue45105>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45105] Incorrect handling of unicode character \U00010900

2021-09-05 Thread Max Bachmann

Max Bachmann  added the comment:

> That is using Python 3.9 in the xfce4-terminal. Which xterm are you using?

This was in the default gnome terminal that is pre-installed on Fedora 34 and 
on windows I directly opened the Python Terminal. I just installed 
xfce4-terminal on my Fedora 34 machine which has exactly the same behavior for 
me that I had in the gnome terminal.

> But regardless, I cannot replicate the behavior you show where list(s) is 
> different from indexing the characters one by one.

That is what surprised me the most. I just ran into this because this was 
somehow generated when fuzz testing my code using hypothesis (which uncovered 
an unrelated bug in my application). However I was quite confused by the 
character order when debugging it.

My original case was:
```
s1='00'
s2='9010ऀ000\x8dÀĀĀĀ222Ā'
parts = [s2[max(0, i) : min(len(s2), i+len(s1))] for i in range(-len(s1), 
len(s2))]
for part in parts:
print(list(part))
```
which produced
```
[]
['9']
['9', '0']
['9', '0', '1']
['9', '0', '1', '0']
['9', '0', '1', '0', 'ऀ']
['9', '0', '1', '0', 'ऀ', '0']
['0', '1', '0', 'ऀ', '0', '0']
['1', '0', 'ऀ', '0', '0', '0']
['0', 'ऀ', '0', '0', '0', '\x8d']
['ऀ', '0', '0', '0', '\x8d', 'À']
['0', '0', '0', '\x8d', 'À', 'Ā']
['0', '0', '\x8d', 'À', 'Ā', 'Ā']
['0', '\x8d', 'À', 'Ā', 'Ā', 'Ā']
['\x8d', 'À', 'Ā', 'Ā', 'Ā', '2']
['À', 'Ā', 'Ā', 'Ā', '2', '2']
['Ā', 'Ā', 'Ā', '2', '2', '2']
['Ā', 'Ā', '2', '2', '2', 'Ā']
['Ā', '2', '2', '2', 'Ā']
['2', '2', '2', 'Ā']
['2', '2', 'Ā']
['2', 'Ā']
['ĀÀ]
```
which has a missing single quote:
  - ['ĀÀ]
changing direction of characters including commas:
  - ['1', '0', 'ऀ', '0', '0', '0']
changing direction back:
  - ['ऀ', '0', '0', '0', '\x8d', 'À']

> AFAICT, there is no bug here. It's just confusing how Unicode right-to-left 
> characters in the repr() can modify how it's displayed in the 
> console/terminal.

Yes it appears the same confusion occurs in other applications like Firefox and 
VS Code.
Thanks at @eryksun and @steven.daprano for testing and telling me about 
Bidirectional writing in Unicode (The more I know about Unicode the more it 
scares me)

--
status: pending -> open

___
Python tracker 
<https://bugs.python.org/issue45105>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45105] Incorrect handling of unicode character \U00010900

2021-09-05 Thread Max Bachmann

Max Bachmann  added the comment:

This is the result of copy pasting example posted above on windows using 
```
Python 3.7.8 (tags/v3.7.8:4b47a5b6ba, Jun 28 2020, 08:53:46) [MSC v.1916 64 bit 
(AMD64)] on win32
```
which appears to run into similar problems:
```
>>> s = '0��00' 
>>> 
>>> 
>>> 
>>>   >>> s 
>>> 
>>> 
>>> 
>>> 
>>> '0ऀ00'  
>>> 
>>> 
>>> 
>>>   >>> ls = list(s)  
>>> 
>>> 
>>> 
>>> 
>>> >>> ls  
>>> 
>>> 
>>> 
>>>   ['0', 'ऀ', '0', '0']  
>>> 
>>> 
>>> 
>>> 
>>> >>> s[0]
>>> 
>>> 
>>> 
>>>   '0'   
>>> 
>>> 
>>> 
>>> 
>>> >>> s[1]
>>> 
>>> 
>>> 
>>>   'ऀ'
```

--

___
Python tracker 
<https://bugs.python.org/issue45105>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45105] Incorrect handling of unicode character \U00010900

2021-09-05 Thread Max Bachmann

New submission from Max Bachmann :

I noticed that when using the Unicode character \U00010900 when inserting the 
character as character:
Here is the result on the Python console both for 3.6 and 3.9:
```
>>> s = '0ऀ00'
>>> s
'0ऀ00'
>>> ls = list(s)
>>> ls
['0', 'ऀ', '0', '0']
>>> s[0]
'0'
>>> s[1]
'ऀ'
>>> s[2]
'0'
>>> s[3]
'0'
>>> ls[0]
'0'
>>> ls[1]
'ऀ'
>>> ls[2]
'0'
>>> ls[3]
'0'
```

It appears that for some reason in this specific case the character is actually 
stored in a different position that shown when printing the complete string. 
Note that the string is already behaving strange when marking it in the 
console. When marking the special character it directly highlights the last 3 
characters (probably because it already thinks this character is in the second 
position).

The same behavior does not occur when directly using the unicode point
```
>>> s='000\U00010900'
>>> s
'000ऀ'
>>> s[0]
'0'
>>> s[1]
'0'
>>> s[2]
'0'
>>> s[3]
'ऀ'
```

This was tested using the following Python versions:
```
Python 3.6.0 (default, Dec 29 2020, 02:18:14) 
[GCC 10.2.1 20201125 (Red Hat 10.2.1-9)] on linux

Python 3.9.6 (default, Jul 16 2021, 00:00:00) 
[GCC 11.1.1 20210531 (Red Hat 11.1.1-3)] on linux
```
on Fedora 34

--
components: Unicode
messages: 401078
nosy: ezio.melotti, maxbachmann, vstinner
priority: normal
severity: normal
status: open
title: Incorrect handling of unicode character \U00010900
type: behavior
versions: Python 3.6, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue45105>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue43565] PyUnicode_KIND macro does not has specified return type

2021-03-19 Thread Max Bachmann


New submission from Max Bachmann :

The documentation stated, that the PyUnicode_KIND macro has the following 
interface:
- int PyUnicode_KIND(PyObject *o)
However it actually returns a value of the underlying type of the 
PyUnicode_Kind enum. This could be e.g. an unsigned int as well.

--
components: C API
messages: 389133
nosy: maxbachmann
priority: normal
severity: normal
status: open
title: PyUnicode_KIND macro does not has specified return type
type: behavior

___
Python tracker 
<https://bugs.python.org/issue43565>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue42629] PyObject_Call not behaving as documented

2020-12-12 Thread Max Bachmann


New submission from Max Bachmann :

The documentation of PyObject_Call here: 
https://docs.python.org/3/c-api/call.html#c.PyObject_Call
states, that it is the equivalent of the Python expression: callable(*args, 
**kwargs).

so I would expect:
PyObject* args = PyTuple_New(0);
PyObject* kwargs = PyDict_New();
PyObject_Call(funcObj, args, kwargs)

to behave similar to
args = []
kwargs = {}
func(*args, **kwargs)

however this is not the case since in this case when I edit kwargs inside
PyObject* func(PyObject* /*self*/, PyObject* /*args*/, PyObject* keywds)
{
  PyObject* str = PyUnicode_FromString("test_str");
  PyDict_SetItemString(keywds, "test", str);
}

it changes the original dictionary passed into PyObject_Call. I was wondering, 
whether this means, that:
a) it is not allowed to modify the keywds argument passed to a 
PyCFunctionWithKeywords
b) when calling PyObject_Call it is required to copy the kwargs for the call 
using PyDict_Copy

Neither the documentation of PyObject_Call nor the documentation of 
PyCFunctionWithKeywords 
(https://docs.python.org/3/c-api/structures.html#c.PyCFunctionWithKeywords) 
made this clear to me.

--
components: C API
messages: 382927
nosy: maxbachmann
priority: normal
severity: normal
status: open
title: PyObject_Call not behaving as documented
type: behavior
versions: Python 3.9

___
Python tracker 
<https://bugs.python.org/issue42629>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com