[Python-ideas] Re: C API for converting Python integers to/from bytes sequences

2021-08-11 Thread Ronald Oussoren via Python-ideas


> On 8 Aug 2021, at 18:53, Serhiy Storchaka  wrote:
> 
> 08.08.21 07:08, Stephen J. Turnbull пише:
>> Serhiy Storchaka writes:
>> 
>>> Python integers have arbitrary precision. For serialization and
>>> interpolation with other programs and libraries we need to
>>> represent them [...].  [In the case of non-standard precisions,]
>>> [t]here are private C API functions _PyLong_AsByteArray and
>>> _PyLong_FromByteArray, but they are for internal use only.
>>> 
>>> I am planning to add public analogs of these private functions, but more
>>> powerful and convenient.
>>> 
>>> PyObject *PyLong_FromBytes(const void *buf, Py_ssize_t size,
>>>   int byteorder, int signed)
>>> 
>>> Py_ssize_t PyLong_AsBytes(PyObject *o, void *buf, Py_ssize_t n,
>>>  int byteorder, int signed, int *overflow)
>> 
>> I don't understand why such a complex API is useful as a public facility.
> 
> There are several goals:
> 
> 1. Support conversion to/from all C integer types (char, short, int,
> long, long long, intN_t, intptr_t, intmax_t, wchar_t, wint_t and
> corresponding unsigned types), POSIX integer types (pid_t, uid_t, off_t,
> etc) and other platfrom or library specific integer types (like
> Tcl_WideInt in libtcl). Currently only supported types are long,
> unsigned long, long long, unsigned long, ssize_t and size_t. For other
> types you should choose the most appropriate supertype (long or long
> long, sometimes providing several varians) and manually handle overflow.
> 
> There are requests for PyLong_AsShort(), PyLong_AsInt32(),
> PyLong_AsMaxInt(), etc. It is better to provide a single universal
> function than extend API by several dozens functions.

But how would you convert from the buffer to the actual type you want? IMHO 
“pid_t a_pid; Py_LongAsBytes(val, _pid, sizeof(pid_t), …)” would be worse 
than having a number of aliases. The API is more cumbersome to use, and you 
loose type checking from the C compiler. 

Other than that, the variants you mention could in general just by aliases for 
conversion functions to/from the basic C types. 

Ronald

—

Twitter / micro.blog: @ronaldoussoren
Blog: https://blog.ronaldoussoren.net/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/5QME5PGFUAXNDIVAD2PFC25WXV4MNERZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: C API for converting Python integers to/from bytes sequences

2021-08-09 Thread Christopher Barker
On Sun, Aug 8, 2021 at 9:54 AM Serhiy Storchaka  wrote:

> 1. Support conversion to/from all C integer types (char, short, int,
> long, long long, intN_t, intptr_t, intmax_t, wchar_t, wint_t and
> corresponding unsigned types),


I suggest support for the "new" C sized types available in 

Why anyone would want to use `long` that could be 32 or 64 bit depending on
platform/compiler compiler, rather than `int32_t` or `int_64_t` is still
confusing to me.

granted, I only write a small amount of C extension code, but I always used
the sized types, as otherwise I have no idea what might happen on different
platforms.

But in any case, thanks for doing this, it's a great idea.

-CHB

-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/S46QXIA4BQJKKN3OAQWSQZCQAL4SYEX2/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: C API for converting Python integers to/from bytes sequences

2021-08-08 Thread Kyle Stanley
I lack the relevant experience to have an opinion on most of this, but FWIW
"PyLong_FromBytes/PyLong_ToBytes' seems clearest to me out of the options
proposed.

On Sat, Aug 7, 2021 at 2:23 PM Serhiy Storchaka  wrote:

> Python integers have arbitrary precision. For serialization and
> interpolation with other programs and libraries we need to represent
> them as fixed-width integers (little- and big-endian, signed and
> unsigned). In Python, we can use struct, array, memoryview and ctypes
> use for some standard sizes and int methods int.to_bytes and
> int.from_bytes for non-standard sizes. In C, there is the C API for
> converting to/from C types long, unsigned long, long long and unsigned
> long long. For other C types (signed and unsigned char, short, int) we
> need to use the C API for converting to long, and then truncate to the
> destination type with checking for overflow. For integers type aliases
> like pid_t we need to determine their size and signess and use
> corresponding C API or wrapper. For non-standard integers (e.g. 24-bit),
> integers wider than long long, and arbitrary precision integers all is
> much more complicated. There are private C API functions
> _PyLong_AsByteArray and _PyLong_FromByteArray, but they are for internal
> use only.
>
> I am planning to add public analogs of these private functions, but more
> powerful and convenient.
>
> PyObject *PyLong_FromBytes(const void *buf, Py_ssize_t size,
>int byteorder, int signed)
>
> Py_ssize_t PyLong_AsBytes(PyObject *o, void *buf, Py_ssize_t n,
>   int byteorder, int signed, int *overflow)
>
> PyLong_FromBytes() returns the int object. It only fails in case of
> memory error or incorrect arguments (e.g. buf is NULL).
>
> PyLong_AsBytes() writes bytes to the specified buffer, it does not
> allocate memory. If buf is NULL it returns the minimum size of the
> buffer for representing the integer. -1 is returned on error. if
> overflow is NULL, then OverfowError is raised, otherwise *overflow is
> set to +1 for overflowing the upper limit, -1 for overflowing the lower
> limit, and 0 for no overflow.
>
> Now I have some design questions.
>
> 1. How to encode the byte order?
>
> a) 1 -- little endian, 0 -- big endian
> b) 0 -- little endian, 1 -- big endian
> c) -1 -- little endian, +1 -- big endian, 0 -- native endian.
>
> Do we need to reserve some values for mixed endians?
>
> 2. How to specify the reduction modulo 2**(8*size) (like in
> PyLong_AsUnsignedLongMask)?
>
> Add yet one flag in PyLong_AsBytes()? Use special value for the signed
> argument? 0 -- unsigned, 1 -- signed, 2 (or -1) -- modulo. Or use some
> combination of signed and overflow?
>
> 3. How to specify saturation (like in PyNumber_AsSsize_t())? I.e. values
> less than the lower limit are replaced with the lower limit, values
> greater than the upper limit are replaced with the upper limit.
>
> Same options as for (2): separate flag, encode in signed (but we need
> two values here) or combination of other parameters.
>
> 4. What exact names to use?
>
> PyLong_FromByteArray/PyLong_AsByteArray,
> PyLong_FromBytes/PyLong_AsBytes, PyLong_FromBytes/PyLong_ToBytes?
>
>
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/V2EKXMKSQV25BMRPMDH47IM2OYCLY2TF/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VVEKHEHP27I453YRW46T3HPJMTZFXLQT/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: C API for converting Python integers to/from bytes sequences

2021-08-08 Thread Serhiy Storchaka
08.08.21 11:41, Barry Scott пише:
>> On 7 Aug 2021, at 19:22, Serhiy Storchaka  wrote:
>> 1. How to encode the byte order?
>>
>> a) 1 -- little endian, 0 -- big endian
>> b) 0 -- little endian, 1 -- big endian
>> c) -1 -- little endian, +1 -- big endian, 0 -- native endian.
> 
> Use an enum and do not use 0 as a valid value to make mistakes easier to 
> detect.
> I think you are right to have big endian, little endian and native endian.
> I do not think the numeric values of the enum matter (apart from avoiding 0).

There is a precedence of using +1/-1/0 for big/little/native in the
UTF16 and UTF32 codecs. I think that using the same convention will be
more error-proof.

> Maybe a single enum that has:
> signed (modulo)
> signed saturate
> unsigned (modulo)
> unsigned saturate

There is a problem with enum -- the size of the type is not specified.
It can be int, it can be 8 bits, it can be less than 8 bits in
structure. Adding new members can change the size of the type. Therefore
it is not stable for ABI.

But combining options for signessness and overflow handling (or
providing a set of functions for different overflow handling, because
the output overflow parameters is not in all cases) may be the best option.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/QETPJWGWERSQYY2VE25HDJBIFUEZUF25/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: C API for converting Python integers to/from bytes sequences

2021-08-08 Thread Serhiy Storchaka
08.08.21 07:08, Stephen J. Turnbull пише:
> Serhiy Storchaka writes:
> 
>  > Python integers have arbitrary precision. For serialization and
>  > interpolation with other programs and libraries we need to
>  > represent them [...].  [In the case of non-standard precisions,]
>  > [t]here are private C API functions _PyLong_AsByteArray and
>  > _PyLong_FromByteArray, but they are for internal use only.
>  > 
>  > I am planning to add public analogs of these private functions, but more
>  > powerful and convenient.
>  > 
>  > PyObject *PyLong_FromBytes(const void *buf, Py_ssize_t size,
>  >int byteorder, int signed)
>  > 
>  > Py_ssize_t PyLong_AsBytes(PyObject *o, void *buf, Py_ssize_t n,
>  >   int byteorder, int signed, int *overflow)
> 
> I don't understand why such a complex API is useful as a public facility.

There are several goals:

1. Support conversion to/from all C integer types (char, short, int,
long, long long, intN_t, intptr_t, intmax_t, wchar_t, wint_t and
corresponding unsigned types), POSIX integer types (pid_t, uid_t, off_t,
etc) and other platfrom or library specific integer types (like
Tcl_WideInt in libtcl). Currently only supported types are long,
unsigned long, long long, unsigned long, ssize_t and size_t. For other
types you should choose the most appropriate supertype (long or long
long, sometimes providing several varians) and manually handle overflow.

There are requests for PyLong_AsShort(), PyLong_AsInt32(),
PyLong_AsMaxInt(), etc. It is better to provide a single universal
function than extend API by several dozens functions.

2. Support different options for overflow handling. Different options
are present in PyLong_AsLong(), PyLong_AsLongAndOverflow(),
PyLong_AsUnsignedLongMask() and PyNumber_AsSsize_t(). But not all
options are available for all types. There is no *AndOverflow() variant
for unsigned types, size_t, ssize_t, and saturation is only available
for ssize_t.

3. Support serialization of arbitrary precision integers. It is used in
pickle and random, and can be used to support other binary data formats.

All these goals can be achieved by few universal functions.

> So I might want
> PyLong_AsGMPInt and PyLong_AsGMPRatio as well as the corresponding
> functions for MP, and maybe even PyLong_AsGMPFloat.  The obvious way
> to write those is (str(python_integer)), I think.

PyLong_AsGMPInt() cannot be added until GMP be included in Python
interpreter, and it is very unlikely. Converting via decimal
representation is very inefficient way, especially for very long
integers (it has cubic complexity from the size of the integer). I think
GMP support more efficient conversions.

> In the unlikely event that an
> application needs to squeeze out that tiny bit of performance, I guess
> the library constructors all accept buffers of bytes, too, probably
> with a similarly complex API that can handle whatever the Python ABI
> throws at them.

For using the library constructors accepting buffers of bytes we need
buffers of bytes. And the proposed functions provide the only interface
for conversion Python integers to/from buffer of bytes.

> In which case why not just expose the internal
> functions?

If you mean _PyLong_FromByteArray/_PyLong_AsByteArray, it is because we
should polish them before exposing them. They currently do not provide
different options for overflow, and I think that it may be more
convenient way for common case of native bytes order. The names of
functions, the number and order of parameters can be discussed. For such
discussion I opened this thread. If you have alternative propositions,
please show them.

> Is it at all likely that that representation would ever
> change?

They do not rely on internal representation. They are for
implementation-indepent representation.

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HZTNOP63WRSHITFPTWJ526UCLNAVE2NS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: C API for converting Python integers to/from bytes sequences

2021-08-08 Thread 2QdxY4RzWzUUiLuE
On 2021-08-08 at 09:41:34 +0100,
Barry Scott  wrote:

> What is mixed endian? I would guess that its use would be application
> specific - so I assume you would not need to support it.

Not AFAIK application specific, but hardware specific:

https://en.wikipedia.org/wiki/Endianness#Mixed
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/HTOGYMYROL5UGS5YXZAUA6HMRXE54B7G/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: C API for converting Python integers to/from bytes sequences

2021-08-08 Thread Barry Scott



> On 7 Aug 2021, at 19:22, Serhiy Storchaka  wrote:
> 
> Python integers have arbitrary precision. For serialization and
> interpolation with other programs and libraries we need to represent
> them as fixed-width integers (little- and big-endian, signed and
> unsigned). In Python, we can use struct, array, memoryview and ctypes
> use for some standard sizes and int methods int.to_bytes and
> int.from_bytes for non-standard sizes. In C, there is the C API for
> converting to/from C types long, unsigned long, long long and unsigned
> long long. For other C types (signed and unsigned char, short, int) we
> need to use the C API for converting to long, and then truncate to the
> destination type with checking for overflow. For integers type aliases
> like pid_t we need to determine their size and signess and use
> corresponding C API or wrapper. For non-standard integers (e.g. 24-bit),
> integers wider than long long, and arbitrary precision integers all is
> much more complicated. There are private C API functions
> _PyLong_AsByteArray and _PyLong_FromByteArray, but they are for internal
> use only.
> 
> I am planning to add public analogs of these private functions, but more
> powerful and convenient.
> 
> PyObject *PyLong_FromBytes(const void *buf, Py_ssize_t size,
>   int byteorder, int signed)
> 
> Py_ssize_t PyLong_AsBytes(PyObject *o, void *buf, Py_ssize_t n,
>  int byteorder, int signed, int *overflow)
> 
> PyLong_FromBytes() returns the int object. It only fails in case of
> memory error or incorrect arguments (e.g. buf is NULL).
> 
> PyLong_AsBytes() writes bytes to the specified buffer, it does not
> allocate memory. If buf is NULL it returns the minimum size of the
> buffer for representing the integer. -1 is returned on error. if
> overflow is NULL, then OverfowError is raised, otherwise *overflow is
> set to +1 for overflowing the upper limit, -1 for overflowing the lower
> limit, and 0 for no overflow.
> 
> Now I have some design questions.
> 
> 1. How to encode the byte order?
> 
> a) 1 -- little endian, 0 -- big endian
> b) 0 -- little endian, 1 -- big endian
> c) -1 -- little endian, +1 -- big endian, 0 -- native endian.

Use an enum and do not use 0 as a valid value to make mistakes easier to detect.
I think you are right to have big endian, little endian and native endian.
I do not think the numeric values of the enum matter (apart from avoiding 0).

> Do we need to reserve some values for mixed endians?

What is mixed endian? I would guess that its use would be application
specific - so I assume you would not need to support it.

> 
> 2. How to specify the reduction modulo 2**(8*size) (like in
> PyLong_AsUnsignedLongMask)?
> 
> Add yet one flag in PyLong_AsBytes()? Use special value for the signed
> argument? 0 -- unsigned, 1 -- signed, 2 (or -1) -- modulo. Or use some
> combination of signed and overflow?
> 
> 3. How to specify saturation (like in PyNumber_AsSsize_t())? I.e. values
> less than the lower limit are replaced with the lower limit, values
> greater than the upper limit are replaced with the upper limit.
> 
> Same options as for (2): separate flag, encode in signed (but we need
> two values here) or combination of other parameters.

Maybe a single enum that has:
signed (modulo)
signed saturate
unsigned (modulo)
unsigned saturate


> 
> 4. What exact names to use?
> 
> PyLong_FromByteArray/PyLong_AsByteArray,
> PyLong_FromBytes/PyLong_AsBytes, PyLong_FromBytes/PyLong_ToBytes?

Barry

> 
> 
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/V2EKXMKSQV25BMRPMDH47IM2OYCLY2TF/
> Code of Conduct: http://python.org/psf/codeofconduct/
> 

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/4TSDAQG3BOACRUEH35MD3ME3WQGCZSUA/
Code of Conduct: http://python.org/psf/codeofconduct/