This is an automated email from the ASF dual-hosted git repository.
chaokunyang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fury.git
The following commit(s) were added to refs/heads/main by this push:
new 3865dcd0 perf(python): Directly access the key-value pairs of a dict
(#1970)
3865dcd0 is described below
commit 3865dcd0982c0ca9de04a8ba9635892b76288769
Author: penguin_wwy <[email protected]>
AuthorDate: Sun Dec 8 00:09:32 2024 +0800
perf(python): Directly access the key-value pairs of a dict (#1970)
## What does this PR do?
In Python, to implement a linear memory structure that stores key-value
pairs, we can traverse them in the order of insertion like accessing an
array. However, Cython does not provide a direct access interface, and
these interfaces are internal in CPython, requiring compatibility work
to use them correctly. Nevertheless, we can still use the`PyDict_Next`
interface to replace the `items` method. Essentially, `items` use
`PyDict_Next` to append to a list. Doing so can reduce the copying
overhead.
## Related issues
## Does this PR introduce any user-facing change?
- [ ] Does this PR introduce any public API change?
- [ ] Does this PR introduce any binary protocol compatibility change?
## Benchmark
For large dict
```
[dict_item] 541 us +- 39 us -> [dict_next] 535 us +- 35 us: 1.00x faster
[dict_item] 119.8 MiB +- 1344.0 KiB -> [dict_next] 118.8 MiB +- 1338.4 KiB:
1.01x faster
```
---
python/pyfury/_serialization.pyx | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/python/pyfury/_serialization.pyx b/python/pyfury/_serialization.pyx
index ce1443c6..74bd755b 100644
--- a/python/pyfury/_serialization.pyx
+++ b/python/pyfury/_serialization.pyx
@@ -44,6 +44,7 @@ from pyfury.util import is_little_endian
from libc.stdint cimport *
from libcpp.vector cimport vector
from cpython cimport PyObject
+from cpython.dict cimport PyDict_Next
from cpython.ref cimport *
from cpython.list cimport PyList_New, PyList_SET_ITEM
from cpython.tuple cimport PyTuple_New, PyTuple_SET_ITEM
@@ -2049,7 +2050,13 @@ cdef class MapSerializer(Serializer):
buffer.write_varint32(len(value))
cdef ClassInfo key_classinfo
cdef ClassInfo value_classinfo
- for k, v in value.items():
+ cdef int64_t key_addr, value_addr
+ cdef Py_ssize_t pos = 0
+ while PyDict_Next(value, &pos, <PyObject **>&key_addr, <PyObject
**>&value_addr) != 0:
+ k = int2obj(key_addr)
+ Py_INCREF(k)
+ v = int2obj(value_addr)
+ Py_INCREF(v)
key_cls = type(k)
if key_cls is str:
buffer.write_int16(NOT_NULL_STRING_FLAG)
@@ -2122,7 +2129,13 @@ cdef class MapSerializer(Serializer):
cpdef inline xwrite(self, Buffer buffer, o):
cdef dict value = o
buffer.write_varint32(len(value))
- for k, v in value.items():
+ cdef int64_t key_addr, value_addr
+ cdef Py_ssize_t pos = 0
+ while PyDict_Next(value, &pos, <PyObject **>&key_addr, <PyObject
**>&value_addr) != 0:
+ k = int2obj(key_addr)
+ Py_INCREF(k)
+ v = int2obj(value_addr)
+ Py_INCREF(v)
self.fury.xserialize_ref(
buffer, k, serializer=self.key_serializer
)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]