WillAyd opened a new issue, #597:
URL: https://github.com/apache/arrow-nanoarrow/issues/597
Wanted to open this as a topic for discussion! I've used nanobind a lot
lately, and find it to be a very good library. Cython of course, has been
widely used in scientific Python projects for some time, but I think suffers
from a few usability issues:
1. The Cython debugger is broken, and has been for a long time
2. IDE support is limited
3. The syntax for mixing in raw C code is rather awkward (ex: getting the
pointer to the start of an array in Cython)
4. Performance benefits are not always clear, and subtle differences in
declarations can have drastic impacts (Cython annotations can help you inspect
this, if you use them)
5. Specific to nanoarrow, the build system has a rather complicated setup
whereby it generates Cython header files from nanoarrow sources
Nanobind rather natively solves issues 1-3 above. Point 4 is partially
solved by nanobind; a missing declaration wouldn't be the culprit for poor
performance, but you still would need to understand the impacts of interaction
with Python objects. For point 5, nanobind was also recently added to the Meson
wrapdb, so if nanoarrow decided to use meson + meson-python it could
drastically simplify the process of building Python extensions.
Of course, the downside to nanobind is that you are trading Python syntax
for C++, which may be off-putting to some developers. Taking
`CArrayBuilder.append_bytes` as an example, the current code looks like (N.B.
the current code probably should declare `code` to be of type `ArrowErrorCode`
but does not):
```python
def append_bytes(self, obj: Iterable[Union[str, None]]) -> CArrayBuilder:
cdef Py_buffer buffer
cdef ArrowBufferView item
for py_item in obj:
if py_item is None:
code = ArrowArrayAppendNull(self._ptr, 1)
else:
PyObject_GetBuffer(py_item, &buffer, PyBUF_ANY_CONTIGUOUS |
PyBUF_FORMAT)
if buffer.ndim != 1:
raise ValueError("Can't append buffer with dimensions !=
1 to binary array")
if buffer.itemsize != 1:
PyBuffer_Release(&buffer)
raise ValueError("Can't append buffer with itemsize != 1
to binary array")
item.data.data = buffer.buf
item.size_bytes = buffer.len
code = ArrowArrayAppendBytes(self._ptr, item)
PyBuffer_Release(&buffer)
if code != NANOARROW_OK:
Error.raise_error(f"append bytes item {py_item}", code)
```
in nanobind, an equivalent implementation probably looks like this (N.B.
this is untested):
```c++
auto AppendBytes(const CArrayBuilder &builder, nb::iterable obj)
ArrowErrorCode code;
for (const auto &py_item : obj) {
if (py_item.is_none()) {
code = ArrowArrayAppendNull(ptr_, 1);
} else {
Py_buffer buffer;
PyObject_GetBuffer(py_item, &buffer, PyBUF_ANY_CONTIGUOUS |
PyBUF_FORMAT);
if (buffer.ndim != 1)
throw nb::value_error("Can't append buffer with dimensions != 1 to
binary array");
if (buffer.itemsize != 1) {
PyBuffer_Release(&buffer);
throw nb::value_error("Can't append buffer with itemsize != 1 to
binary array");
}
item.data.data = buffer.buf;
item.size_bytes = buffer.len;
code = ArrowArrayAppendBytes(ptr_, item);
PyBuffer_Release(&buffer);
}
if (code != NANOARROW_OK)
// TODO: not sure yet how to throw custom error, but should be
possible
}
}
```
This is a relatively literal translation of the code, but in C++ you could
alternatively use RAII wrappers like `nb::buffer` and macros like
`THROW_NANOARROW_NOT_OK` for safer code
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]