[I] DISC: Prefer nanobind to Cython [arrow-nanoarrow]

via GitHub Sun, 25 Aug 2024 08:57:40 -0700


WillAyd opened a new issue, #597:
URL: https://github.com/apache/arrow-nanoarrow/issues/597


   Wanted to open this as a topic for discussion! I've used nanobind a lot 
lately, and find it to be a very good library. Cython of course, has been 
widely used in scientific Python projects for some time, but I think suffers 
from a few usability issues:
    
   1. The Cython debugger is broken, and has been for a long time
   2. IDE support is limited
   3. The syntax for mixing in raw C code is rather awkward (ex: getting the 
pointer to the start of an array in Cython)
   4. Performance benefits are not always clear, and subtle differences in 
declarations can have drastic impacts (Cython annotations can help you inspect 
this, if you use them)
   5. Specific to nanoarrow, the build system has a rather complicated setup 
whereby it generates Cython header files from nanoarrow sources
   
   Nanobind rather natively solves issues 1-3 above. Point 4 is partially 
solved by nanobind; a missing declaration wouldn't be the culprit for poor 
performance, but you still would need to understand the impacts of interaction 
with Python objects. For point 5, nanobind was also recently added to the Meson 
wrapdb, so if nanoarrow decided to use meson + meson-python it could 
drastically simplify the process of building Python extensions.
   
   Of course, the downside to nanobind is that you are trading Python syntax 
for C++, which may be off-putting to some developers. Taking 
`CArrayBuilder.append_bytes` as an example, the current code looks like (N.B. 
the current code probably should  declare `code` to be of type `ArrowErrorCode` 
but does not):
   
   ```python
       def append_bytes(self, obj: Iterable[Union[str, None]]) -> CArrayBuilder:
           cdef Py_buffer buffer
           cdef ArrowBufferView item
   
           for py_item in obj:
               if py_item is None:
                   code = ArrowArrayAppendNull(self._ptr, 1)
               else:
                   PyObject_GetBuffer(py_item, &buffer, PyBUF_ANY_CONTIGUOUS | 
PyBUF_FORMAT)
   
                   if buffer.ndim != 1:
                       raise ValueError("Can't append buffer with dimensions != 
1 to binary array")
   
                   if buffer.itemsize != 1:
                       PyBuffer_Release(&buffer)
                       raise ValueError("Can't append buffer with itemsize != 1 
to binary array")
   
                   item.data.data = buffer.buf
                   item.size_bytes = buffer.len
                   code = ArrowArrayAppendBytes(self._ptr, item)
                   PyBuffer_Release(&buffer)
   
               if code != NANOARROW_OK:
                   Error.raise_error(f"append bytes item {py_item}", code)
   ```
   
   in nanobind, an equivalent implementation probably looks like this (N.B. 
this is untested):
   
   ```c++
   auto AppendBytes(const CArrayBuilder &builder, nb::iterable obj) 
       ArrowErrorCode code;
   
       for (const auto &py_item : obj) {
         if (py_item.is_none()) {
           code = ArrowArrayAppendNull(ptr_, 1);
         } else {
           Py_buffer buffer;
           PyObject_GetBuffer(py_item, &buffer, PyBUF_ANY_CONTIGUOUS | 
PyBUF_FORMAT);
           
           if (buffer.ndim != 1)
             throw nb::value_error("Can't append buffer with dimensions != 1 to 
binary array");
   
           if (buffer.itemsize != 1) {
             PyBuffer_Release(&buffer);
             throw nb::value_error("Can't append buffer with itemsize != 1 to 
binary array");
           }
   
           item.data.data = buffer.buf;
           item.size_bytes = buffer.len;
           code = ArrowArrayAppendBytes(ptr_, item);
           PyBuffer_Release(&buffer);
         }
   
         if (code != NANOARROW_OK)
           // TODO: not sure yet how to throw custom error, but should be 
possible
       }
   }
   ```
   
   This is a relatively literal translation of the code, but in C++ you could 
alternatively use RAII wrappers like `nb::buffer` and macros like 
`THROW_NANOARROW_NOT_OK` for safer code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] DISC: Prefer nanobind to Cython [arrow-nanoarrow]

Reply via email to