This is an automated email from the ASF dual-hosted git repository.

alenka pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 161510e413 GH-37145: [Python] support boolean columns with bitsize 1 
in from_dataframe (#37975)
161510e413 is described below

commit 161510e4131976712ea1588c7649b4ccdebdb5e0
Author: Alenka Frim <[email protected]>
AuthorDate: Thu Oct 5 17:16:09 2023 +0200

    GH-37145: [Python] support boolean columns with bitsize 1 in from_dataframe 
(#37975)
    
    ### Rationale for this change
    
    Bit-packed booleans are currently not supported in the `from_dataframe` of 
the Dataframe Interchange Protocol.
    
    Note: We currently represent booleans in the pyarrow implementation as 
`uint8` which will also need to be changed in a follow-up PR (see 
https://github.com/data-apis/dataframe-api/issues/227).
    
    ### What changes are included in this PR?
    
    This PR adds the support for bit-packed booleans when consuming a dataframe 
interchange object.
    
    ### Are these changes tested?
    
    Only locally, currently!
    * Closes: #37145
    
    Lead-authored-by: AlenkaF <[email protected]>
    Co-authored-by: Alenka Frim <[email protected]>
    Signed-off-by: AlenkaF <[email protected]>
---
 python/pyarrow/interchange/from_dataframe.py | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/python/pyarrow/interchange/from_dataframe.py 
b/python/pyarrow/interchange/from_dataframe.py
index d653054e91..e97e91e44f 100644
--- a/python/pyarrow/interchange/from_dataframe.py
+++ b/python/pyarrow/interchange/from_dataframe.py
@@ -54,7 +54,8 @@ _PYARROW_DTYPES: dict[DtypeKind, dict[int, Any]] = {
     DtypeKind.FLOAT: {16: pa.float16(),
                       32: pa.float32(),
                       64: pa.float64()},
-    DtypeKind.BOOL: {8: pa.uint8()},
+    DtypeKind.BOOL: {1: pa.bool_(),
+                     8: pa.uint8()},
     DtypeKind.STRING: {8: pa.string()},
 }
 
@@ -232,19 +233,23 @@ def bool_column_to_array(
     -------
     pa.Array
     """
-    if not allow_copy:
+    buffers = col.get_buffers()
+    size = buffers["data"][1][1]
+
+    # If booleans are byte-packed a copy to bit-packed will be made
+    if size == 8 and not allow_copy:
         raise RuntimeError(
             "Boolean column will be casted from uint8 and a copy "
             "is required which is forbidden by allow_copy=False"
         )
 
-    buffers = col.get_buffers()
     data_type = col.dtype
     data = buffers_to_array(buffers, data_type,
                             col.size(),
                             col.describe_null,
                             col.offset)
-    data = pc.cast(data, pa.bool_())
+    if size == 8:
+        data = pc.cast(data, pa.bool_())
 
     return data
 

Reply via email to