[ https://issues.apache.org/jira/browse/ARROW-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney updated ARROW-1674: -------------------------------- Description: Some libraries (e.g. NumPy) represent boolean values using an array of int8 or uint8 values of 1's and 0's. This can present a challenge at times to receive such memory without copying. Now that we have ExtensionType capabilities, we could define an extension type distinguish UInt8/Int8-annotated-as-boolean to be able to flow through such data in applications. A discussion about introducing a new logical type didn't go anywhere, so having a custom container that can be used for these specialized applications is one way to unblock the use case. If we develop some endogenous use of such data in C++, we would need to be mindful to sanitize it to bitpacked boolean before sending to another Arrow application was:Some libraries represent boolean data as a single byte per value as a vector of int8/uint8 1's and 0's. It would be useful to be able to retain this metadata as an optional field on the {{Bool}} table in {{Schema.fbs}} > [C++] Add ExtensionType implementation for 8-bit boolean values > --------------------------------------------------------------- > > Key: ARROW-1674 > URL: https://issues.apache.org/jira/browse/ARROW-1674 > Project: Apache Arrow > Issue Type: Improvement > Components: Format > Reporter: Wes McKinney > Priority: Major > Labels: pull-request-available > > Some libraries (e.g. NumPy) represent boolean values using an array of int8 > or uint8 values of 1's and 0's. This can present a challenge at times to > receive such memory without copying. > Now that we have ExtensionType capabilities, we could define an extension > type distinguish UInt8/Int8-annotated-as-boolean to be able to flow through > such data in applications. > A discussion about introducing a new logical type didn't go anywhere, so > having a custom container that can be used for these specialized applications > is one way to unblock the use case. If we develop some endogenous use of such > data in C++, we would need to be mindful to sanitize it to bitpacked boolean > before sending to another Arrow application -- This message was sent by Atlassian JIRA (v7.6.3#76005)