[ 
https://issues.apache.org/jira/browse/ARROW-11263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qingyou Meng updated ARROW-11263:
---------------------------------
    Description: 
Quoting from section *Schema message*

 
[https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#schema-message]
{noformat}
Whether the field is semantically nullable. While this has no bearing on the 
array's physical layout, many systems distinguish nullable and non-nullable 
fields and we want to allow them to preserve this metadata to enable faithful 
schema round trips.{noformat}
This can be read as: for a field with nullable set as true, when encounters 
null array data from the field, data processor CAN continue or refuse to 
process.

In current rust implementation, apart from reading Fields from schema, we also 
construct `Field` with datafusion and`Field::new`in arrow::array::*StructArray*.
 * in datafusion, the nullable is determined by DF schema
 * in arrow::array::StructArray::try_from(values: Vec<(&str, ArrayRef)>) , the 
nullable is determined by the array data. This is error-prone if ArrayRef's 
null buffer has all set by builder. 

Conclusions:
 * It's questionable to determine Field's nullable from array data.
 * Perhaps builders should set null buffer back to None when the buffer has all 
bits set.
 * Enhance StructArray::
 try_from(values: Vec<(&str, ArrayRef)>): don't set wrong nullable when null 
buffer has all bits set.

 

  was:
Quoting from section *Schema message*

 
[https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#schema-message]
{noformat}
Whether the field is semantically nullable. While this has no bearing on the 
array's physical layout, many systems distinguish nullable and non-nullable 
fields and we want to allow them to preserve this metadata to enable faithful 
schema round trips.{noformat}
This can be read as: for a field with nullable set as true, when encounters 
null array data from the field, data processor CAN continue or refuse to 
process.

In current rust implementation, apart from read Fields from schema, we also 
construct `Field` with datafusion and`Field::new`in arrow::array::*StructArray*.
 * in datafusion, the nullable is determined by DF schema
 * in arrow::array::StructArray::try_from(values: Vec<(&str, ArrayRef)>) , the 
nullable is determined actual data. This is error-prone if ArrayRef's null 
buffer has all set by builder. 

Conclusions:
 * It's questionable to set Field's nullable according to data.
 * Perhaps builders should set null buffer back to None when the buffer has all 
bits set.
 * Enhance StructArray::
 try_from(values: Vec<(&str, ArrayRef)>) don't set wrong nullable when null 
buffer has all bits set.

 


> [Rust] problem of Field nullable
> --------------------------------
>
>                 Key: ARROW-11263
>                 URL: https://issues.apache.org/jira/browse/ARROW-11263
>             Project: Apache Arrow
>          Issue Type: Improvement
>            Reporter: Qingyou Meng
>            Priority: Minor
>
> Quoting from section *Schema message*
>  
> [https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#schema-message]
> {noformat}
> Whether the field is semantically nullable. While this has no bearing on the 
> array's physical layout, many systems distinguish nullable and non-nullable 
> fields and we want to allow them to preserve this metadata to enable faithful 
> schema round trips.{noformat}
> This can be read as: for a field with nullable set as true, when encounters 
> null array data from the field, data processor CAN continue or refuse to 
> process.
> In current rust implementation, apart from reading Fields from schema, we 
> also construct `Field` with datafusion and`Field::new`in 
> arrow::array::*StructArray*.
>  * in datafusion, the nullable is determined by DF schema
>  * in arrow::array::StructArray::try_from(values: Vec<(&str, ArrayRef)>) , 
> the nullable is determined by the array data. This is error-prone if 
> ArrayRef's null buffer has all set by builder. 
> Conclusions:
>  * It's questionable to determine Field's nullable from array data.
>  * Perhaps builders should set null buffer back to None when the buffer has 
> all bits set.
>  * Enhance StructArray::
>  try_from(values: Vec<(&str, ArrayRef)>): don't set wrong nullable when null 
> buffer has all bits set.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to