alamb opened a new pull request #942:
URL: https://github.com/apache/arrow-rs/pull/942


   # Which issue does this PR close?
   
   Resolves https://github.com/apache/arrow-rs/issues/940
   
   # Rationale for this change
    
   
   It appears the arrow-rs slice logic for structs introduced in 
https://github.com/apache/arrow-rs/pull/389 by @nevi-me effectively slices the 
child data when a struct array is sliced.
   
   The validation code from validate.cc (and perhaps the C++ code) assumes that 
the children are not sliced, so when I ported that logic over it is not correct 
for sliced structs. You can see by the comment I was somewhat confused about 
the need for offset even when it was originally introduced
   
   # What changes are included in this PR?
   1. Update the struct array validation logic to follow the Rust semantics 
where offsets are applied to both parent and child
   2. Test case from @bjchambers  in 
https://github.com/bjchambers/arrow-rs/blob/reproduce-validation-error/arrow/src/array/data.rs#L1648
   
   # In pictures
   
   
   In pictures, here is what the testcase looks like:
   
   ```
   
        0 ┌─────┐                     0 ┌─────┐  ┌─────┐        0 ┌─────┐  
┌─────┐
          │     │                       │     │  │     │          │     │  │    
 │
          ├─────┤                       ├─────┤  ├─────┤          ├─────┤  
├─────┤
          │     │                       │     │  │     │          │     │  │    
 │
          ├─────┤                       ├─────┤  ├─────┤          ├─────┤  
├─────┤
          │     │                       │     │  │     │          │     │  │    
 │
          ├─────┤                       ├─────┤  ├─────┤          ├─────┤  
├─────┤
          │     │                       │     │  │     │          │     │  │    
 │
          ├─────┤                       ├─────┤  ├─────┤          ├─────┤  
├─────┤
       5  │     │                    5  │     │  │     │       5  │     │  │    
 │
          └─────┘                       └─────┘  └─────┘          └─────┘  
└─────┘
   
           Valid                         Valid    i32[]            Valid    
bool[]
   
   
          StructArray                     Child Array #1           Child Array 
#2
                                           (Int32Array)            
(BooleanArray)
   
   ```
   
   In rust, when we do `slice(1,3)` the offset is applied to both the ArrayData 
and its children:
   
   
   ```
   
   
          0 ┌─────┐                     0 ┌─────┐  ┌─────┐        0 ┌─────┐  
┌─────┐
            │     │                       │     │  │     │          │     │  │  
   │
   ─ ─ ─ ─ ─├─────┤─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─├─────┤─ ┼─────┼ ─ ─ ─ ─ ─├─────┤─ 
┼─────┼ ─ ─ ─ ─ ─ ─ ─ ─
            │     │                       │     │  │     │          │     │  │  
   │
            ├─────┤                       ├─────┤  ├─────┤          ├─────┤  
├─────┤
            │     │                       │     │  │     │          │     │  │  
   │
            ├─────┤                       ├─────┤  ├─────┤          ├─────┤  
├─────┤
            │     │                       │     │  │     │          │     │  │  
   │
   ─ ─ ─ ─ ─├─────┤─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─├─────┤─ ┼─────┼ ─ ─ ─ ─ ─├─────┤─ 
┼─────┼ ─ ─ ─ ─ ─ ─ ─ ─
         5  │     │                    5  │     │  │     │       5  │     │  │  
   │
            └─────┘                       └─────┘  └─────┘          └─────┘  
└─────┘
   
             Valid                         Valid    i32[]            Valid    
bool[]
   
   
            StructArray                     Child Array #1           Child 
Array #2
                                             (Int32Array)            
(BooleanArray)
   
   ```
   
   However, in the C++ validation logic, the assumption is that the children 
have no offsets (and the offset of the parent is applied):
   
   ```
   
          0 ┌─────┐                     0 ┌─────┐  ┌─────┐        0 ┌─────┐  
┌─────┐
            │     │                       │     │  │     │          │     │  │  
   │
   ─ ─ ─ ─ ─├─────┤─ ─ ─ ─ ─ ─            ├─────┤  ├─────┤          ├─────┤  
├─────┤
            │     │                       │     │  │     │          │     │  │  
   │
            ├─────┤                       ├─────┤  ├─────┤          ├─────┤  
├─────┤
            │     │                       │     │  │     │          │     │  │  
   │
            ├─────┤                       ├─────┤  ├─────┤          ├─────┤  
├─────┤
            │     │                       │     │  │     │          │     │  │  
   │
   ─ ─ ─ ─ ─├─────┤─ ─ ─ ─ ─ ─            ├─────┤  ├─────┤          ├─────┤  
├─────┤
         5  │     │                    5  │     │  │     │       5  │     │  │  
   │
            └─────┘                       └─────┘  └─────┘          └─────┘  
└─────┘
   
             Valid                         Valid    i32[]            Valid    
bool[]
   
   
            StructArray                     Child Array #1           Child 
Array #2
                                             (Int32Array)            
(BooleanArray)
   
   ```
   
   
   # Are there any user-facing changes?
   Sliced `StructArrays` can be created without validation errors (or using 
unsafe)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to