Dandandan commented on issue #9083:
URL: https://github.com/apache/arrow-rs/issues/9083#issuecomment-3765363076

   Can you maybe produce / show a usecase / benchmark showing a row format to 
be faster for the "wide table" case?
   I think I see your point that producing the data might be faster (as you can 
colocate the row-values), but at some point you would also need to decode them 
into multiple `RecordBatches` paying the same price?
   
   I also wondered sometimes earlier if in some cases we want to have/support 
more control over how multiple record batches are allocated and colocate arrays 
from different batches or different columns, perhaps nullifying the need of 
creating a new format altogether and avoid many new code paths (as all code 
understands `RecordBatch` / `Array`)?
   
   Colocate different batches:
   ```
         [allocated region col1                                                 
                         ]                                  
   col1: [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6]  [1, 2, 3, 4, 5, 6]  [1, 2, 3, 
4, 5, 6]  [1, 2, 3, 4, 5, 6] 
   
   
   [allocated region col2                                                       
                       ]                                  
   
   col2: ["a", "b", "c"] ["a", "b", "c"] ["a", "b", "c"] ["a", "b", "c"] ["a", 
"b", "c"] ["a", "b", "c"]
   
   ```
   
   or colocating per column:
   ```
   [allocated region col1/col2            ]                                  
   col1: [1, 2, 3, 4, 5, 6] ["a", "b", "c", "d", "e"]
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to