tyrelr removed a comment on pull request #9440:
URL: https://github.com/apache/arrow/pull/9440#issuecomment-774786663


   Just for some vague performance numbers... from before I squashed a couple 
simple commits look good.  15% difference seems to be beneficial or neutral for 
the comparison kernels.
   
   I can't explain the buffer collection regressions as they should be 
completely unrelated (but they did reproduce for both my test runs).
   The buffer_bit & cast look like noise.
   
   ```
   group                             2master-a321cded                       
2value_iteration-7c21fa33              master-a321cded                        
value_iteration-7c21fa33
   -----                             ----------------                       
-------------------------              ---------------                        
------------------------
   Buffer::from_iter bool            1.00      6.1±0.01ms        ? B/sec    
1.73     10.6±0.04ms        ? B/sec    1.03      6.3±0.01ms        ? B/sec    
1.47      9.0±0.02ms        ? B/sec
   MutableBuffer::from_iter bool     1.00      6.1±0.01ms        ? B/sec    
1.42      8.7±0.01ms        ? B/sec    1.03      6.3±0.01ms        ? B/sec    
1.46      8.9±0.02ms        ? B/sec
   array_from_vec 128                1.06    419.7±2.63ns        ? B/sec    
1.04    414.6±0.76ns        ? B/sec    1.00    397.0±1.55ns        ? B/sec    
1.16    459.0±1.41ns        ? B/sec
   buffer_bit_ops and                1.31    320.2±0.33ns        ? B/sec    
1.00    243.7±0.27ns        ? B/sec    1.32    321.0±0.61ns        ? B/sec    
1.31    318.3±0.70ns        ? B/sec
   buffer_bit_ops or                 1.00    277.1±0.31ns        ? B/sec    
1.06    292.1±0.54ns        ? B/sec    1.37    378.3±0.41ns        ? B/sec    
1.00    276.6±0.94ns        ? B/sec
   cast date32 to date64 512         1.00    522.8±1.51ns        ? B/sec    
1.00    523.2±0.65ns        ? B/sec    1.17    610.1±0.61ns        ? B/sec    
1.18    615.5±0.83ns        ? B/sec
   cast time32s to time32ms 512      1.00    343.6±0.74ns        ? B/sec    
1.25    428.8±0.41ns        ? B/sec    1.24    426.6±0.42ns        ? B/sec    
1.01    346.8±1.42ns        ? B/sec
   eq Float32                        1.50     90.5±0.14µs        ? B/sec    
1.00     60.2±0.09µs        ? B/sec    1.50     90.5±0.13µs        ? B/sec    
1.00     60.2±0.07µs        ? B/sec
   eq scalar Float32                 1.35     79.6±0.18µs        ? B/sec    
1.00     59.2±0.13µs        ? B/sec    1.35     79.8±0.09µs        ? B/sec    
1.00     59.1±0.09µs        ? B/sec
   from_slice                        1.81    900.0±1.74µs        ? B/sec    
1.00    497.0±0.98µs        ? B/sec    1.76    875.2±1.17µs        ? B/sec    
1.02    508.3±0.88µs        ? B/sec
   gt Float32                        1.56     86.0±0.10µs        ? B/sec    
1.00     55.0±0.07µs        ? B/sec    1.56     86.0±0.10µs        ? B/sec    
1.00     55.1±0.24µs        ? B/sec
   gt scalar Float32                 1.38     72.3±0.14µs        ? B/sec    
1.00     52.3±0.05µs        ? B/sec    1.38     72.3±0.19µs        ? B/sec    
1.00     52.2±0.04µs        ? B/sec
   gt_eq Float32                     1.55     75.4±0.15µs        ? B/sec    
1.00     48.6±0.09µs        ? B/sec    1.55     75.5±0.09µs        ? B/sec    
1.00     48.7±0.08µs        ? B/sec
   gt_eq scalar Float32              1.32     62.8±0.07µs        ? B/sec    
1.00     47.5±0.07µs        ? B/sec    1.33     63.0±0.07µs        ? B/sec    
1.00     47.5±0.06µs        ? B/sec
   limit 512, 512                    1.00    116.3±0.18ns        ? B/sec    
1.15    133.9±0.26ns        ? B/sec    1.00    116.3±0.22ns        ? B/sec    
1.08    126.0±0.23ns        ? B/sec
   lt Float32                        1.56     85.8±0.08µs        ? B/sec    
1.00     55.0±0.09µs        ? B/sec    1.56     86.1±0.19µs        ? B/sec    
1.00     55.1±0.16µs        ? B/sec
   lt scalar Float32                 1.35     71.7±0.11µs        ? B/sec    
1.00     53.1±0.06µs        ? B/sec    1.35     71.7±0.18µs        ? B/sec    
1.00     53.0±0.05µs        ? B/sec
   lt_eq Float32                     1.55     75.7±0.09µs        ? B/sec    
1.00     48.8±0.09µs        ? B/sec    1.55     75.7±0.13µs        ? B/sec    
1.00     48.7±0.06µs        ? B/sec
   lt_eq scalar Float32              1.35     62.0±0.06µs        ? B/sec    
1.00     46.1±0.04µs        ? B/sec    1.35     62.0±0.09µs        ? B/sec    
1.01     46.4±0.08µs        ? B/sec
   mutable                           1.00    419.1±0.99µs        ? B/sec    
4.01   1682.2±2.66µs        ? B/sec    1.05    441.9±0.95µs        ? B/sec    
3.83   1606.8±2.63µs        ? B/sec
   mutable extend                    1.00    833.1±1.97µs        ? B/sec    
2.48      2.1±0.00ms        ? B/sec    1.00    836.2±1.22µs        ? B/sec    
2.45      2.0±0.00ms        ? B/sec
   mutable iter extend_from_slice    1.00   1004.0±1.32µs        ? B/sec    
2.19      2.2±0.00ms        ? B/sec    1.00   1003.5±1.56µs        ? B/sec    
2.19      2.2±0.00ms        ? B/sec
   neq Float32                       1.50     90.4±0.12µs        ? B/sec    
1.00     60.1±0.09µs        ? B/sec    1.51     90.6±0.12µs        ? B/sec    
1.00     60.2±0.06µs        ? B/sec
   neq scalar Float32                1.36     79.8±0.10µs        ? B/sec    
1.00     58.6±0.14µs        ? B/sec    1.36     79.9±0.13µs        ? B/sec    
1.00     58.6±0.06µs        ? B/sec
   nlike_utf8 scalar ends with       1.16    580.9±0.50µs        ? B/sec    
1.00    500.9±0.73µs        ? B/sec    1.16    579.4±0.67µs        ? B/sec    
1.00    499.6±0.58µs        ? B/sec
   or                                1.00   1007.5±1.33ns        ? B/sec    
1.07   1078.1±3.76ns        ? B/sec    1.11   1116.4±1.79ns        ? B/sec    
1.17   1174.4±2.85ns        ? B/sec
   ```
   Since the comparison kernels could achieve similar iteration speedup WITHOUT 
any kind of public TypedArray trait, that on its own isn't enough to justify 
creation of the API.  I plan to experiment further to see if other places 
reliant on cross-type array iteration could benefit.
   
   The main questions in my mind are:
   * struct union, list, and dictionary arrays don't encode which type of value 
they contain into their rust type, so they don't fit into the API very well... 
is it generally useful without them?  Is there a natural way to make it work 
for them?
   * does it enable simpler code elsewhere, or is it targeting too narrow of a 
use-case (string formatting? a re-usable array.map style utility function?  
chaining a series of functions/operators together before storing an item back 
into an arrow array?)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to