dragosmg commented on PR #13196:
URL: https://github.com/apache/arrow/pull/13196#issuecomment-1161550157

   It look like combining the separator and non-separator formats into a single 
vector (my original implementation) is faster than using them separately based 
on if the data contains a separator or not.
   
   
![image](https://user-images.githubusercontent.com/13176361/174775702-66456012-86be-41a9-9803-80fdc7504d0f.png)
   
   <details>
   <summary>Results table</summary>
   
   ```r
   > results
   # A tibble: 2 × 13
     expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc 
total_time result   memory     time            gc      
     <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   
<bch:tm> <list>   <list>     <list>          <list>  
   1 separate         6s    6.03s     0.165    15.5MB   0.0292    17     3      
1.72m <tibble> <Rprofmem> <bench_tm [20]> <tibble>
   2 combined      3.36s    3.37s     0.297    15.5MB   0.0330    18     2      
1.01m <tibble> <Rprofmem> <bench_tm [20]> <tibble>
   ```
   
   </details>
   
   <details>
   <summary> Code </summary>
   
   ```r
   library(dplyr)
   library(lubridate)
   library(ggplot2)
   library(hrbrthemes)
   load_all()
   
   test_df <- tibble::tibble(
     a = rep(c("20220614", "2022-06-14"), 1e6)
   )
   
   results <- bench::mark(
     separate = test_df %>% 
       arrow_table() %>% 
       mutate(b = parse_date_time(a, orders = "ymd")) %>% 
       collect(),
     combined = test_df %>% 
       arrow_table() %>% 
       mutate(b = parse_date_time_combined(a, orders = "ymd")) %>% 
       collect(), 
     min_iterations = 20
   )
   
   results
   
   ggplot2::autoplot(results) +
     theme_ipsum_rc(grid = "XxY") +
     labs(title = "Comparison of format parsing",
          subtitle = 
            "separate = formats with or without separator are tried separately\n
   combined = formats are combined in a single vector and all are passed to 
`coalesce()`")
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to