dragosmg commented on PR #13196:
URL: https://github.com/apache/arrow/pull/13196#issuecomment-1160827641

   Results of benchmarking `parse_date_time()` implemented with combined 
formats (with and without separator) vs separate formats (either with or 
without separator)
   ```r
   library(dplyr)
   library(lubridate)
   library(ggplot2)
   library(hrbrthemes)
   load_all()
   
   test_df <- tibble::tibble(
     a = rep(c("20220614", "2022-06-14"), 1e6)
   )
   
   results <- bench::mark(
     separate = test_df %>% 
       arrow_table() %>% 
       mutate(b = parse_date_time(a, orders = "ymd")) %>% 
       collect(),
     combined = test_df %>% 
       arrow_table() %>% 
       mutate(b = parse_date_time_combined(a, orders = "ymd")) %>% 
       collect(), 
     min_iterations = 20
   )
   
   results
   
   # A tibble: 2 × 13
     expression      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc 
total_time result   memory     time       gc      
     <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   
<bch:tm> <list>   <list>     <list>     <list>  
   1 separate      5.93s    5.94s    0.168     15.8MB   0.0720    14     6      
1.39m <tibble> <Rprofmem> <bench_tm> <tibble>
   2 combined     12.22s   12.25s    0.0815    16.2MB   0.0439    13     7      
2.66m <tibble> <Rprofmem> <bench_tm> <tibble>
   
   ggplot2::autoplot(results) +
     theme_ipsum_rc(grid = "XxY") +
     labs(title = "Comparison of format parsing",
          subtitle = 
            "separate = formats with or without separator are tried separately\n
   combined = formats are combined in a single vector and all are passed to 
`coalesce()`")
   ```
   
   
![image](https://user-images.githubusercontent.com/13176361/174673234-99592af2-43ed-4646-8890-2c794adf70f2.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to