dragosmg commented on PR #13196: URL: https://github.com/apache/arrow/pull/13196#issuecomment-1160827641
Results of benchmarking `parse_date_time()` implemented with combined formats (with and without separator) vs separate formats (either with or without separator) ```r library(dplyr) library(lubridate) library(ggplot2) library(hrbrthemes) load_all() test_df <- tibble::tibble( a = rep(c("20220614", "2022-06-14"), 1e6) ) results <- bench::mark( separate = test_df %>% arrow_table() %>% mutate(b = parse_date_time(a, orders = "ymd")) %>% collect(), combined = test_df %>% arrow_table() %>% mutate(b = parse_date_time_combined(a, orders = "ymd")) %>% collect(), min_iterations = 20 ) results # A tibble: 2 × 13 expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list> 1 separate 5.93s 5.94s 0.168 15.8MB 0.0720 14 6 1.39m <tibble> <Rprofmem> <bench_tm> <tibble> 2 combined 12.22s 12.25s 0.0815 16.2MB 0.0439 13 7 2.66m <tibble> <Rprofmem> <bench_tm> <tibble> ggplot2::autoplot(results) + theme_ipsum_rc(grid = "XxY") + labs(title = "Comparison of format parsing", subtitle = "separate = formats with or without separator are tried separately\n combined = formats are combined in a single vector and all are passed to `coalesce()`") ```  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org