The benchmark results I'm getting are indeed not as dramatic as the fprof results, but on the other hand also more than the 5% mentioned in the PR which introduced the check: https://github.com/elixir-lang/elixir/pull/9040
```elixir regex = ~r/^([a-z][a-z0-9\+\-\.]*):/i re_pattern = regex.re_pattern Benchee.run(%{ "Regex.run/2" => fn -> Regex.run(regex, "foo") end, ":re.run/3" => fn -> :re.run("foo", re_pattern, [{:capture, :all, :binary}]) end }) ``` ``` Name ips average deviation median 99th % :re.run/3 2.88 M 346.90 ns ±3623.51% 333 ns 417 ns Regex.run/2 1.98 M 504.74 ns ±5851.21% 416 ns 542 ns Comparison: :re.run/3 2.88 M Regex.run/2 1.98 M - 1.46x slower +157.84 ns ``` On Friday 15 March 2024 at 07:20:11 UTC+1 jan.k...@gmail.com wrote: > The difference was definitely measurable just in pure running time of the > code, setting aside fprof. I'll post what I have after work today. > > On Thursday, March 14, 2024 at 10:21:25 PM UTC+1 José Valim wrote: > >> Do you have benchmarks or only the fprof results? fprof is not a >> benchmarking tool: comparing fprof results from different code may be >> misleading. Proper benchmarking is preferrable. I am benchmarking locally >> and I cannot measure any relevant difference even with the whole version >> checking removed. >> >> On Thu, Mar 14, 2024 at 6:01 PM Jan Krüger <jan.k...@gmail.com> wrote: >> >>> Thanks a lot. I'm also happy to share our case, and my fprof results, if >>> that helps. I am very sure that my erlang, and elixir versions match, on >>> the machine where I've tested this. Replacing Regex.run with an identical >>> call to :re.run should show the performance improvement I've mentioned. The >>> regex we've tested this on is: >>> >>> ~r/^([a-z][a-z0-9\+\-\.]*):/i >>> >>> On Thursday, March 14, 2024 at 5:55:47 PM UTC+1 marcel...@googlemail.com >>> wrote: >>> >>>> I'm the maintainer of RDF.ex library with the RDF.IRI module mentioned >>>> in the OP. I can confirm that this fix doesn't affect the problem, since >>>> we're actually not using `URI.parse/1` most of the time (we use it only >>>> when dealing with relative URIs). Even in this case the `Regex.version/0` >>>> call in `Regex.safe_run/3` ( >>>> https://github.com/elixir-lang/elixir/blob/b8fca42e58850b56f65d0fb8a2086f2636141f61/lib/elixir/lib/regex.ex#L533) >>>> >>>> still performs the `:erlang.system_info/0` call. >>>> >>>> On Thursday 14 March 2024 at 17:15:40 UTC+1 jan.k...@gmail.com wrote: >>>> >>>>> I read the commit, and I don't it fixes what our actual problem was. >>>>> See my comment above. The problem is the actual call to :re.version, not >>>>> the recompilation of the regex >>>>> >>>>> On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote: >>>>> >>>>>> I have pushed a fix to main. But also note we provide precompiled >>>>>> Elixir versions per OTP version. Using a matching version will always >>>>>> give >>>>>> you the best results and that's not only about regexes. :) >>>>>> >>>>>> On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> I've recently had to work on a code base that parses largish RDF XML >>>>>>> files. Part of the code base does relatively simple but regular >>>>>>> expression >>>>>>> matches, but since the files are large, quite a lot of Regex.run calls. >>>>>>> While profiling I've noticed, that there are callouts to >>>>>>> :erlang.system_info, which fetches the PCRE version BEAM was compiled >>>>>>> against. >>>>>>> >>>>>>> An example regular expression from the code base in question matches >>>>>>> the schema part of a URL. I've replaced Regex.run with erlang's :re.run >>>>>>> for >>>>>>> testing purposes, and at least for this case, there performance gain is >>>>>>> quite dramatic. >>>>>>> >>>>>>> Comparing fprof results: >>>>>>> >>>>>>> ``` >>>>>>> RDF.IRI.scheme/1 >>>>>>> 1176473 30615.618 2354.355 >>>>>>> --- >>>>>>> RDF.IRI.scheme/1 >>>>>>> 1176473 3531.955 2353.905 >>>>>>> ``` >>>>>>> >>>>>>> I found this thread in the google group, which actually talk about >>>>>>> the reasoning for fetching the version, and proposes and alternative. >>>>>>> >>>>>>> >>>>>>> https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1 >>>>>>> >>>>>>> Especially >>>>>>> >>>>>>> ``` >>>>>>> Taking a further look at the code, the issue with recompiling >>>>>>> regexes on the fly is that it makes executing the regexes more >>>>>>> expensive, >>>>>>> as we need to compute the version on every execution. We could store >>>>>>> the >>>>>>> version in ETS but that would have performance issues. Storing in a >>>>>>> persistent_term would be great, but at the moment we support Erlang/OTP >>>>>>> 20+. Thoughts? >>>>>>> ``` >>>>>>> >>>>>>> Since this has a fairly noticeable impact, at least on all tests >>>>>>> I've run, I wanted to start a discussion, if this could be >>>>>>> implemented/improved now. >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "elixir-lang-core" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to elixir-lang-co...@googlegroups.com. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com >>>>>>> >>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> >>>>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elixir-lang-core" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elixir-lang-co...@googlegroups.com. >>> >> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com >>> >>> <https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- You received this message because you are subscribed to the Google Groups "elixir-lang-core" group. To unsubscribe from this group and stop receiving emails from it, send an email to elixir-lang-core+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com.