The benchmark results I'm getting are indeed not as dramatic as the fprof 
results, but on the other hand also more than the 5% mentioned in the PR 
which introduced the check: https://github.com/elixir-lang/elixir/pull/9040

```elixir
regex = ~r/^([a-z][a-z0-9\+\-\.]*):/i
re_pattern = regex.re_pattern

Benchee.run(%{
  "Regex.run/2" => fn -> Regex.run(regex, "foo") end,
  ":re.run/3" => fn -> :re.run("foo", re_pattern, [{:capture, :all, 
:binary}]) end
})
```

```
Name                  ips        average  deviation         median         
99th %
:re.run/3          2.88 M      346.90 ns  ±3623.51%         333 ns         
417 ns
Regex.run/2        1.98 M      504.74 ns  ±5851.21%         416 ns         
542 ns

Comparison:
:re.run/3          2.88 M
Regex.run/2        1.98 M - 1.46x slower +157.84 ns
```
On Friday 15 March 2024 at 07:20:11 UTC+1 jan.k...@gmail.com wrote:

> The difference was definitely measurable just in pure running time of the 
> code, setting aside fprof. I'll post what I have after work today.
>
> On Thursday, March 14, 2024 at 10:21:25 PM UTC+1 José Valim wrote:
>
>> Do you have benchmarks or only the fprof results? fprof is not a 
>> benchmarking tool: comparing fprof results from different code may be 
>> misleading. Proper benchmarking is preferrable. I am benchmarking locally 
>> and I cannot measure any relevant difference even with the whole version 
>> checking removed.
>>
>> On Thu, Mar 14, 2024 at 6:01 PM Jan Krüger <jan.k...@gmail.com> wrote:
>>
>>> Thanks a lot. I'm also happy to share our case, and my fprof results, if 
>>> that helps. I am very sure that my erlang, and elixir versions match, on 
>>> the machine where I've tested this. Replacing Regex.run with an identical 
>>> call to :re.run should show the performance improvement I've mentioned. The 
>>> regex we've tested this on is: 
>>>
>>> ~r/^([a-z][a-z0-9\+\-\.]*):/i
>>>
>>> On Thursday, March 14, 2024 at 5:55:47 PM UTC+1 marcel...@googlemail.com 
>>> wrote:
>>>
>>>> I'm the maintainer of RDF.ex library with the RDF.IRI module mentioned 
>>>> in the OP. I can confirm that this fix doesn't affect the problem, since 
>>>> we're actually not using `URI.parse/1` most of the time (we use it only 
>>>> when dealing with relative URIs). Even in this case the `Regex.version/0` 
>>>> call in `Regex.safe_run/3` (
>>>> https://github.com/elixir-lang/elixir/blob/b8fca42e58850b56f65d0fb8a2086f2636141f61/lib/elixir/lib/regex.ex#L533)
>>>>  
>>>> still performs the `:erlang.system_info/0` call. 
>>>>
>>>> On Thursday 14 March 2024 at 17:15:40 UTC+1 jan.k...@gmail.com wrote:
>>>>
>>>>> I read the commit, and I don't it fixes what our actual problem was. 
>>>>> See my comment above. The problem is the actual call to :re.version, not 
>>>>> the recompilation of the regex
>>>>>
>>>>> On Thursday, March 14, 2024 at 4:37:43 PM UTC+1 José Valim wrote:
>>>>>
>>>>>> I have pushed a fix to main. But also note we provide precompiled 
>>>>>> Elixir versions per OTP version. Using a matching version will always 
>>>>>> give 
>>>>>> you the best results and that's not only about regexes. :)
>>>>>>
>>>>>> On Thu, Mar 14, 2024 at 2:20 PM Jan Krüger <jan.k...@gmail.com> 
>>>>>> wrote:
>>>>>>
>>>>>>> I've recently had to work on a code base that parses largish RDF XML 
>>>>>>> files. Part of the code base does relatively simple but regular 
>>>>>>> expression 
>>>>>>> matches, but since the files are large, quite a lot of Regex.run calls. 
>>>>>>> While profiling I've noticed, that there are callouts to 
>>>>>>> :erlang.system_info, which fetches the PCRE version BEAM was compiled 
>>>>>>> against.
>>>>>>>
>>>>>>> An example regular expression from the code base in question matches 
>>>>>>> the schema part of a URL. I've replaced Regex.run with erlang's :re.run 
>>>>>>> for 
>>>>>>> testing purposes, and at least for this case, there performance gain is 
>>>>>>> quite dramatic.
>>>>>>>
>>>>>>> Comparing fprof results:
>>>>>>>
>>>>>>> ```
>>>>>>> RDF.IRI.scheme/1                                               
>>>>>>> 1176473   30615.618    2354.355
>>>>>>> ---
>>>>>>> RDF.IRI.scheme/1                                               
>>>>>>> 1176473    3531.955    2353.905
>>>>>>> ```
>>>>>>>
>>>>>>> I found this thread in the google group, which actually talk about 
>>>>>>> the reasoning for fetching the version, and proposes and alternative.
>>>>>>>
>>>>>>>
>>>>>>> https://groups.google.com/g/elixir-lang-core/c/CgFdxIONvGg/m/HN9ryeVXAwAJ?pli=1
>>>>>>>
>>>>>>> Especially
>>>>>>>
>>>>>>> ```
>>>>>>> Taking a further look at the code, the issue with recompiling 
>>>>>>> regexes on the fly is that it makes executing the regexes more 
>>>>>>> expensive, 
>>>>>>> as we need to compute the version on every execution. We could store 
>>>>>>> the 
>>>>>>> version in ETS but that would have performance issues. Storing in a 
>>>>>>> persistent_term would be great, but at the moment we support Erlang/OTP 
>>>>>>> 20+. Thoughts?
>>>>>>> ```
>>>>>>>
>>>>>>> Since this has a fairly noticeable impact, at least on all tests 
>>>>>>> I've run, I wanted to start a discussion, if this could be 
>>>>>>> implemented/improved now.
>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "elixir-lang-core" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to elixir-lang-co...@googlegroups.com.
>>>>>>> To view this discussion on the web visit 
>>>>>>> https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com
>>>>>>>  
>>>>>>> <https://groups.google.com/d/msgid/elixir-lang-core/44d498c7-82a4-46d2-89be-7919400e0297n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>>
>>>>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elixir-lang-core" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elixir-lang-co...@googlegroups.com.
>>>
>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elixir-lang-core/507e6bd5-9be9-49a3-b039-45c2173fd509n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elixir-lang-core" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elixir-lang-core+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elixir-lang-core/fc14260c-67cb-4ee2-801d-6260794b24afn%40googlegroups.com.

Reply via email to