Benjamin-Philip opened a new issue, #5801:
URL: https://github.com/apache/couchdb/issues/5801

   Presently, url-safe base64 encoding is handled by `b64url` NIF at 
[src/b64url](https://github.com/apache/couchdb/tree/main/src/b64url). However, 
support for RFC 4648 compliant url-safe encoding was 
[added](https://github.com/erlang/otp/commit/05e61dc7eb568cc5a5db965dcc3534fb6c9aa66d)
 to Erlang stdlib's `base64` in [Erlang/OTP 
26.0](https://github.com/erlang/otp/releases/tag/OTP-26.0). Additionally, 
encoding was made upto [4 times 
faster](https://www.erlang.org/blog/otp-26-highlights/#improvements-in-the-erlang-compiler-and-jit)
 thanks to the JIT compiler that was merged in the same release.
   
   Benchmarking `base64` and `b64url` with 
[benchee](https://github.com/bencheeorg/benchee)[^1], with the following 
benchmark, we find that the built-in `base64` is faster:
   
   ```elixir
   Mix.install([:benchee, {:b64url, github: "apache/couchdb", sparse: 
"src/b64url/"}])
   
   defmodule B64Bench do
     def main do
       [workers, min_size, max_size, duration, entries] =
         Enum.map(System.argv(), &String.to_integer/1)
   
       bytes =
         1..entries
         |> Enum.to_list()
         |> Enum.map(fn _ ->
           :crypto.strong_rand_bytes(min_size + :rand.uniform(max_size - 
min_size))
         end)
   
       Benchee.run(
         %{
           "b64url" => fn input -> process(input, &:b64url.encode/1, 
&:b64url.decode/1) end,
           "base64 (standard) + re" => fn input ->
             process(
               input,
               fn url ->
                 url = 
:erlang.iolist_to_binary(:re.replace(:base64.encode(url), "=+$", ""))
                 url = :erlang.iolist_to_binary(:re.replace(url, "/", "_", 
[:global]))
                 :erlang.iolist_to_binary(:re.replace(url, "\\+", "-", 
[:global]))
               end,
               fn url64 ->
                 url64 = :erlang.iolist_to_binary(url64)
                 url64 = :erlang.iolist_to_binary(:re.replace(url64, "-", "+", 
[:global]))
                 url64 = :erlang.iolist_to_binary(:re.replace(url64, "_", "/", 
[:global]))
   
                 padding =
                   :erlang.list_to_binary(
                     :lists.duplicate(rem(4 - rem(:erlang.size(url64), 4), 4), 
61)
                   )
   
                 :base64.decode(<<url64::binary, padding::binary>>)
               end
             )
           end,
           "base64 (urlsafe)" => fn input ->
             process(
               input,
               &:base64.encode(&1, %{mode: :urlsafe}),
               &:base64.decode(&1, %{mode: :urlsafe})
             )
           end
         },
         parallel: workers,
         time: duration,
         inputs: %{"generated" => bytes}
       )
   
       IO.inspect(:erlang.byte_size(Enum.join(bytes)), label: "Total size (B)")
     end
   
     def process(bytes, encode, decode) do
       Enum.each(bytes, fn bin -> decode.(encode.(bin)) end)
     end
   end
   
   B64Bench.main()
   ```
   
   ```
   $ elixir b64_bench.exs 4 10 100 60 100
   Operating System: Linux
   CPU Information: 12th Gen Intel(R) Core(TM) i7-1255U
   Number of Available Cores: 12
   Available memory: 15.31 GB
   Elixir 1.18.4
   Erlang 28.0.2
   JIT enabled: true
   
   Benchmark suite executing with the following configuration:
   warmup: 2 s
   time: 1 min
   memory time: 0 ns
   reduction time: 0 ns
   parallel: 4
   inputs: generated
   Estimated total run time: 3 min 6 s
   Excluding outliers: false
   
   Benchmarking b64url with input generated ...
   Benchmarking base64 (standard) + re with input generated ...
   Benchmarking base64 (urlsafe) with input generated ...
   Calculating statistics...
   Formatting results...
   
   ##### With input generated #####
   Name                             ips        average  deviation         
median         99th %
   base64 (urlsafe)              8.18 K      122.31 μs    ±44.53%      110.85 
μs      296.40 μs
   b64url                        6.18 K      161.88 μs    ±56.17%      139.65 
μs      609.83 μs
   base64 (standard) + re        0.74 K     1345.47 μs    ±32.12%     1076.60 
μs     2221.58 μs
   
   Comparison: 
   base64 (urlsafe)              8.18 K
   b64url                        6.18 K - 1.32x slower +39.57 μs
   base64 (standard) + re        0.74 K - 11.00x slower +1223.16 μs
   Total size (B): 5491
   ```
   
   Therefore I propose we drop `b64url` in favour of the stdlib functions. This 
has the following benefits:
   
   - Less code to maintain
   - (marginally) better peformance
   - Enhanced safety (by way of eliminating an NIF)
   
   [^1]: I found updating the existing benchmarks to compare 3 or more 
implementations tedious. The new benchmark's arguments are similar to the 
previous benchmark, with the exception of an extra parameter `entries`, which 
is the number of random binaries to encode. The `ips` results are directly 
proportional to the previous `bps` and can be converted to `bps` by multiplying 
by total size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to