Benjamin-Philip opened a new issue, #5801: URL: https://github.com/apache/couchdb/issues/5801
Presently, url-safe base64 encoding is handled by `b64url` NIF at [src/b64url](https://github.com/apache/couchdb/tree/main/src/b64url). However, support for RFC 4648 compliant url-safe encoding was [added](https://github.com/erlang/otp/commit/05e61dc7eb568cc5a5db965dcc3534fb6c9aa66d) to Erlang stdlib's `base64` in [Erlang/OTP 26.0](https://github.com/erlang/otp/releases/tag/OTP-26.0). Additionally, encoding was made upto [4 times faster](https://www.erlang.org/blog/otp-26-highlights/#improvements-in-the-erlang-compiler-and-jit) thanks to the JIT compiler that was merged in the same release. Benchmarking `base64` and `b64url` with [benchee](https://github.com/bencheeorg/benchee)[^1], with the following benchmark, we find that the built-in `base64` is faster: ```elixir Mix.install([:benchee, {:b64url, github: "apache/couchdb", sparse: "src/b64url/"}]) defmodule B64Bench do def main do [workers, min_size, max_size, duration, entries] = Enum.map(System.argv(), &String.to_integer/1) bytes = 1..entries |> Enum.to_list() |> Enum.map(fn _ -> :crypto.strong_rand_bytes(min_size + :rand.uniform(max_size - min_size)) end) Benchee.run( %{ "b64url" => fn input -> process(input, &:b64url.encode/1, &:b64url.decode/1) end, "base64 (standard) + re" => fn input -> process( input, fn url -> url = :erlang.iolist_to_binary(:re.replace(:base64.encode(url), "=+$", "")) url = :erlang.iolist_to_binary(:re.replace(url, "/", "_", [:global])) :erlang.iolist_to_binary(:re.replace(url, "\\+", "-", [:global])) end, fn url64 -> url64 = :erlang.iolist_to_binary(url64) url64 = :erlang.iolist_to_binary(:re.replace(url64, "-", "+", [:global])) url64 = :erlang.iolist_to_binary(:re.replace(url64, "_", "/", [:global])) padding = :erlang.list_to_binary( :lists.duplicate(rem(4 - rem(:erlang.size(url64), 4), 4), 61) ) :base64.decode(<<url64::binary, padding::binary>>) end ) end, "base64 (urlsafe)" => fn input -> process( input, &:base64.encode(&1, %{mode: :urlsafe}), &:base64.decode(&1, %{mode: :urlsafe}) ) end }, parallel: workers, time: duration, inputs: %{"generated" => bytes} ) IO.inspect(:erlang.byte_size(Enum.join(bytes)), label: "Total size (B)") end def process(bytes, encode, decode) do Enum.each(bytes, fn bin -> decode.(encode.(bin)) end) end end B64Bench.main() ``` ``` $ elixir b64_bench.exs 4 10 100 60 100 Operating System: Linux CPU Information: 12th Gen Intel(R) Core(TM) i7-1255U Number of Available Cores: 12 Available memory: 15.31 GB Elixir 1.18.4 Erlang 28.0.2 JIT enabled: true Benchmark suite executing with the following configuration: warmup: 2 s time: 1 min memory time: 0 ns reduction time: 0 ns parallel: 4 inputs: generated Estimated total run time: 3 min 6 s Excluding outliers: false Benchmarking b64url with input generated ... Benchmarking base64 (standard) + re with input generated ... Benchmarking base64 (urlsafe) with input generated ... Calculating statistics... Formatting results... ##### With input generated ##### Name ips average deviation median 99th % base64 (urlsafe) 8.18 K 122.31 μs ±44.53% 110.85 μs 296.40 μs b64url 6.18 K 161.88 μs ±56.17% 139.65 μs 609.83 μs base64 (standard) + re 0.74 K 1345.47 μs ±32.12% 1076.60 μs 2221.58 μs Comparison: base64 (urlsafe) 8.18 K b64url 6.18 K - 1.32x slower +39.57 μs base64 (standard) + re 0.74 K - 11.00x slower +1223.16 μs Total size (B): 5491 ``` Therefore I propose we drop `b64url` in favour of the stdlib functions. This has the following benefits: - Less code to maintain - (marginally) better peformance - Enhanced safety (by way of eliminating an NIF) [^1]: I found updating the existing benchmarks to compare 3 or more implementations tedious. The new benchmark's arguments are similar to the previous benchmark, with the exception of an extra parameter `entries`, which is the number of random binaries to encode. The `ips` results are directly proportional to the previous `bps` and can be converted to `bps` by multiplying by total size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
