Benjamin-Philip commented on issue #5801:
URL: https://github.com/apache/couchdb/issues/5801#issuecomment-3656996512

   I looked into this, and yes, `b64url` is faster on sizes above 100 bytes.
   
   # Benchmark
   
   I made some improvements in this 
[patch](https://github.com/user-attachments/files/24171160/0001-Remove-generation-overhead-from-b64url-benchmark.patch)
 (changes explained in the commit msg). 
   
   Applying the patch:
   
   ```sh
   git checkout -b b64url-bench
   git am 
~/Downloads/0001-Remove-generation-overhead-from-b64url-benchmark.patch
   ```
   
   and then benchmarking for different sizes:
   
   ```sh
   cd src/b64url
   alias bench="ERL_LIBS=_build/default/lib/b64url/ ./test/benchmark.escript"
   
   for power in $(seq 1 3); do
       bench 1 $((10 ** power)) $((10 ** ($power + 1))) 60 100             
   done
   ```
   
   I finally get:
   
   ```
   Workers: 1, MinSize: 10, MaxSize: 100, Duration: 60, SampleSize: 100
   erl :     4752923375 bytes /  60 seconds =    79215389.58 bps
   nif :     3055668280 bytes /  60 seconds =    50927804.67 bps
   1.5554448125501372 times slower
   Workers: 1, MinSize: 100, MaxSize: 1000, Duration: 60, SampleSize: 100
   nif :    19462006451 bytes /  60 seconds =   324366774.18 bps
   erl :     8901825341 bytes /  60 seconds =   148363755.68 bps
   2.186293901022968 times slower
   Workers: 1, MinSize: 1000, MaxSize: 10000, Duration: 60, SampleSize: 100
   nif :    29760976505 bytes /  60 seconds =   496016275.08 bps
   erl :    11500940039 bytes /  60 seconds =   191682333.98 bps
   2.5876994753541642 times slower
   ```
   
   As you can see, the erlang version becomes progressively slower as the size 
increases (peaking at about 50 bytes), and the difference is significant (upto 
2.5x).
   
   If you calculate the difference in performance on 10,000 bytes, you get a 
sub-millisecond value: 0.032 ms.
   
   Additionally, if you compare that last range as the number of parallel 
workers increases, the difference decreases to just 1.87x:
   
   ```sh
   for power in $(seq 1 3); do
       bench $((10 ** power)) 1000 10000 60 100             
   done
   
   Workers: 10, MinSize: 1000, MaxSize: 10000, Duration: 60, SampleSize: 100
   nif :   114522454433 bytes /  60 seconds =  1908707573.88 bps
   erl :    51406532861 bytes /  60 seconds =   856775547.68 bps
   2.2277801683817393 times slower
   Workers: 100, MinSize: 1000, MaxSize: 10000, Duration: 60, SampleSize: 100
   nif :   100150195411 bytes /  60 seconds =  1669169923.52 bps
   erl :    46881336505 bytes /  60 seconds =   781355608.42 bps
   2.136248726618934 times slower
   Workers: 1000, MinSize: 1000, MaxSize: 10000, Duration: 60, SampleSize: 100
   nif :    83513632230 bytes /  60 seconds =  1391893870.50 bps
   erl :    44569748335 bytes /  60 seconds =   742829138.92 bps
   1.873773924014238 times slower
   ```
   
   For reference this is my environment:
   
   ```
   $ inxi -MSC      
   System:
     Host: rivendell Kernel: 6.17.11-200.fc42.x86_64 arch: x86_64 bits: 64
     Desktop: GNOME v: 48.7 Distro: Fedora Linux 42 (Workstation Edition)
   Machine:
     Type: Laptop System: LENOVO product: 21C1S0SM00 v: ThinkPad L14 Gen 3
       serial: <superuser required>
     Mobo: LENOVO model: 21C1S0SM00 serial: <superuser required> UEFI: LENOVO
       v: R1XET54W (1.36 ) date: 07/01/2024
   CPU:
     Info: 10-core (2-mt/8-st) model: 12th Gen Intel Core i7-1255U bits: 64
       type: MST AMCP cache: L2: 6.5 MiB
     Speed (MHz): avg: 4481 min/max: 400/4700:3500 cores: 1: 4481 2: 4481
       3: 4481 4: 4481 5: 4481 6: 4481 7: 4481 8: 4481 9: 4481 10: 4481 11: 4481
       12: 4481
   $ erl -s erlang halt
   
   Erlang/OTP 28 [erts-16.0.2] [source] [64-bit] [smp:12:12] [ds:12:12:10] 
[async-threads:1] [jit:ns]
   ```
   
   ## Conclusion
   
   The point I'm trying to make is that in my mind a 0.032 ms overhead on the 
worst-case input is still competitive. In the real world, this difference might 
even be lesser. 
   
   Ultimately, it comes down to whether the sub-millisecond overhead in an 
acceptable tradeoff for eliminating an entire submodule (and better complying 
with Erlang standard practice). I'm not familiar with what components are 
impacted by b64url's performance, and I don't know how performance sensitive 
couchdb's users are.
   
   You might also conclude that removing b64url doesn't make any meaningful 
improvement to the maintenance workload since b64url is so rarely updated. 
   
   # Moving Forward
   
   Moving forward, I see 3 options:
   
   ## Option 1 - Completely replace b64url
   
   We completely replace b64url if we're not too performance sensitive, and the 
tradeoff is worthwhile.
   
   ## Option 2 - Keep everything as is
   
   We make no changes if we are performance sensitive, and revisit this later 
when stdlib performance improves.
   
   ## Option 3 - Replace b64url for data less than 100 bytes
   
   If we're hyper performance sensitive, we handle binaries less than 100 bytes 
with `base64` to take advantage of the speed up on small binaries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to