This is an automated email from the ASF dual-hosted git repository. nickva pushed a commit to tag 2.0.0 in repository https://gitbox.apache.org/repos/asf/couchdb-jiffy.git
commit 83a470a574c543812a11b6dd92dd2698eb00c615 Author: Nick Vatamaniuc <[email protected]> AuthorDate: Sat Apr 25 12:07:30 2026 -0400 Version 2.0.0 Performance =========== - **SIMD vectorization** for ASCII scan-ahead loops in both decoder string parsing and encoder string emission. This meant replacing the byte-at-a-time scans with 16/32-byte chunked compares. The most interesting part is it was done without writing a single line of assembly, just relying on compiler auto-vectorization. This showed a **15x** performance improvement on encoding large strings like in the "Issue 90" benchmark. To get even better auto-vectorizer behavior, it's advisable to set `-march=native` or `-march=x86-64-v3`. That can make the auto-vectorizers on recent compilers switch to using 256bit AVX2 registers and instructions. - **UTF-8 skip-ahead in encoder** and faster UTF-8 validation. This is like the scan-ahead loop for ASCII, but it's for UTF-8 validation. This helps quite a bit on non-ASCII, Unicode-heavy inputs. "UTF-8 unescaped" benchmark got a **5.8x** speedup from it. - Use **Ryu** for number encoding. This is the exact Ryu version from the latest Erlang/OTP release with all the updates and tweaks they added. This makes the float output the same as Erlang's. However, this means the output is not exactly the same as before for Jiffy (we used to emit more fractional digits, now it switches to the scientific notation a bit earlier). Number heavy benchmarks like "Canada" showed a **2x** speedup. - **ffc.h** for number parsing in the decoder. This is the fastest C number parser around at this time. I worked with the upstream author to add a new API to it parse JSON numbers as a single call which returns back either an integer or a double, as opposed pre-parsing to figure out which is which first (https://github.com/kolemannix/ffc.h/pull/22). Using this library yielded a **4x** speedup in the number-heavy "Canada" benchmark on decoding. - **Faster array and map creation** for building the result term in fewer steps. (In the processes discovered that maps with duplicates created from NIFs were subtly broken in Erlang https://github.com/erlang/otp/pull/10976. The fix is now merged and should be in the recent patch Erlang releases). This bulk creation improved decoding across the board. Some examples are **2.5x** for "JSON Generator", **2.6x** for "Github" and **3.3x** for "Blockchain". Most of those a mixed inputs so number parsing and scan-ahead played a role in there as well. - **Branch hints** (`JIFFY_LIKELY` / `JIFFY_UNLIKELY`) on encoder hot paths. I saw QuickJS library doing this, so experimented around and saw few percent speedup from it. - **Unity build**. Having handled a few issues over the years related to enabling, disabling and detecting LTO (Link-time optimization) compiler features, decided to side-step it and go with a unity build. This is where we include all the source file into one `jiffy.c` file and compile that. We get all the benefits of LTO but without having to juggle linker flags. Yielding & scheduler behavior - **Reduction count bumped to 4000** to match current Erlang VM defaults - **Bytes per reduction lowered** so cooperative yields fire more often on long input. This results in better latency under contention without a measurable throughput hit. Since Jiffy is a NIF, it's crucial for it to never block schedulers and always yield appropriately. As the concurrency increases it should degrade gracefully in proportion to the applied load. This is not a trivial task to accomplish in a NIF, in general. Some json library NIFs use dirty schedulers, however in cases where Jiffy is used that wouldn't work as that is still a limited resource and during high concurrency it would lead to bottlenecks. A separate benchmark, `bench_scheduling.sh` in https://github.com/nickva/bench runs concurrent JSON encoding and decoding scaled by the number of schedulers. Testing with a few Erlang json libraries shows something like this: ``` ./bench_scheduling.sh ... scheduler responsiveness check input: citm-catalog.json duration: 2000 schedulers: 12 online impls: json, jiffy, simdjsone, jsone, jsx [json] 1x encdec n=84 p50=135.0ms p95=182.9ms p99=191.9ms max=196.7ms 12x encdec n=86 p50=129.7ms p95=189.9ms p99=203.0ms max=206.2ms 24x encdec n=87 p50=263.0ms p95=461.2ms p99=506.1ms max=527.1ms [jiffy] 1x encdec n=309 p50=38.3ms p95=51.9ms p99=57.4ms max=66.5ms 12x encdec n=300 p50=41.2ms p95=52.5ms p99=59.7ms max=66.2ms 24x encdec n=306 p50=80.2ms p95=111.8ms p99=118.8ms max=140.1ms [simdjsone] 1x encdec n=20 p50=690.1ms p95=784.6ms p99=784.6ms max=784.8ms 12x encdec n=16 p50=790.9ms p95=887.5ms p99=887.5ms max=899.9ms 24x encdec n=24 p50=1448.4ms p95=1876.7ms p99=1879.5ms max=1882.7ms [jsone] 1x encdec n=60 p50=213.1ms p95=261.8ms p99=263.9ms max=264.8ms 12x encdec n=60 p50=204.9ms p95=329.8ms p99=345.0ms max=350.9ms 24x encdec n=52 p50=440.1ms p95=700.3ms p99=773.3ms max=817.3ms [jsx] 1x encdec n=24 p50=398.8ms p95=539.0ms p99=544.1ms max=548.3ms 12x encdec n=24 p50=391.5ms p95=684.9ms p99=687.0ms max=689.6ms 24x encdec n=24 p50=1181.3ms p95=1479.0ms p99=1558.1ms max=1654.7ms ``` There we measure both the latency of sending a term back and forth between two encoder/decoder processes, as well as the throughput (`n` is how many times we managed to do that). Features - **Pre-encoded JSON** — embed already-encoded JSON fragments directly in a value being encoded, saving a round-trip through the decoder. Use `{json, IoData}` terms and they will be embedded in the emitted stream as is. This was a surprisingly popular feature over the years. Paul J. Davis (Jiffy's original author) suggested a nice and quick patch to make it work so I went with that. - **Encode UTF-8 atoms** (on OTP 26+ only!) atoms with non-ASCII bytes now encode as their UTF-8 source. Unfortunately this is for OTP 26+ only. - **Number-as-key encoding** — integer/float map keys are encoded as string keys instead of erroring. Both Python and Erlang/OTP's built-in json already does this. Correctness & compliance ======================== - **RFC 8259 100% compliance.** A new test suite based on `nst/JSONTestSuite` is wired in and all conformance tests pass. - **Big List of Naughty Strings (BLNS)** added in the test mix. Build & CI ========== - **OTP 21** is the new minimum. - **C coverage checks** added so the test suite reports per-file C line coverage; several uncovered paths were closed during this work. --- src/jiffy.app.src | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/jiffy.app.src b/src/jiffy.app.src index 1908c8c..1dce860 100644 --- a/src/jiffy.app.src +++ b/src/jiffy.app.src @@ -1,9 +1,9 @@ {application, jiffy, [ {description, "JSON Decoder/Encoder."}, - {vsn, "1.1.2"}, + {vsn, "2.0.0"}, {registered, []}, {applications, [kernel, stdlib, xmerl]}, - {maintainers, ["Paul J. Davis"]}, + {maintainers, ["Paul J. Davis", "Nick Vatamaniuc"]}, {licenses, ["MIT", "BSD"]}, {links, [{"GitHub", "https://github.com/davisp/jiffy"}]}, {files, [
