Re: [PR] Add hooks to json encoder to override default encoding or add support for unsupported types [arrow-rs]

via GitHub Wed, 12 Mar 2025 14:42:38 -0700


adriangb commented on code in PR #7015:
URL: https://github.com/apache/arrow-rs/pull/7015#discussion_r1992335818



##########
arrow-json/src/writer/encoder.rs:
##########
@@ -196,9 +236,18 @@ impl Encoder for StructArrayEncoder<'_> {
         let mut is_first = true;
         // Nulls can only be dropped in explicit mode
         let drop_nulls = (self.struct_mode == StructMode::ObjectOnly) && 
!self.explicit_nulls;
-        for field_encoder in &mut self.encoders {
-            let is_null = is_some_and(field_encoder.nulls.as_ref(), |n| 
n.is_null(idx));
-            if drop_nulls && is_null {
+
+        // Collect all the field nulls buffers up front to avoid dynamic 
dispatch in the loop
+        // This creates a temporary Vec, but avoids repeated virtual calls 
which should be a net win

Review Comment:
   ```
   ❯ git checkout main && cargo bench --bench json_writer && git checkout 
arrow-union && cargo bench --bench json_writer
   Switched to branch 'main'
   Your branch is ahead of 'origin/main' by 6 commits.
     (use "git push" to publish your local commits)
      Compiling arrow-json v54.2.1 (/Users/adriangb/GitHub/arrow-rs/arrow-json)
      Compiling arrow v54.2.1 (/Users/adriangb/GitHub/arrow-rs/arrow)
       Finished `bench` profile [optimized] target(s) in 3.03s
        Running benches/json_writer.rs 
(target/release/deps/json_writer-8d99cf5d0d08efba)
   Gnuplot not found, using plotters backend
   bench_integer           time:   [2.9239 ms 2.9488 ms 2.9810 ms]
                           change: [-1.4155% -0.3843% +0.7943%] (p = 0.52 > 
0.05)
                           No change in performance detected.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   bench_float             time:   [3.4054 ms 3.4216 ms 3.4394 ms]
                           change: [-3.7264% -2.9504% -2.1609%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 15 outliers among 100 measurements (15.00%)
     8 (8.00%) high mild
     7 (7.00%) high severe
   
   bench_string            time:   [8.8852 ms 8.9466 ms 9.0106 ms]
                           change: [-1.3959% -0.5660% +0.3531%] (p = 0.21 > 
0.05)
                           No change in performance detected.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   bench_mixed             time:   [11.975 ms 12.030 ms 12.087 ms]
                           change: [-3.7701% -3.0819% -2.4763%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 5 outliers among 100 measurements (5.00%)
     5 (5.00%) high mild
   
   bench_dict_array        time:   [4.6221 ms 4.6771 ms 4.7345 ms]
                           change: [-2.4806% -0.5873% +1.2096%] (p = 0.53 > 
0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   bench_struct            time:   [19.521 ms 19.623 ms 19.731 ms]
                           change: [-0.9078% +0.1000% +0.9733%] (p = 0.84 > 
0.05)
                           No change in performance detected.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   bench_nullable_struct   time:   [9.4059 ms 9.4565 ms 9.5092 ms]
                           change: [-3.5737% -2.7898% -1.9920%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   bench_list              time:   [27.854 ms 27.976 ms 28.118 ms]
                           change: [-0.8182% -0.3309% +0.1897%] (p = 0.22 > 
0.05)
                           No change in performance detected.
   Found 10 outliers among 100 measurements (10.00%)
     7 (7.00%) high mild
     3 (3.00%) high severe
   
   bench_nullable_list     time:   [5.9283 ms 5.9849 ms 6.0429 ms]
                           change: [-8.7315% -6.8046% -5.0511%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   bench_struct_list       time:   [2.9264 ms 2.9694 ms 3.0130 ms]
                           change: [-5.8590% -3.8633% -1.8735%] (p = 0.00 < 
0.05)
                           Performance has improved.
   
   Switched to branch 'arrow-union'
   Your branch is up to date with 'origin/arrow-union'.
      Compiling arrow-json v54.2.1 (/Users/adriangb/GitHub/arrow-rs/arrow-json)
      Compiling arrow v54.2.1 (/Users/adriangb/GitHub/arrow-rs/arrow)
       Finished `bench` profile [optimized] target(s) in 3.03s
        Running benches/json_writer.rs 
(target/release/deps/json_writer-8d99cf5d0d08efba)
   Gnuplot not found, using plotters backend
   bench_integer           time:   [2.9105 ms 2.9313 ms 2.9542 ms]
                           change: [-1.8789% -0.5924% +0.5939%] (p = 0.36 > 
0.05)
                           No change in performance detected.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   bench_float             time:   [3.3695 ms 3.3854 ms 3.4041 ms]
                           change: [-1.7458% -1.0557% -0.3581%] (p = 0.00 < 
0.05)
                           Change within noise threshold.
   Found 6 outliers among 100 measurements (6.00%)
     5 (5.00%) high mild
     1 (1.00%) high severe
   
   bench_string            time:   [8.7954 ms 8.8473 ms 8.9044 ms]
                           change: [-2.0603% -1.1094% -0.2183%] (p = 0.02 < 
0.05)
                           Change within noise threshold.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high severe
   
   bench_mixed             time:   [12.034 ms 12.099 ms 12.170 ms]
                           change: [-0.1335% +0.5726% +1.3337%] (p = 0.13 > 
0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   bench_dict_array        time:   [4.4275 ms 4.4639 ms 4.5020 ms]
                           change: [-5.9236% -4.5580% -3.1699%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 2 outliers among 100 measurements (2.00%)
     2 (2.00%) high mild
   
   bench_struct            time:   [19.450 ms 19.538 ms 19.640 ms]
                           change: [-1.1211% -0.4330% +0.3004%] (p = 0.25 > 
0.05)
                           No change in performance detected.
   Found 6 outliers among 100 measurements (6.00%)
     3 (3.00%) high mild
     3 (3.00%) high severe
   
   bench_nullable_struct   time:   [9.4349 ms 9.4862 ms 9.5417 ms]
                           change: [-0.4570% +0.3147% +1.0725%] (p = 0.43 > 
0.05)
                           No change in performance detected.
   Found 2 outliers among 100 measurements (2.00%)
     1 (1.00%) high mild
     1 (1.00%) high severe
   
   bench_list              time:   [27.799 ms 27.901 ms 28.015 ms]
                           change: [-0.8950% -0.2687% +0.2907%] (p = 0.40 > 
0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   
   bench_nullable_list     time:   [5.9567 ms 6.0222 ms 6.0886 ms]
                           change: [-0.9060% +0.6229% +2.1762%] (p = 0.41 > 
0.05)
                           No change in performance detected.
   
   bench_struct_list       time:   [2.8755 ms 2.9589 ms 3.0925 ms]
                           change: [-3.6152% -0.3533% +4.3395%] (p = 0.89 > 
0.05)
                           No change in performance detected.
   Found 4 outliers among 100 measurements (4.00%)
     3 (3.00%) high mild
     1 (1.00%) high severe
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add hooks to json encoder to override default encoding or add support for unsupported types [arrow-rs]

Reply via email to