ding-young commented on PR #7917: URL: https://github.com/apache/arrow-rs/pull/7917#issuecomment-3077375154
- cargo bench result | Case (str_len, null prob) | main | issue-6057 | |---------------------------|--------------|--------------| | string view(10, 0) | 51.23 µs | 52.18 µs | | string view(30, 0) | 45.47 µs | 46.63 µs | | string view(100, 0) | 64.18 µs | 68.54 µs | | string view(100, 0.5) | 70.11 µs | 74.06 µs | | string view(1..100, 0) | 100.72 µs | 103.80 µs | | string view(1..100, 0.5) | 80.48 µs | 86.02 µs | - manual memory profiling result (*unit = B) I added code to get jemalloc stats (allocate, resident, active) before and after decoding binary view, and the memory usage actually improved especially when short strings are mixed up with large strings. When given rows consists of only large strings, the memory usage was the same. ```rust let before = jemalloc_stat(); let view = if !validate_utf8 { decode_binary_view_inner_utf8_unchecked(rows, options) } else { decode_binary_view_inner(rows, options, validate_utf8) }; let after = jemalloc_stat(); // print ( after - before ) ``` (To reproduce, see https://github.com/ding-young/arrow-rs/tree/issue-6057-bench-mem ) | Case | main (alloc / active) | issue-6057 (alloc / active) | |---------------------------|----------------------|-----------------------------| | string view(10, 0) | **102656 / 114688** | **65536 / 69632** | | string view(30, 0) | 196608 / 204800 | 196608 / 204800 | | string view(100, 0) | 524288 / 532480 | 524288 / 532480 | | string view(100, 0.5) | 294912 / 303104 | 294912 / 303104 | | string view(1..100, 0) | 294912 / 303104 | 294912 / 303104 | | string view(1..100, 0.5) | **180224 / 188416** | **163840 / 172032** | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org