Re: [PATCH] Add a gzip fastpath for the xmalloc readers, v3
On Sun, Dec 7, 2014, at 01:51, Denys Vlasenko wrote: It misses the plus one zero byte behavior of the non-compressed path (xmalloc_read). I bet some callers depend on it, using str* functions on the returned buffer. Fixed, and committed to git. Please try it. Results are correct and speed is equal to my patch. All good, thanks. - Lauri -- http://www.fastmail.com - Send your email first class ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: [PATCH] Add a gzip fastpath for the xmalloc readers, v3
Please see attached patch (only compile-tested so far). It builds on your work, implementing in-memory compression for all unpackers we have. Unfortunately, bloatcheck is not inspiring: function old new delta transformer_write - 133+133 setup_transformer_on_fd- 133+133 complete_cmd_dir_file773 895+122 open_transformer_state - 92 +92 xmalloc_open_zipped_read_close63 112 +49 xtransformer_write - 24 +24 unpack_lzma_stream 27172737 +20 unpack_xz_stream23932407 +14 unpack_Z_stream 11731183 +10 unpack_gz_stream 693 696 +3 decode_one_format726 729 +3 refresh 665 667 +2 inflate_unzip111 113 +2 unpack_bz2_stream359 360 +1 swap_on_off_main 420 418 -2 scan_recursive 380 378 -2 inflate_unzip_internal 19451924 -21 open_zipped 89 47 -42 setup_unzip_on_fd142 53 -89 -- (add/remove: 4/0 grow/shrink: 10/5 up/down: 608/-156) Total: 452 bytes textdata bss dec hex filename 923218 928 17684 941830 e5f06 busybox_old 923686 928 17684 942298 e60da busybox_unstripped ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: [PATCH] Add a gzip fastpath for the xmalloc readers, v3
On Sat, Dec 6, 2014 at 5:11 PM, Denys Vlasenko vda.li...@googlemail.com wrote: Please see attached patch (only compile-tested so far). It builds on your work, implementing in-memory compression for all unpackers we have. 1.patch Description: Binary data ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: [PATCH] Add a gzip fastpath for the xmalloc readers, v3
On Sat, Dec 6, 2014, at 18:11, Denys Vlasenko wrote: Please see attached patch (only compile-tested so far). It builds on your work, implementing in-memory compression for all unpackers we have. It misses the plus one zero byte behavior of the non-compressed path (xmalloc_read). I bet some callers depend on it, using str* functions on the returned buffer. - Lauri -- http://www.fastmail.com - Same, same, but different... ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
Re: [PATCH] Add a gzip fastpath for the xmalloc readers, v2
On Fri, Nov 28, 2014 at 10:44 AM, Lauri Kasanen cur...@operamail.com wrote: v2: Add missing check on open The performance and number of processes for a depmod -a with gzipped modules was abysmal. This patch adds a fast path without fork for well- behaved gzip files, benefiting all users of xmalloc_open_zipped_read_close. modinfo radeon.ko.gz, a single-file reader, got 30% faster. depmod -a, which used to fork over 800 times, got 20% faster. And of course a whole lot less processes - much saved RAM. function old new delta inflate_get_next_window-1877 +1877 xmalloc_unpack_gz - 356+356 check_header_gzip - 298+298 xmalloc_inflate_unzip_internal - 223+223 xmalloc_open_zipped_read_close73 176+103 inflate_init - 97 +97 inflate_store_unused - 35 +35 unpack_gz_stream 567 299-268 inflate_unzip_internal 2304 172 -2132 -- (add/remove: 6/0 grow/shrink: 1/2 up/down: 2989/-2400)Total: 589 bytes This feels somewhat big. Looking at the code, you have significant code duplication. I think a following approach can work: * Extend transformer_aux_data_t so that it can specify I want decompressed data to go into a mem buffer. Say, size_t aux-mem_output_size. If 0, it's the maximum amount of bytes you allow to decompress into it (0 means decompress into dst_fd). The result is char *aux-mem_output_buf. * Pass aux pointer into inflate_unzip_internal(). * There, modify a decompression loop so that it stores result in aux-mem_output_buf if aux-mem_output_size 0 (xreallocing it so that it grows with decompression) * In xmalloc_open_zipped_read, you only need to construct a suitable aux, call unpack_gz_stream(), and get aux-mem_output_buf as your result buffer. This way, you don't need xmalloc_unpack_gz() and xmalloc_inflate_unzip_internal(). The special-casing of GZ in xmalloc_open_zipped_read looks ugly, but demanding a generic mechanism for in-memory unpacking is too much for one patch... ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox
[PATCH] Add a gzip fastpath for the xmalloc readers, v3
v3: Use the aux struct as requested by Denys v2: Add missing check on open The performance and number of processes for a depmod -a with gzipped modules was abysmal. This patch adds a fast path without fork for well- behaved gzip files, benefiting all users of xmalloc_open_zipped_read_close. modinfo radeon.ko.gz, a single-file reader, got 30% faster. depmod -a, which used to fork over 800 times, got 20% faster. And of course a whole lot less processes - much saved RAM. function old new delta inflate_unzip_internal 23042521+217 xmalloc_open_zipped_read_close73 201+128 unpack_gz_stream 567 570 +3 -- (add/remove: 0/0 grow/shrink: 3/0 up/down: 348/0) Total: 348 bytes -- http://www.fastmail.com - Does exactly what it says on the tin From 5ad6804ed4485eae176da45524ea848a00b11929 Mon Sep 17 00:00:00 2001 From: Lauri Kasanen cur...@operamail.com Date: Sun, 30 Nov 2014 21:37:10 +0200 Subject: [PATCH] Add a gzip fastpath for the xmalloc readers, v3 v3: Use the aux struct as requested by Denys v2: Add missing check on open The performance and number of processes for a depmod -a with gzipped modules was abysmal. This patch adds a fast path without fork for well- behaved gzip files, benefiting all users of xmalloc_open_zipped_read_close. modinfo radeon.ko.gz, a single-file reader, got 30% faster. depmod -a, which used to fork over 800 times, got 20% faster. And of course a whole lot less processes - much saved RAM. function old new delta inflate_unzip_internal 23042521+217 xmalloc_open_zipped_read_close73 201+128 unpack_gz_stream 567 570 +3 -- (add/remove: 0/0 grow/shrink: 3/0 up/down: 348/0) Total: 348 bytes Signed-off-by: Lauri Kasanen cur...@operamail.com --- archival/libarchive/decompress_gunzip.c | 38 - archival/libarchive/open_transformer.c | 31 ++- include/bb_archive.h| 2 ++ 3 files changed, 65 insertions(+), 6 deletions(-) diff --git a/archival/libarchive/decompress_gunzip.c b/archival/libarchive/decompress_gunzip.c index 7c6f38e..938d21f 100644 --- a/archival/libarchive/decompress_gunzip.c +++ b/archival/libarchive/decompress_gunzip.c @@ -971,10 +971,11 @@ static int inflate_get_next_window(STATE_PARAM_ONLY) /* Called from unpack_gz_stream() and inflate_unzip() */ static IF_DESKTOP(long long) int -inflate_unzip_internal(STATE_PARAM int in, int out) +inflate_unzip_internal(STATE_PARAM transformer_aux_data_t *aux, int in, int out) { IF_DESKTOP(long long) int n = 0; ssize_t nwrote; + size_t bufsize = 0; /* Allocate all global buffers (for DYN_ALLOC option) */ gunzip_window = xmalloc(GUNZIP_WSIZE); @@ -1002,16 +1003,43 @@ inflate_unzip_internal(STATE_PARAM int in, int out) while (1) { int r = inflate_get_next_window(PASS_STATE_ONLY); - nwrote = full_write(out, gunzip_window, gunzip_outbuf_count); + + if (aux-mem_output_size) { + nwrote = gunzip_outbuf_count; + if (gunzip_outbuf_count + n bufsize) { + // increase by four blocks each time + const size_t newsize = bufsize + 4 * gunzip_outbuf_count + 1; + aux-mem_output_buf = xrealloc(aux-mem_output_buf, newsize); + bufsize = newsize; + } + + if (bufsize aux-mem_output_size) { + free(aux-mem_output_buf); + aux-mem_output_buf = NULL; + n = -1; + goto ret; + } + + memcpy(aux-mem_output_buf + n, gunzip_window, gunzip_outbuf_count); + } else { + nwrote = full_write(out, gunzip_window, gunzip_outbuf_count); + } if (nwrote != (ssize_t)gunzip_outbuf_count) { bb_perror_msg(write); n = -1; goto ret; } - IF_DESKTOP(n += nwrote;) + n += nwrote; if (r == 0) break; } + /* Final realloc, plus zero byte */ + if (aux-mem_output_size) { + aux-mem_output_buf = xrealloc(aux-mem_output_buf, n + 1); + aux-mem_output_size = n; + aux-mem_output_buf[n] = '\0
[PATCH] Add a gzip fastpath for the xmalloc readers, v2
v2: Add missing check on open The performance and number of processes for a depmod -a with gzipped modules was abysmal. This patch adds a fast path without fork for well- behaved gzip files, benefiting all users of xmalloc_open_zipped_read_close. modinfo radeon.ko.gz, a single-file reader, got 30% faster. depmod -a, which used to fork over 800 times, got 20% faster. And of course a whole lot less processes - much saved RAM. function old new delta inflate_get_next_window-1877 +1877 xmalloc_unpack_gz - 356+356 check_header_gzip - 298+298 xmalloc_inflate_unzip_internal - 223+223 xmalloc_open_zipped_read_close73 176+103 inflate_init - 97 +97 inflate_store_unused - 35 +35 unpack_gz_stream 567 299-268 inflate_unzip_internal 2304 172 -2132 -- (add/remove: 6/0 grow/shrink: 1/2 up/down: 2989/-2400)Total: 589 bytes -- http://www.fastmail.com - The way an email service should be From e5a58da5d54b8a8e2eef657ac6be60231683a662 Mon Sep 17 00:00:00 2001 From: Lauri Kasanen cur...@operamail.com Date: Thu, 27 Nov 2014 14:48:17 +0200 Subject: [PATCH] Add a gzip fastpath for the xmalloc readers v2: Add missing check on open The performance and number of processes for a depmod -a with gzipped modules was abysmal. This patch adds a fast path without fork for well- behaved gzip files, benefiting all users of xmalloc_open_zipped_read_close. modinfo radeon.ko.gz, a single-file reader, got 30% faster. depmod -a, which used to fork over 800 times, got 20% faster. And of course a whole lot less processes - much saved RAM. function old new delta inflate_get_next_window-1877 +1877 xmalloc_unpack_gz - 356+356 check_header_gzip - 298+298 xmalloc_inflate_unzip_internal - 223+223 xmalloc_open_zipped_read_close73 176+103 inflate_init - 97 +97 inflate_store_unused - 35 +35 unpack_gz_stream 567 299-268 inflate_unzip_internal 2304 172 -2132 -- (add/remove: 6/0 grow/shrink: 1/2 up/down: 2989/-2400)Total: 589 bytes Signed-off-by: Lauri Kasanen cur...@operamail.com --- archival/libarchive/decompress_gunzip.c | 155 archival/libarchive/open_transformer.c | 20 + include/bb_archive.h| 1 + 3 files changed, 160 insertions(+), 16 deletions(-) diff --git a/archival/libarchive/decompress_gunzip.c b/archival/libarchive/decompress_gunzip.c index 7c6f38e..1f68ebd 100644 --- a/archival/libarchive/decompress_gunzip.c +++ b/archival/libarchive/decompress_gunzip.c @@ -968,19 +968,12 @@ static int inflate_get_next_window(STATE_PARAM_ONLY) /* Doesnt get here */ } - -/* Called from unpack_gz_stream() and inflate_unzip() */ -static IF_DESKTOP(long long) int -inflate_unzip_internal(STATE_PARAM int in, int out) +static void inflate_init(STATE_PARAM_ONLY) { - IF_DESKTOP(long long) int n = 0; - ssize_t nwrote; - /* Allocate all global buffers (for DYN_ALLOC option) */ gunzip_window = xmalloc(GUNZIP_WSIZE); gunzip_outbuf_count = 0; gunzip_bytes_out = 0; - gunzip_src_fd = in; /* (re) initialize state */ method = -1; @@ -994,6 +987,31 @@ inflate_unzip_internal(STATE_PARAM int in, int out) gunzip_crc = ~0; error_msg = corrupted data; +} + +static void inflate_store_unused(STATE_PARAM_ONLY) +{ + /* Store unused bytes in a global buffer so calling applets can access it */ + if (gunzip_bk = 8) { + /* Undo too much lookahead. The next read will be byte aligned +* so we can discard unused bits in the last meaningful byte. */ + bytebuffer_offset--; + bytebuffer[bytebuffer_offset] = gunzip_bb 0xff; + gunzip_bb = 8; + gunzip_bk -= 8; + } +} + +/* Called from unpack_gz_stream() and inflate_unzip() */ +static IF_DESKTOP(long long) int +inflate_unzip_internal(STATE_PARAM int in, int out) +{ + IF_DESKTOP(long long) int n = 0; + ssize_t nwrote; + + gunzip_src_fd = in; + inflate_init(PASS_STATE_ONLY
[PATCH] Add a gzip fastpath for the xmalloc readers
Hi, The performance and number of processes for a depmod -a with gzipped modules was abysmal. This patch adds a fast path without fork for well- behaved gzip files, benefiting all users of xmalloc_open_zipped_read_close. modinfo radeon.ko.gz, a single-file reader, got 30% faster. depmod -a, which used to fork over 800 times, got 20% faster. And of course a whole lot less processes - much saved RAM. function old new delta inflate_get_next_window-1877 +1877 xmalloc_unpack_gz - 356+356 check_header_gzip - 298+298 xmalloc_inflate_unzip_internal - 223+223 inflate_init - 97 +97 xmalloc_open_zipped_read_close73 159 +86 inflate_store_unused - 35 +35 unpack_gz_stream 567 299-268 inflate_unzip_internal 2304 172 -2132 -- (add/remove: 6/0 grow/shrink: 1/2 up/down: 2972/-2400)Total: 572 bytes -- It's currently guarded by CONFIG_DESKTOP. If you'd like a new config option instead, please say so. - Lauri -- http://www.fastmail.com - Choose from over 50 domains or use your own From 4eaee06a748ce07b004b40aa10fa1bcc3e5e8928 Mon Sep 17 00:00:00 2001 From: Lauri Kasanen cur...@operamail.com Date: Thu, 27 Nov 2014 14:48:17 +0200 Subject: [PATCH] Add a gzip fastpath for the xmalloc readers The performance and number of processes for a depmod -a with gzipped modules was abysmal. This patch adds a fast path without fork for well- behaved gzip files, benefiting all users of xmalloc_open_zipped_read_close. modinfo radeon.ko.gz, a single-file reader, got 30% faster. depmod -a, which used to fork over 800 times, got 20% faster. And of course a whole lot less processes - much saved RAM. function old new delta inflate_get_next_window-1877 +1877 xmalloc_unpack_gz - 356+356 check_header_gzip - 298+298 xmalloc_inflate_unzip_internal - 223+223 inflate_init - 97 +97 xmalloc_open_zipped_read_close73 159 +86 inflate_store_unused - 35 +35 unpack_gz_stream 567 299-268 inflate_unzip_internal 2304 172 -2132 -- (add/remove: 6/0 grow/shrink: 1/2 up/down: 2972/-2400)Total: 572 bytes Signed-off-by: Lauri Kasanen cur...@operamail.com --- archival/libarchive/decompress_gunzip.c | 155 archival/libarchive/open_transformer.c | 16 include/bb_archive.h| 1 + 3 files changed, 156 insertions(+), 16 deletions(-) diff --git a/archival/libarchive/decompress_gunzip.c b/archival/libarchive/decompress_gunzip.c index 7c6f38e..1f68ebd 100644 --- a/archival/libarchive/decompress_gunzip.c +++ b/archival/libarchive/decompress_gunzip.c @@ -968,19 +968,12 @@ static int inflate_get_next_window(STATE_PARAM_ONLY) /* Doesnt get here */ } - -/* Called from unpack_gz_stream() and inflate_unzip() */ -static IF_DESKTOP(long long) int -inflate_unzip_internal(STATE_PARAM int in, int out) +static void inflate_init(STATE_PARAM_ONLY) { - IF_DESKTOP(long long) int n = 0; - ssize_t nwrote; - /* Allocate all global buffers (for DYN_ALLOC option) */ gunzip_window = xmalloc(GUNZIP_WSIZE); gunzip_outbuf_count = 0; gunzip_bytes_out = 0; - gunzip_src_fd = in; /* (re) initialize state */ method = -1; @@ -994,6 +987,31 @@ inflate_unzip_internal(STATE_PARAM int in, int out) gunzip_crc = ~0; error_msg = corrupted data; +} + +static void inflate_store_unused(STATE_PARAM_ONLY) +{ + /* Store unused bytes in a global buffer so calling applets can access it */ + if (gunzip_bk = 8) { + /* Undo too much lookahead. The next read will be byte aligned +* so we can discard unused bits in the last meaningful byte. */ + bytebuffer_offset--; + bytebuffer[bytebuffer_offset] = gunzip_bb 0xff; + gunzip_bb = 8; + gunzip_bk -= 8; + } +} + +/* Called from unpack_gz_stream() and inflate_unzip() */ +static IF_DESKTOP(long long) int +inflate_unzip_internal(STATE_PARAM int in, int out) +{ + IF_DESKTOP(long long) int n = 0; + ssize_t nwrote
Re: [PATCH] Add a gzip fastpath for the xmalloc readers
--- a/archival/libarchive/open_transformer.c +++ b/archival/libarchive/open_transformer.c @@ -213,6 +213,22 @@ void* FAST_FUNC xmalloc_open_zipped_read_close(const char *fname, size_t *maxsz_ int fd; char *image; + /* Fast path for well-behaved gzip files, avoiding forks. */ + if (ENABLE_FEATURE_SEAMLESS_GZ ENABLE_DESKTOP BB_MMU) { + uint16_t magic; + fd = open(fname, O_RDONLY); + xread(fd, magic, 2); + are you sure that open will succeed at this place ? re. wh Am 27.11.2014 15:27, schrieb Lauri Kasanen: Hi, The performance and number of processes for a depmod -a with gzipped modules was abysmal. This patch adds a fast path without fork for well- behaved gzip files, benefiting all users of xmalloc_open_zipped_read_close. modinfo radeon.ko.gz, a single-file reader, got 30% faster. depmod -a, which used to fork over 800 times, got 20% faster. And of course a whole lot less processes - much saved RAM. function old new delta inflate_get_next_window-1877 +1877 xmalloc_unpack_gz - 356+356 check_header_gzip - 298+298 xmalloc_inflate_unzip_internal - 223+223 inflate_init - 97 +97 xmalloc_open_zipped_read_close73 159 +86 inflate_store_unused - 35 +35 unpack_gz_stream 567 299-268 inflate_unzip_internal 2304 172 -2132 -- (add/remove: 6/0 grow/shrink: 1/2 up/down: 2972/-2400)Total: 572 bytes -- It's currently guarded by CONFIG_DESKTOP. If you'd like a new config option instead, please say so. - Lauri ___ busybox mailing list busybox@busybox.net http://lists.busybox.net/mailman/listinfo/busybox