Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()
On Thu, Aug 20, 2020 at 14:36:15 +0200, Michal Privoznik wrote: > On 8/20/20 1:02 PM, Peter Krempa wrote: > > On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote: > > > This function will be used to detect zero buffers (which are > > > going to be interpreted as hole in virStream later). > > > > > > I shamelessly took inspiration from coreutils. > > > > Coreutils is proudly GPLv3 ... > > > > Sure. But it was discussed in v1 and I think we agreed that the algorithm is > generic enough that it can be used in Libvirt too: > > https://www.redhat.com/archives/libvir-list/2020-July/msg00170.html > > > > Signed-off-by: Michal Privoznik > > > --- > > > src/libvirt_private.syms | 1 + > > > src/util/virstring.c | 38 > > > src/util/virstring.h | 2 ++ > > > tests/virstringtest.c| 47 > > > 4 files changed, 88 insertions(+) > > > > [...] > > > > > diff --git a/src/util/virstring.c b/src/util/virstring.c > > > index e9e792f3bf..c26bc770d4 100644 > > > --- a/src/util/virstring.c > > > +++ b/src/util/virstring.c > > > @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool > > > *result) > > > return 0; > > > } > > > + > > > + > > > +/** > > > + * virStringIsNull: > > > > IsNull might indicate that this does a check if the pointer is NULL. You > > are checking for NUL bytes. > > Fair enough. I'm out of ideas though. Do you have a suggestion? > > > > > > + * @buf: buffer to check > > > + * @len: the length of the buffer > > > + * > > > + * For given buffer @buf and its size @len determine whether > > > + * it contains only zero bytes (NUL) or not. > > > > Given the semantics of C strings being terminated by the NUL byte I > > don't think this function qualifies as a string helper and thus should > > probably reside somewhere outside of virstring.h > > Alright. I will try to find better location. But since I want to use this > function from virsh too I'd like to have it in utils. > > > > > > + * > > > + * Returns: true if buffer is full of zero bytes, > > > + * false otherwise. > > > + */ > > > +bool virStringIsNull(const char *buf, size_t len) > > > +{ > > > +const char *p = buf; > > > + > > > +if (!len) > > > +return true; > > > + > > > +/* Check up to 16 first bytes. */ > > > +for (;;) { > > > +if (*p) > > > +return false; > > > + > > > +p++; > > > +len--; > > > + > > > +if (!len) > > > +return true; > > > + > > > +if ((len & 0xf) == 0) > > > +break; > > > +} > > > > Do we really need to do this optimization? We could arguably simplify > > this to: > > > > if (*buf != '\0') > > return false; > > > > return memcmp(buf, buf + 1, len - 1); > > > > You can then use the saved lines to explain that comparing a piece of > > memory with itself shifted by any position just ensures that there are > > repeating sequences of itself in the remainder and by shifting it by 1 > > it means that it checks that the strings are just the same byte. The > > check above then ensuers that the one byte is NUL. > > > > The idea is to pass aligned address to memcmp(). But I guess we can let > memcmp() deal with that. Well, this explanation might justify the algorithm above, but it's certainly not obvious, so please add a comment.
Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()
On 8/20/20 1:02 PM, Peter Krempa wrote: On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote: This function will be used to detect zero buffers (which are going to be interpreted as hole in virStream later). I shamelessly took inspiration from coreutils. Coreutils is proudly GPLv3 ... Sure. But it was discussed in v1 and I think we agreed that the algorithm is generic enough that it can be used in Libvirt too: https://www.redhat.com/archives/libvir-list/2020-July/msg00170.html Signed-off-by: Michal Privoznik --- src/libvirt_private.syms | 1 + src/util/virstring.c | 38 src/util/virstring.h | 2 ++ tests/virstringtest.c| 47 4 files changed, 88 insertions(+) [...] diff --git a/src/util/virstring.c b/src/util/virstring.c index e9e792f3bf..c26bc770d4 100644 --- a/src/util/virstring.c +++ b/src/util/virstring.c @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool *result) return 0; } + + +/** + * virStringIsNull: IsNull might indicate that this does a check if the pointer is NULL. You are checking for NUL bytes. Fair enough. I'm out of ideas though. Do you have a suggestion? + * @buf: buffer to check + * @len: the length of the buffer + * + * For given buffer @buf and its size @len determine whether + * it contains only zero bytes (NUL) or not. Given the semantics of C strings being terminated by the NUL byte I don't think this function qualifies as a string helper and thus should probably reside somewhere outside of virstring.h Alright. I will try to find better location. But since I want to use this function from virsh too I'd like to have it in utils. + * + * Returns: true if buffer is full of zero bytes, + * false otherwise. + */ +bool virStringIsNull(const char *buf, size_t len) +{ +const char *p = buf; + +if (!len) +return true; + +/* Check up to 16 first bytes. */ +for (;;) { +if (*p) +return false; + +p++; +len--; + +if (!len) +return true; + +if ((len & 0xf) == 0) +break; +} Do we really need to do this optimization? We could arguably simplify this to: if (*buf != '\0') return false; return memcmp(buf, buf + 1, len - 1); You can then use the saved lines to explain that comparing a piece of memory with itself shifted by any position just ensures that there are repeating sequences of itself in the remainder and by shifting it by 1 it means that it checks that the strings are just the same byte. The check above then ensuers that the one byte is NUL. The idea is to pass aligned address to memcmp(). But I guess we can let memcmp() deal with that. Michal
Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()
On Thu, Aug 20, 2020 at 13:17:42 +0100, Daniel Berrange wrote: > On Thu, Aug 20, 2020 at 01:02:55PM +0200, Peter Krempa wrote: > > On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote: > > > This function will be used to detect zero buffers (which are > > > going to be interpreted as hole in virStream later). > > > > > > I shamelessly took inspiration from coreutils. > > > > Coreutils is proudly GPLv3 ... > > > > > Signed-off-by: Michal Privoznik > > > --- [...] > > Do we really need to do this optimization? We could arguably simplify > > this to: > > > > if (*buf != '\0') > > return false; > > > > return memcmp(buf, buf + 1, len - 1); > > Depends whether we care about this sparsification having goood > performance or not. As a point of reference, QEMU has invested > tonnes of effort in its impl, using highly tuned impls for > SSE, AVX, etc switched at runtime. Well, comments to other patches in this series question whether we actually want to do explicit sparsification at all. Users always can explicitly sparsify their storage using the qemu tools and the implementation this series adds (the sparsification on read) is IMO actually stretching the semantics of the _SPARSE_STREAM flag too much and creating different behaviour depending on whether a block device is used (full sparsification) or a file is used (sparseness of the file is preserved, explicitly allocated but zeroed blocks are transported in full).
Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()
On Thu, Aug 20, 2020 at 01:02:55PM +0200, Peter Krempa wrote: > On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote: > > This function will be used to detect zero buffers (which are > > going to be interpreted as hole in virStream later). > > > > I shamelessly took inspiration from coreutils. > > Coreutils is proudly GPLv3 ... > > > Signed-off-by: Michal Privoznik > > --- > > src/libvirt_private.syms | 1 + > > src/util/virstring.c | 38 > > src/util/virstring.h | 2 ++ > > tests/virstringtest.c| 47 > > 4 files changed, 88 insertions(+) > > [...] > > > diff --git a/src/util/virstring.c b/src/util/virstring.c > > index e9e792f3bf..c26bc770d4 100644 > > --- a/src/util/virstring.c > > +++ b/src/util/virstring.c > > @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool > > *result) > > > > return 0; > > } > > + > > + > > +/** > > + * virStringIsNull: > > IsNull might indicate that this does a check if the pointer is NULL. You > are checking for NUL bytes. > > > + * @buf: buffer to check > > + * @len: the length of the buffer > > + * > > + * For given buffer @buf and its size @len determine whether > > + * it contains only zero bytes (NUL) or not. > > Given the semantics of C strings being terminated by the NUL byte I > don't think this function qualifies as a string helper and thus should > probably reside somewhere outside of virstring.h > > > + * > > + * Returns: true if buffer is full of zero bytes, > > + * false otherwise. > > + */ > > +bool virStringIsNull(const char *buf, size_t len) > > +{ > > +const char *p = buf; > > + > > +if (!len) > > +return true; > > + > > +/* Check up to 16 first bytes. */ > > +for (;;) { > > +if (*p) > > +return false; > > + > > +p++; > > +len--; > > + > > +if (!len) > > +return true; > > + > > +if ((len & 0xf) == 0) > > +break; > > +} > > Do we really need to do this optimization? We could arguably simplify > this to: > > if (*buf != '\0') > return false; > > return memcmp(buf, buf + 1, len - 1); Depends whether we care about this sparsification having goood performance or not. As a point of reference, QEMU has invested tonnes of effort in its impl, using highly tuned impls for SSE, AVX, etc switched at runtime. Regards, Daniel -- |: https://berrange.com -o-https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o-https://fstop138.berrange.com :| |: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|
Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()
On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote: > This function will be used to detect zero buffers (which are > going to be interpreted as hole in virStream later). > > I shamelessly took inspiration from coreutils. Coreutils is proudly GPLv3 ... > Signed-off-by: Michal Privoznik > --- > src/libvirt_private.syms | 1 + > src/util/virstring.c | 38 > src/util/virstring.h | 2 ++ > tests/virstringtest.c| 47 > 4 files changed, 88 insertions(+) [...] > diff --git a/src/util/virstring.c b/src/util/virstring.c > index e9e792f3bf..c26bc770d4 100644 > --- a/src/util/virstring.c > +++ b/src/util/virstring.c > @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool *result) > > return 0; > } > + > + > +/** > + * virStringIsNull: IsNull might indicate that this does a check if the pointer is NULL. You are checking for NUL bytes. > + * @buf: buffer to check > + * @len: the length of the buffer > + * > + * For given buffer @buf and its size @len determine whether > + * it contains only zero bytes (NUL) or not. Given the semantics of C strings being terminated by the NUL byte I don't think this function qualifies as a string helper and thus should probably reside somewhere outside of virstring.h > + * > + * Returns: true if buffer is full of zero bytes, > + * false otherwise. > + */ > +bool virStringIsNull(const char *buf, size_t len) > +{ > +const char *p = buf; > + > +if (!len) > +return true; > + > +/* Check up to 16 first bytes. */ > +for (;;) { > +if (*p) > +return false; > + > +p++; > +len--; > + > +if (!len) > +return true; > + > +if ((len & 0xf) == 0) > +break; > +} Do we really need to do this optimization? We could arguably simplify this to: if (*buf != '\0') return false; return memcmp(buf, buf + 1, len - 1); You can then use the saved lines to explain that comparing a piece of memory with itself shifted by any position just ensures that there are repeating sequences of itself in the remainder and by shifting it by 1 it means that it checks that the strings are just the same byte. The check above then ensuers that the one byte is NUL.
[PATCH v2 08/17] virstring: Introduce virStringIsNull()
This function will be used to detect zero buffers (which are going to be interpreted as hole in virStream later). I shamelessly took inspiration from coreutils. Signed-off-by: Michal Privoznik --- src/libvirt_private.syms | 1 + src/util/virstring.c | 38 src/util/virstring.h | 2 ++ tests/virstringtest.c| 47 4 files changed, 88 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index ae0e253ab9..1d80aeb833 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3197,6 +3197,7 @@ virStringHasChars; virStringHasControlChars; virStringHasSuffix; virStringIsEmpty; +virStringIsNull; virStringIsPrintable; virStringListAdd; virStringListAutoFree; diff --git a/src/util/virstring.c b/src/util/virstring.c index e9e792f3bf..c26bc770d4 100644 --- a/src/util/virstring.c +++ b/src/util/virstring.c @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool *result) return 0; } + + +/** + * virStringIsNull: + * @buf: buffer to check + * @len: the length of the buffer + * + * For given buffer @buf and its size @len determine whether + * it contains only zero bytes (NUL) or not. + * + * Returns: true if buffer is full of zero bytes, + * false otherwise. + */ +bool virStringIsNull(const char *buf, size_t len) +{ +const char *p = buf; + +if (!len) +return true; + +/* Check up to 16 first bytes. */ +for (;;) { +if (*p) +return false; + +p++; +len--; + +if (!len) +return true; + +if ((len & 0xf) == 0) +break; +} + +/* Now we know first 16 bytes are NUL, memcmp with self. */ +return memcmp(buf, p, len) == 0; +} diff --git a/src/util/virstring.h b/src/util/virstring.h index 360c68395c..d0e54358ac 100644 --- a/src/util/virstring.h +++ b/src/util/virstring.h @@ -185,6 +185,8 @@ int virStringParsePort(const char *str, int virStringParseYesNo(const char *str, bool *result) G_GNUC_WARN_UNUSED_RESULT; +bool virStringIsNull(const char *buf, size_t len); + /** * VIR_AUTOSTRINGLIST: * diff --git a/tests/virstringtest.c b/tests/virstringtest.c index bee49e6cb6..5838a57574 100644 --- a/tests/virstringtest.c +++ b/tests/virstringtest.c @@ -649,6 +649,27 @@ static int testFilterChars(const void *args) return ret; } +struct testIsNulData { +const char *buf; +size_t len; +bool nul; +}; + +static int +testIsNul(const void *args) +{ +const struct testIsNulData *data = args; +int rc; + +rc = virStringIsNull(data->buf, data->len); +if (rc != data->nul) { +fprintf(stderr, "Returned %d, expected %d\n", rc, data->nul); +return -1; +} + +return 0; +} + static int mymain(void) { @@ -977,6 +998,32 @@ mymain(void) TEST_FILTER_CHARS(NULL, NULL, NULL); TEST_FILTER_CHARS("hello 123 hello", "helo", "hellohello"); +#define TEST_IS_NUL(expect, ...) \ +do { \ +const char buf[] = {__VA_ARGS__ }; \ +struct testIsNulData isNulData = { \ +.buf = buf, \ +.len = G_N_ELEMENTS(buf), \ +.nul = expect \ +}; \ +if (virTestRun("isNul check", \ + testIsNul, ) < 0) \ +ret = -1; \ +} while (0) + +TEST_IS_NUL(true); +TEST_IS_NUL(true, +0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00); +TEST_IS_NUL(false, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01); +TEST_IS_NUL(false, +0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, +0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01); +TEST_IS_NUL(false, +0x00, 0x01, 0x00, 0x01, 0x00, 0x01, 0x00, 0x01, +0x00, 0x01, 0x00, 0x01, 0x00, 0x01, 0x00, 0x01, +0x00); + return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE; } -- 2.26.2