Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-08-20 Thread Peter Krempa
On Thu, Aug 20, 2020 at 14:36:15 +0200, Michal Privoznik wrote:
> On 8/20/20 1:02 PM, Peter Krempa wrote:
> > On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote:
> > > This function will be used to detect zero buffers (which are
> > > going to be interpreted as hole in virStream later).
> > > 
> > > I shamelessly took inspiration from coreutils.
> > 
> > Coreutils is proudly GPLv3 ...
> > 
> 
> Sure. But it was discussed in v1 and I think we agreed that the algorithm is
> generic enough that it can be used in Libvirt too:
> 
> https://www.redhat.com/archives/libvir-list/2020-July/msg00170.html
> 
> > > Signed-off-by: Michal Privoznik 
> > > ---
> > >   src/libvirt_private.syms |  1 +
> > >   src/util/virstring.c | 38 
> > >   src/util/virstring.h |  2 ++
> > >   tests/virstringtest.c| 47 
> > >   4 files changed, 88 insertions(+)
> > 
> > [...]
> > 
> > > diff --git a/src/util/virstring.c b/src/util/virstring.c
> > > index e9e792f3bf..c26bc770d4 100644
> > > --- a/src/util/virstring.c
> > > +++ b/src/util/virstring.c
> > > @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool 
> > > *result)
> > >   return 0;
> > >   }
> > > +
> > > +
> > > +/**
> > > + * virStringIsNull:
> > 
> > IsNull might indicate that this does a check if the pointer is NULL. You
> > are checking for NUL bytes.
> 
> Fair enough. I'm out of ideas though. Do you have a suggestion?
> 
> > 
> > > + * @buf: buffer to check
> > > + * @len: the length of the buffer
> > > + *
> > > + * For given buffer @buf and its size @len determine whether
> > > + * it contains only zero bytes (NUL) or not.
> > 
> > Given the semantics of C strings being terminated by the NUL byte I
> > don't think this function qualifies as a string helper and thus should
> > probably reside somewhere outside of virstring.h
> 
> Alright. I will try to find better location. But since I want to use this
> function from virsh too I'd like to have it in utils.
> 
> > 
> > > + *
> > > + * Returns: true if buffer is full of zero bytes,
> > > + *  false otherwise.
> > > + */
> > > +bool virStringIsNull(const char *buf, size_t len)
> > > +{
> > > +const char *p = buf;
> > > +
> > > +if (!len)
> > > +return true;
> > > +
> > > +/* Check up to 16 first bytes. */
> > > +for (;;) {
> > > +if (*p)
> > > +return false;
> > > +
> > > +p++;
> > > +len--;
> > > +
> > > +if (!len)
> > > +return true;
> > > +
> > > +if ((len & 0xf) == 0)
> > > +break;
> > > +}
> > 
> > Do we really need to do this optimization? We could arguably simplify
> > this to:
> > 
> >  if (*buf != '\0')
> >  return false;
> > 
> >  return memcmp(buf, buf + 1, len - 1);
> > 
> > You can then use the saved lines to explain that comparing a piece of
> > memory with itself shifted by any position just ensures that there are
> > repeating sequences of itself in the remainder and by shifting it by 1
> > it means that it checks that the strings are just the same byte. The
> > check above then ensuers that the one byte is NUL.
> > 
> 
> The idea is to pass aligned address to memcmp(). But I guess we can let
> memcmp() deal with that.

Well, this explanation might justify the algorithm above, but it's
certainly not obvious, so please add a comment.



Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-08-20 Thread Michal Privoznik

On 8/20/20 1:02 PM, Peter Krempa wrote:

On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote:

This function will be used to detect zero buffers (which are
going to be interpreted as hole in virStream later).

I shamelessly took inspiration from coreutils.


Coreutils is proudly GPLv3 ...



Sure. But it was discussed in v1 and I think we agreed that the 
algorithm is generic enough that it can be used in Libvirt too:


https://www.redhat.com/archives/libvir-list/2020-July/msg00170.html


Signed-off-by: Michal Privoznik 
---
  src/libvirt_private.syms |  1 +
  src/util/virstring.c | 38 
  src/util/virstring.h |  2 ++
  tests/virstringtest.c| 47 
  4 files changed, 88 insertions(+)


[...]


diff --git a/src/util/virstring.c b/src/util/virstring.c
index e9e792f3bf..c26bc770d4 100644
--- a/src/util/virstring.c
+++ b/src/util/virstring.c
@@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool *result)
  
  return 0;

  }
+
+
+/**
+ * virStringIsNull:


IsNull might indicate that this does a check if the pointer is NULL. You
are checking for NUL bytes.


Fair enough. I'm out of ideas though. Do you have a suggestion?




+ * @buf: buffer to check
+ * @len: the length of the buffer
+ *
+ * For given buffer @buf and its size @len determine whether
+ * it contains only zero bytes (NUL) or not.


Given the semantics of C strings being terminated by the NUL byte I
don't think this function qualifies as a string helper and thus should
probably reside somewhere outside of virstring.h


Alright. I will try to find better location. But since I want to use 
this function from virsh too I'd like to have it in utils.





+ *
+ * Returns: true if buffer is full of zero bytes,
+ *  false otherwise.
+ */
+bool virStringIsNull(const char *buf, size_t len)
+{
+const char *p = buf;
+
+if (!len)
+return true;
+
+/* Check up to 16 first bytes. */
+for (;;) {
+if (*p)
+return false;
+
+p++;
+len--;
+
+if (!len)
+return true;
+
+if ((len & 0xf) == 0)
+break;
+}


Do we really need to do this optimization? We could arguably simplify
this to:

 if (*buf != '\0')
 return false;

 return memcmp(buf, buf + 1, len - 1);

You can then use the saved lines to explain that comparing a piece of
memory with itself shifted by any position just ensures that there are
repeating sequences of itself in the remainder and by shifting it by 1
it means that it checks that the strings are just the same byte. The
check above then ensuers that the one byte is NUL.



The idea is to pass aligned address to memcmp(). But I guess we can let 
memcmp() deal with that.


Michal



Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-08-20 Thread Peter Krempa
On Thu, Aug 20, 2020 at 13:17:42 +0100, Daniel Berrange wrote:
> On Thu, Aug 20, 2020 at 01:02:55PM +0200, Peter Krempa wrote:
> > On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote:
> > > This function will be used to detect zero buffers (which are
> > > going to be interpreted as hole in virStream later).
> > > 
> > > I shamelessly took inspiration from coreutils.
> > 
> > Coreutils is proudly GPLv3 ...
> > 
> > > Signed-off-by: Michal Privoznik 
> > > ---

[...]

> > Do we really need to do this optimization? We could arguably simplify
> > this to:
> > 
> > if (*buf != '\0')
> > return false;
> > 
> > return memcmp(buf, buf + 1, len - 1);
> 
> Depends whether we care about this sparsification having goood
> performance or not. As a point of reference, QEMU has invested
> tonnes of effort in its impl, using highly tuned impls for
> SSE, AVX, etc switched at runtime.

Well, comments to other patches in this series question whether we
actually want to do explicit sparsification at all.

Users always can explicitly sparsify their storage using the qemu tools
and the implementation this series adds (the sparsification on read) is
IMO actually stretching the semantics of the _SPARSE_STREAM flag too
much and creating different behaviour depending on whether a block
device is used (full sparsification) or a file is used (sparseness of
the file is preserved, explicitly allocated but zeroed blocks are
transported in full).



Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-08-20 Thread Daniel P . Berrangé
On Thu, Aug 20, 2020 at 01:02:55PM +0200, Peter Krempa wrote:
> On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote:
> > This function will be used to detect zero buffers (which are
> > going to be interpreted as hole in virStream later).
> > 
> > I shamelessly took inspiration from coreutils.
> 
> Coreutils is proudly GPLv3 ...
> 
> > Signed-off-by: Michal Privoznik 
> > ---
> >  src/libvirt_private.syms |  1 +
> >  src/util/virstring.c | 38 
> >  src/util/virstring.h |  2 ++
> >  tests/virstringtest.c| 47 
> >  4 files changed, 88 insertions(+)
> 
> [...]
> 
> > diff --git a/src/util/virstring.c b/src/util/virstring.c
> > index e9e792f3bf..c26bc770d4 100644
> > --- a/src/util/virstring.c
> > +++ b/src/util/virstring.c
> > @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool 
> > *result)
> >  
> >  return 0;
> >  }
> > +
> > +
> > +/**
> > + * virStringIsNull:
> 
> IsNull might indicate that this does a check if the pointer is NULL. You
> are checking for NUL bytes.
> 
> > + * @buf: buffer to check
> > + * @len: the length of the buffer
> > + *
> > + * For given buffer @buf and its size @len determine whether
> > + * it contains only zero bytes (NUL) or not.
> 
> Given the semantics of C strings being terminated by the NUL byte I
> don't think this function qualifies as a string helper and thus should
> probably reside somewhere outside of virstring.h
> 
> > + *
> > + * Returns: true if buffer is full of zero bytes,
> > + *  false otherwise.
> > + */
> > +bool virStringIsNull(const char *buf, size_t len)
> > +{
> > +const char *p = buf;
> > +
> > +if (!len)
> > +return true;
> > +
> > +/* Check up to 16 first bytes. */
> > +for (;;) {
> > +if (*p)
> > +return false;
> > +
> > +p++;
> > +len--;
> > +
> > +if (!len)
> > +return true;
> > +
> > +if ((len & 0xf) == 0)
> > +break;
> > +}
> 
> Do we really need to do this optimization? We could arguably simplify
> this to:
> 
> if (*buf != '\0')
> return false;
> 
> return memcmp(buf, buf + 1, len - 1);

Depends whether we care about this sparsification having goood
performance or not. As a point of reference, QEMU has invested
tonnes of effort in its impl, using highly tuned impls for
SSE, AVX, etc switched at runtime.



Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-08-20 Thread Peter Krempa
On Tue, Jul 07, 2020 at 21:46:26 +0200, Michal Privoznik wrote:
> This function will be used to detect zero buffers (which are
> going to be interpreted as hole in virStream later).
> 
> I shamelessly took inspiration from coreutils.

Coreutils is proudly GPLv3 ...

> Signed-off-by: Michal Privoznik 
> ---
>  src/libvirt_private.syms |  1 +
>  src/util/virstring.c | 38 
>  src/util/virstring.h |  2 ++
>  tests/virstringtest.c| 47 
>  4 files changed, 88 insertions(+)

[...]

> diff --git a/src/util/virstring.c b/src/util/virstring.c
> index e9e792f3bf..c26bc770d4 100644
> --- a/src/util/virstring.c
> +++ b/src/util/virstring.c
> @@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool *result)
>  
>  return 0;
>  }
> +
> +
> +/**
> + * virStringIsNull:

IsNull might indicate that this does a check if the pointer is NULL. You
are checking for NUL bytes.

> + * @buf: buffer to check
> + * @len: the length of the buffer
> + *
> + * For given buffer @buf and its size @len determine whether
> + * it contains only zero bytes (NUL) or not.

Given the semantics of C strings being terminated by the NUL byte I
don't think this function qualifies as a string helper and thus should
probably reside somewhere outside of virstring.h

> + *
> + * Returns: true if buffer is full of zero bytes,
> + *  false otherwise.
> + */
> +bool virStringIsNull(const char *buf, size_t len)
> +{
> +const char *p = buf;
> +
> +if (!len)
> +return true;
> +
> +/* Check up to 16 first bytes. */
> +for (;;) {
> +if (*p)
> +return false;
> +
> +p++;
> +len--;
> +
> +if (!len)
> +return true;
> +
> +if ((len & 0xf) == 0)
> +break;
> +}

Do we really need to do this optimization? We could arguably simplify
this to:

if (*buf != '\0')
return false;

return memcmp(buf, buf + 1, len - 1);

You can then use the saved lines to explain that comparing a piece of
memory with itself shifted by any position just ensures that there are
repeating sequences of itself in the remainder and by shifting it by 1
it means that it checks that the strings are just the same byte. The
check above then ensuers that the one byte is NUL.



[PATCH v2 08/17] virstring: Introduce virStringIsNull()

2020-07-07 Thread Michal Privoznik
This function will be used to detect zero buffers (which are
going to be interpreted as hole in virStream later).

I shamelessly took inspiration from coreutils.

Signed-off-by: Michal Privoznik 
---
 src/libvirt_private.syms |  1 +
 src/util/virstring.c | 38 
 src/util/virstring.h |  2 ++
 tests/virstringtest.c| 47 
 4 files changed, 88 insertions(+)

diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index ae0e253ab9..1d80aeb833 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -3197,6 +3197,7 @@ virStringHasChars;
 virStringHasControlChars;
 virStringHasSuffix;
 virStringIsEmpty;
+virStringIsNull;
 virStringIsPrintable;
 virStringListAdd;
 virStringListAutoFree;
diff --git a/src/util/virstring.c b/src/util/virstring.c
index e9e792f3bf..c26bc770d4 100644
--- a/src/util/virstring.c
+++ b/src/util/virstring.c
@@ -1404,3 +1404,41 @@ int virStringParseYesNo(const char *str, bool *result)
 
 return 0;
 }
+
+
+/**
+ * virStringIsNull:
+ * @buf: buffer to check
+ * @len: the length of the buffer
+ *
+ * For given buffer @buf and its size @len determine whether
+ * it contains only zero bytes (NUL) or not.
+ *
+ * Returns: true if buffer is full of zero bytes,
+ *  false otherwise.
+ */
+bool virStringIsNull(const char *buf, size_t len)
+{
+const char *p = buf;
+
+if (!len)
+return true;
+
+/* Check up to 16 first bytes. */
+for (;;) {
+if (*p)
+return false;
+
+p++;
+len--;
+
+if (!len)
+return true;
+
+if ((len & 0xf) == 0)
+break;
+}
+
+/* Now we know first 16 bytes are NUL, memcmp with self.  */
+return memcmp(buf, p, len) == 0;
+}
diff --git a/src/util/virstring.h b/src/util/virstring.h
index 360c68395c..d0e54358ac 100644
--- a/src/util/virstring.h
+++ b/src/util/virstring.h
@@ -185,6 +185,8 @@ int virStringParsePort(const char *str,
 int virStringParseYesNo(const char *str,
 bool *result)
 G_GNUC_WARN_UNUSED_RESULT;
+bool virStringIsNull(const char *buf, size_t len);
+
 /**
  * VIR_AUTOSTRINGLIST:
  *
diff --git a/tests/virstringtest.c b/tests/virstringtest.c
index bee49e6cb6..5838a57574 100644
--- a/tests/virstringtest.c
+++ b/tests/virstringtest.c
@@ -649,6 +649,27 @@ static int testFilterChars(const void *args)
 return ret;
 }
 
+struct testIsNulData {
+const char *buf;
+size_t len;
+bool nul;
+};
+
+static int
+testIsNul(const void *args)
+{
+const struct testIsNulData *data = args;
+int rc;
+
+rc = virStringIsNull(data->buf, data->len);
+if (rc != data->nul) {
+fprintf(stderr, "Returned %d, expected %d\n", rc, data->nul);
+return -1;
+}
+
+return 0;
+}
+
 static int
 mymain(void)
 {
@@ -977,6 +998,32 @@ mymain(void)
 TEST_FILTER_CHARS(NULL, NULL, NULL);
 TEST_FILTER_CHARS("hello 123 hello", "helo", "hellohello");
 
+#define TEST_IS_NUL(expect, ...) \
+do { \
+const char buf[] = {__VA_ARGS__ }; \
+struct testIsNulData isNulData = { \
+.buf = buf, \
+.len = G_N_ELEMENTS(buf), \
+.nul = expect \
+}; \
+if (virTestRun("isNul check", \
+   testIsNul, ) < 0) \
+ret = -1; \
+} while (0)
+
+TEST_IS_NUL(true);
+TEST_IS_NUL(true,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00);
+TEST_IS_NUL(false, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01);
+TEST_IS_NUL(false,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x01);
+TEST_IS_NUL(false,
+0x00, 0x01, 0x00, 0x01, 0x00, 0x01, 0x00, 0x01,
+0x00, 0x01, 0x00, 0x01, 0x00, 0x01, 0x00, 0x01,
+0x00);
+
 return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE;
 }
 
-- 
2.26.2