Re: [osv-dev] Re: [COMMIT osv master] libc: implement getentropy()

2022-06-28 Thread Nadav Har'El
On Tue, Jun 28, 2022 at 4:40 PM Waldek Kozaczuk 
wrote:

> Excellent! Now I can upgrade to Fedora 36 as well.
>
> Do unit tests build and pass as well?
>

I have to admit I didn't check - just getting "make" and "build
image=rogue" (my favorite image :-)) to work took much longer than I
thought it would
(and resulted in over a dozen patch), so I didn't get around to making the
test pass.

I did try "build image=test" but got errors in building Java, which I lost
track on how it's built, so I didn't continue to fix it
(and also didn't try to work around the Java and just try to build the C++
tests).
Maybe if you have time you can try - I'm sure it will be easier for you to
fix the Java problems.


>
> On Sunday, June 26, 2022 at 6:28:34 AM UTC-4 Commit Bot wrote:
>
>> From: Nadav Har'El 
>> Committer: Nadav Har'El 
>> Branch: master
>>
>> libc: implement getentropy()
>>
>> Starting in gcc commit
>> https://gcc.gnu.org/g:3439657b02869299685d259c3a77aa38714565b7,
>> libstdc++
>> started to use the getentropy() function, so the OSv kernel cannot be
>> linked without implementing it.
>>
>> It's easy to implement getentropy() by calling getrandom().
>>
>> Musl also has its own getentropy.c but it has a silly compilation error
>> (when len=0, it returns an uninitialized "ret") so let's just write
>> one of our own and later reconsider taking Musl's one.
>>
>> After this patch, OSv can finally build on Fedora 36 with Gcc 12.1.1.
>>
>> Fixes #1198.
>>
>> Signed-off-by: Nadav Har'El 
>>
>> ---
>> diff --git a/include/api/unistd.h b/include/api/unistd.h
>> --- a/include/api/unistd.h
>> +++ b/include/api/unistd.h
>> @@ -179,6 +179,7 @@ char *getusershell(void);
>> int acct(const char *);
>> long syscall(long, ...);
>> long __syscall(long, ...);
>> +int getentropy(void *, size_t);
>> #endif
>>
>> #ifdef _GNU_SOURCE
>> diff --git a/libc/random.c b/libc/random.c
>> --- a/libc/random.c
>> +++ b/libc/random.c
>> @@ -1,5 +1,6 @@
>> /*
>> * Copyright (C) 2018 Waldemar Kozaczuk
>> + * Copyright (C) 2022 Nadav Har'El
>> *
>> * This work is open source software, licensed under the terms of the
>> * BSD license as described in the LICENSE file in the top-level
>> directory.
>> @@ -49,3 +50,14 @@ ssize_t getrandom(void *buf, size_t count, unsigned
>> int flags)
>> close(fd);
>> return read;
>> }
>> +
>> +int getentropy(void *buf, size_t len)
>> +{
>> + if (len > 256) {
>> + errno = EIO;
>> + return -1;
>> + } else if (len == 0) {
>> + return 0;
>> + }
>> + return getrandom(buf, len, 0) >= 0;
>> +}
>>
> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to osv-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/36d2b09a-c689-446e-83c0-1222c6d9bd52n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CANEVyjuGDdrGhmBJ%3DrMk0ddzssgiX_9eZ5vmJ07bWWTc1TXQ5Q%40mail.gmail.com.


[osv-dev] [PATCH] syscalls: allocate local iovec copies on the stack instead of the heap

2022-06-28 Thread Waldemar Kozaczuk
This commit is mostly based on the one from the Spirent fork of OSv -
https://github.com/SpirentOrion/osv/commit/29e4d2bbc23d6ddbf8d4b065fc3388c9931e705a
 -
and its original description reads:

"Multiple syscalls used std::vector to manage local iovec copies.  For our
uses cases, this is totally unnecessary overhead and results in 1000's of
small object memory allocations.  So, just use the stack instead."

In nutshell this patch slightly optimizes 4 functions - 2 that are part
of the networking stack and 2 others in VFS layer to - by tweaking them
to use stack instead of malloc()/free() which are relatively constly.
Please note that the objects in question - copies of iovec - are pretty
tiny, typically 16 bytes in size so it does make sense to use stack.

Obviously it is hard to tell without measuring how significant this
change is in terms of performance benefits and what use cases it would
benefit. I did however find two usecases where I could observe some
significant decrease of malloc invocations.

The first is actually the cpiod app used to upload ZFS files during
upload file which seems to be hitting the VFS code path in question.
I this case I could see ~25% drop of malloc/free invocations.

The second use case involved a test misc-tcp.cc which sends and receives
data over socket in multiple threads and seems to be hitting the
networking stack paths. In this case I could see 15-6% drop.

The program was called like so:
./scripts/run.py -e '/tests/misc-tcp.so --remote=192.168.122.1 -c 10 -n 10 -l 5'

with netcat running like so:
ncat -l -k -p  -e /bin/cat

This patch also updates the test makefile to make it build misc-tcp.so
after kernel no longer includes program options. It also slightly
updates the test itself to output some helpful information in progress.

Co-authored-by: "Timmons C. Player" 
Co-authored-by: Waldemar Kozaczuk 
Signed-off-by: Waldemar Kozaczuk 
---
 bsd/sys/kern/uipc_syscalls.cc | 26 --
 fs/vfs/vfs_syscalls.cc| 14 ++
 modules/tests/Makefile|  8 ++--
 tests/misc-tcp.cc | 12 +---
 4 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/bsd/sys/kern/uipc_syscalls.cc b/bsd/sys/kern/uipc_syscalls.cc
index 9f8db3f6..1ae4090e 100644
--- a/bsd/sys/kern/uipc_syscalls.cc
+++ b/bsd/sys/kern/uipc_syscalls.cc
@@ -494,10 +494,12 @@ kern_sendit(int s,
so = (struct socket *)file_data(fp);
 
// Create a local copy of the user's iovec - sosend() is going to 
change it!
-   std::vector uio_iov(mp->msg_iov, mp->msg_iov + mp->msg_iovlen);
+   assert(mp->msg_iovlen <= UIO_MAXIOV);
+   struct iovec uio_iov[mp->msg_iovlen];
+   memcpy(uio_iov, mp->msg_iov, sizeof(uio_iov));
 
-   auio.uio_iov = uio_iov.data();
-   auio.uio_iovcnt = uio_iov.size();
+   auio.uio_iov = uio_iov;
+   auio.uio_iovcnt = mp->msg_iovlen;;
auio.uio_rw = UIO_WRITE;
auio.uio_offset = 0;/* XXX */
auio.uio_resid = 0;
@@ -585,10 +587,12 @@ kern_recvit(int s, struct msghdr *mp, struct mbuf 
**controlp, ssize_t* bytes)
so = (socket*)file_data(fp);
 
// Create a local copy of the user's iovec - sorecieve() is going to 
change it!
-   std::vector uio_iov(mp->msg_iov, mp->msg_iov + mp->msg_iovlen);
+   assert(mp->msg_iovlen <= UIO_MAXIOV);
+   struct iovec uio_iov[mp->msg_iovlen];
+   memcpy(uio_iov, mp->msg_iov, sizeof(uio_iov));
 
-   auio.uio_iov = uio_iov.data();
-   auio.uio_iovcnt = uio_iov.size();
+   auio.uio_iov = uio_iov;
+   auio.uio_iovcnt = mp->msg_iovlen;
auio.uio_rw = UIO_READ;
auio.uio_offset = 0;/* XXX */
auio.uio_resid = 0;
@@ -653,7 +657,7 @@ out:
if (fromsa)
free(fromsa);
 
-   if (error == 0 && controlp != NULL)  
+   if (error == 0 && controlp != NULL)
*controlp = control;
else  if (control)
m_freem(control);
@@ -1038,10 +1042,12 @@ zcopy_tx(int s, struct zmsghdr *zm)
if (so->so_type != SOCK_STREAM)
return (EINVAL);
// Create a local copy of the user's iovec - sosend() is going to 
change it!
-   std::vector uio_iov(mp->msg_iov, mp->msg_iov + mp->msg_iovlen);
+   assert(mp->msg_iovlen <= UIO_MAXIOV);
+   struct iovec uio_iov[mp->msg_iovlen];
+   memcpy(uio_iov, mp->msg_iov, sizeof(uio_iov));
 
-   auio.uio_iov = uio_iov.data();
-   auio.uio_iovcnt = uio_iov.size();
+   auio.uio_iov = uio_iov;
+   auio.uio_iovcnt = mp->msg_iovlen;
auio.uio_rw = UIO_WRITE;
auio.uio_offset = 0;
auio.uio_resid = 0;
diff --git a/fs/vfs/vfs_syscalls.cc b/fs/vfs/vfs_syscalls.cc
index cd0d1745..055a32c7 100644
--- a/fs/vfs/vfs_syscalls.cc
+++ b/fs/vfs/vfs_syscalls.cc
@@ -266,8 +266,11 @@ sys_read(struct file *fp, const struct iovec *iov, size_t 
niov,
 struct uio uio;
 // Unfortunately, the current implement

[osv-dev] [PATCH] virtio-blk: Use multiplex strategy for I/O

2022-06-28 Thread Waldemar Kozaczuk
This allows arbitrarily large block sizes to be used for I/O
requests and matches the behavior of most other block drivers.

This commit is mostly based on the one from the Spirent fork of OSv -
https://github.com/SpirentOrion/osv/commit/45306415040521ac875ec7f6ba0ad6671ea8ad11.

However, it differs slightly by handling the situation when the 
VIRTIO_BLK_F_SEG_MAX
capability is not detected. In this case the driver behaves effectively
as if there was no limit on the size of the request which was the
original behavior.

It is worth noting that we already had a check on the size of the
request in runtime and would return EIO error (see make_request()
method), but this change makes the driver correctly chunk the request
using the multiplexing strategy if the request is larger than the
maximum imposed by the hypervisor.

In practical terms this commit can be tested using the misc-bdev-rw
test which has been adjusted to request reads and writes as large as 2
MB which are higher that the limit on QEMU (the VIRTIO_BLK_F_SEG_MAX is
equal to 254 = eq 1MB - 8K). Before the commit this test would hang
with 'i' equal to 255 and now it behaves correctly.

Finally some hypervisors like Firecracker do not provide the 
VIRTIO_BLK_F_SEG_MAX
capability, in which case our driver behaves as if there was no limit
which is also the old behavior.

Co-authored-by: "Timmons C. Player" 
Co-authored-by: Waldemar Kozaczuk 
Signed-off-by: Waldemar Kozaczuk 
---
 drivers/virtio-blk.cc |  8 ++--
 tests/misc-bdev-rw.cc | 15 +--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio-blk.cc b/drivers/virtio-blk.cc
index e03d41f7..643ca275 100644
--- a/drivers/virtio-blk.cc
+++ b/drivers/virtio-blk.cc
@@ -54,6 +54,7 @@ int blk::_instance = 0;
 
 
 struct blk_priv {
+devop_strategy_t strategy;
 blk* drv;
 };
 
@@ -63,7 +64,6 @@ blk_strategy(struct bio *bio)
 struct blk_priv *prv = reinterpret_cast(bio->bio_dev->private_data);
 
 trace_virtio_blk_strategy(bio);
-bio->bio_offset += bio->bio_dev->offset;
 prv->drv->make_request(bio);
 }
 
@@ -90,7 +90,7 @@ static struct devops blk_devops {
 blk_write,
 no_ioctl,
 no_devctl,
-blk_strategy,
+multiplex_strategy,
 };
 
 struct driver blk_driver = {
@@ -180,8 +180,10 @@ blk::blk(virtio_device& virtio_dev)
 
 dev = device_create(&blk_driver, dev_name.c_str(), D_BLK);
 prv = reinterpret_cast(dev->private_data);
+prv->strategy = blk_strategy;
 prv->drv = this;
 dev->size = prv->drv->size();
+dev->max_io_size = _config.seg_max ? (_config.seg_max - 1) * 
mmu::page_size : UINT_MAX;
 read_partition_table(dev);
 
 debugf("virtio-blk: Add blk device instances %d as %s, devsize=%lld\n", 
_id, dev_name.c_str(), dev->size);
@@ -208,6 +210,8 @@ void blk::read_config()
 if (get_guest_feature_bit(VIRTIO_BLK_F_SEG_MAX)) {
 READ_CONFIGURATION_FIELD(blk_config,seg_max,_config.seg_max)
 trace_virtio_blk_read_config_seg_max(_config.seg_max);
+} else {
+_config.seg_max = 0;
 }
 if (get_guest_feature_bit(VIRTIO_BLK_F_GEOMETRY)) {
 READ_CONFIGURATION_FIELD(blk_config,geometry,_config.geometry)
diff --git a/tests/misc-bdev-rw.cc b/tests/misc-bdev-rw.cc
index ae8eaf64..ce81c1fa 100644
--- a/tests/misc-bdev-rw.cc
+++ b/tests/misc-bdev-rw.cc
@@ -8,6 +8,17 @@
 
 #define MB (1024*1024)
 
+/*
+This test requires a standalone block device (not a one
+actively used by given filesystem) and can be created and
+mounted like so:
+
+dd if=/dev/zero of=/tmp/test1.raw bs=1M count=512
+qemu-img convert -O qcow2 /tmp/test1.raw /tmp/test1.img
+
+./scripts/run.py -e '/tests/misc-bdev-rw.so vblk1' --cloud-init-image 
/tmp/test1.img
+*/
+
 using namespace std;
 
 atomic bio_inflights(0);
@@ -89,7 +100,7 @@ int main(int argc, char const *argv[])
 long written = 0;
 
 //Do all writes
-for(auto i = 1; i < 32; i++)
+for(auto i = 1; i < 511; i++)
 {
 const size_t buff_size = i * memory::page_size;
 
@@ -142,4 +153,4 @@ int main(int argc, char const *argv[])
  << "Test " << (test_failed.load() ? "FAILED" : "PASSED") << endl;
 
 return test_failed.load() ? 1 : 0;
-}
\ No newline at end of file
+}
-- 
2.35.1

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/20220628154711.108483-1-jwkozaczuk%40gmail.com.


[osv-dev] [PATCH] loader: don't leak directory entries when processing init entries

2022-06-28 Thread Waldemar Kozaczuk
This commit originates from the Spirent fork of OSv -
https://github.com/SpirentOrion/osv/commit/502fa31d631bba073e7bb9bc7ce6623e9159dbdd

Authored-by: "Timmons C. Player" 
Reviewed-by: Waldemar Kozaczuk 
Signed-off-by: Waldemar Kozaczuk 
---
 loader.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/loader.cc b/loader.cc
index b98f9681..ee05033b 100644
--- a/loader.cc
+++ b/loader.cc
@@ -607,6 +607,7 @@ void* do_main_thread(void *_main_args)
 for (int i = 0; i < count; i++) {
 if (!strcmp(".", namelist[i]->d_name) ||
 !strcmp("..", namelist[i]->d_name)) {
+free(namelist[i]);
 continue;
 }
 std::string fn("/init/");
-- 
2.35.1

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/20220628140241.97140-1-jwkozaczuk%40gmail.com.


[osv-dev] Re: [COMMIT osv master] libc: implement getentropy()

2022-06-28 Thread Waldek Kozaczuk
Excellent! Now I can upgrade to Fedora 36 as well.

Do unit tests build and pass as well?

On Sunday, June 26, 2022 at 6:28:34 AM UTC-4 Commit Bot wrote:

> From: Nadav Har'El 
> Committer: Nadav Har'El 
> Branch: master
>
> libc: implement getentropy()
>
> Starting in gcc commit
> https://gcc.gnu.org/g:3439657b02869299685d259c3a77aa38714565b7, libstdc++
> started to use the getentropy() function, so the OSv kernel cannot be
> linked without implementing it.
>
> It's easy to implement getentropy() by calling getrandom().
>
> Musl also has its own getentropy.c but it has a silly compilation error
> (when len=0, it returns an uninitialized "ret") so let's just write
> one of our own and later reconsider taking Musl's one.
>
> After this patch, OSv can finally build on Fedora 36 with Gcc 12.1.1.
>
> Fixes #1198.
>
> Signed-off-by: Nadav Har'El 
>
> ---
> diff --git a/include/api/unistd.h b/include/api/unistd.h
> --- a/include/api/unistd.h
> +++ b/include/api/unistd.h
> @@ -179,6 +179,7 @@ char *getusershell(void);
> int acct(const char *);
> long syscall(long, ...);
> long __syscall(long, ...);
> +int getentropy(void *, size_t);
> #endif
>
> #ifdef _GNU_SOURCE
> diff --git a/libc/random.c b/libc/random.c
> --- a/libc/random.c
> +++ b/libc/random.c
> @@ -1,5 +1,6 @@
> /*
> * Copyright (C) 2018 Waldemar Kozaczuk
> + * Copyright (C) 2022 Nadav Har'El
> *
> * This work is open source software, licensed under the terms of the
> * BSD license as described in the LICENSE file in the top-level directory.
> @@ -49,3 +50,14 @@ ssize_t getrandom(void *buf, size_t count, unsigned int 
> flags)
> close(fd);
> return read;
> }
> +
> +int getentropy(void *buf, size_t len)
> +{
> + if (len > 256) {
> + errno = EIO;
> + return -1;
> + } else if (len == 0) {
> + return 0;
> + }
> + return getrandom(buf, len, 0) >= 0;
> +}
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/36d2b09a-c689-446e-83c0-1222c6d9bd52n%40googlegroups.com.