Re: [PATCH v3 1/1] eventfd: implementation of EFD_MASK flag

2015-10-22 Thread Damian Hobson-Garcia
Hello,

> diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
> index ff0b981..87de343 100644
> --- a/include/linux/eventfd.h
> +++ b/include/linux/eventfd.h

>  
> -/*
> - * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining
> - * new flags, since they might collide with O_* ones. We want
> - * to re-use O_* flags that couldn't possibly have a meaning
> - * from eventfd, in order to leave a free define-space for
> - * shared O_* flags.
> - */
> -#define EFD_SEMAPHORE (1 << 0)
> -#define EFD_CLOEXEC O_CLOEXEC
> -#define EFD_NONBLOCK O_NONBLOCK
> -
> -#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
> -#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE)
> -
>  struct file;
>  
>  #ifdef CONFIG_EVENTFD
> diff --git a/include/uapi/linux/eventfd.h b/include/uapi/linux/eventfd.h
> new file mode 100644
> index 000..097dcad
> --- /dev/null
> +++ b/include/uapi/linux/eventfd.h
> @@ -0,0 +1,33 @@

> +
> +/*
> + * CAREFUL: Check include/asm-generic/fcntl.h when defining
> + * new flags, since they might collide with O_* ones. We want
> + * to re-use O_* flags that couldn't possibly have a meaning
> + * from eventfd, in order to leave a free define-space for
> + * shared O_* flags.
> + */
> +
> +/* Provide semaphore-like semantics for reads from the eventfd. */
> +#define EFD_SEMAPHORE (1 << 0)
> +/* Provide event mask semantics for the eventfd. */
> +#define EFD_MASK (1 << 1)
> +/*  Set the close-on-exec (FD_CLOEXEC) flag on the eventfd. */
> +#define EFD_CLOEXEC O_CLOEXEC
> +/*  Create the eventfd in non-blocking mode. */
> +#define EFD_NONBLOCK O_NONBLOCK
> +#endif /* _UAPI_LINUX_EVENTFD_H */
> 

Since the latest version of this patch adds only the EFD_MASK definition
to the eventfd header, I was wondering if it was really
necessary/recommended to move the definitions from linux/eventfd.h to
linux/uapi/eventfd.h.  From my understanding, the EFD_SEMAPHORE (and now
EFD_MASK) define(s) are provided to user space from the libc headers
only. Any advice would be greatly appreciated.

Thank you,
Damian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/1] eventfd: implementation of EFD_MASK flag

2015-10-14 Thread Damian Hobson-Garcia
From: Martin Sustrik <sust...@250bpm.com>

When implementing network protocols in user space, one has to implement
fake file descriptors to represent the sockets for the protocol.

Polling on such fake file descriptors is a problem (poll/select/epoll
accept only true file descriptors) and forces protocol implementers to use
various workarounds resulting in complex, non-standard and convoluted APIs.

More generally, ability to create full-blown file descriptors for
userspace-to-userspace signalling is missing. While eventfd(2) goes half
the way towards this goal it has follwoing shorcomings:

I.  There's no way to signal POLLPRI, POLLHUP etc.
II. There's no way to signal arbitrary combination of POLL* flags. Most
notably, simultaneous !POLLIN and !POLLOUT, which is a perfectly valid
combination for a network protocol (rx buffer is empty and tx buffer is
full), cannot be signaled using eventfd.

This patch implements new EFD_MASK flag which solves the above problems.

The semantics of EFD_MASK are as follows:

eventfd(2):

If eventfd is created with EFD_MASK flag set, it is initialised in such a
way as to signal no events on the file descriptor when it is polled on.
The 'initval' argument is ignored.

write(2):

User is allowed to write only buffers containing a 32-bit value
representing any combination of event flags as defined by the poll(2)
function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.). Specified events
will be signaled when polling (select, poll, epoll) on the eventfd is
done later on.

read(2):

read is not supported and will fail with EINVAL.

select(2), poll(2) and similar:

When polling on the eventfd marked by EFD_MASK flag, all the events
specified in last written event flags shall be signaled.

Signed-off-by: Martin Sustrik <sust...@250bpm.com>

[dhobs...@igel.co.jp: Rebased, and resubmitted for Linux 4.3]
Signed-off-by: Damian Hobson-Garcia <dhobs...@igel.co.jp>
---
 fs/eventfd.c | 102 ++-
 include/linux/eventfd.h  |  16 +--
 include/uapi/linux/eventfd.h |  33 ++
 3 files changed, 126 insertions(+), 25 deletions(-)
 create mode 100644 include/uapi/linux/eventfd.h

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 8d0c0df..1310779 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -2,6 +2,7 @@
  *  fs/eventfd.c
  *
  *  Copyright (C) 2007  Davide Libenzi <davi...@xmailserver.org>
+ *  Copyright (C) 2013  Martin Sustrik <sust...@250bpm.com>
  *
  */
 
@@ -22,18 +23,31 @@
 #include 
 #include 
 
+#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
+#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_MASK)
+#define EFD_MASK_VALID_EVENTS (POLLIN | POLLPRI | POLLOUT | POLLERR | POLLHUP)
+
 struct eventfd_ctx {
struct kref kref;
wait_queue_head_t wqh;
-   /*
-* Every time that a write(2) is performed on an eventfd, the
-* value of the __u64 being written is added to "count" and a
-* wakeup is performed on "wqh". A read(2) will return the "count"
-* value to userspace, and will reset "count" to zero. The kernel
-* side eventfd_signal() also, adds to the "count" counter and
-* issue a wakeup.
-*/
-   __u64 count;
+   union {
+   /*
+* Every time that a write(2) is performed on an eventfd, the
+* value of the __u64 being written is added to "count" and a
+* wakeup is performed on "wqh". A read(2) will return the
+* "count" value to userspace, and will reset "count" to zero.
+* The kernel side eventfd_signal() also, adds to the "count"
+* counter and issue a wakeup.
+*/
+   __u64 count;
+
+   /*
+* When using eventfd in EFD_MASK mode this stracture stores the
+* current events to be signaled on the eventfd (events member)
+* along with opaque user-defined data (data member).
+*/
+   __u32 events;
+   };
unsigned int flags;
 };
 
@@ -134,6 +148,14 @@ static unsigned int eventfd_poll(struct file *file, 
poll_table *wait)
return events;
 }
 
+static unsigned int eventfd_mask_poll(struct file *file, poll_table *wait)
+{
+   struct eventfd_ctx *ctx = file->private_data;
+
+   poll_wait(file, >wqh, wait);
+   return ctx->events;
+}
+
 static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
 {
*cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
@@ -239,6 +261,14 @@ static ssize_t eventfd_read(struct file *file, char __user 
*buf, size_t count,
return put_user(cnt, (__u64 __user *) buf) ? -EFAULT : sizeof(cnt);
 }
 
+static ssize_t eventfd_mask_read(struct file *file, char __user *buf,
+ 

[PATCH v3 0/1] Generalize poll events from eventfd

2015-10-14 Thread Damian Hobson-Garcia
Using eventfd user space can generate POLLIN/POLLOUT events but some
applications may want to generate POLLPRI/POLLERR events as well.
This patch submission aims to generalize the events generated by an
eventfd. This is a resubmission of a patch from Feb 2013[1]. The original
discussion trailed off without any conclusion, but the original author
has recently confirmed[2] that this functionality is still useful, so I
volunteered to rebase and resubmit the patch for discussion.

[1] https://lkml.org/lkml/2013/2/18/147
[2] https://lkml.org/lkml/2015/7/9/153

Changes in v3
-
* replace efd_mask structure with scalar 'events' variable.

Changes in v2
-

* rebased on Linux v4.3-rc1
* Move file operation implementations for EFD_MASK to a seperate structure
* Remove 'data' element from efd_mask structure
* read() is no longer supported when EFD_MASK is set (fails with EINVAL)
* eventfd_ctx_fileget() now returns EINVAL when EFD_MASK is set, eliminating
  the possibility of triggering the orginal BUG_ON() macros which have now
  been removed.

Thank you,
Damian

Martin Sustrik (1):
  eventfd: implementation of EFD_MASK flag

 fs/eventfd.c | 91 ++--
 include/linux/eventfd.h  | 16 +---
 include/uapi/linux/eventfd.h | 40 +++
 3 files changed, 121 insertions(+), 26 deletions(-)
 create mode 100644 include/uapi/linux/eventfd.h

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/1] eventfd: implementation of EFD_MASK flag

2015-09-16 Thread Damian Hobson-Garcia
From: Martin Sustrik <sust...@250bpm.com>

When implementing network protocols in user space, one has to implement
fake file descriptors to represent the sockets for the protocol.

Polling on such fake file descriptors is a problem (poll/select/epoll
accept only true file descriptors) and forces protocol implementers to use
various workarounds resulting in complex, non-standard and convoluted APIs.

More generally, ability to create full-blown file descriptors for
userspace-to-userspace signalling is missing. While eventfd(2) goes half
the way towards this goal it has follwoing shorcomings:

I.  There's no way to signal POLLPRI, POLLHUP etc.
II. There's no way to signal arbitrary combination of POLL* flags. Most
notably, simultaneous !POLLIN and !POLLOUT, which is a perfectly valid
combination for a network protocol (rx buffer is empty and tx buffer is
full), cannot be signaled using eventfd.

This patch implements new EFD_MASK flag which solves the above problems.

Additionally, to provide a way to associate user-space state with eventfd
object, it allows to attach user-space data to the file descriptor.

The semantics of EFD_MASK are as follows:

eventfd(2):

If eventfd is created with EFD_MASK flag set, it is initialised in such a
way as to signal no events on the file descriptor when it is polled on.
The 'initval' argument is ignored.

write(2):

User is allowed to write only buffers containing the following structure:

struct efd_mask {
  uint32_t events;
};

The value of 'events' should be any combination of event flags as defined
by poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.) Specified
events will be signaled when polling (select, poll, epoll) on the eventfd
is done later on.

read(2):

read is not supported and will fail with EINVAL.

select(2), poll(2) and similar:

When polling on the eventfd marked by EFD_MASK flag, all the events
specified in last written 'events' field shall be signaled.

Signed-off-by: Martin Sustrik <sust...@250bpm.com>

[dhobs...@igel.co.jp: Rebased, and resubmitted for Linux 4.3]
Signed-off-by: Damian Hobson-Garcia <dhobs...@igel.co.jp>
---
 fs/eventfd.c | 102 ++-
 include/linux/eventfd.h  |  16 +--
 include/uapi/linux/eventfd.h |  39 +
 3 files changed, 132 insertions(+), 25 deletions(-)
 create mode 100644 include/uapi/linux/eventfd.h

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 8d0c0df..1a6a066 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -2,6 +2,7 @@
  *  fs/eventfd.c
  *
  *  Copyright (C) 2007  Davide Libenzi <davi...@xmailserver.org>
+ *  Copyright (C) 2013  Martin Sustrik <sust...@250bpm.com>
  *
  */
 
@@ -22,18 +23,31 @@
 #include 
 #include 
 
+#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
+#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_MASK)
+#define EFD_MASK_VALID_EVENTS (POLLIN | POLLPRI | POLLOUT | POLLERR | POLLHUP)
+
 struct eventfd_ctx {
struct kref kref;
wait_queue_head_t wqh;
-   /*
-* Every time that a write(2) is performed on an eventfd, the
-* value of the __u64 being written is added to "count" and a
-* wakeup is performed on "wqh". A read(2) will return the "count"
-* value to userspace, and will reset "count" to zero. The kernel
-* side eventfd_signal() also, adds to the "count" counter and
-* issue a wakeup.
-*/
-   __u64 count;
+   union {
+   /*
+* Every time that a write(2) is performed on an eventfd, the
+* value of the __u64 being written is added to "count" and a
+* wakeup is performed on "wqh". A read(2) will return the
+* "count" value to userspace, and will reset "count" to zero.
+* The kernel side eventfd_signal() also, adds to the "count"
+* counter and issue a wakeup.
+*/
+   __u64 count;
+
+   /*
+* When using eventfd in EFD_MASK mode this stracture stores the
+* current events to be signaled on the eventfd (events member)
+* along with opaque user-defined data (data member).
+*/
+   struct efd_mask mask;
+   };
unsigned int flags;
 };
 
@@ -134,6 +148,14 @@ static unsigned int eventfd_poll(struct file *file, 
poll_table *wait)
return events;
 }
 
+static unsigned int eventfd_mask_poll(struct file *file, poll_table *wait)
+{
+   struct eventfd_ctx *ctx = file->private_data;
+
+   poll_wait(file, >wqh, wait);
+   return ctx->mask.events;
+}
+
 static void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt)
 {
*cnt = (ctx->flags & EFD_SEMAPHORE) ? 1 : ctx->count;
@@ -239,6 +261,14 @@ stat

[PATCH v2 0/1] Generalize poll events from eventfd

2015-09-16 Thread Damian Hobson-Garcia
Using eventfd user space can generate POLLIN/POLLOUT events but some
applications may want to generate POLLPRI/POLLERR events as well.
This patch submission aims to generalize the events generated by an
eventfd. This is a resubmission of a patch from Feb 2013[1]. The original
discussion trailed off without any conclusion, but the original author
has recently confirmed[2] that this functionality is still useful, so I
volunteered to rebase and resubmit the patch for discussion.

[1] https://lkml.org/lkml/2013/2/18/147
[2] https://lkml.org/lkml/2015/7/9/153

Changes in v2
-

* rebased on Linux v4.3-rc1
* Move file operation implementations for EFD_MASK to a seperate structure
* Remove 'data' element from efd_mask structure
* read() is no longer supported when EFD_MASK is set (fails with EINVAL)
* eventfd_ctx_fileget() now returns EINVAL when EFD_MASK is set, eliminating
  the possibility of triggering the orginal BUG_ON() macros which have now
  been removed.

Thank you,
Damian

Martin Sustrik (1):
  eventfd: implementation of EFD_MASK flag

 fs/eventfd.c | 91 ++--
 include/linux/eventfd.h  | 16 +---
 include/uapi/linux/eventfd.h | 40 +++
 3 files changed, 121 insertions(+), 26 deletions(-)
 create mode 100644 include/uapi/linux/eventfd.h

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/1] eventfd: implementation of EFD_MASK flag

2015-09-16 Thread Damian Hobson-Garcia
Hi Martin,

On 2015-09-16 3:51 PM, Martin Sustrik wrote:
> On 2015-09-16 08:27, Damian Hobson-Garcia wrote:
>>
>> Additionally, to provide a way to associate user-space state with eventfd
>> object, it allows to attach user-space data to the file descriptor.
> 
> The above paragraph is a leftover from the past. The functionality no
> longer exist.
> 
Oops, I forgot to delete that part. I'll get rid of it.

>>
>> The semantics of EFD_MASK are as follows:
>>
>> eventfd(2):
>>
>> If eventfd is created with EFD_MASK flag set, it is initialised in such a
>> way as to signal no events on the file descriptor when it is polled on.
>> The 'initval' argument is ignored.
>>
>> write(2):
>>
>> User is allowed to write only buffers containing the following structure:
>>
>> struct efd_mask {
>>   uint32_t events;
>> };
> 
> Is it worth having a struct here? Why not just uint32_t?
As it stands right now, no, the struct doesn't really add anything.
uint32_t should be just fine.
> 
> Martin

Damian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] eventfd: implementation of EFD_MASK flag

2015-08-11 Thread Damian Hobson-Garcia


On 2015-08-11 6:16 AM, Martin Sustrik wrote:
 On 2015-08-10 10:57, Damian Hobson-Garcia wrote:
 Hi Martin,

 Thanks for your comments.

 On 2015-08-10 3:39 PM, Martin Sustrik wrote:
 On 2015-08-10 08:23, Damian Hobson-Garcia wrote:
 Replying to my own post, but I had the following comments/questions.
 Martin, if you have any response to my comments I would be very
 happy to
 hear them.

 On 2015-08-10 2:51 PM, Damian Hobson-Garcia wrote:
 From: Martin Sustrik sust...@250bpm.com

 [snip]

 write(2):

 User is allowed to write only buffers containing the following
 structure:

 struct efd_mask {
   __u32 events;
   __u64 data;
 };

 The value of 'events' should be any combination of event flags as
 defined by
 poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.) Specified
 events will
 be signaled when polling (select, poll, epoll) on the eventfd is done
 later on.
 'data' is opaque data that are not interpreted by eventfd object.

 I'm not fully clear on the purpose that the 'data' member serves.  Does
 this opaque handle need to be tied together with this event
 synchronization construct?

 It's a convenience thing. Imagine you are implementing your own file
 descriptor type in user space. You create an EFD_MASK socket and a
 structure that will hold any state that you need for the socket (tx/rx
 buffers and such).

 Now you have two things to pass around. If you want to pass the fd to a
 function, it must have two parameters (fd and pointer to the structure).

 To fix it you can put the fd into the structure. That way there's only
 one thing to pass around (the structure).

 The problem with that approach is when you have generic code that deals
 with file descriptors. For example, a simple poller which accepts a list
 of (fd, callback) pairs and invokes the callback when one of the fds
 signals POLLIN. You can't send a pointer to a structure to such
 function. All you can send is the fd, but then, when the callback is
 invoked, fd is all you have. You have no idea where your state is.

 'data' member allows you to put the pointer to the state to the socket
 itself. Thus, if you have a fd, you can always find out where the
 associated data is by reading the mask structure from the fd.


 Ok, I see what you're saying. I guess that keeping track of the mapping
 between the fd and the struct in user space could be non-trivial if
 there are a large number of active fds that are polling very frequently.
 Wouldn't it be sufficient to just use epoll() in this case though?  It
 already seems to support this kind of thing.
 
 My use case was like this:
 
 int s = mysocket();
 ...
 // myrecv() can get the pointer to the structure
 // without user having to pass it as an argument
 myrecv(s, buf, sizeof(buf));
 
 However, same behaviour can be accomplished by simply keeping
 a static array of pointers in the user space.
 
 So let's cut this part out of the patch.
 

Ok, I'll drop the 'data' member. I could add some padding to the
efd_mask structure so that it is the same size as the 64-bit data size
used when EFD_MASK is not set.



 [snip]

 @@ -55,6 +69,9 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx,
 __u64 n)
  {
 +/* This function should never be used with eventfd in the mask
 mode. */
 +BUG_ON(ctx-flags  EFD_MASK);
 +
 ...
 @@ -158,6 +180,9 @@ int eventfd_ctx_remove_wait_queue(struct
 eventfd_ctx *ctx, wait_queue_t *wait,
  {
 +/* This function should never be used with eventfd in the mask
 mode. */
 +BUG_ON(ctx-flags  EFD_MASK);
 +
 ...
 @@ -188,6 +213,9 @@ ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx,
 int no_wait, __u64 *cnt)
 +/* This function should never be used with eventfd in the mask
 mode. */
 +BUG_ON(ctx-flags  EFD_MASK);
 +

 If eventfd_ctx_fileget() returns EINVAL when EFD_MASK is set, I don't
 think that there will be a way to call these functions in the mask
 mode,
 so it should be possible to get rid of the BUG_ON checks.

 Sure. Feel free to do so.


 [snip]
 @@ -230,6 +258,19 @@ static ssize_t eventfd_read(struct file *file,
 char __user *buf, size_t count,
  ssize_t res;
  __u64 cnt;

 +if (ctx-flags  EFD_MASK) {
 +struct efd_mask mask;
 +
 +if (count  sizeof(mask))
 +return -EINVAL;
 +spin_lock_irq(ctx-wqh.lock);
 +mask = ctx-mask;
 +spin_unlock_irq(ctx-wqh.lock);
 +if (copy_to_user(buf, mask, sizeof(mask)))
 +return -EFAULT;
 +return sizeof(mask);
 +}
 +

 For the other eventfd modes, reading the value will update the internal
 state of the eventfd (either clearing or decrementing the counter).
 Should something similar be done here? I'm thinking of a case where a
 process is polling on this fd in a loop. Clearing the efd_mask data  on
 read should provide an easy way for the polling process to know if
 it is
 seeing new poll events.

 No. In this case reading the value has no effect on the state of the fd.
 How it should work is rather:

 // fd is in POLLIN

Re: [PATCH] eventfd: implementation of EFD_MASK flag

2015-08-10 Thread Damian Hobson-Garcia
Hi Martin,

Thanks for your comments.

On 2015-08-10 3:39 PM, Martin Sustrik wrote:
 On 2015-08-10 08:23, Damian Hobson-Garcia wrote:
 Replying to my own post, but I had the following comments/questions.
 Martin, if you have any response to my comments I would be very happy to
 hear them.

 On 2015-08-10 2:51 PM, Damian Hobson-Garcia wrote:
 From: Martin Sustrik sust...@250bpm.com

 [snip]

 write(2):

 User is allowed to write only buffers containing the following
 structure:

 struct efd_mask {
   __u32 events;
   __u64 data;
 };

 The value of 'events' should be any combination of event flags as
 defined by
 poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.) Specified
 events will
 be signaled when polling (select, poll, epoll) on the eventfd is done
 later on.
 'data' is opaque data that are not interpreted by eventfd object.

 I'm not fully clear on the purpose that the 'data' member serves.  Does
 this opaque handle need to be tied together with this event
 synchronization construct?
 
 It's a convenience thing. Imagine you are implementing your own file
 descriptor type in user space. You create an EFD_MASK socket and a
 structure that will hold any state that you need for the socket (tx/rx
 buffers and such).
 
 Now you have two things to pass around. If you want to pass the fd to a
 function, it must have two parameters (fd and pointer to the structure).
 
 To fix it you can put the fd into the structure. That way there's only
 one thing to pass around (the structure).
 
 The problem with that approach is when you have generic code that deals
 with file descriptors. For example, a simple poller which accepts a list
 of (fd, callback) pairs and invokes the callback when one of the fds
 signals POLLIN. You can't send a pointer to a structure to such
 function. All you can send is the fd, but then, when the callback is
 invoked, fd is all you have. You have no idea where your state is.
 
 'data' member allows you to put the pointer to the state to the socket
 itself. Thus, if you have a fd, you can always find out where the
 associated data is by reading the mask structure from the fd.
 

Ok, I see what you're saying. I guess that keeping track of the mapping
between the fd and the struct in user space could be non-trivial if
there are a large number of active fds that are polling very frequently.
Wouldn't it be sufficient to just use epoll() in this case though?  It
already seems to support this kind of thing.


 [snip]

 @@ -55,6 +69,9 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
  {
 +/* This function should never be used with eventfd in the mask
 mode. */
 +BUG_ON(ctx-flags  EFD_MASK);
 +
 ...
 @@ -158,6 +180,9 @@ int eventfd_ctx_remove_wait_queue(struct
 eventfd_ctx *ctx, wait_queue_t *wait,
  {
 +/* This function should never be used with eventfd in the mask
 mode. */
 +BUG_ON(ctx-flags  EFD_MASK);
 +
 ...
 @@ -188,6 +213,9 @@ ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx,
 int no_wait, __u64 *cnt)
 +/* This function should never be used with eventfd in the mask
 mode. */
 +BUG_ON(ctx-flags  EFD_MASK);
 +

 If eventfd_ctx_fileget() returns EINVAL when EFD_MASK is set, I don't
 think that there will be a way to call these functions in the mask mode,
 so it should be possible to get rid of the BUG_ON checks.
 
 Sure. Feel free to do so.
 

 [snip]
 @@ -230,6 +258,19 @@ static ssize_t eventfd_read(struct file *file,
 char __user *buf, size_t count,
  ssize_t res;
  __u64 cnt;

 +if (ctx-flags  EFD_MASK) {
 +struct efd_mask mask;
 +
 +if (count  sizeof(mask))
 +return -EINVAL;
 +spin_lock_irq(ctx-wqh.lock);
 +mask = ctx-mask;
 +spin_unlock_irq(ctx-wqh.lock);
 +if (copy_to_user(buf, mask, sizeof(mask)))
 +return -EFAULT;
 +return sizeof(mask);
 +}
 +

 For the other eventfd modes, reading the value will update the internal
 state of the eventfd (either clearing or decrementing the counter).
 Should something similar be done here? I'm thinking of a case where a
 process is polling on this fd in a loop. Clearing the efd_mask data  on
 read should provide an easy way for the polling process to know if it is
 seeing new poll events.
 
 No. In this case reading the value has no effect on the state of the fd.
 How it should work is rather:
 
 // fd is in POLLIN state
 poll(fd);
 // function exits with POLLIN but fd remains in POLLIN state
 my_recv(fd, buf, size);
 // my_recv function have found out that there's no more data to recv and
 switched off the POLLIN flag
 poll(fd); // we block here waiting for more data to arrive from the network
 

How exactly doe the receiver switch off the POLLIN flag?  Does the
receiver also write to the eventfd? or do you mean that it just doesn't
set POLLIN in the events mask?  It seems cleaner to have the sender only
write the eventfd and the receiver only read it.  Your example would be
exactly the same, except that my_recv(fd

Re: [PATCH] eventfd: implementation of EFD_MASK flag

2015-08-10 Thread Damian Hobson-Garcia
Replying to my own post, but I had the following comments/questions.
Martin, if you have any response to my comments I would be very happy to
hear them.

On 2015-08-10 2:51 PM, Damian Hobson-Garcia wrote:
 From: Martin Sustrik sust...@250bpm.com
 
[snip]
 
 write(2):
 
 User is allowed to write only buffers containing the following structure:
 
 struct efd_mask {
   __u32 events;
   __u64 data;
 };
 
 The value of 'events' should be any combination of event flags as defined by
 poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.) Specified events 
 will
 be signaled when polling (select, poll, epoll) on the eventfd is done later 
 on.
 'data' is opaque data that are not interpreted by eventfd object.
 
I'm not fully clear on the purpose that the 'data' member serves.  Does
this opaque handle need to be tied together with this event
synchronization construct?

[snip]

 @@ -55,6 +69,9 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
  {
 + /* This function should never be used with eventfd in the mask mode. */
 + BUG_ON(ctx-flags  EFD_MASK);
 +
...
 @@ -158,6 +180,9 @@ int eventfd_ctx_remove_wait_queue(struct eventfd_ctx 
 *ctx, wait_queue_t *wait,
  {
 + /* This function should never be used with eventfd in the mask mode. */
 + BUG_ON(ctx-flags  EFD_MASK);
 +
...
 @@ -188,6 +213,9 @@ ssize_t eventfd_ctx_read(struct eventfd_ctx *ctx, int 
 no_wait, __u64 *cnt)
 + /* This function should never be used with eventfd in the mask mode. */
 + BUG_ON(ctx-flags  EFD_MASK);
 +

If eventfd_ctx_fileget() returns EINVAL when EFD_MASK is set, I don't
think that there will be a way to call these functions in the mask mode,
so it should be possible to get rid of the BUG_ON checks.

[snip]
 @@ -230,6 +258,19 @@ static ssize_t eventfd_read(struct file *file, char 
 __user *buf, size_t count,
   ssize_t res;
   __u64 cnt;
  
 + if (ctx-flags  EFD_MASK) {
 + struct efd_mask mask;
 +
 + if (count  sizeof(mask))
 + return -EINVAL;
 + spin_lock_irq(ctx-wqh.lock);
 + mask = ctx-mask;
 + spin_unlock_irq(ctx-wqh.lock);
 + if (copy_to_user(buf, mask, sizeof(mask)))
 + return -EFAULT;
 + return sizeof(mask);
 + }
 +

For the other eventfd modes, reading the value will update the internal
state of the eventfd (either clearing or decrementing the counter).
Should something similar be done here? I'm thinking of a case where a
process is polling on this fd in a loop. Clearing the efd_mask data  on
read should provide an easy way for the polling process to know if it is
seeing new poll events.

 @@ -292,8 +351,13 @@ static void eventfd_show_fdinfo(struct seq_file *m, 
 struct file *f)
   struct eventfd_ctx *ctx = f-private_data;
  
   spin_lock_irq(ctx-wqh.lock);
 - seq_printf(m, eventfd-count: %16llx\n,
 -(unsigned long long)ctx-count);
 + if (ctx-flags  EFD_MASK) {
 + seq_printf(m, eventfd-mask: %x\n,
 +  (unsigned)ctx-mask.events);
 + } else {
 + seq_printf(m, eventfd-count: %16llx\n,
 +  (unsigned long long)ctx-count);
 + }
   spin_unlock_irq(ctx-wqh.lock);
  }
I think that putting the EFD_MASK functionality into a different fops
structure might be useful for reducing the number of if statements.

Thank you,
Damian
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] eventfd: implementation of EFD_MASK flag

2015-08-09 Thread Damian Hobson-Garcia
From: Martin Sustrik sust...@250bpm.com

When implementing network protocols in user space, one has to implement
fake file descriptors to represent the sockets for the protocol.

Polling on such fake file descriptors is a problem (poll/select/epoll accept
only true file descriptors) and forces protocol implementers to use various
workarounds resulting in complex, non-standard and convoluted APIs.

More generally, ability to create full-blown file descriptors for
userspace-to-userspace signalling is missing. While eventfd(2) goes half the way
towards this goal it has follwoing shorcomings:

I.  There's no way to signal POLLPRI, POLLHUP etc.
II. There's no way to signal arbitrary combination of POLL* flags. Most notably,
simultaneous !POLLIN and !POLLOUT, which is a perfectly valid combination
for a network protocol (rx buffer is empty and tx buffer is full), cannot be
signaled using eventfd.

This patch implements new EFD_MASK flag which solves the above problems.

Additionally, to provide a way to associate user-space state with eventfd
object, it allows to attach user-space data to the file descriptor.

The semantics of EFD_MASK are as follows:

eventfd(2):

If eventfd is created with EFD_MASK flag set, it is initialised in such a way
as to signal no events on the file descriptor when it is polled on. 'initval'
argument is ignored.

write(2):

User is allowed to write only buffers containing the following structure:

struct efd_mask {
  __u32 events;
  __u64 data;
};

The value of 'events' should be any combination of event flags as defined by
poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.) Specified events will
be signaled when polling (select, poll, epoll) on the eventfd is done later on.
'data' is opaque data that are not interpreted by eventfd object.

read(2):

User is allowed to read an efd_mask structure from the eventfd marked by
EFD_MASK. Returned value shall be the last one written to the eventfd.

select(2), poll(2) and similar:

When polling on the eventfd marked by EFD_MASK flag, all the events specified
in last written 'events' field shall be signaled.

Signed-off-by: Martin Sustrik sust...@250bpm.com
---
 fs/eventfd.c | 93 ++--
 include/linux/eventfd.h  | 16 +---
 include/uapi/linux/eventfd.h | 40 +++
 3 files changed, 123 insertions(+), 26 deletions(-)
 create mode 100644 include/uapi/linux/eventfd.h

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 8d0c0df..11fb92a 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -2,6 +2,7 @@
  *  fs/eventfd.c
  *
  *  Copyright (C) 2007  Davide Libenzi davi...@xmailserver.org
+ *  Copyright (C) 2013  Martin Sustrik sust...@250bpm.com
  *
  */
 
@@ -22,18 +23,31 @@
 #include linux/proc_fs.h
 #include linux/seq_file.h
 
+#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
+#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_MASK)
+#define EFD_MASK_VALID_EVENTS (POLLIN | POLLPRI | POLLOUT | POLLERR | POLLHUP)
+
 struct eventfd_ctx {
struct kref kref;
wait_queue_head_t wqh;
-   /*
-* Every time that a write(2) is performed on an eventfd, the
-* value of the __u64 being written is added to count and a
-* wakeup is performed on wqh. A read(2) will return the count
-* value to userspace, and will reset count to zero. The kernel
-* side eventfd_signal() also, adds to the count counter and
-* issue a wakeup.
-*/
-   __u64 count;
+   union {
+   /*
+* Every time that a write(2) is performed on an eventfd, the
+* value of the __u64 being written is added to count and a
+* wakeup is performed on wqh. A read(2) will return the
+* count value to userspace, and will reset count to zero.
+* The kernel side eventfd_signal() also, adds to the count
+* counter and issue a wakeup.
+*/
+   __u64 count;
+
+   /*
+* When using eventfd in EFD_MASK mode this stracture stores the
+* current events to be signaled on the eventfd (events member)
+* along with opaque user-defined data (data member).
+*/
+   struct efd_mask mask;
+   };
unsigned int flags;
 };
 
@@ -55,6 +69,9 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
 {
unsigned long flags;
 
+   /* This function should never be used with eventfd in the mask mode. */
+   BUG_ON(ctx-flags  EFD_MASK);
+
spin_lock_irqsave(ctx-wqh.lock, flags);
if (ULLONG_MAX - ctx-count  n)
n = ULLONG_MAX - ctx-count;
@@ -124,6 +141,11 @@ static unsigned int eventfd_poll(struct file *file, 
poll_table *wait)
smp_rmb();
count = ctx-count;
 
+   if (ctx-flags  EFD_MASK) {
+   events = ctx-mask.events;
+   return 

[RFC/PATCH] Generalize poll events from eventfd

2015-08-09 Thread Damian Hobson-Garcia
Hello all,

eventfd is very useful for generating POLLIN/POLLOUT poll events from user
space but some applications may want to generate POLLPRI/POLLERR events as well.
This patch submission aims to generalize the events generated by an
eventfd. This is a resubmission of a patch from Feb 2013[1]. The original
discussion trailed off without any conclusion, but the original author
has recently confirmed[2] that this functionality is still useful, so I
volunteered to rebase and resubmit the patch for discussion.

I have rebased this patch onto v4.2-rc5 and modified the original patch to
not add an indentation level to the implementations of eventfd_{read,write,
poll}.  I hope these changes will make it easier to review. The description of
the efd_mask structure was also updated.

I have a few comments and questions, but I will add those separately as a
reply for open discussion.

[1] https://lkml.org/lkml/2013/2/18/147
[2] https://lkml.org/lkml/2015/7/9/153

Thank you,
Damian

Martin Sustrik (1):
  eventfd: implementation of EFD_MASK flag

 fs/eventfd.c | 91 ++--
 include/linux/eventfd.h  | 16 +---
 include/uapi/linux/eventfd.h | 40 +++
 3 files changed, 121 insertions(+), 26 deletions(-)
 create mode 100644 include/uapi/linux/eventfd.h

-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/1] eventfd: implementation of EFD_MASK flag

2015-07-09 Thread Damian Hobson-Garcia
Hello Martin and all,

I recently came across this (quite old by now) patch submission for an
extension to the functionality of eventfd and I noticed that the
discussion seems to have fizzled out.  Is this functionality still of
use for user space network protocols?  It seems like it would be usable
for other notification use cases as well.  In particular I'm thinking of
poll/select/epoll support for a user space video codec via libv4l.
Is there value in re-examining this patch?

Thank you,
Damian

On 2013-02-18 8:34 PM, Martin Sustrik wrote:
 When implementing network protocols in user space, one has to implement
 fake file descriptors to represent the sockets for the protocol.
 
 Polling on such fake file descriptors is a problem (poll/select/epoll accept
 only true file descriptors) and forces protocol implementers to use various
 workarounds resulting in complex, non-standard and convoluted APIs.
 
 More generally, ability to create full-blown file descriptors for
 userspace-to-userspace signalling is missing. While eventfd(2) goes half the 
 way
 towards this goal it has follwoing shorcomings:
 
 I.  There's no way to signal POLLPRI, POLLHUP etc.
 II. There's no way to signal arbitrary combination of POLL* flags. Most 
 notably,
 simultaneous !POLLIN and !POLLOUT, which is a perfectly valid combination
 for a network protocol (rx buffer is empty and tx buffer is full), cannot 
 be
 signaled using eventfd.
 
 This patch implements new EFD_MASK flag which solves the above problems.
 
 Additionally, to provide a way to associate user-space state with eventfd
 object, it allows to attach user-space data to the file descriptor.
 
 The semantics of EFD_MASK are as follows:
 
 eventfd(2):
 
 If eventfd is created with EFD_MASK flag set, it is initialised in such a way
 as to signal no events on the file descriptor when it is polled on. 'initval'
 argument is ignored.
 
 write(2):
 
 User is allowed to write only buffers containing the following structure:
 
 struct efd_mask {
   short events;
   union {
 void*ptr;
 uint32_t u32;
 uint64_t u64;
   };
 };
 
 The value of 'events' should be any combination of event flags as defined by
 poll(2) function (POLLIN, POLLOUT, POLLERR, POLLHUP etc.) Specified events 
 will
 be signaled when polling (select, poll, epoll) on the eventfd is done later 
 on.
 'ptr', 'u32' or 'u64' are opaque data that are not interpreted by eventfd
 object.
 
 read(2):
 
 User is allowed to read an efd_mask structure from the eventfd marked by
 EFD_MASK. Returned value shall be the last one written to the eventfd.
 
 select(2), poll(2) and similar:
 
 When polling on the eventfd marked by EFD_MASK flag, all the events specified
 in last written 'events' field shall be signaled.
 
 Signed-off-by: Martin Sustrik sust...@250bpm.com
 ---
 Following changes were made to the patch since v2:
 
 - eventfd_mask structure renamed to efd_mask to keep user-space prefixes
   consistent
 - efd_mask is __packed for all architectures
 - eventfd.h header file added to uapi; given there wasn't one so far, in
   addition to moving efd_mask there, I've moved also all the eventfd flags 
 that
   are meant to be visible from the user space
 - comment for 'mask' member eventfd_ctx added 
 - synchronisation bugs in eventfd_read fixed
 - several variable declarations moved from the beginning of the function to
   the blocks where they are used
 - changelog text made simpler and more up to the point
 
 There was also a request to explain why this functionality is needed. I've
 written an article explaining the problems user-space network protocol
 implementers face here: http://www.250bpm.com/blog:16
 ---
  fs/eventfd.c |  194 
 --
  include/linux/eventfd.h  |   17 +---
  include/uapi/linux/eventfd.h |   40 +
  3 files changed, 172 insertions(+), 79 deletions(-)
  create mode 100644 include/uapi/linux/eventfd.h
 
 diff --git a/fs/eventfd.c b/fs/eventfd.c
 index 35470d9..8f821b1 100644
 --- a/fs/eventfd.c
 +++ b/fs/eventfd.c
 @@ -2,6 +2,7 @@
   *  fs/eventfd.c
   *
   *  Copyright (C) 2007  Davide Libenzi davi...@xmailserver.org
 + *  Copyright (C) 2013  Martin Sustrik sust...@250bpm.com
   *
   */
  
 @@ -22,18 +23,31 @@
  #include linux/proc_fs.h
  #include linux/seq_file.h
  
 +#define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK)
 +#define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE | EFD_MASK)
 +#define EFD_MASK_VALID_EVENTS (POLLIN | POLLPRI | POLLOUT | POLLERR | 
 POLLHUP)
 +
  struct eventfd_ctx {
   struct kref kref;
   wait_queue_head_t wqh;
 - /*
 -  * Every time that a write(2) is performed on an eventfd, the
 -  * value of the __u64 being written is added to count and a
 -  * wakeup is performed on wqh. A read(2) will return the count
 -  * value to userspace, and will reset count to zero. The kernel
 -  * side eventfd_signal() also, adds to the count counter and
 

Re: [PATCH v3 1/1] eventfd: implementation of EFD_MASK flag

2015-07-09 Thread Damian Hobson-Garcia
Hi Martin,

On 2015-07-09 5:41 PM, Martin Sustrik wrote:
 Hi Damian,
 
 Yes, this patch would be geneally useful for implementing stuff in user
 space that otherwise would have to live in kernelspace.
 
 Unfortunately, I have no cycles left to pursue getting it to the
 mainline. If you feel like you can take care of it, that would be great.
I'll see what I can do. I'll rebase it and resubmit it.
Unless there are major changes (which I highly doubt), I'll keep you as
the author, if that's ok with you.
 
 I can help with the documentation, if needed.
Thank you, that would be very much appreciated.

Damian
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html