Re: [PATCH v3] epoll: Support for disabling items, and a self-test app.

2012-11-01 Thread Paton J. Lewis

On 10/31/12 5:43 PM, Michael Wang wrote:

On 11/01/2012 02:57 AM, Paton J. Lewis wrote:

On 10/30/12 11:32 PM, Michael Wang wrote:

On 10/26/2012 08:08 AM, Paton J. Lewis wrote:

From: "Paton J. Lewis" 

It is not currently possible to reliably delete epoll items when
using the
same epoll set from multiple threads. After calling epoll_ctl with
EPOLL_CTL_DEL, another thread might still be executing code related
to an
event for that epoll item (in response to epoll_wait). Therefore the
deleting
thread does not know when it is safe to delete resources pertaining
to the
associated epoll item because another thread might be using those
resources.

The deleting thread could wait an arbitrary amount of time after calling
epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
inefficient and could result in the destruction of resources before
another
thread is done handling an event returned by epoll_wait.

This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which
disables an
epoll item. If epoll_ctl returns -EBUSY in this case, then another
thread may
handling a return from epoll_wait for this item. Otherwise if epoll_ctl
returns 0, then it is safe to delete the epoll item. This allows
multiple
threads to use a mutex to determine when it is safe to delete an
epoll item
and its associated resources, which allows epoll items to be deleted
both
efficiently and without error in a multi-threaded environment. Note that
EPOLL_CTL_DISABLE is only useful in conjunction with EPOLLONESHOT,
and using
EPOLL_CTL_DISABLE on an epoll item without EPOLLONESHOT returns -EINVAL.

This patch also adds a new test_epoll self-test program to both
demonstrate
the need for this feature and test it.


Hi, Paton

I'm just think about may be we could use this way.

Seems like currently we are depending on the epoll_ctl() to indicate the
start point of safe section and epoll_wait() for the end point, like:

  while () {
  epoll_wait()--

  fd event arrivedsafe section

  clear fd epi->event.events
  --
  if (fd need stop)
  continue;
  --
  ...fd data process...

  epoll_ctl(MOD)  danger section

  set fd epi->event.events--

  continue;
  }

So we got a safe section and do delete work in this section won't cause
trouble since we have a stop check directly after it.

Actually what we want is to make sure no one will touch the fd any more
after we DISABLE it.

Then what about we add a ref count and a stop flag in epi, maintain it
like:

  epoll_wait()

  check user events and
  dec the ref count of fd ---

  ...

  fd event arrivedsafe sec if ref count is 0

  if epi stop flag set
  do nothing
  else
  inc epi ref count   ---


The pseudecode you provide below (for "DISABLE") seems to indicate that
this "epi ref count" must be maintained by the kernel. Therefore any
userspace modification of a ref count associated with an epoll item will
require a new or changed kernel API.


  send event

And what DISABLE do is:

  set epi stop flag

  if epi ref count is not 0
  wait until ref count be 0


Perhaps I don't fully understand what you're proposing, but I don't
think it's reasonable for a kernel API (epoll_ctl in this case) to block
while waiting for a userspace action (decrementing the ref count) that
might never occur.

Andrew Morton also proposed using ref counting in response to my initial
patch submission; my reply to his proposal might also be applicable to
your proposal. A link to that discussion thread:
http://thread.gmane.org/gmane.linux.kernel/1311457/focus=1315096

Sorry if I am misunderstanding your proposal, but I don't see how it
solves the original problem.


I just try to find out whether we could using DISABLE with out ONESHOT :)

My currently understanding is:

1. we actually want to determine the part between each epoll_wait() in a
while().

2. we can't count on epoll_wait() itself, since no info pass to kernel
to indicate whether it was invoked after another epoll_wait() in the
same while().

3. so we need epoll_ctl(MOD) to tell kernel: user finished process data
after epoll_wait(), and those data belong to which epi.

4. since 3 we need ONESHOT to be enabled.


Is this the reason we rely on ONESHOT to be enabled?


No, we need to use EPOLLONESHOT to ensure that only one worker thread at 
a time can ever be handling private data associated with a given fd. 
This constraint allows a deletion t

Re: [PATCH v3] epoll: Support for disabling items, and a self-test app.

2012-11-01 Thread Paton J. Lewis

On 10/31/12 5:43 PM, Michael Wang wrote:

On 11/01/2012 02:57 AM, Paton J. Lewis wrote:

On 10/30/12 11:32 PM, Michael Wang wrote:

On 10/26/2012 08:08 AM, Paton J. Lewis wrote:

From: Paton J. Lewis pale...@adobe.com

It is not currently possible to reliably delete epoll items when
using the
same epoll set from multiple threads. After calling epoll_ctl with
EPOLL_CTL_DEL, another thread might still be executing code related
to an
event for that epoll item (in response to epoll_wait). Therefore the
deleting
thread does not know when it is safe to delete resources pertaining
to the
associated epoll item because another thread might be using those
resources.

The deleting thread could wait an arbitrary amount of time after calling
epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
inefficient and could result in the destruction of resources before
another
thread is done handling an event returned by epoll_wait.

This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which
disables an
epoll item. If epoll_ctl returns -EBUSY in this case, then another
thread may
handling a return from epoll_wait for this item. Otherwise if epoll_ctl
returns 0, then it is safe to delete the epoll item. This allows
multiple
threads to use a mutex to determine when it is safe to delete an
epoll item
and its associated resources, which allows epoll items to be deleted
both
efficiently and without error in a multi-threaded environment. Note that
EPOLL_CTL_DISABLE is only useful in conjunction with EPOLLONESHOT,
and using
EPOLL_CTL_DISABLE on an epoll item without EPOLLONESHOT returns -EINVAL.

This patch also adds a new test_epoll self-test program to both
demonstrate
the need for this feature and test it.


Hi, Paton

I'm just think about may be we could use this way.

Seems like currently we are depending on the epoll_ctl() to indicate the
start point of safe section and epoll_wait() for the end point, like:

  while () {
  epoll_wait()--

  fd event arrivedsafe section

  clear fd epi-event.events
  --
  if (fd need stop)
  continue;
  --
  ...fd data process...

  epoll_ctl(MOD)  danger section

  set fd epi-event.events--

  continue;
  }

So we got a safe section and do delete work in this section won't cause
trouble since we have a stop check directly after it.

Actually what we want is to make sure no one will touch the fd any more
after we DISABLE it.

Then what about we add a ref count and a stop flag in epi, maintain it
like:

  epoll_wait()

  check user events and
  dec the ref count of fd ---

  ...

  fd event arrivedsafe sec if ref count is 0

  if epi stop flag set
  do nothing
  else
  inc epi ref count   ---


The pseudecode you provide below (for DISABLE) seems to indicate that
this epi ref count must be maintained by the kernel. Therefore any
userspace modification of a ref count associated with an epoll item will
require a new or changed kernel API.


  send event

And what DISABLE do is:

  set epi stop flag

  if epi ref count is not 0
  wait until ref count be 0


Perhaps I don't fully understand what you're proposing, but I don't
think it's reasonable for a kernel API (epoll_ctl in this case) to block
while waiting for a userspace action (decrementing the ref count) that
might never occur.

Andrew Morton also proposed using ref counting in response to my initial
patch submission; my reply to his proposal might also be applicable to
your proposal. A link to that discussion thread:
http://thread.gmane.org/gmane.linux.kernel/1311457/focus=1315096

Sorry if I am misunderstanding your proposal, but I don't see how it
solves the original problem.


I just try to find out whether we could using DISABLE with out ONESHOT :)

My currently understanding is:

1. we actually want to determine the part between each epoll_wait() in a
while().

2. we can't count on epoll_wait() itself, since no info pass to kernel
to indicate whether it was invoked after another epoll_wait() in the
same while().

3. so we need epoll_ctl(MOD) to tell kernel: user finished process data
after epoll_wait(), and those data belong to which epi.

4. since 3 we need ONESHOT to be enabled.


Is this the reason we rely on ONESHOT to be enabled?


No, we need to use EPOLLONESHOT to ensure that only one worker thread at 
a time can ever be handling private data associated with a given fd. 
This constraint allows a deletion thread to coordinate

Re: [PATCH v3] epoll: Support for disabling items, and a self-test app.

2012-10-31 Thread Paton J. Lewis

On 10/30/12 11:32 PM, Michael Wang wrote:

On 10/26/2012 08:08 AM, Paton J. Lewis wrote:

From: "Paton J. Lewis" 

It is not currently possible to reliably delete epoll items when using the
same epoll set from multiple threads. After calling epoll_ctl with
EPOLL_CTL_DEL, another thread might still be executing code related to an
event for that epoll item (in response to epoll_wait). Therefore the deleting
thread does not know when it is safe to delete resources pertaining to the
associated epoll item because another thread might be using those resources.

The deleting thread could wait an arbitrary amount of time after calling
epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
inefficient and could result in the destruction of resources before another
thread is done handling an event returned by epoll_wait.

This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which disables an
epoll item. If epoll_ctl returns -EBUSY in this case, then another thread may
handling a return from epoll_wait for this item. Otherwise if epoll_ctl
returns 0, then it is safe to delete the epoll item. This allows multiple
threads to use a mutex to determine when it is safe to delete an epoll item
and its associated resources, which allows epoll items to be deleted both
efficiently and without error in a multi-threaded environment. Note that
EPOLL_CTL_DISABLE is only useful in conjunction with EPOLLONESHOT, and using
EPOLL_CTL_DISABLE on an epoll item without EPOLLONESHOT returns -EINVAL.

This patch also adds a new test_epoll self-test program to both demonstrate
the need for this feature and test it.


Hi, Paton

I'm just think about may be we could use this way.

Seems like currently we are depending on the epoll_ctl() to indicate the
start point of safe section and epoll_wait() for the end point, like:

 while () {
 epoll_wait()--

 fd event arrivedsafe section

 clear fd epi->event.events
 --
 if (fd need stop)
 continue;
 --
 ...fd data process...

 epoll_ctl(MOD)  danger section

 set fd epi->event.events--

 continue;
 }

So we got a safe section and do delete work in this section won't cause
trouble since we have a stop check directly after it.

Actually what we want is to make sure no one will touch the fd any more
after we DISABLE it.

Then what about we add a ref count and a stop flag in epi, maintain it like:

 epoll_wait()

 check user events and
 dec the ref count of fd ---

 ...

 fd event arrivedsafe sec if ref count is 0

 if epi stop flag set
 do nothing
 else
 inc epi ref count   ---


The pseudecode you provide below (for "DISABLE") seems to indicate that 
this "epi ref count" must be maintained by the kernel. Therefore any 
userspace modification of a ref count associated with an epoll item will 
require a new or changed kernel API.



 send event

And what DISABLE do is:

 set epi stop flag

 if epi ref count is not 0
 wait until ref count be 0


Perhaps I don't fully understand what you're proposing, but I don't 
think it's reasonable for a kernel API (epoll_ctl in this case) to block 
while waiting for a userspace action (decrementing the ref count) that 
might never occur.


Andrew Morton also proposed using ref counting in response to my initial 
patch submission; my reply to his proposal might also be applicable to 
your proposal. A link to that discussion thread: 
http://thread.gmane.org/gmane.linux.kernel/1311457/focus=1315096


Sorry if I am misunderstanding your proposal, but I don't see how it 
solves the original problem.


Pat


So after DISABLE return, we can safely delete any thing related to that epi.

One thing is that the user should not change the events info returned by
epoll_wait().

It's just a propose, but if it works, there will be no limit on ONESHOT
any more ;-)

Regards,
Michael Wang



Signed-off-by: Paton J. Lewis 
---
  fs/eventpoll.c |   40 ++-
  include/linux/eventpoll.h  |1 +
  tools/testing/selftests/Makefile   |2 +-
  tools/testing/selftests/epoll/Makefile |   11 +
  tools/testing/selftests/epoll/test_epoll.c |  364 
  5 files changed, 414 insertions(+), 4 deletions(-)
  create mode 100644 tools/testing/selftests/epoll/Makefile
  create mode 100644 tools/testing/selftests/epoll/test_epoll.c

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
in

Re: [PATCH v3] epoll: Support for disabling items, and a self-test app.

2012-10-31 Thread Paton J. Lewis

On 10/30/12 11:32 PM, Michael Wang wrote:

On 10/26/2012 08:08 AM, Paton J. Lewis wrote:

From: Paton J. Lewis pale...@adobe.com

It is not currently possible to reliably delete epoll items when using the
same epoll set from multiple threads. After calling epoll_ctl with
EPOLL_CTL_DEL, another thread might still be executing code related to an
event for that epoll item (in response to epoll_wait). Therefore the deleting
thread does not know when it is safe to delete resources pertaining to the
associated epoll item because another thread might be using those resources.

The deleting thread could wait an arbitrary amount of time after calling
epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
inefficient and could result in the destruction of resources before another
thread is done handling an event returned by epoll_wait.

This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which disables an
epoll item. If epoll_ctl returns -EBUSY in this case, then another thread may
handling a return from epoll_wait for this item. Otherwise if epoll_ctl
returns 0, then it is safe to delete the epoll item. This allows multiple
threads to use a mutex to determine when it is safe to delete an epoll item
and its associated resources, which allows epoll items to be deleted both
efficiently and without error in a multi-threaded environment. Note that
EPOLL_CTL_DISABLE is only useful in conjunction with EPOLLONESHOT, and using
EPOLL_CTL_DISABLE on an epoll item without EPOLLONESHOT returns -EINVAL.

This patch also adds a new test_epoll self-test program to both demonstrate
the need for this feature and test it.


Hi, Paton

I'm just think about may be we could use this way.

Seems like currently we are depending on the epoll_ctl() to indicate the
start point of safe section and epoll_wait() for the end point, like:

 while () {
 epoll_wait()--

 fd event arrivedsafe section

 clear fd epi-event.events
 --
 if (fd need stop)
 continue;
 --
 ...fd data process...

 epoll_ctl(MOD)  danger section

 set fd epi-event.events--

 continue;
 }

So we got a safe section and do delete work in this section won't cause
trouble since we have a stop check directly after it.

Actually what we want is to make sure no one will touch the fd any more
after we DISABLE it.

Then what about we add a ref count and a stop flag in epi, maintain it like:

 epoll_wait()

 check user events and
 dec the ref count of fd ---

 ...

 fd event arrivedsafe sec if ref count is 0

 if epi stop flag set
 do nothing
 else
 inc epi ref count   ---


The pseudecode you provide below (for DISABLE) seems to indicate that 
this epi ref count must be maintained by the kernel. Therefore any 
userspace modification of a ref count associated with an epoll item will 
require a new or changed kernel API.



 send event

And what DISABLE do is:

 set epi stop flag

 if epi ref count is not 0
 wait until ref count be 0


Perhaps I don't fully understand what you're proposing, but I don't 
think it's reasonable for a kernel API (epoll_ctl in this case) to block 
while waiting for a userspace action (decrementing the ref count) that 
might never occur.


Andrew Morton also proposed using ref counting in response to my initial 
patch submission; my reply to his proposal might also be applicable to 
your proposal. A link to that discussion thread: 
http://thread.gmane.org/gmane.linux.kernel/1311457/focus=1315096


Sorry if I am misunderstanding your proposal, but I don't see how it 
solves the original problem.


Pat


So after DISABLE return, we can safely delete any thing related to that epi.

One thing is that the user should not change the events info returned by
epoll_wait().

It's just a propose, but if it works, there will be no limit on ONESHOT
any more ;-)

Regards,
Michael Wang



Signed-off-by: Paton J. Lewis pale...@adobe.com
---
  fs/eventpoll.c |   40 ++-
  include/linux/eventpoll.h  |1 +
  tools/testing/selftests/Makefile   |2 +-
  tools/testing/selftests/epoll/Makefile |   11 +
  tools/testing/selftests/epoll/test_epoll.c |  364 
  5 files changed, 414 insertions(+), 4 deletions(-)
  create mode 100644 tools/testing/selftests/epoll/Makefile
  create mode 100644 tools/testing/selftests/epoll/test_epoll.c

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 739b098..c718afd

Re: [PATCH v2] epoll: Support for disabling items, and a self-test app.

2012-10-25 Thread Paton J. Lewis

On 10/23/12 12:15 PM, Andreas Jaeger wrote:

On 10/23/2012 07:23 PM, Paton J. Lewis wrote:

[Re-sending without HTML formatting; all vger.kernel.org destination
addresses bounced my original response.]

On 10/16/12 8:12 AM, Michael Kerrisk (man-pages) wrote:

[CC += linux-api@]


Thank you; is this sufficient to coordinate the required changes to the
glibc version of epoll.h?


Paton, we normally review the diffs between kernel versions but noticing
us via libc-alpha is great.

So, you ask to get this added to ?

#define EPOLL_CTL_DISABLE 4


Yes, that's correct (assuming that v3 the patch proposal is approved).


Once 3.7 is out and contains it, we will add it. A friendly reminder
once the patch is in would be nice so that we don't miss it during the
review.

thanks,
Andreas



Thank you,
Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] epoll: Support for disabling items, and a self-test app.

2012-10-25 Thread Paton J. Lewis
From: "Paton J. Lewis" 

It is not currently possible to reliably delete epoll items when using the
same epoll set from multiple threads. After calling epoll_ctl with
EPOLL_CTL_DEL, another thread might still be executing code related to an
event for that epoll item (in response to epoll_wait). Therefore the deleting
thread does not know when it is safe to delete resources pertaining to the
associated epoll item because another thread might be using those resources.

The deleting thread could wait an arbitrary amount of time after calling
epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
inefficient and could result in the destruction of resources before another
thread is done handling an event returned by epoll_wait.

This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which disables an
epoll item. If epoll_ctl returns -EBUSY in this case, then another thread may
handling a return from epoll_wait for this item. Otherwise if epoll_ctl
returns 0, then it is safe to delete the epoll item. This allows multiple
threads to use a mutex to determine when it is safe to delete an epoll item
and its associated resources, which allows epoll items to be deleted both
efficiently and without error in a multi-threaded environment. Note that
EPOLL_CTL_DISABLE is only useful in conjunction with EPOLLONESHOT, and using
EPOLL_CTL_DISABLE on an epoll item without EPOLLONESHOT returns -EINVAL.

This patch also adds a new test_epoll self-test program to both demonstrate
the need for this feature and test it.

Signed-off-by: Paton J. Lewis 
---
 fs/eventpoll.c |   40 ++-
 include/linux/eventpoll.h  |1 +
 tools/testing/selftests/Makefile   |2 +-
 tools/testing/selftests/epoll/Makefile |   11 +
 tools/testing/selftests/epoll/test_epoll.c |  364 
 5 files changed, 414 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/epoll/Makefile
 create mode 100644 tools/testing/selftests/epoll/test_epoll.c

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 739b098..c718afd 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -339,7 +339,7 @@ static inline struct epitem 
*ep_item_from_epqueue(poll_table *p)
 /* Tells if the epoll_ctl(2) operation needs an event copy from userspace */
 static inline int ep_op_has_event(int op)
 {
-   return op != EPOLL_CTL_DEL;
+   return op == EPOLL_CTL_ADD || op == EPOLL_CTL_MOD;
 }
 
 /* Initialize the poll safe wake up structure */
@@ -664,6 +664,36 @@ static int ep_remove(struct eventpoll *ep, struct epitem 
*epi)
return 0;
 }
 
+/*
+ * Disables a "struct epitem" in the eventpoll set. Returns -EBUSY if the item
+ * had no event flags set, indicating that another thread may be currently
+ * handling that item's events (in the case that EPOLLONESHOT was being
+ * used). Otherwise a zero result indicates that the item has been disabled
+ * from receiving events. A disabled item may be re-enabled via
+ * EPOLL_CTL_MOD. Must be called with "mtx" held.
+ */
+static int ep_disable(struct eventpoll *ep, struct epitem *epi)
+{
+   int result = 0;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   if (epi->event.events & EPOLLONESHOT) {
+   if (epi->event.events & ~EP_PRIVATE_BITS) {
+   if (ep_is_linked(>rdllink))
+   list_del_init(>rdllink);
+   /* Ensure ep_poll_callback will not add epi back onto
+  ready list: */
+   epi->event.events &= EP_PRIVATE_BITS;
+   } else
+   result = -EBUSY;
+   } else
+   result = -EINVAL;
+   spin_unlock_irqrestore(>lock, flags);
+
+   return result;
+}
+
 static void ep_free(struct eventpoll *ep)
 {
struct rb_node *rbp;
@@ -996,8 +1026,6 @@ static void ep_rbtree_insert(struct eventpoll *ep, struct 
epitem *epi)
rb_insert_color(>rbn, >rbr);
 }
 
-
-
 #define PATH_ARR_SIZE 5
 /*
  * These are the number paths of length 1 to 5, that we are allowing to emanate
@@ -1701,6 +1729,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
} else
error = -ENOENT;
break;
+   case EPOLL_CTL_DISABLE:
+   if (epi)
+   error = ep_disable(ep, epi);
+   else
+   error = -ENOENT;
+   break;
}
mutex_unlock(>mtx);
 
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..e91f7e3 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -25,6 +25,7 @@
 #define EPOLL_CTL_ADD 1
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
+#define EPOLL_CTL_DISABLE 4
 
 /* Set the One Shot behaviour for the target file descriptor */
 #define EPOLLONESHOT (1 <&l

Re: [PATCH v2] epoll: Support for disabling items, and a self-test app.

2012-10-25 Thread Paton J. Lewis

On 10/25/12 3:23 AM, Michael Kerrisk (man-pages) wrote:

Hi Pat,



I suppose that I have a concern that goes in the other direction. Is
there not some other solution possible that doesn't require the use of
EPOLLONESHOT? It seems overly restrictive to require that the caller
must employ this flag, and imposes the burden that the caller must
re-enable monitoring after each event.

Does a solution like the following (with no requirement for EPOLLONESHOT)
work?

0. Implement an epoll_ctl() operation EPOLL_CTL_XXX
where the name XXX might be chosen based on the decision
in 4(a).
1. EPOLL_CTL_XXX employs a private flag, EPOLLUSED, in the
per-fd events mask in the ready list. By default,
that flag is off.
2. epoll_wait() always clears the EPOLLUSED flag if a
file descriptor is found to be ready.
3. If an epoll_ctl(EPOLL_CTL_XXX) discovers that the EPOLLUSED
flag is NOT set, then
 a) it sets the EPOLLUSED flag
 b) It disables I/O events (as per EPOLL_CTL_DISABLE)
(I'm not 100% sure if this is necesary).
 c) it returns EBUSY to the caller
4. If an epoll_ctl(EPOLL_CTL_XXX) discovers that the EPOLLUSED
flag IS set, then it
 a) either deletes the fd or disables events for the fd
(the choice here is a matter of design taste, I think;
deletion has the virtue of simplicity; disabling provides
the option to re-enable the fd later, if desired)
 b) returns 0 to the caller.

All of the above with suitable locking around the user-space cache.

Cheers,

Michael



I don't believe that proposal will solve the problem. Consider the case
where a worker thread has just executed epoll_wait and is about to execute
the next line of code (which will access the data associated with the fd
receiving the event). If the deletion thread manages to call
epoll_ctl(EPOLL_CTL_XXX) for that fd twice in a row before the worker thread
is able to execute the next statement, then the deletion thread will
mistakenly conclude that it is safe to destroy the data that the worker
thread is about to access.


Okay -- I had the idea there might be a hole in my proposal ;-).

By the way, have you been reading the comments in the two LWN articles
on EPOLL_CTL_DISABLE?
https://lwn.net/Articles/520012/
http://lwn.net/SubscriberLink/520198/fd81ba0ecb1858a2/

There's some interesting proposals there--some suggesting that an
entirely user-space solution might be possible. I haven't looked
deeply into the ideas though.


Yes, thanks, I read through the article and comments. I believe all of 
the objections raised there were either addressed by responses there, or 
they were also voiced here on the kernel.org mailing lists and addressed 
by either my or Paul Holland's responses here. If there is another 
objection to the original proposal that you feel I have overlooked or 
which has not been properly addressed, please let me know.


Pat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] epoll: Support for disabling items, and a self-test app.

2012-10-25 Thread Paton J. Lewis

On 10/25/12 3:23 AM, Michael Kerrisk (man-pages) wrote:

Hi Pat,



I suppose that I have a concern that goes in the other direction. Is
there not some other solution possible that doesn't require the use of
EPOLLONESHOT? It seems overly restrictive to require that the caller
must employ this flag, and imposes the burden that the caller must
re-enable monitoring after each event.

Does a solution like the following (with no requirement for EPOLLONESHOT)
work?

0. Implement an epoll_ctl() operation EPOLL_CTL_XXX
where the name XXX might be chosen based on the decision
in 4(a).
1. EPOLL_CTL_XXX employs a private flag, EPOLLUSED, in the
per-fd events mask in the ready list. By default,
that flag is off.
2. epoll_wait() always clears the EPOLLUSED flag if a
file descriptor is found to be ready.
3. If an epoll_ctl(EPOLL_CTL_XXX) discovers that the EPOLLUSED
flag is NOT set, then
 a) it sets the EPOLLUSED flag
 b) It disables I/O events (as per EPOLL_CTL_DISABLE)
(I'm not 100% sure if this is necesary).
 c) it returns EBUSY to the caller
4. If an epoll_ctl(EPOLL_CTL_XXX) discovers that the EPOLLUSED
flag IS set, then it
 a) either deletes the fd or disables events for the fd
(the choice here is a matter of design taste, I think;
deletion has the virtue of simplicity; disabling provides
the option to re-enable the fd later, if desired)
 b) returns 0 to the caller.

All of the above with suitable locking around the user-space cache.

Cheers,

Michael



I don't believe that proposal will solve the problem. Consider the case
where a worker thread has just executed epoll_wait and is about to execute
the next line of code (which will access the data associated with the fd
receiving the event). If the deletion thread manages to call
epoll_ctl(EPOLL_CTL_XXX) for that fd twice in a row before the worker thread
is able to execute the next statement, then the deletion thread will
mistakenly conclude that it is safe to destroy the data that the worker
thread is about to access.


Okay -- I had the idea there might be a hole in my proposal ;-).

By the way, have you been reading the comments in the two LWN articles
on EPOLL_CTL_DISABLE?
https://lwn.net/Articles/520012/
http://lwn.net/SubscriberLink/520198/fd81ba0ecb1858a2/

There's some interesting proposals there--some suggesting that an
entirely user-space solution might be possible. I haven't looked
deeply into the ideas though.


Yes, thanks, I read through the article and comments. I believe all of 
the objections raised there were either addressed by responses there, or 
they were also voiced here on the kernel.org mailing lists and addressed 
by either my or Paul Holland's responses here. If there is another 
objection to the original proposal that you feel I have overlooked or 
which has not been properly addressed, please let me know.


Pat
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] epoll: Support for disabling items, and a self-test app.

2012-10-25 Thread Paton J. Lewis
From: Paton J. Lewis pale...@adobe.com

It is not currently possible to reliably delete epoll items when using the
same epoll set from multiple threads. After calling epoll_ctl with
EPOLL_CTL_DEL, another thread might still be executing code related to an
event for that epoll item (in response to epoll_wait). Therefore the deleting
thread does not know when it is safe to delete resources pertaining to the
associated epoll item because another thread might be using those resources.

The deleting thread could wait an arbitrary amount of time after calling
epoll_ctl with EPOLL_CTL_DEL and before deleting the item, but this is
inefficient and could result in the destruction of resources before another
thread is done handling an event returned by epoll_wait.

This patch enhances epoll_ctl to support EPOLL_CTL_DISABLE, which disables an
epoll item. If epoll_ctl returns -EBUSY in this case, then another thread may
handling a return from epoll_wait for this item. Otherwise if epoll_ctl
returns 0, then it is safe to delete the epoll item. This allows multiple
threads to use a mutex to determine when it is safe to delete an epoll item
and its associated resources, which allows epoll items to be deleted both
efficiently and without error in a multi-threaded environment. Note that
EPOLL_CTL_DISABLE is only useful in conjunction with EPOLLONESHOT, and using
EPOLL_CTL_DISABLE on an epoll item without EPOLLONESHOT returns -EINVAL.

This patch also adds a new test_epoll self-test program to both demonstrate
the need for this feature and test it.

Signed-off-by: Paton J. Lewis pale...@adobe.com
---
 fs/eventpoll.c |   40 ++-
 include/linux/eventpoll.h  |1 +
 tools/testing/selftests/Makefile   |2 +-
 tools/testing/selftests/epoll/Makefile |   11 +
 tools/testing/selftests/epoll/test_epoll.c |  364 
 5 files changed, 414 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/epoll/Makefile
 create mode 100644 tools/testing/selftests/epoll/test_epoll.c

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 739b098..c718afd 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -339,7 +339,7 @@ static inline struct epitem 
*ep_item_from_epqueue(poll_table *p)
 /* Tells if the epoll_ctl(2) operation needs an event copy from userspace */
 static inline int ep_op_has_event(int op)
 {
-   return op != EPOLL_CTL_DEL;
+   return op == EPOLL_CTL_ADD || op == EPOLL_CTL_MOD;
 }
 
 /* Initialize the poll safe wake up structure */
@@ -664,6 +664,36 @@ static int ep_remove(struct eventpoll *ep, struct epitem 
*epi)
return 0;
 }
 
+/*
+ * Disables a struct epitem in the eventpoll set. Returns -EBUSY if the item
+ * had no event flags set, indicating that another thread may be currently
+ * handling that item's events (in the case that EPOLLONESHOT was being
+ * used). Otherwise a zero result indicates that the item has been disabled
+ * from receiving events. A disabled item may be re-enabled via
+ * EPOLL_CTL_MOD. Must be called with mtx held.
+ */
+static int ep_disable(struct eventpoll *ep, struct epitem *epi)
+{
+   int result = 0;
+   unsigned long flags;
+
+   spin_lock_irqsave(ep-lock, flags);
+   if (epi-event.events  EPOLLONESHOT) {
+   if (epi-event.events  ~EP_PRIVATE_BITS) {
+   if (ep_is_linked(epi-rdllink))
+   list_del_init(epi-rdllink);
+   /* Ensure ep_poll_callback will not add epi back onto
+  ready list: */
+   epi-event.events = EP_PRIVATE_BITS;
+   } else
+   result = -EBUSY;
+   } else
+   result = -EINVAL;
+   spin_unlock_irqrestore(ep-lock, flags);
+
+   return result;
+}
+
 static void ep_free(struct eventpoll *ep)
 {
struct rb_node *rbp;
@@ -996,8 +1026,6 @@ static void ep_rbtree_insert(struct eventpoll *ep, struct 
epitem *epi)
rb_insert_color(epi-rbn, ep-rbr);
 }
 
-
-
 #define PATH_ARR_SIZE 5
 /*
  * These are the number paths of length 1 to 5, that we are allowing to emanate
@@ -1701,6 +1729,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
} else
error = -ENOENT;
break;
+   case EPOLL_CTL_DISABLE:
+   if (epi)
+   error = ep_disable(ep, epi);
+   else
+   error = -ENOENT;
+   break;
}
mutex_unlock(ep-mtx);
 
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..e91f7e3 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -25,6 +25,7 @@
 #define EPOLL_CTL_ADD 1
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
+#define EPOLL_CTL_DISABLE 4
 
 /* Set the One Shot behaviour for the target file descriptor */
 #define EPOLLONESHOT (1  30)
diff --git a/tools/testing/selftests

Re: [PATCH v2] epoll: Support for disabling items, and a self-test app.

2012-10-25 Thread Paton J. Lewis

On 10/23/12 12:15 PM, Andreas Jaeger wrote:

On 10/23/2012 07:23 PM, Paton J. Lewis wrote:

[Re-sending without HTML formatting; all vger.kernel.org destination
addresses bounced my original response.]

On 10/16/12 8:12 AM, Michael Kerrisk (man-pages) wrote:

[CC += linux-api@]


Thank you; is this sufficient to coordinate the required changes to the
glibc version of epoll.h?


Paton, we normally review the diffs between kernel versions but noticing
us via libc-alpha is great.

So, you ask to get this added to sys/epoll.h?

#define EPOLL_CTL_DISABLE 4


Yes, that's correct (assuming that v3 the patch proposal is approved).


Once 3.7 is out and contains it, we will add it. A friendly reminder
once the patch is in would be nice so that we don't miss it during the
review.

thanks,
Andreas



Thank you,
Pat
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] epoll: Support for disabling items, and a self-test app.

2012-10-23 Thread Paton J. Lewis

On 10/23/12 6:26 AM, Michael Kerrisk (man-pages) wrote:

On 10/23/12 10:23 AM, Paton J. Lewis wrote:

[Re-sending without HTML formatting; all vger.kernel.org destination
addresses bounced my original response.]

On 10/16/12 8:12 AM, Michael Kerrisk (man-pages) wrote:

[CC += linux-api@]


Thank you; is this sufficient to coordinate the required changes to the
glibc version of epoll.h?


I'm not sure. With a bit of luck, someone at glibc might monitor that list.


Hello Paton,

On Thu, Aug 23, 2012 at 11:15 PM, Paton J. Lewis 
wrote:

From: "Paton J. Lewis" 

Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an
epoll item.
If epoll_ctl doesn't return -EBUSY in this case, it is then safe to
delete the
epoll item in a multi-threaded environment. Also added a new
test_epoll self-
test app to both demonstrate the need for this feature and test it.


(There's a lot of background missing from this version of the patch
that was included in the previous version
[http://thread.gmane.org/gmane.linux.kernel/1311457]. It helps to
include the full rationale with a revised patch--best not to assume
that someone has the context of past mails when reading a revised
patch.)


I was trying to present the same information in a more concise manner. I
was hoping that the test application would provide a clearer description
of the problem and the solution. However, I can include some of my
original prose from the v1 email in the next revision's email if you
think that would be helpful.


Yes, it would be a good idea.


I've taken a look at this patch as it currently stands in 3.7-rc1, and
done a bit of testing. (By the way, the test program
tools/testing/selftests/epoll/test_epoll.c does not compile...)


Sorry, the test program compiled against commit
ecca5c3acc0d0933d89abc44e60afb0cc8170e35 from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, which
I believe was the current head commit when I emailed the v2 patch in
August. I thought that git's format-patch command would include the
necessary information so people could see which commit the diff was
created from. Should I be pulling from a different repository or working
on a different branch? Since this is my first patch submission and I
have essentially zero experience with git, I would appreciate any
guidance you could provide here.


I'm not expert enough to provide guidance, unfortunately. I'd just
make sure that the program gets updated with any future patch
revision.

I should have added that the changes to fix the program were fairly
trivial (but they involved a misnamed variable--it looks like you did
a scan renaming a variable but didn't catch all instances, so I'm not
sure how the test program could have ever compiled in the form that
was committed), and I did get it going.


Sorry, that error arose from my lack of experience with git and its 
format-patch command. I created the patch without first committing the 
final fix to my branch.



There are one or two places where the behavior seems a little strange,
so I have a question or two at the end of this mail. But other than
that, I want to check my understanding so that the interface can be
correctly documented.

Just to go though my understanding, the problem is the following
scenario in a multithreaded application:

1. Multiple threads are performing epoll_wait() operations,
and maintaining a user-space cache that contains information
corresponding to each file descriptor being monitored by
epoll_wait().

2. At some point, a thread wants to delete (EPOLL_CTL_DEL)
a file descriptor from the epoll interest list, and
delete the corresponding record from the user-space cache.

3. The problem with (2) is that some other thread may have
previously done an epoll_wait() that retrieved information
about the fd in question, and may be in the middle of using
information in the cache that relates to that fd. Thus,
there is a potential race.

4. The race can't solved purely in user space, because doing
so would require applying a mutex across the epoll_wait()
call, which would of course blow thread concurrency.

Right?


Yes, that is an accurate description of the problem.


Okay.


Your solution is the EPOLL_CTL_DISABLE operation. I want to
confirm my understanding about how to use this flag, since
the description that has accompanied the patches so far
has been a bit sparse

0. In the scenario you're concerned about, deleting a file
descriptor means (safely) doing the following:
(a) Deleting the file descriptor from the epoll interest list
using EPOLL_CTL_DEL
(b) Deleting the corresponding record in the user-space cache

1. It's only meaningful to use this EPOLL_CTL_DISABLE in
conjunction with EPOLLONESHOT.

2. Using EPOLL_CTL_DISABLE without using EPOLLONESHOT in
conjunction is a logical error.

3. The correct way to code multithreaded applications using
EPOLL_CTL_DISABLE and EPOLLONESHOT is 

Re: [PATCH v2] epoll: Support for disabling items, and a self-test app.

2012-10-23 Thread Paton J. Lewis
[Re-sending without HTML formatting; all vger.kernel.org destination 
addresses bounced my original response.]


On 10/16/12 8:12 AM, Michael Kerrisk (man-pages) wrote:

[CC += linux-api@]


Thank you; is this sufficient to coordinate the required changes to the 
glibc version of epoll.h?



Hello Paton,

On Thu, Aug 23, 2012 at 11:15 PM, Paton J. Lewis  wrote:

From: "Paton J. Lewis" 

Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll item.
If epoll_ctl doesn't return -EBUSY in this case, it is then safe to delete the
epoll item in a multi-threaded environment. Also added a new test_epoll self-
test app to both demonstrate the need for this feature and test it.


(There's a lot of background missing from this version of the patch
that was included in the previous version
[http://thread.gmane.org/gmane.linux.kernel/1311457]. It helps to
include the full rationale with a revised patch--best not to assume
that someone has the context of past mails when reading a revised
patch.)


I was trying to present the same information in a more concise manner. I 
was hoping that the test application would provide a clearer description 
of the problem and the solution. However, I can include some of my 
original prose from the v1 email in the next revision's email if you 
think that would be helpful.



I've taken a look at this patch as it currently stands in 3.7-rc1, and
done a bit of testing. (By the way, the test program
tools/testing/selftests/epoll/test_epoll.c does not compile...)


Sorry, the test program compiled against commit 
ecca5c3acc0d0933d89abc44e60afb0cc8170e35 from 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, which 
I believe was the current head commit when I emailed the v2 patch in 
August. I thought that git's format-patch command would include the 
necessary information so people could see which commit the diff was 
created from. Should I be pulling from a different repository or working 
on a different branch? Since this is my first patch submission and I 
have essentially zero experience with git, I would appreciate any 
guidance you could provide here.



There are one or two places where the behavior seems a little strange,
so I have a question or two at the end of this mail. But other than
that, I want to check my understanding so that the interface can be
correctly documented.

Just to go though my understanding, the problem is the following
scenario in a multithreaded application:

1. Multiple threads are performing epoll_wait() operations,
and maintaining a user-space cache that contains information
corresponding to each file descriptor being monitored by
epoll_wait().

2. At some point, a thread wants to delete (EPOLL_CTL_DEL)
a file descriptor from the epoll interest list, and
delete the corresponding record from the user-space cache.

3. The problem with (2) is that some other thread may have
previously done an epoll_wait() that retrieved information
about the fd in question, and may be in the middle of using
information in the cache that relates to that fd. Thus,
there is a potential race.

4. The race can't solved purely in user space, because doing
so would require applying a mutex across the epoll_wait()
call, which would of course blow thread concurrency.

Right?


Yes, that is an accurate description of the problem.


Your solution is the EPOLL_CTL_DISABLE operation. I want to
confirm my understanding about how to use this flag, since
the description that has accompanied the patches so far
has been a bit sparse

0. In the scenario you're concerned about, deleting a file
descriptor means (safely) doing the following:
(a) Deleting the file descriptor from the epoll interest list
using EPOLL_CTL_DEL
(b) Deleting the corresponding record in the user-space cache

1. It's only meaningful to use this EPOLL_CTL_DISABLE in
conjunction with EPOLLONESHOT.

2. Using EPOLL_CTL_DISABLE without using EPOLLONESHOT in
conjunction is a logical error.

3. The correct way to code multithreaded applications using
EPOLL_CTL_DISABLE and EPOLLONESHOT is as follows:

a. All EPOLL_CTL_ADD and EPOLL_CTL_MOD operations should
   should EPOLLONESHOT.

b. When a thread wants to delete a file descriptor, it
   should do the following:

   [1] Call epoll_ctl(EPOLL_CTL_DISABLE)
   [2] If the return status from epoll_ctl(EPOLL_CTL_DISABLE)
   was zero, then the file descriptor can be safely
   deleted by the thread that made this call.
   [3] If the epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY,
   then the descriptor is in use. In this case, the calling
   thread should set a flag in the user-space cache to
   indicate that the thread that is using the descriptor
   should perform the deletion operation.

Is all of the above correct?


Yes, that is correct.


The implementation depends on checking on wheth

Re: [PATCH v2] epoll: Support for disabling items, and a self-test app.

2012-10-23 Thread Paton J. Lewis
[Re-sending without HTML formatting; all vger.kernel.org destination 
addresses bounced my original response.]


On 10/16/12 8:12 AM, Michael Kerrisk (man-pages) wrote:

[CC += linux-api@]


Thank you; is this sufficient to coordinate the required changes to the 
glibc version of epoll.h?



Hello Paton,

On Thu, Aug 23, 2012 at 11:15 PM, Paton J. Lewis pale...@adobe.com wrote:

From: Paton J. Lewis pale...@adobe.com

Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll item.
If epoll_ctl doesn't return -EBUSY in this case, it is then safe to delete the
epoll item in a multi-threaded environment. Also added a new test_epoll self-
test app to both demonstrate the need for this feature and test it.


(There's a lot of background missing from this version of the patch
that was included in the previous version
[http://thread.gmane.org/gmane.linux.kernel/1311457]. It helps to
include the full rationale with a revised patch--best not to assume
that someone has the context of past mails when reading a revised
patch.)


I was trying to present the same information in a more concise manner. I 
was hoping that the test application would provide a clearer description 
of the problem and the solution. However, I can include some of my 
original prose from the v1 email in the next revision's email if you 
think that would be helpful.



I've taken a look at this patch as it currently stands in 3.7-rc1, and
done a bit of testing. (By the way, the test program
tools/testing/selftests/epoll/test_epoll.c does not compile...)


Sorry, the test program compiled against commit 
ecca5c3acc0d0933d89abc44e60afb0cc8170e35 from 
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, which 
I believe was the current head commit when I emailed the v2 patch in 
August. I thought that git's format-patch command would include the 
necessary information so people could see which commit the diff was 
created from. Should I be pulling from a different repository or working 
on a different branch? Since this is my first patch submission and I 
have essentially zero experience with git, I would appreciate any 
guidance you could provide here.



There are one or two places where the behavior seems a little strange,
so I have a question or two at the end of this mail. But other than
that, I want to check my understanding so that the interface can be
correctly documented.

Just to go though my understanding, the problem is the following
scenario in a multithreaded application:

1. Multiple threads are performing epoll_wait() operations,
and maintaining a user-space cache that contains information
corresponding to each file descriptor being monitored by
epoll_wait().

2. At some point, a thread wants to delete (EPOLL_CTL_DEL)
a file descriptor from the epoll interest list, and
delete the corresponding record from the user-space cache.

3. The problem with (2) is that some other thread may have
previously done an epoll_wait() that retrieved information
about the fd in question, and may be in the middle of using
information in the cache that relates to that fd. Thus,
there is a potential race.

4. The race can't solved purely in user space, because doing
so would require applying a mutex across the epoll_wait()
call, which would of course blow thread concurrency.

Right?


Yes, that is an accurate description of the problem.


Your solution is the EPOLL_CTL_DISABLE operation. I want to
confirm my understanding about how to use this flag, since
the description that has accompanied the patches so far
has been a bit sparse

0. In the scenario you're concerned about, deleting a file
descriptor means (safely) doing the following:
(a) Deleting the file descriptor from the epoll interest list
using EPOLL_CTL_DEL
(b) Deleting the corresponding record in the user-space cache

1. It's only meaningful to use this EPOLL_CTL_DISABLE in
conjunction with EPOLLONESHOT.

2. Using EPOLL_CTL_DISABLE without using EPOLLONESHOT in
conjunction is a logical error.

3. The correct way to code multithreaded applications using
EPOLL_CTL_DISABLE and EPOLLONESHOT is as follows:

a. All EPOLL_CTL_ADD and EPOLL_CTL_MOD operations should
   should EPOLLONESHOT.

b. When a thread wants to delete a file descriptor, it
   should do the following:

   [1] Call epoll_ctl(EPOLL_CTL_DISABLE)
   [2] If the return status from epoll_ctl(EPOLL_CTL_DISABLE)
   was zero, then the file descriptor can be safely
   deleted by the thread that made this call.
   [3] If the epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY,
   then the descriptor is in use. In this case, the calling
   thread should set a flag in the user-space cache to
   indicate that the thread that is using the descriptor
   should perform the deletion operation.

Is all of the above correct?


Yes, that is correct.


The implementation depends

Re: [PATCH v2] epoll: Support for disabling items, and a self-test app.

2012-10-23 Thread Paton J. Lewis

On 10/23/12 6:26 AM, Michael Kerrisk (man-pages) wrote:

On 10/23/12 10:23 AM, Paton J. Lewis wrote:

[Re-sending without HTML formatting; all vger.kernel.org destination
addresses bounced my original response.]

On 10/16/12 8:12 AM, Michael Kerrisk (man-pages) wrote:

[CC += linux-api@]


Thank you; is this sufficient to coordinate the required changes to the
glibc version of epoll.h?


I'm not sure. With a bit of luck, someone at glibc might monitor that list.


Hello Paton,

On Thu, Aug 23, 2012 at 11:15 PM, Paton J. Lewis pale...@adobe.com
wrote:

From: Paton J. Lewis pale...@adobe.com

Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an
epoll item.
If epoll_ctl doesn't return -EBUSY in this case, it is then safe to
delete the
epoll item in a multi-threaded environment. Also added a new
test_epoll self-
test app to both demonstrate the need for this feature and test it.


(There's a lot of background missing from this version of the patch
that was included in the previous version
[http://thread.gmane.org/gmane.linux.kernel/1311457]. It helps to
include the full rationale with a revised patch--best not to assume
that someone has the context of past mails when reading a revised
patch.)


I was trying to present the same information in a more concise manner. I
was hoping that the test application would provide a clearer description
of the problem and the solution. However, I can include some of my
original prose from the v1 email in the next revision's email if you
think that would be helpful.


Yes, it would be a good idea.


I've taken a look at this patch as it currently stands in 3.7-rc1, and
done a bit of testing. (By the way, the test program
tools/testing/selftests/epoll/test_epoll.c does not compile...)


Sorry, the test program compiled against commit
ecca5c3acc0d0933d89abc44e60afb0cc8170e35 from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git, which
I believe was the current head commit when I emailed the v2 patch in
August. I thought that git's format-patch command would include the
necessary information so people could see which commit the diff was
created from. Should I be pulling from a different repository or working
on a different branch? Since this is my first patch submission and I
have essentially zero experience with git, I would appreciate any
guidance you could provide here.


I'm not expert enough to provide guidance, unfortunately. I'd just
make sure that the program gets updated with any future patch
revision.

I should have added that the changes to fix the program were fairly
trivial (but they involved a misnamed variable--it looks like you did
a scan renaming a variable but didn't catch all instances, so I'm not
sure how the test program could have ever compiled in the form that
was committed), and I did get it going.


Sorry, that error arose from my lack of experience with git and its 
format-patch command. I created the patch without first committing the 
final fix to my branch.



There are one or two places where the behavior seems a little strange,
so I have a question or two at the end of this mail. But other than
that, I want to check my understanding so that the interface can be
correctly documented.

Just to go though my understanding, the problem is the following
scenario in a multithreaded application:

1. Multiple threads are performing epoll_wait() operations,
and maintaining a user-space cache that contains information
corresponding to each file descriptor being monitored by
epoll_wait().

2. At some point, a thread wants to delete (EPOLL_CTL_DEL)
a file descriptor from the epoll interest list, and
delete the corresponding record from the user-space cache.

3. The problem with (2) is that some other thread may have
previously done an epoll_wait() that retrieved information
about the fd in question, and may be in the middle of using
information in the cache that relates to that fd. Thus,
there is a potential race.

4. The race can't solved purely in user space, because doing
so would require applying a mutex across the epoll_wait()
call, which would of course blow thread concurrency.

Right?


Yes, that is an accurate description of the problem.


Okay.


Your solution is the EPOLL_CTL_DISABLE operation. I want to
confirm my understanding about how to use this flag, since
the description that has accompanied the patches so far
has been a bit sparse

0. In the scenario you're concerned about, deleting a file
descriptor means (safely) doing the following:
(a) Deleting the file descriptor from the epoll interest list
using EPOLL_CTL_DEL
(b) Deleting the corresponding record in the user-space cache

1. It's only meaningful to use this EPOLL_CTL_DISABLE in
conjunction with EPOLLONESHOT.

2. Using EPOLL_CTL_DISABLE without using EPOLLONESHOT in
conjunction is a logical error.

3. The correct way to code multithreaded applications using
EPOLL_CTL_DISABLE

[PATCH v2] epoll: Support for disabling items, and a self-test app.

2012-08-23 Thread Paton J. Lewis
From: "Paton J. Lewis" 

Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll item.
If epoll_ctl doesn't return -EBUSY in this case, it is then safe to delete the
epoll item in a multi-threaded environment. Also added a new test_epoll self-
test app to both demonstrate the need for this feature and test it.

Signed-off-by: Paton J. Lewis 
---
 fs/eventpoll.c |   38 ++-
 include/linux/eventpoll.h  |1 +
 tools/testing/selftests/Makefile   |2 +-
 tools/testing/selftests/epoll/Makefile |   11 +
 tools/testing/selftests/epoll/test_epoll.c |  344 
 5 files changed, 392 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/epoll/Makefile
 create mode 100644 tools/testing/selftests/epoll/test_epoll.c

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 739b098..1c47bbd 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -339,7 +339,7 @@ static inline struct epitem 
*ep_item_from_epqueue(poll_table *p)
 /* Tells if the epoll_ctl(2) operation needs an event copy from userspace */
 static inline int ep_op_has_event(int op)
 {
-   return op != EPOLL_CTL_DEL;
+   return op == EPOLL_CTL_ADD || op == EPOLL_CTL_MOD;
 }
 
 /* Initialize the poll safe wake up structure */
@@ -664,6 +664,34 @@ static int ep_remove(struct eventpoll *ep, struct epitem 
*epi)
return 0;
 }
 
+/*
+ * Disables a "struct epitem" in the eventpoll set. Returns -EBUSY if the item
+ * had no event flags set, indicating that another thread may be currently
+ * handling that item's events (in the case that EPOLLONESHOT was being
+ * used). Otherwise a zero result indicates that the item has been disabled
+ * from receiving events. A disabled item may be re-enabled via
+ * EPOLL_CTL_MOD. Must be called with "mtx" held.
+ */
+static int ep_disable(struct eventpoll *ep, struct epitem *epi)
+{
+   int result = 0;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   if (epi->event.events & ~EP_PRIVATE_BITS) {
+   if (ep_is_linked(>rdllink))
+   list_del_init(>rdllink);
+   /* Ensure ep_poll_callback will not add epi back onto ready
+  list: */
+   epi->event.events &= EP_PRIVATE_BITS;
+   }
+   else
+   result = -EBUSY;
+   spin_unlock_irqrestore(>lock, flags);
+
+   return result;
+}
+
 static void ep_free(struct eventpoll *ep)
 {
struct rb_node *rbp;
@@ -996,8 +1024,6 @@ static void ep_rbtree_insert(struct eventpoll *ep, struct 
epitem *epi)
rb_insert_color(>rbn, >rbr);
 }
 
-
-
 #define PATH_ARR_SIZE 5
 /*
  * These are the number paths of length 1 to 5, that we are allowing to emanate
@@ -1701,6 +1727,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
} else
error = -ENOENT;
break;
+   case EPOLL_CTL_DISABLE:
+   if (epi)
+   error = ep_disable(ep, epi);
+   else
+   error = -ENOENT;
+   break;
}
mutex_unlock(>mtx);
 
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..e91f7e3 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -25,6 +25,7 @@
 #define EPOLL_CTL_ADD 1
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
+#define EPOLL_CTL_DISABLE 4
 
 /* Set the One Shot behaviour for the target file descriptor */
 #define EPOLLONESHOT (1 << 30)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 28bc57e..4cf477f 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -1,4 +1,4 @@
-TARGETS = breakpoints vm
+TARGETS = breakpoints epoll vm
 
 all:
for TARGET in $(TARGETS); do \
diff --git a/tools/testing/selftests/epoll/Makefile 
b/tools/testing/selftests/epoll/Makefile
new file mode 100644
index 000..19806ed
--- /dev/null
+++ b/tools/testing/selftests/epoll/Makefile
@@ -0,0 +1,11 @@
+# Makefile for epoll selftests
+
+all: test_epoll
+%: %.c
+   gcc -pthread -g -o $@ $^
+
+run_tests: all
+   ./test_epoll
+
+clean:
+   $(RM) test_epoll
diff --git a/tools/testing/selftests/epoll/test_epoll.c 
b/tools/testing/selftests/epoll/test_epoll.c
new file mode 100644
index 000..cbd8f8b
--- /dev/null
+++ b/tools/testing/selftests/epoll/test_epoll.c
@@ -0,0 +1,344 @@
+/*
+ *  tools/testing/selftests/epoll/test_epoll.c
+ *
+ *  Copyright 2012 Adobe Systems Incorporated
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  Paton J. Lewis 
+ *
+ */
+
+#include 
+#include 
+#include 
+#i

[PATCH v2] epoll: Support for disabling items, and a self-test app.

2012-08-23 Thread Paton J. Lewis
From: Paton J. Lewis pale...@adobe.com

Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll item.
If epoll_ctl doesn't return -EBUSY in this case, it is then safe to delete the
epoll item in a multi-threaded environment. Also added a new test_epoll self-
test app to both demonstrate the need for this feature and test it.

Signed-off-by: Paton J. Lewis pale...@adobe.com
---
 fs/eventpoll.c |   38 ++-
 include/linux/eventpoll.h  |1 +
 tools/testing/selftests/Makefile   |2 +-
 tools/testing/selftests/epoll/Makefile |   11 +
 tools/testing/selftests/epoll/test_epoll.c |  344 
 5 files changed, 392 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/epoll/Makefile
 create mode 100644 tools/testing/selftests/epoll/test_epoll.c

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 739b098..1c47bbd 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -339,7 +339,7 @@ static inline struct epitem 
*ep_item_from_epqueue(poll_table *p)
 /* Tells if the epoll_ctl(2) operation needs an event copy from userspace */
 static inline int ep_op_has_event(int op)
 {
-   return op != EPOLL_CTL_DEL;
+   return op == EPOLL_CTL_ADD || op == EPOLL_CTL_MOD;
 }
 
 /* Initialize the poll safe wake up structure */
@@ -664,6 +664,34 @@ static int ep_remove(struct eventpoll *ep, struct epitem 
*epi)
return 0;
 }
 
+/*
+ * Disables a struct epitem in the eventpoll set. Returns -EBUSY if the item
+ * had no event flags set, indicating that another thread may be currently
+ * handling that item's events (in the case that EPOLLONESHOT was being
+ * used). Otherwise a zero result indicates that the item has been disabled
+ * from receiving events. A disabled item may be re-enabled via
+ * EPOLL_CTL_MOD. Must be called with mtx held.
+ */
+static int ep_disable(struct eventpoll *ep, struct epitem *epi)
+{
+   int result = 0;
+   unsigned long flags;
+
+   spin_lock_irqsave(ep-lock, flags);
+   if (epi-event.events  ~EP_PRIVATE_BITS) {
+   if (ep_is_linked(epi-rdllink))
+   list_del_init(epi-rdllink);
+   /* Ensure ep_poll_callback will not add epi back onto ready
+  list: */
+   epi-event.events = EP_PRIVATE_BITS;
+   }
+   else
+   result = -EBUSY;
+   spin_unlock_irqrestore(ep-lock, flags);
+
+   return result;
+}
+
 static void ep_free(struct eventpoll *ep)
 {
struct rb_node *rbp;
@@ -996,8 +1024,6 @@ static void ep_rbtree_insert(struct eventpoll *ep, struct 
epitem *epi)
rb_insert_color(epi-rbn, ep-rbr);
 }
 
-
-
 #define PATH_ARR_SIZE 5
 /*
  * These are the number paths of length 1 to 5, that we are allowing to emanate
@@ -1701,6 +1727,12 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
} else
error = -ENOENT;
break;
+   case EPOLL_CTL_DISABLE:
+   if (epi)
+   error = ep_disable(ep, epi);
+   else
+   error = -ENOENT;
+   break;
}
mutex_unlock(ep-mtx);
 
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..e91f7e3 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -25,6 +25,7 @@
 #define EPOLL_CTL_ADD 1
 #define EPOLL_CTL_DEL 2
 #define EPOLL_CTL_MOD 3
+#define EPOLL_CTL_DISABLE 4
 
 /* Set the One Shot behaviour for the target file descriptor */
 #define EPOLLONESHOT (1  30)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 28bc57e..4cf477f 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -1,4 +1,4 @@
-TARGETS = breakpoints vm
+TARGETS = breakpoints epoll vm
 
 all:
for TARGET in $(TARGETS); do \
diff --git a/tools/testing/selftests/epoll/Makefile 
b/tools/testing/selftests/epoll/Makefile
new file mode 100644
index 000..19806ed
--- /dev/null
+++ b/tools/testing/selftests/epoll/Makefile
@@ -0,0 +1,11 @@
+# Makefile for epoll selftests
+
+all: test_epoll
+%: %.c
+   gcc -pthread -g -o $@ $^
+
+run_tests: all
+   ./test_epoll
+
+clean:
+   $(RM) test_epoll
diff --git a/tools/testing/selftests/epoll/test_epoll.c 
b/tools/testing/selftests/epoll/test_epoll.c
new file mode 100644
index 000..cbd8f8b
--- /dev/null
+++ b/tools/testing/selftests/epoll/test_epoll.c
@@ -0,0 +1,344 @@
+/*
+ *  tools/testing/selftests/epoll/test_epoll.c
+ *
+ *  Copyright 2012 Adobe Systems Incorporated
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  Paton J. Lewis pale...@adobe.com
+ *
+ */
+
+#include errno.h
+#include fcntl.h
+#include pthread.h

Re: [PATCH] epoll: Improved support for multi-threaded clients

2012-08-14 Thread Paton J. Lewis

At 8/14/2012 01:21 PM, Christof Meerwald wrote:

Hi Paton,

On Thu, Aug 02, 2012 at 06:37:06PM -0700, Paton J. Lewis wrote:
[...]
> My first concern is about code clarity. Using a custom event to
> delete an event type (either EPOLLIN or EPOLLOUT) from an epoll item
> requires that functionality to be split across two areas of code:
> the code that requests the deletion (via the call to epoll_ctl), and
> the code that responds to it (via epoll_wait).

But don't you have a similar problem in your proposal as well as you
might get an EBUSY when trying to disabling the item - in which case
you would have to do the deletion in the epoll_wait loop.


Good point.


> However, my main concern is about performance. Handling a custom
> event means that each return from epoll_wait requires the responding
> thread to check for possible custom events, which in the case of
> deletion is going to be relatively rare. Thus code which was once
> purely concerned with responding to I/O events must now spend a
> fraction of its time testing for exceptional conditions. In
> addition, handling deletion in this manner now requires a thread or
> context switch.

But in your initial proposal you also had the code checking for
deletion in the epoll_wait loop.


Also true. However, I believe the context switch is always required 
by the custom message passing technique, but will not always happen 
when using the event disabling technique.




> Given the drawbacks listed above, and the kernel design philosophy
> of only implementing what is actually needed, I would argue for
> sticking with the original EPOLL_CTL_DISABLE proposal for now.

I have finally had some chance to play around with your patch a bit
and I really think that you don't want to check for
ep_is_linked(>rdllink) in ep_disable as I don't see that this
would provide any useful semantics with respect to race-conditions.
I.e. consider the point in the epoll_wait loop just after you have
re-enabled to item - in this case ep_disable would (almost certainly)
return EBUSY, but there is no guarantee that epoll_wait will be woken
up on the next iteration.

As I mentioned, I think it would be much more useful to check for
"epi->event.events & ~EP_PRIVATE_BITS" instead which I believe would
provide more useful semantics.


You are correct. Thanks for being patient and persistent here. I 
discovered this problem myself last week during testing, and I am 
planning to post a v2 patch proposal that includes this fix.


I am also working on an epoll self-test as Andrew Morton suggested. 
I'm going to finish that before re-submitting the EPOLL_CTL_DISABLE 
patch to help reduce the possibility that the v2 patch still contains bugs.


Pat



Christof

--

http://cmeerw.org  sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org   xmpp:cmeerw at cmeerw.org


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] epoll: Improved support for multi-threaded clients

2012-08-14 Thread Paton J. Lewis

At 8/14/2012 01:21 PM, Christof Meerwald wrote:

Hi Paton,

On Thu, Aug 02, 2012 at 06:37:06PM -0700, Paton J. Lewis wrote:
[...]
 My first concern is about code clarity. Using a custom event to
 delete an event type (either EPOLLIN or EPOLLOUT) from an epoll item
 requires that functionality to be split across two areas of code:
 the code that requests the deletion (via the call to epoll_ctl), and
 the code that responds to it (via epoll_wait).

But don't you have a similar problem in your proposal as well as you
might get an EBUSY when trying to disabling the item - in which case
you would have to do the deletion in the epoll_wait loop.


Good point.


 However, my main concern is about performance. Handling a custom
 event means that each return from epoll_wait requires the responding
 thread to check for possible custom events, which in the case of
 deletion is going to be relatively rare. Thus code which was once
 purely concerned with responding to I/O events must now spend a
 fraction of its time testing for exceptional conditions. In
 addition, handling deletion in this manner now requires a thread or
 context switch.

But in your initial proposal you also had the code checking for
deletion in the epoll_wait loop.


Also true. However, I believe the context switch is always required 
by the custom message passing technique, but will not always happen 
when using the event disabling technique.




 Given the drawbacks listed above, and the kernel design philosophy
 of only implementing what is actually needed, I would argue for
 sticking with the original EPOLL_CTL_DISABLE proposal for now.

I have finally had some chance to play around with your patch a bit
and I really think that you don't want to check for
ep_is_linked(epi-rdllink) in ep_disable as I don't see that this
would provide any useful semantics with respect to race-conditions.
I.e. consider the point in the epoll_wait loop just after you have
re-enabled to item - in this case ep_disable would (almost certainly)
return EBUSY, but there is no guarantee that epoll_wait will be woken
up on the next iteration.

As I mentioned, I think it would be much more useful to check for
epi-event.events  ~EP_PRIVATE_BITS instead which I believe would
provide more useful semantics.


You are correct. Thanks for being patient and persistent here. I 
discovered this problem myself last week during testing, and I am 
planning to post a v2 patch proposal that includes this fix.


I am also working on an epoll self-test as Andrew Morton suggested. 
I'm going to finish that before re-submitting the EPOLL_CTL_DISABLE 
patch to help reduce the possibility that the v2 patch still contains bugs.


Pat



Christof

--

http://cmeerw.org  sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org   xmpp:cmeerw at cmeerw.org


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] epoll: Improved support for multi-threaded clients

2012-08-02 Thread Paton J. Lewis

Christof,

I notice that Windows (via I/O Completion Ports) and both BSD and 
OS/X (via kqueue) all appear to have support for both of the concepts 
we have been discussing: 1) the ability to disable epoll items, and 
2) the ability to send custom events. This suggests that either 
solution may be acceptable to a broader community.


I implemented and tested the method of using a new custom epoll event 
as a way to signal that an epoll item should be deleted. Although 
this alternative technique works, the approach has several drawbacks 
that I feel make it less desirable than the original proposal.


My first concern is about code clarity. Using a custom event to 
delete an event type (either EPOLLIN or EPOLLOUT) from an epoll item 
requires that functionality to be split across two areas of code: the 
code that requests the deletion (via the call to epoll_ctl), and the 
code that responds to it (via epoll_wait).


However, my main concern is about performance. Handling a custom 
event means that each return from epoll_wait requires the responding 
thread to check for possible custom events, which in the case of 
deletion is going to be relatively rare. Thus code which was once 
purely concerned with responding to I/O events must now spend a 
fraction of its time testing for exceptional conditions. In addition, 
handling deletion in this manner now requires a thread or context switch.


Given the drawbacks listed above, and the kernel design philosophy of 
only implementing what is actually needed, I would argue for sticking 
with the original EPOLL_CTL_DISABLE proposal for now.


Pat

At 6/29/2012 02:43 PM, Paton J. Lewis wrote:

At 6/19/2012 11:17 AM, Christof Meerwald wrote:

Hi Paton,

On Mon, Jun 18, 2012 at 04:24:35PM -0700, Paton J. Lewis wrote:
> We believe that EPOLLONESHOT is required in order to make any
> sensible use of calling epoll_wait on a single epoll set
> concurrently in multiple threads.

I guess we have to disagree here - though it might be more difficult.


> >On Mon, 11 Jun 2012 15:34:49 -0700, Paton Lewis wrote:
> >> This patch introduces the new epoll_ctl command 
EPOLL_CTL_DISABLE, which

> >> disables the associated epoll item and returns -EBUSY if the
> >epoll item is not
> >> currently in the epoll ready queue. This allows multiple 
threads to use a

> >> mutex to determine when it is safe to delete an epoll item and
> >its associated
> >> resources. This allows epoll items to be deleted and closed 
efficiently and

> >> without error.

Maybe I am missing something here (as I am not really familiar with
the kernel internals), but I don't really understand the logic behind
your patch. Isn't the "expected" case that the item is not on the
ready list and no I/O is being processed for that item?


Consider the case where we want to have a set of threads waiting for 
'write' events on a set of pipes or sockets. We have no control over 
when code on the other side of a pipe or socket might write into it, 
and so have no control over when one of the threads calling 
epoll_wait will receive events relative to the timing of the thread 
that is attempting to cancel the pending read operation.


I believe there is no "expected" case, because the probability for 
an item to be on the ready list is a complex function of the number 
of file descriptors being monitored, the frequency at which those 
descriptors receive events, the number of threads calling 
epoll_wait, and the complexity of the code responding to events.


Therefore for some clients of epoll it will be the case that items 
will often be on the ready list, and for others it will not.



So I think instead of checking for the item being on the ready list,
checking for the event mask would make more sense for me, e.g.

  if (!(epi->event.events & ~EP_PRIVATE_BITS))


I think that would be fine. However, the inlined function 
ep_is_linked boils down to a call to the inline function list_empty, 
which is just a single comparison. I haven't compared the 
disassembly, but I would expect the two methods to be roughly 
equivalent in terms of performance. Given that the operation of 
deleting an epoll item is likely to be an exceptional circumstance 
and therefore not performance-critical, would it be better to test 
against ep_is_linked for clarity's sake in the code?


However, I believe these discussions are rendered moot by your 
suggestion below:




But, taking one step back - wouldn't an alternative approach be to add
some mechanism to allow a thread to post a user-event for an fd? So in
delete_epoll_item you would post a user event (e.g. EPOLLUSER) for the
fd which you can then handle in your epoll_wait processing thread -
with no additional synchronisation necessary.


I think this is an excellent suggestion, and in fact your proposal 
is more similar to what Windows provides when solving this problem. 
I'll

Re: [PATCH] epoll: Improved support for multi-threaded clients

2012-08-02 Thread Paton J. Lewis

Christof,

I notice that Windows (via I/O Completion Ports) and both BSD and 
OS/X (via kqueue) all appear to have support for both of the concepts 
we have been discussing: 1) the ability to disable epoll items, and 
2) the ability to send custom events. This suggests that either 
solution may be acceptable to a broader community.


I implemented and tested the method of using a new custom epoll event 
as a way to signal that an epoll item should be deleted. Although 
this alternative technique works, the approach has several drawbacks 
that I feel make it less desirable than the original proposal.


My first concern is about code clarity. Using a custom event to 
delete an event type (either EPOLLIN or EPOLLOUT) from an epoll item 
requires that functionality to be split across two areas of code: the 
code that requests the deletion (via the call to epoll_ctl), and the 
code that responds to it (via epoll_wait).


However, my main concern is about performance. Handling a custom 
event means that each return from epoll_wait requires the responding 
thread to check for possible custom events, which in the case of 
deletion is going to be relatively rare. Thus code which was once 
purely concerned with responding to I/O events must now spend a 
fraction of its time testing for exceptional conditions. In addition, 
handling deletion in this manner now requires a thread or context switch.


Given the drawbacks listed above, and the kernel design philosophy of 
only implementing what is actually needed, I would argue for sticking 
with the original EPOLL_CTL_DISABLE proposal for now.


Pat

At 6/29/2012 02:43 PM, Paton J. Lewis wrote:

At 6/19/2012 11:17 AM, Christof Meerwald wrote:

Hi Paton,

On Mon, Jun 18, 2012 at 04:24:35PM -0700, Paton J. Lewis wrote:
 We believe that EPOLLONESHOT is required in order to make any
 sensible use of calling epoll_wait on a single epoll set
 concurrently in multiple threads.

I guess we have to disagree here - though it might be more difficult.


 On Mon, 11 Jun 2012 15:34:49 -0700, Paton Lewis wrote:
  This patch introduces the new epoll_ctl command 
EPOLL_CTL_DISABLE, which

  disables the associated epoll item and returns -EBUSY if the
 epoll item is not
  currently in the epoll ready queue. This allows multiple 
threads to use a

  mutex to determine when it is safe to delete an epoll item and
 its associated
  resources. This allows epoll items to be deleted and closed 
efficiently and

  without error.

Maybe I am missing something here (as I am not really familiar with
the kernel internals), but I don't really understand the logic behind
your patch. Isn't the expected case that the item is not on the
ready list and no I/O is being processed for that item?


Consider the case where we want to have a set of threads waiting for 
'write' events on a set of pipes or sockets. We have no control over 
when code on the other side of a pipe or socket might write into it, 
and so have no control over when one of the threads calling 
epoll_wait will receive events relative to the timing of the thread 
that is attempting to cancel the pending read operation.


I believe there is no expected case, because the probability for 
an item to be on the ready list is a complex function of the number 
of file descriptors being monitored, the frequency at which those 
descriptors receive events, the number of threads calling 
epoll_wait, and the complexity of the code responding to events.


Therefore for some clients of epoll it will be the case that items 
will often be on the ready list, and for others it will not.



So I think instead of checking for the item being on the ready list,
checking for the event mask would make more sense for me, e.g.

  if (!(epi-event.events  ~EP_PRIVATE_BITS))


I think that would be fine. However, the inlined function 
ep_is_linked boils down to a call to the inline function list_empty, 
which is just a single comparison. I haven't compared the 
disassembly, but I would expect the two methods to be roughly 
equivalent in terms of performance. Given that the operation of 
deleting an epoll item is likely to be an exceptional circumstance 
and therefore not performance-critical, would it be better to test 
against ep_is_linked for clarity's sake in the code?


However, I believe these discussions are rendered moot by your 
suggestion below:




But, taking one step back - wouldn't an alternative approach be to add
some mechanism to allow a thread to post a user-event for an fd? So in
delete_epoll_item you would post a user event (e.g. EPOLLUSER) for the
fd which you can then handle in your epoll_wait processing thread -
with no additional synchronisation necessary.


I think this is an excellent suggestion, and in fact your proposal 
is more similar to what Windows provides when solving this problem. 
I'll test this idea out with our code and get back to you. Is there 
an existing kernel technique that you would recommend for posting a 
user event