Re: [tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication
> -Original Message- > From: Parthasarathy Bhuvaragan > Sent: Friday, 11 November, 2016 06:56 > To: Jon Maloy ; tipc-discussion@lists.sourceforge.net; > Ying Xue > Cc: ma...@donjonn.com; thompa@gmail.com > Subject: Re: [PATCH net-next v2 0/4] tipc: introduce multicast through > replication > > On 11/10/2016 05:08 PM, Jon Maloy wrote: > > > > > >> -Original Message- > >> From: Parthasarathy Bhuvaragan > >> Sent: Thursday, 10 November, 2016 10:50 > >> To: Jon Maloy ; tipc- > discuss...@lists.sourceforge.net; > >> Ying Xue > >> Cc: ma...@donjonn.com; thompa@gmail.com > >> Subject: Re: [PATCH net-next v2 0/4] tipc: introduce multicast through > replication > >> > >> On 10/27/2016 04:35 PM, Jon Maloy wrote: > >>> TIPC multicast messages are currently distributed via L2 broadcast > >>> or IP multicast to all nodes in the cluster, irrespective of the > >>> number of real destinations of the message. > >>> > >>> In this series we introduce an option to transport messages via > >>> replication ("replicast") across a selected number of unicast links, > >>> instead of relying on the underlying media. This option is used when > >>> true broadcast/multicast is not supported by the media, or when the > >>> number of true destinations is much smaller than the cluster size. > >>> > >>> v2: -Fixed a counter bug when removing nodes from destination node list > >>> - Moved definition of node destination list from to bcast.{h,c} > >>> > >>> Jon Maloy (4): > >>> tipc: add function for checking broadcast support in bearer > >>> tipc: add functionality to lookup multicast destination nodes > >>> tipc: introduce replicast as transport option for multicast > >>> tipc: make replicast a user selectable option > >>> > >>> include/uapi/linux/tipc.h | 6 +- > >>> net/tipc/bcast.c | 245 > >> +- > >>> net/tipc/bcast.h | 40 +++- > >>> net/tipc/bearer.c | 15 ++- > >>> net/tipc/bearer.h | 6 ++ > >>> net/tipc/link.c | 12 ++- > >>> net/tipc/msg.c| 17 > >>> net/tipc/msg.h| 2 + > >>> net/tipc/name_table.c | 33 +++ > >>> net/tipc/name_table.h | 4 + > >>> net/tipc/node.c | 27 +++-- > >>> net/tipc/node.h | 4 +- > >>> net/tipc/socket.c | 89 ++--- > >>> net/tipc/udp_media.c | 8 +- > >>> 14 files changed, 424 insertions(+), 84 deletions(-) > >>> > >> [partha] > >> I have a general concern that this design might not work for > >> non-blocking sockets or blocking socket which set MSG_DONTWAIT flag. > >> > >> Consider that the user is using replicasting to 4 peers. > >> For ex, in tipc_rcast_xmit() we manage to xmit to the first two peers > >> successfully but the next peer(3) fails due to link congestion. Since > >> this is a non blocking call we return EAGAIN to user. > >> The subsequent retry from user will re-deliver the same message to the > >> first two peers. > >> > >> The checks for the congestion are now based on the limits on unicast > >> links. We will get into the above situation easily as the traffic > >> pattern on all links are not the same. > >> > >> I think you will have a solution to this as always :-). > >> [/partha] > > > > The solution is already there. This is why I have an "unsent" and a "sent" > > queue > in struct tipc_nlist. When a message has been successfully sent to a node (or > when an error code other than --ELINKCONG is returned) the corresponding > node item is moved from the "unsent" to the "sent" queue, and will be > disregarded at next send attempt. But I now see that I have done a stupid > mistake during the last iteration of this code; I purge the destination list > before > returning to the user, even when returning -EAGAIN. I'll fix that. > > > > You may also wonder why I have the two queues in struct tipc_nlist, instead > > of > just deleting the items for sent nodes. This is because this list will be > reused > across different sending sessions in later commits. > > > > ///jon > > > > > [partha] > Jon, the proposal works only for blocking sockets with default send > timeout (MAX_SCHEDULE_TIMEOUT). When the socket transmits to a peer its > moved from unsent to sent and put to sleep if link is congested. Once > the socket receives the wakeup message (congestion ceases), it > continues from the list of unsent peers until the message is sent to all. > > Now, consider for the following three socket variants: > a) blocking sockets with a fixed send timeout (say 100ms) > b) blocking sockets which set MSG_DONTWAIT flag > c) non-blocking sockets > > The struct tipc_nlist is stored in the stack of tipc_sendmcast(), > implying that its stateless between subsequent calls for the same > socket. This leads to the following issue: You are right. I realized that already when I had sent my response yesterday. One obvious solution would be to allocate the list on the
Re: [tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication
On 11/10/2016 05:08 PM, Jon Maloy wrote: > > >> -Original Message- >> From: Parthasarathy Bhuvaragan >> Sent: Thursday, 10 November, 2016 10:50 >> To: Jon Maloy ; >> tipc-discussion@lists.sourceforge.net; >> Ying Xue >> Cc: ma...@donjonn.com; thompa@gmail.com >> Subject: Re: [PATCH net-next v2 0/4] tipc: introduce multicast through >> replication >> >> On 10/27/2016 04:35 PM, Jon Maloy wrote: >>> TIPC multicast messages are currently distributed via L2 broadcast >>> or IP multicast to all nodes in the cluster, irrespective of the >>> number of real destinations of the message. >>> >>> In this series we introduce an option to transport messages via >>> replication ("replicast") across a selected number of unicast links, >>> instead of relying on the underlying media. This option is used when >>> true broadcast/multicast is not supported by the media, or when the >>> number of true destinations is much smaller than the cluster size. >>> >>> v2: -Fixed a counter bug when removing nodes from destination node list >>> - Moved definition of node destination list from to bcast.{h,c} >>> >>> Jon Maloy (4): >>> tipc: add function for checking broadcast support in bearer >>> tipc: add functionality to lookup multicast destination nodes >>> tipc: introduce replicast as transport option for multicast >>> tipc: make replicast a user selectable option >>> >>> include/uapi/linux/tipc.h | 6 +- >>> net/tipc/bcast.c | 245 >> +- >>> net/tipc/bcast.h | 40 +++- >>> net/tipc/bearer.c | 15 ++- >>> net/tipc/bearer.h | 6 ++ >>> net/tipc/link.c | 12 ++- >>> net/tipc/msg.c| 17 >>> net/tipc/msg.h| 2 + >>> net/tipc/name_table.c | 33 +++ >>> net/tipc/name_table.h | 4 + >>> net/tipc/node.c | 27 +++-- >>> net/tipc/node.h | 4 +- >>> net/tipc/socket.c | 89 ++--- >>> net/tipc/udp_media.c | 8 +- >>> 14 files changed, 424 insertions(+), 84 deletions(-) >>> >> [partha] >> I have a general concern that this design might not work for >> non-blocking sockets or blocking socket which set MSG_DONTWAIT flag. >> >> Consider that the user is using replicasting to 4 peers. >> For ex, in tipc_rcast_xmit() we manage to xmit to the first two peers >> successfully but the next peer(3) fails due to link congestion. Since >> this is a non blocking call we return EAGAIN to user. >> The subsequent retry from user will re-deliver the same message to the >> first two peers. >> >> The checks for the congestion are now based on the limits on unicast >> links. We will get into the above situation easily as the traffic >> pattern on all links are not the same. >> >> I think you will have a solution to this as always :-). >> [/partha] > > The solution is already there. This is why I have an "unsent" and a "sent" > queue in struct tipc_nlist. When a message has been successfully sent to a > node (or when an error code other than --ELINKCONG is returned) the > corresponding node item is moved from the "unsent" to the "sent" queue, and > will be disregarded at next send attempt. But I now see that I have done a > stupid mistake during the last iteration of this code; I purge the > destination list before returning to the user, even when returning -EAGAIN. > I'll fix that. > > You may also wonder why I have the two queues in struct tipc_nlist, instead > of just deleting the items for sent nodes. This is because this list will be > reused across different sending sessions in later commits. > > ///jon > > [partha] Jon, the proposal works only for blocking sockets with default send timeout (MAX_SCHEDULE_TIMEOUT). When the socket transmits to a peer its moved from unsent to sent and put to sleep if link is congested. Once the socket receives the wakeup message (congestion ceases), it continues from the list of unsent peers until the message is sent to all. Now, consider for the following three socket variants: a) blocking sockets with a fixed send timeout (say 100ms) b) blocking sockets which set MSG_DONTWAIT flag c) non-blocking sockets The struct tipc_nlist is stored in the stack of tipc_sendmcast(), implying that its stateless between subsequent calls for the same socket. This leads to the following issue: 1. In tipc_sendmcast(), when we send the message to partial recipients and experience link congestion, we try to sleep. However, the socket send timeout value is so small that we might return from tipc_sendmcast() before we get thewakeup message. Thus we managed to send only to a subset of the recipients. The replicast state tipc_nlis is destroyed as we exit tipc_sendmcast(). 2. The application receives an EAGAIN and re-tries with the same message. This time tipc_sendmcast() is successful and the message is sent to all the replicast recipients. Now some of the recipients receive the same message o
Re: [tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication
> -Original Message- > From: Parthasarathy Bhuvaragan > Sent: Thursday, 10 November, 2016 10:50 > To: Jon Maloy ; tipc-discussion@lists.sourceforge.net; > Ying Xue > Cc: ma...@donjonn.com; thompa@gmail.com > Subject: Re: [PATCH net-next v2 0/4] tipc: introduce multicast through > replication > > On 10/27/2016 04:35 PM, Jon Maloy wrote: > > TIPC multicast messages are currently distributed via L2 broadcast > > or IP multicast to all nodes in the cluster, irrespective of the > > number of real destinations of the message. > > > > In this series we introduce an option to transport messages via > > replication ("replicast") across a selected number of unicast links, > > instead of relying on the underlying media. This option is used when > > true broadcast/multicast is not supported by the media, or when the > > number of true destinations is much smaller than the cluster size. > > > > v2: -Fixed a counter bug when removing nodes from destination node list > > - Moved definition of node destination list from to bcast.{h,c} > > > > Jon Maloy (4): > > tipc: add function for checking broadcast support in bearer > > tipc: add functionality to lookup multicast destination nodes > > tipc: introduce replicast as transport option for multicast > > tipc: make replicast a user selectable option > > > > include/uapi/linux/tipc.h | 6 +- > > net/tipc/bcast.c | 245 > +- > > net/tipc/bcast.h | 40 +++- > > net/tipc/bearer.c | 15 ++- > > net/tipc/bearer.h | 6 ++ > > net/tipc/link.c | 12 ++- > > net/tipc/msg.c| 17 > > net/tipc/msg.h| 2 + > > net/tipc/name_table.c | 33 +++ > > net/tipc/name_table.h | 4 + > > net/tipc/node.c | 27 +++-- > > net/tipc/node.h | 4 +- > > net/tipc/socket.c | 89 ++--- > > net/tipc/udp_media.c | 8 +- > > 14 files changed, 424 insertions(+), 84 deletions(-) > > > [partha] > I have a general concern that this design might not work for > non-blocking sockets or blocking socket which set MSG_DONTWAIT flag. > > Consider that the user is using replicasting to 4 peers. > For ex, in tipc_rcast_xmit() we manage to xmit to the first two peers > successfully but the next peer(3) fails due to link congestion. Since > this is a non blocking call we return EAGAIN to user. > The subsequent retry from user will re-deliver the same message to the > first two peers. > > The checks for the congestion are now based on the limits on unicast > links. We will get into the above situation easily as the traffic > pattern on all links are not the same. > > I think you will have a solution to this as always :-). > [/partha] The solution is already there. This is why I have an "unsent" and a "sent" queue in struct tipc_nlist. When a message has been successfully sent to a node (or when an error code other than --ELINKCONG is returned) the corresponding node item is moved from the "unsent" to the "sent" queue, and will be disregarded at next send attempt. But I now see that I have done a stupid mistake during the last iteration of this code; I purge the destination list before returning to the user, even when returning -EAGAIN. I'll fix that. You may also wonder why I have the two queues in struct tipc_nlist, instead of just deleting the items for sent nodes. This is because this list will be reused across different sending sessions in later commits. ///jon -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
Re: [tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication
On 10/27/2016 04:35 PM, Jon Maloy wrote: > TIPC multicast messages are currently distributed via L2 broadcast > or IP multicast to all nodes in the cluster, irrespective of the > number of real destinations of the message. > > In this series we introduce an option to transport messages via > replication ("replicast") across a selected number of unicast links, > instead of relying on the underlying media. This option is used when > true broadcast/multicast is not supported by the media, or when the > number of true destinations is much smaller than the cluster size. > > v2: -Fixed a counter bug when removing nodes from destination node list > - Moved definition of node destination list from to bcast.{h,c} > > Jon Maloy (4): > tipc: add function for checking broadcast support in bearer > tipc: add functionality to lookup multicast destination nodes > tipc: introduce replicast as transport option for multicast > tipc: make replicast a user selectable option > > include/uapi/linux/tipc.h | 6 +- > net/tipc/bcast.c | 245 > +- > net/tipc/bcast.h | 40 +++- > net/tipc/bearer.c | 15 ++- > net/tipc/bearer.h | 6 ++ > net/tipc/link.c | 12 ++- > net/tipc/msg.c| 17 > net/tipc/msg.h| 2 + > net/tipc/name_table.c | 33 +++ > net/tipc/name_table.h | 4 + > net/tipc/node.c | 27 +++-- > net/tipc/node.h | 4 +- > net/tipc/socket.c | 89 ++--- > net/tipc/udp_media.c | 8 +- > 14 files changed, 424 insertions(+), 84 deletions(-) > [partha] I have a general concern that this design might not work for non-blocking sockets or blocking socket which set MSG_DONTWAIT flag. Consider that the user is using replicasting to 4 peers. For ex, in tipc_rcast_xmit() we manage to xmit to the first two peers successfully but the next peer(3) fails due to link congestion. Since this is a non blocking call we return EAGAIN to user. The subsequent retry from user will re-deliver the same message to the first two peers. The checks for the congestion are now based on the limits on unicast links. We will get into the above situation easily as the traffic pattern on all links are not the same. I think you will have a solution to this as always :-). [/partha] -- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion
[tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication
TIPC multicast messages are currently distributed via L2 broadcast or IP multicast to all nodes in the cluster, irrespective of the number of real destinations of the message. In this series we introduce an option to transport messages via replication ("replicast") across a selected number of unicast links, instead of relying on the underlying media. This option is used when true broadcast/multicast is not supported by the media, or when the number of true destinations is much smaller than the cluster size. v2: -Fixed a counter bug when removing nodes from destination node list - Moved definition of node destination list from to bcast.{h,c} Jon Maloy (4): tipc: add function for checking broadcast support in bearer tipc: add functionality to lookup multicast destination nodes tipc: introduce replicast as transport option for multicast tipc: make replicast a user selectable option include/uapi/linux/tipc.h | 6 +- net/tipc/bcast.c | 245 +- net/tipc/bcast.h | 40 +++- net/tipc/bearer.c | 15 ++- net/tipc/bearer.h | 6 ++ net/tipc/link.c | 12 ++- net/tipc/msg.c| 17 net/tipc/msg.h| 2 + net/tipc/name_table.c | 33 +++ net/tipc/name_table.h | 4 + net/tipc/node.c | 27 +++-- net/tipc/node.h | 4 +- net/tipc/socket.c | 89 ++--- net/tipc/udp_media.c | 8 +- 14 files changed, 424 insertions(+), 84 deletions(-) -- 2.7.4 -- The Command Line: Reinvented for Modern Developers Did the resurgence of CLI tooling catch you by surprise? Reconnect with the command line and become more productive. Learn the new .NET and ASP.NET CLI. Get your free copy! http://sdm.link/telerik ___ tipc-discussion mailing list tipc-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/tipc-discussion