Re: [tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication

2016-11-11 Thread Jon Maloy


> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Friday, 11 November, 2016 06:56
> To: Jon Maloy ; tipc-discussion@lists.sourceforge.net;
> Ying Xue 
> Cc: ma...@donjonn.com; thompa@gmail.com
> Subject: Re: [PATCH net-next v2 0/4] tipc: introduce multicast through 
> replication
> 
> On 11/10/2016 05:08 PM, Jon Maloy wrote:
> >
> >
> >> -Original Message-
> >> From: Parthasarathy Bhuvaragan
> >> Sent: Thursday, 10 November, 2016 10:50
> >> To: Jon Maloy ; tipc-
> discuss...@lists.sourceforge.net;
> >> Ying Xue 
> >> Cc: ma...@donjonn.com; thompa@gmail.com
> >> Subject: Re: [PATCH net-next v2 0/4] tipc: introduce multicast through
> replication
> >>
> >> On 10/27/2016 04:35 PM, Jon Maloy wrote:
> >>> TIPC multicast messages are currently distributed via L2 broadcast
> >>> or IP multicast to all nodes in the cluster, irrespective of the
> >>> number of real destinations of the message.
> >>>
> >>> In this series we introduce an option to transport messages via
> >>> replication ("replicast") across a selected number of unicast links,
> >>> instead of relying on the underlying media. This option is used when
> >>> true broadcast/multicast is not supported by the media, or when the
> >>> number of true destinations is much smaller than the cluster size.
> >>>
> >>> v2: -Fixed a counter bug when removing nodes from destination node list
> >>> - Moved definition of node destination list from to bcast.{h,c}
> >>>
> >>> Jon Maloy (4):
> >>>   tipc: add function for checking broadcast support in bearer
> >>>   tipc: add functionality to lookup multicast destination nodes
> >>>   tipc: introduce replicast as transport option for multicast
> >>>   tipc: make replicast a user selectable option
> >>>
> >>>  include/uapi/linux/tipc.h |   6 +-
> >>>  net/tipc/bcast.c  | 245
> >> +-
> >>>  net/tipc/bcast.h  |  40 +++-
> >>>  net/tipc/bearer.c |  15 ++-
> >>>  net/tipc/bearer.h |   6 ++
> >>>  net/tipc/link.c   |  12 ++-
> >>>  net/tipc/msg.c|  17 
> >>>  net/tipc/msg.h|   2 +
> >>>  net/tipc/name_table.c |  33 +++
> >>>  net/tipc/name_table.h |   4 +
> >>>  net/tipc/node.c   |  27 +++--
> >>>  net/tipc/node.h   |   4 +-
> >>>  net/tipc/socket.c |  89 ++---
> >>>  net/tipc/udp_media.c  |   8 +-
> >>>  14 files changed, 424 insertions(+), 84 deletions(-)
> >>>
> >> [partha]
> >> I have a general concern that this design might not work for
> >> non-blocking sockets or blocking socket which set MSG_DONTWAIT flag.
> >>
> >> Consider that the user is using replicasting to 4 peers.
> >> For ex, in tipc_rcast_xmit() we manage to xmit to the first two peers
> >> successfully but the next peer(3) fails due to link congestion. Since
> >> this is a non blocking call we return EAGAIN to user.
> >> The subsequent retry from user will re-deliver the same message to the
> >> first two peers.
> >>
> >> The checks for the congestion are now based on the limits on unicast
> >> links. We will get into the above situation easily as the traffic
> >> pattern on all links are not the same.
> >>
> >> I think you will have a solution to this as always :-).
> >> [/partha]
> >
> > The solution is already there. This is why I have an "unsent" and a "sent" 
> > queue
> in struct tipc_nlist. When a message has been successfully sent to a node (or
> when an error code other than --ELINKCONG is returned) the corresponding
> node item is moved from the "unsent" to the "sent" queue, and will be
> disregarded at next send attempt. But I now see that I have done a stupid
> mistake during the last iteration of this code; I purge the destination list 
> before
> returning to the user, even when returning -EAGAIN. I'll fix that.
> >
> > You may also wonder why I have the two queues in struct tipc_nlist, instead 
> > of
> just deleting the items for sent nodes. This is because this list will be 
> reused
> across different sending sessions in later commits.
> >
> > ///jon
> >
> >
> [partha]
> Jon, the proposal works only for blocking sockets with default send
> timeout (MAX_SCHEDULE_TIMEOUT). When the socket transmits to a peer its
> moved from unsent to sent and put to sleep if link is congested. Once
> the socket receives the  wakeup message (congestion ceases), it
> continues from the list of unsent peers until the message is sent to all.
> 
> Now, consider for the following three socket variants:
> a) blocking sockets with a fixed send timeout (say 100ms)
> b) blocking sockets which set MSG_DONTWAIT flag
> c) non-blocking sockets
> 
> The struct tipc_nlist is stored in the stack of tipc_sendmcast(),
> implying that its stateless between subsequent calls for the same
> socket. This leads to the following issue:

You are right. I realized that already when I had sent my response yesterday.  
One obvious solution would be to allocate the list on the 

Re: [tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication

2016-11-11 Thread Parthasarathy Bhuvaragan
On 11/10/2016 05:08 PM, Jon Maloy wrote:
>
>
>> -Original Message-
>> From: Parthasarathy Bhuvaragan
>> Sent: Thursday, 10 November, 2016 10:50
>> To: Jon Maloy ; 
>> tipc-discussion@lists.sourceforge.net;
>> Ying Xue 
>> Cc: ma...@donjonn.com; thompa@gmail.com
>> Subject: Re: [PATCH net-next v2 0/4] tipc: introduce multicast through 
>> replication
>>
>> On 10/27/2016 04:35 PM, Jon Maloy wrote:
>>> TIPC multicast messages are currently distributed via L2 broadcast
>>> or IP multicast to all nodes in the cluster, irrespective of the
>>> number of real destinations of the message.
>>>
>>> In this series we introduce an option to transport messages via
>>> replication ("replicast") across a selected number of unicast links,
>>> instead of relying on the underlying media. This option is used when
>>> true broadcast/multicast is not supported by the media, or when the
>>> number of true destinations is much smaller than the cluster size.
>>>
>>> v2: -Fixed a counter bug when removing nodes from destination node list
>>> - Moved definition of node destination list from to bcast.{h,c}
>>>
>>> Jon Maloy (4):
>>>   tipc: add function for checking broadcast support in bearer
>>>   tipc: add functionality to lookup multicast destination nodes
>>>   tipc: introduce replicast as transport option for multicast
>>>   tipc: make replicast a user selectable option
>>>
>>>  include/uapi/linux/tipc.h |   6 +-
>>>  net/tipc/bcast.c  | 245
>> +-
>>>  net/tipc/bcast.h  |  40 +++-
>>>  net/tipc/bearer.c |  15 ++-
>>>  net/tipc/bearer.h |   6 ++
>>>  net/tipc/link.c   |  12 ++-
>>>  net/tipc/msg.c|  17 
>>>  net/tipc/msg.h|   2 +
>>>  net/tipc/name_table.c |  33 +++
>>>  net/tipc/name_table.h |   4 +
>>>  net/tipc/node.c   |  27 +++--
>>>  net/tipc/node.h   |   4 +-
>>>  net/tipc/socket.c |  89 ++---
>>>  net/tipc/udp_media.c  |   8 +-
>>>  14 files changed, 424 insertions(+), 84 deletions(-)
>>>
>> [partha]
>> I have a general concern that this design might not work for
>> non-blocking sockets or blocking socket which set MSG_DONTWAIT flag.
>>
>> Consider that the user is using replicasting to 4 peers.
>> For ex, in tipc_rcast_xmit() we manage to xmit to the first two peers
>> successfully but the next peer(3) fails due to link congestion. Since
>> this is a non blocking call we return EAGAIN to user.
>> The subsequent retry from user will re-deliver the same message to the
>> first two peers.
>>
>> The checks for the congestion are now based on the limits on unicast
>> links. We will get into the above situation easily as the traffic
>> pattern on all links are not the same.
>>
>> I think you will have a solution to this as always :-).
>> [/partha]
>
> The solution is already there. This is why I have an "unsent" and a "sent" 
> queue in struct tipc_nlist. When a message has been successfully sent to a 
> node (or when an error code other than --ELINKCONG is returned) the 
> corresponding node item is moved from the "unsent" to the "sent" queue, and 
> will be disregarded at next send attempt. But I now see that I have done a 
> stupid mistake during the last iteration of this code; I purge the 
> destination list before returning to the user, even when returning -EAGAIN. 
> I'll fix that.
>
> You may also wonder why I have the two queues in struct tipc_nlist, instead 
> of just deleting the items for sent nodes. This is because this list will be 
> reused across different sending sessions in later commits.
>
> ///jon
>
>
[partha]
Jon, the proposal works only for blocking sockets with default send 
timeout (MAX_SCHEDULE_TIMEOUT). When the socket transmits to a peer its 
moved from unsent to sent and put to sleep if link is congested. Once 
the socket receives the  wakeup message (congestion ceases), it 
continues from the list of unsent peers until the message is sent to all.

Now, consider for the following three socket variants:
a) blocking sockets with a fixed send timeout (say 100ms)
b) blocking sockets which set MSG_DONTWAIT flag
c) non-blocking sockets

The struct tipc_nlist is stored in the stack of tipc_sendmcast(), 
implying that its stateless between subsequent calls for the same 
socket. This leads to the following issue:

1. In tipc_sendmcast(), when we send the message to partial recipients 
and experience link congestion, we try to sleep. However, the socket 
send timeout value is so small that we might return from 
tipc_sendmcast() before we get thewakeup message. Thus we 
managed to send only to a subset of the recipients.
The replicast state tipc_nlis is destroyed as we exit tipc_sendmcast().

2. The application receives an EAGAIN and re-tries with the same 
message. This time tipc_sendmcast() is successful and the message is 
sent to all the replicast recipients.

Now some of the recipients receive the same message o

Re: [tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication

2016-11-10 Thread Jon Maloy


> -Original Message-
> From: Parthasarathy Bhuvaragan
> Sent: Thursday, 10 November, 2016 10:50
> To: Jon Maloy ; tipc-discussion@lists.sourceforge.net;
> Ying Xue 
> Cc: ma...@donjonn.com; thompa@gmail.com
> Subject: Re: [PATCH net-next v2 0/4] tipc: introduce multicast through 
> replication
> 
> On 10/27/2016 04:35 PM, Jon Maloy wrote:
> > TIPC multicast messages are currently distributed via L2 broadcast
> > or IP multicast to all nodes in the cluster, irrespective of the
> > number of real destinations of the message.
> >
> > In this series we introduce an option to transport messages via
> > replication ("replicast") across a selected number of unicast links,
> > instead of relying on the underlying media. This option is used when
> > true broadcast/multicast is not supported by the media, or when the
> > number of true destinations is much smaller than the cluster size.
> >
> > v2: -Fixed a counter bug when removing nodes from destination node list
> > - Moved definition of node destination list from to bcast.{h,c}
> >
> > Jon Maloy (4):
> >   tipc: add function for checking broadcast support in bearer
> >   tipc: add functionality to lookup multicast destination nodes
> >   tipc: introduce replicast as transport option for multicast
> >   tipc: make replicast a user selectable option
> >
> >  include/uapi/linux/tipc.h |   6 +-
> >  net/tipc/bcast.c  | 245
> +-
> >  net/tipc/bcast.h  |  40 +++-
> >  net/tipc/bearer.c |  15 ++-
> >  net/tipc/bearer.h |   6 ++
> >  net/tipc/link.c   |  12 ++-
> >  net/tipc/msg.c|  17 
> >  net/tipc/msg.h|   2 +
> >  net/tipc/name_table.c |  33 +++
> >  net/tipc/name_table.h |   4 +
> >  net/tipc/node.c   |  27 +++--
> >  net/tipc/node.h   |   4 +-
> >  net/tipc/socket.c |  89 ++---
> >  net/tipc/udp_media.c  |   8 +-
> >  14 files changed, 424 insertions(+), 84 deletions(-)
> >
> [partha]
> I have a general concern that this design might not work for
> non-blocking sockets or blocking socket which set MSG_DONTWAIT flag.
> 
> Consider that the user is using replicasting to 4 peers.
> For ex, in tipc_rcast_xmit() we manage to xmit to the first two peers
> successfully but the next peer(3) fails due to link congestion. Since
> this is a non blocking call we return EAGAIN to user.
> The subsequent retry from user will re-deliver the same message to the
> first two peers.
> 
> The checks for the congestion are now based on the limits on unicast
> links. We will get into the above situation easily as the traffic
> pattern on all links are not the same.
> 
> I think you will have a solution to this as always :-).
> [/partha]

The solution is already there. This is why I have an "unsent" and a "sent" 
queue in struct tipc_nlist. When a message has been successfully sent to a node 
(or when an error code other than --ELINKCONG is returned) the corresponding 
node item is moved from the "unsent" to the "sent" queue, and will be 
disregarded at next send attempt. But I now see that I have done a stupid 
mistake during the last iteration of this code; I purge the destination list 
before returning to the user, even when returning -EAGAIN. I'll fix that.

You may also wonder why I have the two queues in struct tipc_nlist, instead of 
just deleting the items for sent nodes. This is because this list will be 
reused across different sending sessions in later commits.

///jon



--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


Re: [tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication

2016-11-10 Thread Parthasarathy Bhuvaragan
On 10/27/2016 04:35 PM, Jon Maloy wrote:
> TIPC multicast messages are currently distributed via L2 broadcast
> or IP multicast to all nodes in the cluster, irrespective of the
> number of real destinations of the message.
>
> In this series we introduce an option to transport messages via
> replication ("replicast") across a selected number of unicast links,
> instead of relying on the underlying media. This option is used when
> true broadcast/multicast is not supported by the media, or when the
> number of true destinations is much smaller than the cluster size.
>
> v2: -Fixed a counter bug when removing nodes from destination node list
> - Moved definition of node destination list from to bcast.{h,c}
>
> Jon Maloy (4):
>   tipc: add function for checking broadcast support in bearer
>   tipc: add functionality to lookup multicast destination nodes
>   tipc: introduce replicast as transport option for multicast
>   tipc: make replicast a user selectable option
>
>  include/uapi/linux/tipc.h |   6 +-
>  net/tipc/bcast.c  | 245 
> +-
>  net/tipc/bcast.h  |  40 +++-
>  net/tipc/bearer.c |  15 ++-
>  net/tipc/bearer.h |   6 ++
>  net/tipc/link.c   |  12 ++-
>  net/tipc/msg.c|  17 
>  net/tipc/msg.h|   2 +
>  net/tipc/name_table.c |  33 +++
>  net/tipc/name_table.h |   4 +
>  net/tipc/node.c   |  27 +++--
>  net/tipc/node.h   |   4 +-
>  net/tipc/socket.c |  89 ++---
>  net/tipc/udp_media.c  |   8 +-
>  14 files changed, 424 insertions(+), 84 deletions(-)
>
[partha]
I have a general concern that this design might not work for 
non-blocking sockets or blocking socket which set MSG_DONTWAIT flag.

Consider that the user is using replicasting to 4 peers.
For ex, in tipc_rcast_xmit() we manage to xmit to the first two peers 
successfully but the next peer(3) fails due to link congestion. Since 
this is a non blocking call we return EAGAIN to user.
The subsequent retry from user will re-deliver the same message to the 
first two peers.

The checks for the congestion are now based on the limits on unicast 
links. We will get into the above situation easily as the traffic 
pattern on all links are not the same.

I think you will have a solution to this as always :-).
[/partha]


--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion


[tipc-discussion] [PATCH net-next v2 0/4] tipc: introduce multicast through replication

2016-10-27 Thread Jon Maloy
TIPC multicast messages are currently distributed via L2 broadcast
or IP multicast to all nodes in the cluster, irrespective of the 
number of real destinations of the message.

In this series we introduce an option to transport messages via
replication ("replicast") across a selected number of unicast links,
instead of relying on the underlying media. This option is used when
true broadcast/multicast is not supported by the media, or when the
number of true destinations is much smaller than the cluster size.

v2: -Fixed a counter bug when removing nodes from destination node list
- Moved definition of node destination list from to bcast.{h,c}

Jon Maloy (4):
  tipc: add function for checking broadcast support in bearer
  tipc: add functionality to lookup multicast destination nodes
  tipc: introduce replicast as transport option for multicast
  tipc: make replicast a user selectable option

 include/uapi/linux/tipc.h |   6 +-
 net/tipc/bcast.c  | 245 +-
 net/tipc/bcast.h  |  40 +++-
 net/tipc/bearer.c |  15 ++-
 net/tipc/bearer.h |   6 ++
 net/tipc/link.c   |  12 ++-
 net/tipc/msg.c|  17 
 net/tipc/msg.h|   2 +
 net/tipc/name_table.c |  33 +++
 net/tipc/name_table.h |   4 +
 net/tipc/node.c   |  27 +++--
 net/tipc/node.h   |   4 +-
 net/tipc/socket.c |  89 ++---
 net/tipc/udp_media.c  |   8 +-
 14 files changed, 424 insertions(+), 84 deletions(-)

-- 
2.7.4


--
The Command Line: Reinvented for Modern Developers
Did the resurgence of CLI tooling catch you by surprise?
Reconnect with the command line and become more productive. 
Learn the new .NET and ASP.NET CLI. Get your free copy!
http://sdm.link/telerik
___
tipc-discussion mailing list
tipc-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tipc-discussion