Re: [lng-odp] [RFC] Add ipc.h

Maxim Uvarov Thu, 21 May 2015 08:47:42 -0700

>From the rfc 3549 netlink looks like good protocol to communicate between
data plane and control plane. And messages are defined by that protocol
also. At least we should do something the same.


Maxim.

On 21 May 2015 at 17:46, Ola Liljedahl <ola.liljed...@linaro.org> wrote:

> On 21 May 2015 at 15:56, Alexandru Badicioiu <
> alexandru.badici...@linaro.org> wrote:
>
>> I got the impression that ODP MBUS API would define a transport
>> protocol/API between an ODP
>>
> No the MBUS API is just an API for message passing (think of the OSE IPC
> API) and doesn't specify use cases or content. Just like the ODP packet API
> doesn't specify what the content in a packet means or the format of the
> content.
>
>
>> application and a control plane application, like TCP is the transport
>> protocol for HTTP applications (e.g Web). Netlink defines exactly that -
>> transport protocol for configuration messages.
>> Maxim asked about the messages - should applications define the message
>> format and/or the message content? Wouldn't be an easier task for the
>> application to define only the content and let ODP to define a format?
>>
> How can you define a format when you don't know what the messages are used
> for and what data needs to be transferred? Why should the MBUS API or
> implementations care about the message format? It's just payload and none
> of their business.
>
> If you want to, you can specify formats for specific purposes, e.g. reuse
> Netlink formats for the functions that Netlink supports. Some ODP
> applications may use this, other not (because they use some other protocol
> or they implement some other functionality).
>
>
>
>> Reliability could be an issue but Netlink spec says how applications can
>> create reliable protocols:
>>
>>
>> One could create a reliable protocol between an FEC and a CPC by
>>    using the combination of sequence numbers, ACKs, and retransmit
>>    timers.  Both sequence numbers and ACKs are provided by Netlink;
>>    timers are provided by Linux.
>>
>> And you could do the same in ODP but I prefer not to, this adds a level
> of complexity to the application code I do not want. Perhaps the actual
> MBUS implementation has to do this but then hidden from the applications.
> Just like TCP reliability and ordering etc is hidden from the applications
> that just do read and write.
>
>    One could create a heartbeat protocol between the FEC and CPC by
>>    using the ECHO flags and the NLMSG_NOOP message.
>>
>>
>>
>>
>>
>>
>>
>> On 21 May 2015 at 16:23, Ola Liljedahl <ola.liljed...@linaro.org> wrote:
>>
>>> On 21 May 2015 at 15:05, Alexandru Badicioiu <
>>> alexandru.badici...@linaro.org> wrote:
>>>
>>>> I was referring to the  Netlink protocol in itself, as a model for ODP
>>>> MBUS (or IPC).
>>>>
>>> Isn't the Netlink protocol what the endpoints send between them? This is
>>> not specified by the ODP IPC/MBUS API, applications can define or re-use
>>> whatever protocol they like. The protocol definition is heavily dependent
>>> on what you actually use the IPC for and we shouldn't force ODP users to
>>> use some specific predefined protocol.
>>>
>>> Also the "wire protocol" is left undefined, this is up to the
>>> implementation to define and each platform can have its own definition.
>>>
>>> And netlink isn't even reliable. I know that that creates problems, e.g.
>>> impossible to get a clean and complete snapshot of e.g. the routing table.
>>>
>>>
>>>> The interaction between the FEC and the CPC, in the Netlink context,
>>>>    defines a protocol.  Netlink provides mechanisms for the CPC
>>>>    (residing in user space) and the FEC (residing in kernel space) to
>>>>    have their own protocol definition -- *kernel space and user space
>>>>    just mean different protection domains*.  Therefore, a wire protocol
>>>>    is needed to communicate.  The wire protocol is normally provided by
>>>>    some privileged service that is able to copy between multiple
>>>>    protection domains.  We will refer to this service as the Netlink
>>>>    service.  The Netlink service can also be encapsulated in a different
>>>>    transport layer, if the CPC executes on a different node than the
>>>>    FEC.  The FEC and CPC, using Netlink mechanisms, may choose to define
>>>>    a reliable protocol between each other.  By default, however, Netlink
>>>>    provides an unreliable communication.
>>>>
>>>>    Note that the FEC and CPC can both live in the same memory protection
>>>>    domain and use the connect() system call to create a path to the peer
>>>>    and talk to each other.  We will not discuss this mechanism further
>>>>    other than to say that it is available. Throughout this document, we
>>>>    will refer interchangeably to the FEC to mean kernel space and the
>>>>    CPC to mean user space.  This denomination is not meant, however, to
>>>>    restrict the two components to these protection domains or to the
>>>>    same compute node.
>>>>
>>>>
>>>>
>>>> On 21 May 2015 at 15:55, Ola Liljedahl <ola.liljed...@linaro.org>
>>>> wrote:
>>>>
>>>>> On 21 May 2015 at 13:22, Alexandru Badicioiu <
>>>>> alexandru.badici...@linaro.org> wrote:
>>>>> > Hi,
>>>>> > would Netlink protocol (https://tools.ietf.org/html/rfc3549) fit
>>>>> the purpose
>>>>> > of ODP IPC (within a single OS instance)?
>>>>> I interpret this as a question whether Netlink would be fit as an
>>>>> implementation of the ODP IPC (now called message bus because "IPC" is so
>>>>> contended and imbued with different meanings).
>>>>>
>>>>> It is perhaps possible. Netlink seems a bit focused on intra-kernel
>>>>> and kernel-to-user while the ODP IPC-MBUS is focused on user-to-user
>>>>> (application-to-application).
>>>>>
>>>>> I see a couple of primary requirements:
>>>>>
>>>>>    - Support communication (message exchange) between user space
>>>>>    processes.
>>>>>    - Support arbitrary used-defined messages.
>>>>>    - Ordered, reliable delivery of messages.
>>>>>
>>>>>
>>>>> From the little I can quickly read up on Netlink, the first two
>>>>> requirements do not seem supported. But perhaps someone with more intimate
>>>>> knowledge of Netlink can prove me wrong. Or maybe Netlink can be extended
>>>>> to support u2u and user-defined messages, the current specialization (e.g.
>>>>> specialized addressing, specialized message formats) seems contrary to the
>>>>> goals of providing generic mechanisms in the kernel that can be used for
>>>>> different things.
>>>>>
>>>>> My IPC/MBUS reference implementation for linux-generic builds upon
>>>>> POSIX message queues. One of my issues is that I want the message queue
>>>>> associated with a process to go away when the process goes away. The
>>>>> message queues are not independent entities.
>>>>>
>>>>> -- Ola
>>>>>
>>>>> >
>>>>> > Thanks,
>>>>> > Alex
>>>>> >
>>>>> > On 21 May 2015 at 14:12, Ola Liljedahl <ola.liljed...@linaro.org>
>>>>> wrote:
>>>>> >>
>>>>> >> On 21 May 2015 at 11:50, Savolainen, Petri (Nokia - FI/Espoo)
>>>>> >> <petri.savolai...@nokia.com> wrote:
>>>>> >> >
>>>>> >> >
>>>>> >> >> -----Original Message-----
>>>>> >> >> From: lng-odp [mailto:lng-odp-boun...@lists.linaro.org] On
>>>>> Behalf Of
>>>>> >> >> ext
>>>>> >> >> Ola Liljedahl
>>>>> >> >> Sent: Tuesday, May 19, 2015 1:04 AM
>>>>> >> >> To: lng-odp@lists.linaro.org
>>>>> >> >> Subject: [lng-odp] [RFC] Add ipc.h
>>>>> >> >>
>>>>> >> >> As promised, here is my first attempt at a standalone API for
>>>>> IPC -
>>>>> >> >> inter
>>>>> >> >> process communication in a shared nothing architecture (message
>>>>> passing
>>>>> >> >> between processes which do not share memory).
>>>>> >> >>
>>>>> >> >> Currently all definitions are in the file ipc.h but it is
>>>>> possible to
>>>>> >> >> break out some message/event related definitions (everything from
>>>>> >> >> odp_ipc_sender) in a separate file message.h. This would mimic
>>>>> the
>>>>> >> >> packet_io.h/packet.h separation.
>>>>> >> >>
>>>>> >> >> The semantics of message passing is that sending a message to an
>>>>> >> >> endpoint
>>>>> >> >> will always look like it succeeds. The appearance of endpoints is
>>>>> >> >> explicitly
>>>>> >> >> notified through user-defined messages specified in the
>>>>> >> >> odp_ipc_resolve()
>>>>> >> >> call. Similarly, the disappearance (e.g. death or otherwise lost
>>>>> >> >> connection)
>>>>> >> >> is also explicitly notified through user-defined messages
>>>>> specified in
>>>>> >> >> the
>>>>> >> >> odp_ipc_monitor() call. The send call does not fail because the
>>>>> >> >> addressed
>>>>> >> >> endpoints has disappeared.
>>>>> >> >>
>>>>> >> >> Messages (from endpoint A to endpoint B) are delivered in order.
>>>>> If
>>>>> >> >> message
>>>>> >> >> N sent to an endpoint is delivered, then all messages <N have
>>>>> also been
>>>>> >> >> delivered. Message delivery does not guarantee actual processing
>>>>> by the
>>>>> >> >
>>>>> >> > Ordered is OK requirement, but "all messages <N have also been
>>>>> >> > delivered" means in practice loss less delivery (== re-tries and
>>>>> >> > retransmission windows, etc). Lossy vs loss less link should be an
>>>>> >> > configuration option.
>>>>> >> I am just targeting internal communication which I expect to be
>>>>> >> reliable. There is not any physical "link" involved. If an
>>>>> >> implementation chooses to use some unreliable media, then it will
>>>>> need
>>>>> >> to take some counter measures. Any loss of message could be detected
>>>>> >> using sequence numbers (and timeouts) and handled by (temporary)
>>>>> >> disconnection (so that no more messages will be delivered should one
>>>>> >> go missing).
>>>>> >>
>>>>> >> I am OK with adding the lossless/lossy configuration to the API as
>>>>> >> long as lossless option is always implemented. Is this a
>>>>> configuration
>>>>> >> when creating the local  IPC endpoint or when sending a message to
>>>>> >> another endpoint?
>>>>> >>
>>>>> >> >
>>>>> >> > Also what "delivered" means?'
>>>>> >> >
>>>>> >> > Message:
>>>>> >> >  - transmitted successfully over the link ?
>>>>> >> >  - is now under control of the remote node (post office) ?
>>>>> >> >  - delivered into application input queue ?
>>>>> >> Probably this one but I am not sure the exact definition matters,
>>>>> "has
>>>>> >> been delivered" or "will eventually be delivered unless connection
>>>>> to
>>>>> >> the destination is lost". Maybe there is a better word than
>>>>> >> "delivered?
>>>>> >>
>>>>> >> "Made available into the destination (recipient) address space"?
>>>>> >>
>>>>> >> >  - has been dequeued from application queue ?
>>>>> >> >
>>>>> >> >
>>>>> >> >> recipient. End-to-end acknowledgements (using messages) should
>>>>> be used
>>>>> >> >> if
>>>>> >> >> this guarantee is important to the user.
>>>>> >> >>
>>>>> >> >> IPC endpoints can be seen as interfaces (taps) to an internal
>>>>> reliable
>>>>> >> >> multidrop network where each endpoint has a unique address which
>>>>> is
>>>>> >> >> only
>>>>> >> >> valid for the lifetime of the endpoint. I.e. if an endpoint is
>>>>> >> >> destroyed
>>>>> >> >> and then recreated (with the same name), the new endpoint will
>>>>> have a
>>>>> >> >> new address (eventually endpoints addresses will have to be
>>>>> recycled
>>>>> >> >> but
>>>>> >> >> not for a very long time). Endpoints names do not necessarily
>>>>> have to
>>>>> >> >> be
>>>>> >> >> unique.
>>>>> >> >
>>>>> >> > How widely these addresses are unique: inside one VM, multiple
>>>>> VMs under
>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>> >> Currently, the scope of the name and address space is defined by the
>>>>> >> implementation. Perhaps we should define it? My current interest is
>>>>> >> within an OS instance (bare metal or virtualised). Between different
>>>>> >> OS instances, I expect something based on IP to be used (because you
>>>>> >> don't know where those different OS/VM instances will be deployed so
>>>>> >> you need topology-independent addressing).
>>>>> >>
>>>>> >> Based on other feedback, I have dropped the contented usage of "IPC"
>>>>> >> and now call it "message bus" (MBUS).
>>>>> >>
>>>>> >> "MBUS endpoints can be seen as interfaces (taps) to an OS-internal
>>>>> >> reliable multidrop network"...
>>>>> >>
>>>>> >> >
>>>>> >> >
>>>>> >> >>
>>>>> >> >> Signed-off-by: Ola Liljedahl <ola.liljed...@linaro.org>
>>>>> >> >> ---
>>>>> >> >> (This document/code contribution attached is provided under the
>>>>> terms
>>>>> >> >> of
>>>>> >> >> agreement LES-LTM-21309)
>>>>> >> >>
>>>>> >> >
>>>>> >> >
>>>>> >> >> +/**
>>>>> >> >> + * Create IPC endpoint
>>>>> >> >> + *
>>>>> >> >> + * @param name Name of local IPC endpoint
>>>>> >> >> + * @param pool Pool for incoming messages
>>>>> >> >> + *
>>>>> >> >> + * @return IPC handle on success
>>>>> >> >> + * @retval ODP_IPC_INVALID on failure and errno set
>>>>> >> >> + */
>>>>> >> >> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>>> >> >
>>>>> >> > This creates (implicitly) the local end point address.
>>>>> >> >
>>>>> >> >
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Set the default input queue for an IPC endpoint
>>>>> >> >> + *
>>>>> >> >> + * @param ipc   IPC handle
>>>>> >> >> + * @param queue Queue handle
>>>>> >> >> + *
>>>>> >> >> + * @retval  0 on success
>>>>> >> >> + * @retval <0 on failure
>>>>> >> >> + */
>>>>> >> >> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>>> >> >
>>>>> >> > Multiple input queues are likely needed for different priority
>>>>> messages.
>>>>> >> >
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Resolve endpoint by name
>>>>> >> >> + *
>>>>> >> >> + * Look up an existing or future endpoint by name.
>>>>> >> >> + * When the endpoint exists, return the specified message with
>>>>> the
>>>>> >> >> endpoint
>>>>> >> >> + * as the sender.
>>>>> >> >> + *
>>>>> >> >> + * @param ipc IPC handle
>>>>> >> >> + * @param name Name to resolve
>>>>> >> >> + * @param msg Message to return
>>>>> >> >> + */
>>>>> >> >> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>>> >> >> +                  const char *name,
>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>> >> >
>>>>> >> > How widely these names are visible? Inside one VM, multiple VMs
>>>>> under
>>>>> >> > the same host, multiple devices on a LAN (VLAN), ...
>>>>> >> >
>>>>> >> > I think name service (or address resolution) are better handled in
>>>>> >> > middleware layer. If ODP provides unique addresses and message
>>>>> passing
>>>>> >> > mechanism, additional services can be built on top.
>>>>> >> >
>>>>> >> >
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Monitor endpoint
>>>>> >> >> + *
>>>>> >> >> + * Monitor an existing (potentially already dead) endpoint.
>>>>> >> >> + * When the endpoint is dead, return the specified message with
>>>>> the
>>>>> >> >> endpoint
>>>>> >> >> + * as the sender.
>>>>> >> >> + *
>>>>> >> >> + * Unrecognized or invalid endpoint addresses are treated as
>>>>> dead
>>>>> >> >> endpoints.
>>>>> >> >> + *
>>>>> >> >> + * @param ipc IPC handle
>>>>> >> >> + * @param addr Address of monitored endpoint
>>>>> >> >> + * @param msg Message to return
>>>>> >> >> + */
>>>>> >> >> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>>> >> >> +                  const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>>> >> >> +                  odp_ipc_msg_t msg);
>>>>> >> >
>>>>> >> > Again, I'd see node health monitoring and alarms as middleware
>>>>> services.
>>>>> >> >
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Send message
>>>>> >> >> + *
>>>>> >> >> + * Send a message to an endpoint (which may already be dead).
>>>>> >> >> + * Message delivery is ordered and reliable. All (accepted)
>>>>> messages
>>>>> >> >> will
>>>>> >> >> be
>>>>> >> >> + * delivered up to the point of endpoint death or lost
>>>>> connection.
>>>>> >> >> + * Actual reception and processing is not guaranteed (use
>>>>> end-to-end
>>>>> >> >> + * acknowledgements for that).
>>>>> >> >> + * Monitor the remote endpoint to detect death or lost
>>>>> connection.
>>>>> >> >> + *
>>>>> >> >> + * @param ipc IPC handle
>>>>> >> >> + * @param msg Message to send
>>>>> >> >> + * @param addr Address of remote endpoint
>>>>> >> >> + *
>>>>> >> >> + * @retval 0 on success
>>>>> >> >> + * @retval <0 on error
>>>>> >> >> + */
>>>>> >> >> +int odp_ipc_send(odp_ipc_t ipc,
>>>>> >> >> +              odp_ipc_msg_t msg,
>>>>> >> >> +              const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>> >> >
>>>>> >> > This would be used to send a message to an address, but normal
>>>>> >> > odp_queue_enq() could be used to circulate this event inside an
>>>>> application
>>>>> >> > (ODP instance).
>>>>> >> >
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Get address of sender (source) of message
>>>>> >> >> + *
>>>>> >> >> + * @param msg Message handle
>>>>> >> >> + * @param addr Address of sender endpoint
>>>>> >> >> + */
>>>>> >> >> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>>> >> >> +                 uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Message data pointer
>>>>> >> >> + *
>>>>> >> >> + * Return a pointer to the message data
>>>>> >> >> + *
>>>>> >> >> + * @param msg Message handle
>>>>> >> >> + *
>>>>> >> >> + * @return Pointer to the message data
>>>>> >> >> + */
>>>>> >> >> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Message data length
>>>>> >> >> + *
>>>>> >> >> + * Return length of the message data.
>>>>> >> >> + *
>>>>> >> >> + * @param msg Message handle
>>>>> >> >> + *
>>>>> >> >> + * @return Message length
>>>>> >> >> + */
>>>>> >> >> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>>> >> >> +
>>>>> >> >> +/**
>>>>> >> >> + * Set message length
>>>>> >> >> + *
>>>>> >> >> + * Set length of the message data.
>>>>> >> >> + *
>>>>> >> >> + * @param msg Message handle
>>>>> >> >> + * @param len New length
>>>>> >> >> + *
>>>>> >> >> + * @retval 0 on success
>>>>> >> >> + * @retval <0 on error
>>>>> >> >> + */
>>>>> >> >> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>>> >> >
>>>>> >> > When data ptr or data len is modified: push/pull head, push/pull
>>>>> tail
>>>>> >> > would be analogies from packet API
>>>>> >> >
>>>>> >> >
>>>>> >> > -Petri
>>>>> >> >
>>>>> >> >
>>>>> >> _______________________________________________
>>>>> >> lng-odp mailing list
>>>>> >> lng-odp@lists.linaro.org
>>>>> >> https://lists.linaro.org/mailman/listinfo/lng-odp
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>
>>>
>>
>
> _______________________________________________
> lng-odp mailing list
> lng-odp@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/lng-odp
>
>

_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] [RFC] Add ipc.h

Reply via email to