On 20 May 2015 at 13:29, Benoît Ganne <bga...@kalray.eu> wrote:
> Hi Ola,
>
> thanks for your feedback. I think part of the problem is that today we have
> 2 things dubbed "IPC" trying to solve different problems (I think): yours vs
> Maxim and me. Maybe we should clarify that.
Yes. Maybe all of us should stay away from the phrase IPC. And message
passing (which is what I am looking for) is just one type of IPC
(useful for shared nothing and distributed systems), in the past many
did IPC using shared memory and mutexes and semaphores. The horror.


> From my understanding, what Maxim is proposing is closer to what I was
> trying to achieve. The main differences of my proposal vs Maxim proposal
> was:
>  - use a more "POSIX namespace" approach for naming resources (eg.
> "/ipc/..." vs "ipc_..."
I see the names of pktio interfaces as platform specific so each
platform can use whatever syntax it wants.

>  - extend pktio to allow unidirectional communication to save HW resources
A slight tweak to the current packet_io API. Post a patch.

> I agree your need is different and need a different API. Maybe we could call
> it "message bus" or "odp bus" or something like that to disambiguate?
Yes. Need a nice three-letter-acronym as well...

>
> Other comments inline.
>
>>>   - use it for control/data planes signaling but also to exchange packets
>>> to
>>> be allowed to build packets processing pipelines
>>
>> Control/data plane signaling must be reliable (per the definition in
>> my RFC). But do you have the same requirements for packet transfer
>> between pipeline stages (on different cores)? Or it just happens to be
>> reliable on your hardware? Is reliability (guaranteed delivery) an
>> inherent requirement for pipelined packet processing? (this would be
>> contrary to my experience).
>
>
> Right, it won't be reliable, especially if the rx is too slow to consume
> messages, tx should get EAGAIN and need to retry later. But reliability can
> be build on top of that (buffering on tx side, spinning...).
Well if the sender gets a notice that send didn't succeed and the data
is still around (so can be resent later), I still see this as
reliable. Unreliably is when data goes missing without notification.

>
>> Also couldn't you use ODP queues as the abstraction for transferring
>> packets between cores/pipeline stages? Per your pktio proposal below,
>> if you connect input and output queues to your "ipc" pktio instances,
>> applications can just enqueue and dequeue packets from these queues
>> and that will cause packets to be transferred between pipeline stages
>> (on different cores/AS's).
>
>
> Sure. But as far as I understand, queues are not associated to pools by
> themselves. We still need a way to move data from one AS to another, and it
> means from one pool to another. I need a way to identify the rx pool.
Yes a queue by itself is not enough.

An egress queue leads to a (transmit-only) pktio interface which then
can magically transport packet to another (receive-only) pktio
interface in another AS. That receive pktio interface uses a pool in
the AS to allocate packet buffers from. Received buffers can be put on
an ingress queue. So the application code just sees the queues, only
the main logic needs handle the "packet pipe" pktio interfaces.

>
>> This use case does not want or need the ability to specify the
>> destination address for each individual send operation and this would
>> likely just add overhead to a performance critical operation.
>
>
> It needs to identify the destination pool, as we are moving from one AS to
> another, and pools are defined per-AS.
But you don't specify the destination pool on a per-packet basis. And
the producer (sender) of packets doesn't care which pool is used. Just
enqueue the packet on a queue.

>
>>>   - IPC ops should be mapped to our NoC HW as much as possible,
>>> especially
>>> for send/recv operations
>>
>> Is there anything in the proposed API that would limit your
>> implementation freedom?
>
>
> Not in the case of an application messaging bus as you propose. I just
> wanted to highlight the fact that we need a lower-level IPC mechanism where
> I can do unidirectional communications (to save HW resources) without any
> centralized logic.
>
>>>  From your proposal, it seems to me that what you proposed is more like
>>> an
>>> application messaging bus such as d-bus (especially the deferred lookup()
>>> and monitor()),
>>
>> Perhaps d-bus can be used as the implementation, that could save me
>> some work.
>
>
> I didn't mean to use D-Bus nor CORBA :)
> It was just to better understand what you were trying to achieve. That said,
> I think we can learn a few things from D-Bus in this case. For example, you
> did mention your need for delivering reliability, but what about message
> ordering and broadcast/multicast?
Messages are ordered on a per source/destination basis.
So far, I haven't seen any absolute need for broadcast or multicast. A
sender normally wants to know which endpoints it is communicating
with, perhaps it is expecting replies.
What is probably more useful is a way to subscribe for all new endpoints.

>
>>> But in that case you need to be able to identify the endpoints. What I
>>> proposed in precedent thread, was to use a hierarchical device naming:
>>>   - "/dev/<device>" for real devices
>>>   - "/ipc/<identifier>" for IPC
>>> What I had in mind so far for our platform was to used {AS identifier +
>>> pool
>>> name in the AS} as IPC identifier, which would avoid the need of a
>>> centralized rendez-vous point.
>>
>> The semantics of the IPC endpoint names are not defined in my
>> proposal. I think it is a mistake to impose meaning on the names.
>
>
> Agreed. I was just proposing a naming scheme for pktio in my case to ease
> readability.
>
>> Also I don't envision any (active) rendez-vous point, it's just a
>> global shared table in my implementation for linux-generic. This is
>> needed since names are user-defined and arbitrary.
>
>
> Hmm a global shared table looks like a rendez-vous point to me. You cannot
> implement that this way in a 0-sharing architecture.
The global shared table is an implementation detail. A true shared
nothing architecture will have to implement it in some other way.

> But anyway, I
> completely see the value of a messaging bus with discovery service, the
> implementation will use whatever suits it for this.
>
>> Why are these pktio instances of type IPC? They could just as well be
>> network interfaces on which you send and/or receive packets with the
>> normal semantics. No need for the IPC API I proposed. What stops you
>> from implementing this today?
>
>
> A specific type is still useful in my opinion:
>  - it eases readability
>  - real pktio may have specific characteristics missing for IPC pktio (eg.
> in our case, we have HW acceleration for packet classification/extraction
> for real pktio, but not for IPC pktio)
>
>> So we need to amend the API spec that it is OK to specify
>> ODP_POOL_INVALID for the pool parameter to the opd_pktio_open() call
>> and this will indicate that the pktio interface is write-only. Is
>> there anything similar we need to do for read-only interfaces? You
>> would like to have the indication of read-only/write-only/read-write
>> at the time of pktio_open. Perhaps we need a new parameter for
>> odp_pktio_open(), the pool parameter is not enough for this purpose.
>
>
> OK. What is the best way to make a proposal? A RFC patches serie to
> linux-generic proposing such implementation?
Or just vanilla patches. These are just minor tweaks to the API.

>
>> Your needs are real but I think reusing the ODP pktio concept is a
>> better solution for you, not trying to hijack the application
>> messaging API which is intended for a very different use case. IPC
>> seems to mean different things to different people so perhaps should
>> be avoided in order not to give people the wrong ideas.
>
>
> Agreed. We need to define better names. I personally like IPC pktio for my
> case and application message bus (odp_bus?) for your case but if you have
> better ideas, I would be happy to hear from you.
I have used "IPC" to mean application messaging network (it's more
than a simple bus, works for distributed systems as well) for twenty
years so yours and Maxim's use of IPC is confusing to me. I think none
of us should use "IPC". You both need pacḱet pipes, I need a messaging
network.


>
>
> ben
>
>>> On 05/19/2015 12:03 AM, Ola Liljedahl wrote:
>>>>
>>>>
>>>> As promised, here is my first attempt at a standalone API for IPC -
>>>> inter
>>>> process communication in a shared nothing architecture (message passing
>>>> between processes which do not share memory).
>>>>
>>>> Currently all definitions are in the file ipc.h but it is possible to
>>>> break out some message/event related definitions (everything from
>>>> odp_ipc_sender) in a separate file message.h. This would mimic the
>>>> packet_io.h/packet.h separation.
>>>>
>>>> The semantics of message passing is that sending a message to an
>>>> endpoint
>>>> will always look like it succeeds. The appearance of endpoints is
>>>> explicitly
>>>> notified through user-defined messages specified in the
>>>> odp_ipc_resolve()
>>>> call. Similarly, the disappearance (e.g. death or otherwise lost
>>>> connection)
>>>> is also explicitly notified through user-defined messages specified in
>>>> the
>>>> odp_ipc_monitor() call. The send call does not fail because the
>>>> addressed
>>>> endpoints has disappeared.
>>>>
>>>> Messages (from endpoint A to endpoint B) are delivered in order. If
>>>> message
>>>> N sent to an endpoint is delivered, then all messages <N have also been
>>>> delivered. Message delivery does not guarantee actual processing by the
>>>> recipient. End-to-end acknowledgements (using messages) should be used
>>>> if
>>>> this guarantee is important to the user.
>>>>
>>>> IPC endpoints can be seen as interfaces (taps) to an internal reliable
>>>> multidrop network where each endpoint has a unique address which is only
>>>> valid for the lifetime of the endpoint. I.e. if an endpoint is destroyed
>>>> and then recreated (with the same name), the new endpoint will have a
>>>> new address (eventually endpoints addresses will have to be recycled but
>>>> not for a very long time). Endpoints names do not necessarily have to be
>>>> unique.
>>>>
>>>> Signed-off-by: Ola Liljedahl <ola.liljed...@linaro.org>
>>>> ---
>>>> (This document/code contribution attached is provided under the terms of
>>>> agreement LES-LTM-21309)
>>>>
>>>>    include/odp/api/ipc.h | 261
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 261 insertions(+)
>>>>    create mode 100644 include/odp/api/ipc.h
>>>>
>>>> diff --git a/include/odp/api/ipc.h b/include/odp/api/ipc.h
>>>> new file mode 100644
>>>> index 0000000..3395a34
>>>> --- /dev/null
>>>> +++ b/include/odp/api/ipc.h
>>>> @@ -0,0 +1,261 @@
>>>> +/* Copyright (c) 2015, Linaro Limited
>>>> + * All rights reserved.
>>>> + *
>>>> + * SPDX-License-Identifier:     BSD-3-Clause
>>>> + */
>>>> +
>>>> +
>>>> +/**
>>>> + * @file
>>>> + *
>>>> + * ODP IPC API
>>>> + */
>>>> +
>>>> +#ifndef ODP_API_IPC_H_
>>>> +#define ODP_API_IPC_H_
>>>> +
>>>> +#ifdef __cplusplus
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>> +/** @defgroup odp_ipc ODP IPC
>>>> + *  @{
>>>> + */
>>>> +
>>>> +/**
>>>> + * @typedef odp_ipc_t
>>>> + * ODP IPC handle
>>>> + */
>>>> +
>>>> +/**
>>>> + * @typedef odp_ipc_msg_t
>>>> + * ODP IPC message handle
>>>> + */
>>>> +
>>>> +
>>>> +/**
>>>> + * @def ODP_IPC_ADDR_SIZE
>>>> + * Size of the address of an IPC endpoint
>>>> + */
>>>> +
>>>> +/**
>>>> + * Create IPC endpoint
>>>> + *
>>>> + * @param name Name of local IPC endpoint
>>>> + * @param pool Pool for incoming messages
>>>> + *
>>>> + * @return IPC handle on success
>>>> + * @retval ODP_IPC_INVALID on failure and errno set
>>>> + */
>>>> +odp_ipc_t odp_ipc_create(const char *name, odp_pool_t pool);
>>>> +
>>>> +/**
>>>> + * Destroy IPC endpoint
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on failure
>>>> + */
>>>> +int odp_ipc_destroy(odp_ipc_t ipc);
>>>> +
>>>> +/**
>>>> + * Set the default input queue for an IPC endpoint
>>>> + *
>>>> + * @param ipc   IPC handle
>>>> + * @param queue Queue handle
>>>> + *
>>>> + * @retval  0 on success
>>>> + * @retval <0 on failure
>>>> + */
>>>> +int odp_ipc_inq_setdef(odp_ipc_t ipc, odp_queue_t queue);
>>>> +
>>>> +/**
>>>> + * Remove the default input queue
>>>> + *
>>>> + * Remove (disassociate) the default input queue from an IPC endpoint.
>>>> + * The queue itself is not touched.
>>>> + *
>>>> + * @param ipc  IPC handle
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on failure
>>>> + */
>>>> +int odp_ipc_inq_remdef(odp_ipc_t ipc);
>>>> +
>>>> +/**
>>>> + * Resolve endpoint by name
>>>> + *
>>>> + * Look up an existing or future endpoint by name.
>>>> + * When the endpoint exists, return the specified message with the
>>>> endpoint
>>>> + * as the sender.
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + * @param name Name to resolve
>>>> + * @param msg Message to return
>>>> + */
>>>> +void odp_ipc_resolve(odp_ipc_t ipc,
>>>> +                    const char *name,
>>>> +                    odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Monitor endpoint
>>>> + *
>>>> + * Monitor an existing (potentially already dead) endpoint.
>>>> + * When the endpoint is dead, return the specified message with the
>>>> endpoint
>>>> + * as the sender.
>>>> + *
>>>> + * Unrecognized or invalid endpoint addresses are treated as dead
>>>> endpoints.
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + * @param addr Address of monitored endpoint
>>>> + * @param msg Message to return
>>>> + */
>>>> +void odp_ipc_monitor(odp_ipc_t ipc,
>>>> +                    const uint8_t addr[ODP_IPC_ADDR_SIZE],
>>>> +                    odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Send message
>>>> + *
>>>> + * Send a message to an endpoint (which may already be dead).
>>>> + * Message delivery is ordered and reliable. All (accepted) messages
>>>> will
>>>> be
>>>> + * delivered up to the point of endpoint death or lost connection.
>>>> + * Actual reception and processing is not guaranteed (use end-to-end
>>>> + * acknowledgements for that).
>>>> + * Monitor the remote endpoint to detect death or lost connection.
>>>> + *
>>>> + * @param ipc IPC handle
>>>> + * @param msg Message to send
>>>> + * @param addr Address of remote endpoint
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on error
>>>> + */
>>>> +int odp_ipc_send(odp_ipc_t ipc,
>>>> +                odp_ipc_msg_t msg,
>>>> +                const uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>> +
>>>> +/**
>>>> + * Get address of sender (source) of message
>>>> + *
>>>> + * @param msg Message handle
>>>> + * @param addr Address of sender endpoint
>>>> + */
>>>> +void odp_ipc_sender(odp_ipc_msg_t msg,
>>>> +                   uint8_t addr[ODP_IPC_ADDR_SIZE]);
>>>> +
>>>> +/**
>>>> + * Message data pointer
>>>> + *
>>>> + * Return a pointer to the message data
>>>> + *
>>>> + * @param msg Message handle
>>>> + *
>>>> + * @return Pointer to the message data
>>>> + */
>>>> +void *odp_ipc_data(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Message data length
>>>> + *
>>>> + * Return length of the message data.
>>>> + *
>>>> + * @param msg Message handle
>>>> + *
>>>> + * @return Message length
>>>> + */
>>>> +uint32_t odp_ipc_length(const odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Set message length
>>>> + *
>>>> + * Set length of the message data.
>>>> + *
>>>> + * @param msg Message handle
>>>> + * @param len New length
>>>> + *
>>>> + * @retval 0 on success
>>>> + * @retval <0 on error
>>>> + */
>>>> +int odp_ipc_reset(const odp_ipc_msg_t msg, uint32_t len);
>>>> +
>>>> +/**
>>>> + * Allocate message
>>>> + *
>>>> + * Allocate a message of a specific size.
>>>> + *
>>>> + * @param pool Message pool to allocate message from
>>>> + * @param len Length of the allocated message
>>>> + *
>>>> + * @return IPC message handle on success
>>>> + * @retval ODP_IPC_MSG_INVALID on failure and errno set
>>>> + */
>>>> +odp_ipc_msg_t odp_ipc_alloc(odp_pool_t pool, uint32_t len);
>>>> +
>>>> +/**
>>>> + * Free message
>>>> + *
>>>> + * Free message back to the message pool it was allocated from.
>>>> + *
>>>> + * @param msg Handle of message to free
>>>> + */
>>>> +void odp_ipc_free(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Get message handle from event
>>>> + *
>>>> + * Converts an ODP_EVENT_MESSAGE type event to a message.
>>>> + *
>>>> + * @param ev   Event handle
>>>> + *
>>>> + * @return Message handle
>>>> + *
>>>> + * @see odp_event_type()
>>>> + */
>>>> +odp_ipc_msg_t odp_message_from_event(odp_event_t ev);
>>>> +
>>>> +/**
>>>> + * Convert message handle to event
>>>> + *
>>>> + * @param msg  Message handle
>>>> + *
>>>> + * @return Event handle
>>>> + */
>>>> +odp_event_t odp_message_to_event(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * Get printable value for an odp_ipc_t
>>>> + *
>>>> + * @param ipc  IPC handle to be printed
>>>> + * @return     uint64_t value that can be used to print/display this
>>>> + *             handle
>>>> + *
>>>> + * @note This routine is intended to be used for diagnostic purposes
>>>> + * to enable applications to generate a printable value that represents
>>>> + * an odp_ipc_t handle.
>>>> + */
>>>> +uint64_t odp_ipc_to_u64(odp_ipc_t ipc);
>>>> +
>>>> +/**
>>>> + * Get printable value for an odp_ipc_msg_t
>>>> + *
>>>> + * @param msg  Message handle to be printed
>>>> + * @return     uint64_t value that can be used to print/display this
>>>> + *             handle
>>>> + *
>>>> + * @note This routine is intended to be used for diagnostic purposes
>>>> + * to enable applications to generate a printable value that represents
>>>> + * an odp_ipc_msg_t handle.
>>>> + */
>>>> +uint64_t odp_ipc_msg_to_u64(odp_ipc_msg_t msg);
>>>> +
>>>> +/**
>>>> + * @}
>>>> + */
>>>> +
>>>> +#ifdef __cplusplus
>>>> +}
>>>> +#endif
>>>> +
>>>> +#endif
>>>>
>>>
>>>
>>> --
>>> Benoît GANNE
>>> Field Application Engineer, Kalray
>>> +33 (0)648 125 843
>
>
>
> --
> Benoît GANNE
> Field Application Engineer, Kalray
> +33 (0)648 125 843
_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Reply via email to