Re: [lng-odp] thread/shmem discussion summary V4

Bill Fischofer Fri, 03 Jun 2016 14:57:54 -0700

I realized I forgot to respond to s23.  Corrected here.

On Fri, Jun 3, 2016 at 4:15 AM, Christophe Milard <
christophe.mil...@linaro.org> wrote:


> since V3: Update following Bill's comments
> since V2: Update following Barry and Bill's comments
> since V1: Update following arch call 31 may 2016
>
> This is a tentative to sum up the discussions around the thread/process
> that have been happening these last weeks.
> Sorry for the formalism of this mail, but it seems we need accuracy
> here...
>
> This summary is organized as follows:
>
> It is a set of statements, each of them expecting a separate answer
> from you. When no specific ODP version is specified, the statement
> regards the"ultimate" goal (i.e what we want eventually to achieve).
> Each statement is prefixed with:
>   - a statement number for further reference (e.g. S1)
>   - a status word (one of 'agreed' or 'open', or 'closed').
> Agreed statements expect a yes/no answers: 'yes' meaning that you
> acknowledge that this is your understanding of the agreement and will
> not nack an implementation based on this statement. You can comment
> after a yes, but your comment will not block any implementation based
> on the agreed statement. A 'no' implies that the statement does not
> reflect your understanding of the agreement, or you refuse the
> proposal.
> Any 'no' received on an 'agreed' statement will push it back as 'open'.
> Open statements are fully open for further discussion.
>
> S1  -agreed: an ODP thread is an OS/platform concurrent execution
> environment object (as opposed to an ODP objects). No more specific
> definition is given by the ODP API itself.
>
> Barry: YES
> Bill: Yes
>
> ---------------------------
>
> S2  -agreed: Each ODP implementation must tell what is allowed to be
> used as ODP thread for that specific implementation: a linux-based
> implementation, for instance, will have to state whether odp threads
> can be linux pthread, linux processes, or both, or any other type of
> concurrent execution environment. ODP implementations can put any
> restriction they wish on what an ODP thread is allowed to be. This
> should be documented in the ODP implementation documentation.
>
> Barry: YES
> Bill: Yes
>
> ---------------------------
>
> S3  -agreed: in the linux generic ODP implementation a odpthread will be
> either:
>         * a linux process descendant (or same as) the odp instantiation
> process.
>         * a pthread 'member' of a linux process descendant (or same
> as) the odp instantiation process.
>
> Barry: YES
> Bill: Yes
>
> ---------------------------
>
> S4  -agreed: For monarch, the linux generic ODP implementation only
> supports odp thread as pthread member of the instantiation process.
>
> Barry: YES
> Bill: Yes
>
> ---------------------------
>
> S5  -agreed: whether multiple instances of ODP can be run on the same
> machine is left as a implementation decision. The ODP implementation
> document should state what is supported and any restriction is allowed.
>
> Barry: YES
> Bill: Yes
>
> ---------------------------
>
> S6  -agreed: The l-g odp implementation will support multiple odp
> instances whose instantiation processes are different and not
> ancestor/descendant of each others. Different instances of ODP will,
> of course, be restricted in sharing common OS ressources (The total
> amount of memory available for each ODP instances may decrease as the
> number of instances increases, the access to network interfaces will
> probably be granted to the first instance grabbing the interface and
> denied to others... some other rule may apply when sharing other
> common ODP ressources.)
>
> Bill: Yes
>
> ---------------------------
>
> S7  -agreed: the l-g odp implementation will not support multiple ODP
> instances initiated from the same linux process (calling
> odp_init_global() multiple times).
> As an illustration, This means that a single process P is not allowed
> to execute the following calls (in any order)
> instance1 = odp_init_global()
> instance2 = odp_init_global()
> pthread_create (and, in that thread, run odp_local_init(instance1) )
> pthread_create (and, in that thread, run odp_local_init(instance2) )
>
> Bill: Yes
>
> -------------------
>
> S8  -agreed: the l-g odp implementation will not support multiple ODP
> instances initiated from related linux processes (descendant/ancestor
> of each other), hence enabling ODP 'sub-instance'? As an illustration,
> this means that the following is not supported:
> instance1 = odp_init_global()
> pthread_create (and, in that thread, run odp_local_init(instance1) )
> if (fork()==0) {
>     instance2 = odp_init_global()
>     pthread_create (and, in that thread, run odp_local_init(instance2) )
> }
>
> Bill: Yes
>
> --------------------
>
> S9  -agreed: the odp instance passed as parameter to odp_local_init()
> must always be one of the odp_instance returned by odp_global_init()
>
> Barry: YES
> Bill: Yes
>
> ---------------------------
>
> S10 -agreed: For l-g, if the answer to S7 and S8 are 'yes', then due to S3,
> the odp_instance an odp_thread can attach to is completely defined by
> the ancestor of the thread, making the odp_instance parameter of
> odp_init_local redundant. The odp l-g implementation guide will
> enlighten this
> redundancy, but will stress that even in this case the parameter to
> odp_local_init() still have to be set correctly, as its usage is
> internal to the implementation.
>
> Barry: I think so
> Bill: This practice also ensures that applications behave unchanged if
> and when multi-instance support is added, so I don't think we need to
> be apologetic about this parameter requirement.
>
> ---------------------------
>
> S11 -agreed: at odp_global_init() time, the application will provide 3
> sets of cpu (i.e 3 cpu masks):
>         -the control cpu mask
>         -the worker cpu mask
>         -the odp service cpu mask (i.e the set of cpu odp can take for
> its own usage)
> Note: The service CPU mask will be introdused post monarch
>
> Bill: Yes
> Barry: YES
> ---------------------------
>
> S12 -agreed: the odp implementation may return an error at
> odp_init_global() call if the number of cpu in the odp service mask
> (or their 'position') does not match the ODP implementation need.
>
> Barry: YES
> Bill: Yes. However, an implementation may fail an odp_init_global() call
> for any resource insufficiency, not just cpus.
>
>
> ---------------------------
>
> S13 -agreed: the application is fully responsible of pinning its own
> odp threads to different cpus, and this is done directly through OS
> system calls, or via helper functions (as opposed to ODP API calls).
> This pinning should be made among cpus member of the worker cpu mask
> or the control cpu mask.
>
> Barry: YES, but I support the existence of helper functions to do this
> – including the
> important case of pinning the main thread
>
> Bill: Yes. And agree an ODP helper is useful here (which is why odp-linux
> provides one).
>
> ---------------------------
>
> S14 -agreed: whether more than one odp thread can be pinned to the
> same cpu is left as an implementation choice (and the answer to that
> question can be different for the service, worker and control
> threads). This choice should be well documented in the implementation
> user manual.
>
> Barry: YES
> Bill: Yes
>
> ---------------------------
>
> S15 -agreed: the odp implementation is responsible of pinning its own
> service threads among the cpu member of the odp service cpu mask.
>
> Barry: YES,  in principle – BUT be aware that currently the l-g ODP
> implementation
> (and perhaps many others) cannot call the helper functions (unless
> inlined),
> so this internal pinning may not be well coordinated with the helpers.
>
> Bill: Yes.  And I agree with Barry on the helper recursion issue. We should
> fix that so there is no conflict between implementation internal pinning
> and application pinning attempts.
>
> ---------------------------
>
> S16 -open: why does the odp implementation need to know the control and
> worker mask? If S13 is true, shoudln't these two masks be part of the
> helper only? (meaning that S11 is wrong)
>
> Barry: Currently it probably doesn’t NEED them, but perhaps in the
> future, with some
> new API’s and capabilities, it might benefit from this information,
> and so I would leave them in.
>
> Bill: The implementation sees these because they are how a provisioning
> agent (e.g., OpenDaylight) would pass higher-level configuration
> information through the application to the underlying ODP implementation.
> The initial masks specified on odp_init_global() are used in the
> implementation of the odp_cpumask_default_worker(),
> odp_cpumask_default_control(), and odp_cpumask_all_available() APIs.
>
> ---------------------------
>
> S17 -open: should masks passed as parameter to odp_init_global() have the
> same "namespace" as those used internally within ODP?
>
> Barry: YES
> Bill: Yes. I'm not sure what it would mean for them to be in a different
> namespace. How would those be bridged if they weren't?
>
>
> ---------------------------
>
> S18 -agreed: ODP handles are valid over the whole ODP instance, i.e.
> any odp handle remains valid among all the odpthreads of the ODP
> instance regardless of the odp thread type (process, thread or
> whatever): an ODP thread A can pass an odp handle to onother ODP
> thread B (using any kind of IPC), and B can use the handle.
>
> Bill: Yes
>
> -----------------
>
> S19 -open : any pointer retrieved by an ODP call (such as
> odp_*_get_addr()) follows the rules defined by the OS, with the
> possible exception defined in S21. For the linux generic ODP
> implementation, this means that
> pointers are fully shareable when using pthreads and that pointers
> pointing to shared mem areas will be shareable as long as the fork()
> happens after the shm_reserve().
>
> Barry: NO. Disagree.  I would prefer to see a consistent ODP answer on
> this topic, and in
> particular I don’t even believe that most OS’s “have rules defining …”,
> since
> in fact one can make programs run under Linux which can share pointers
> regardless
> the ordering of fork() calls.  Most OS have lots of (continually
> evolving) capabilities
> in the category of sharing memory and so “following the rules of the OS”
> is not
> well defined.
> Instead, I prefer a simpler rule.  Memory reserved using the special
> flag is guaranteed
> to use the same addresses across processes, and all other pointers are
> not guaranteed
> to be the same nor guaranteed to be different, so the ODP programmer
> should avoid
> any such assumptions for maximum portability.  But of course programmers
> often
> only consider a subset of possible targets (e.g. how many programmers
> consider porting
> to an 8-bit CPU or a machine with a 36-bit word length), and so they
> may happily take advantage
> of certain non-guaranteed assumptions.
>
>
> Bill: As I noted earlier we have to distinguish between different types of
> memory and where these pointers come from. If the application is using
> malloc() or some other OS API to get memory and then using that memory's
> address as, for example, a queue context pointer, then it is taking
> responsibility for ensuring that these pointers are meaningful to whoever
> sees them. ODP isn't going to do anything to help there. So this question
> really only refers to addresses returned from ODP APIs. If we look for void
> * returns in the ODP API we see that the only addresses ODP returns are:
>
> 1) Those that enable addressability to buffers and packets
> (odp_buffer_addr(), odp_packet_data(), odp_packet_offset(), etc.)
>
> These addresses are intended to be used within the scope of the calling
> thread and should not be assumed to have any validity outside of that
> context because the buffer/packet is the durable object and any addresses
> are just (potentially transient) mappings of that object for use by that
> thread. Packet and buffer handles (not addresses) are passed between
> threads via queues and the receiver issues its own such calls on the
> received handles to get its own addressability to these objects. Whether or
> not these addresses are the same is purely internal to an ODP
> implementation and is not visible to the application.
>
> 2) Packet user areas (odp_packet_user_area()).  This API returns the
> address of a preallocated user area associated with the packet (size
> defined by the pool that the packet was drawn from at odp_pool_create()
> time by the max_uarea_size entry in the odp_pool_param_t). Since this is
> metadata associated with the packet this API may be called by any thread
> that obtains the odp_packet_t for the packet that contains that user area.
> However, like the packet itself, the scope of this returned address is the
> calling thread. So the address returned by odp_packet_user_area() should
> not be cached or passed to any other thread. Each thread that needs
> addressability to this area makes its own call and whether these returned
> addresses are the same or different is again internal to the implementation
> and not visible to the application. Note that just as two threads should
> not ownership of an odp_packet_t at the same time, two threads should not
> be trying to access the user area associated with a packet either.
>
> 3) Context pointer getters (odp_queue_context(), odp_packet_user_ptr(),
> odp_timeout_user_ptr(), odp_tm_node_context(), odp_tm_queue_context(), and
> the user context pointer carried in the odp_crypto_params_t struct)
>
> These are set by the application using corresponding setter APIs or
> provided as values in structs, so the application either obtains these
> pointers on its own, in which case it is responsible for ensuring that they
> are meaningful to whoever retrieves them, or from an odp_shm_t.  So these
> are not a special case in themselves.
>
> 4) ODP shared memory (odp_shm_addr(), odp_shm_info()).  These APIs return
> addresses to odp_shm_t objects that are specifically created to support
> sharing. The rule here is simple: the scope of any returned shm address is
> determined by the sharing flag specified at odp_shm_reserve() time. ODP
> currently defines two such flags: ODP_SHM_SW_ONLY and ODP_SHM_PROC. We
> simply need to define precisely the intended sharing scope of these two (or
> any new flags we define) to answer this question.  Note that context
> pointers drawn from odp_shm_t objects would then have whatever sharing
> attributes that the shm object has, thus completely defining case (3).
>
> ---------------------
>
> S20 -open: by default, shmem addresses (returned by odp_shm_addr())
> follow the OS rules, as defined by S19.
>
> Ola: The question is which OS rules apply (an OS can have different rules
> for
> different OS objects, e.g. memory regions allocated using malloc and mmap
> will behave differently). I think the answer depends on ODP shmem objects
> are implemented. Only the ODP implementation knows how ODP shmem objects
> are created (e.g. use some OS system call, manipulate the page tables
> directly). So essentially the sharability of pointers is ODP implementation
> specific (although ODP implementations on the same OS can be expected to
> behave the same). Conclusion: we actually don't specify anything at all
> here, it is completely up to the ODP implementation.
> What is required/expected by ODP applications? If we don't make
> applications happy, ODP is unlikely to succeed.
> I think many applications are happy with a single-process thread model
> where all memory is shared and pointers can be shared freely.
> I hear of some applications that require multi-process thread model, I
> expect that those applications also want to be able to share memory and
> pointers freely between them, at least memory that was specifically
> allocated to be shared (so called shared memory regions, what's otherwise
> the purpose of such memory regions?).
>
> Barry: Disagree with the same comments as in S19.
>
> Bill: I believe my discourse on S19 completely resolves this question. This
> is controlled by the share flag specified at odp_shm_reserve() time. We
> just need to specify the sharing scope implied by each of these and then it
> is up to each implementation to see that such scope is realized.
>
> ---------------------
>
> S21 -open: shm will support and extra flag at shm_reserve() call time:
> SHM_XXX. The usage of this flag will allocate shared memory guaranteed
> to be located at the same virtual address on all odpthreads of the
> odp_instance. Pointers to this shared memory type are therefore fully
> sharable, even on odpthreads running on different VA space (e.g.
> processes). The amount of memory which can be allocated using this
> flag can be
> limited to any value by the ODP implementation, down to zero bytes,
> meaning that some odp implementation may not support this option at
> all. The shm_reserve() will return an error in this case.The usage of
> this flag by the application is therefore not recommended. The ODP
> implementation may require a hint about the size of this area at
> odp_init_global() call time.
>
> Barry: Mostly agree, except for the comment about the special flag not
> being recommended.
>
> Ola: Agree. Some/many applications will want to share memory between
> threads/processes and must be able to do so. Some ODP platforms may have
> limitations to the amount of memory (if any) that can be shared and may
> thus fail to run certain applications. Such is life. I don't see a problem
> with that. Possibly we should remove the phrase "not recommended" and just
> state that portability may be limited.
>
>
> Bill: Yes. As noted in S19 and S20 the intent of the share flag is to
> specify desired addressability scope for the returned odp_shm_t. It's
> perfectly reasonable to define multiple such scopes that may have different
> intended uses (and implementation costs).
>
> ------------------
>
> S22 -open: please put here your name suggestions for this SHM_XXX flag
> :-).
>
> Ola:
> SHM_I_REALLY_WANT_TO_SHARE_THIS_MEMORY
>
> Bill: I previously suggested ODP_SHM_INSTANCE that specifies that the
> sharing scope of this odp_shm_t is the entire ODP instance.
>
> ------------------
>
> S23 -open: The rules above define relatively well the behaviour of
> pointer retrieved by the call to odp_shm_get_addr(). But many points
> needs tobe defined regarding other ODP objects pointers: What is the
> validity of a pointer to a packet, for instance? If process A creates
> a packet pool P, then forks B and C, and B allocate a packet from P
> and retrieves a pointer to a packet allocated from this P... Is this
> pointer valid in A and C? In the current l-g implementation, it
> will... Is this behaviour
> something we wish to enforce on any odp implementation? What about
> other objects: buffers, atomics ... Some clear rule has to be defined
> here... How things behave and if this behaviour is a part of the ODP
> API or just specific to different implementations...
>
> Ola: Perhaps we need the option to specify the
> I_REALLY_WANT_TO_SHARE_THIS_MEMORY flag when creating all types of ODP
> pools?
> An ODP implementation can always fail to create such a pool if the
> sharability requirement can not be satisfied.
> Allocation of locations used for atomic operations is the responsibility
> of the application which can (and must) choose a suitable type of memory.
> It is better that sharability is an explicit requirement from the
> application. It should be specified as a flag parameter to the different
> calls that create/allocate regions of memory (shmem, different types of
> pools).
>
>
> Barry:
> Again refer to S19 answer.  Specifically it is about what is
> GUARANTEED regarding
> pointer validity, not whether the pointers in certain cases will happen
> to be
> the same.  So for your example, the pointer is not guaranteed to be
> valid in A and C,
> but the programmer might well believe that for all the ODP platforms
> and implementations
> they expect to run on, this is very likely to be the case, in which
> case we can’t stop them
> from constraining their program’s portability – no more than requiring
> them to be able to
> port to a ternary (3-valued “bit”) architecture.
>


Bill: See s19 response, which answers all these questions.  The fork case
you mention is moot because addresses returned by ODP packet operations are
only valid in the thread that makes those calls. The handle must be valid
throughout the instance so that may affect how the implementation chooses
to implement packet pool creation, but such questions are internal to each
implementation and not part of the API.

Atomics are application declared entities and so memory sharing is again
covered by my response to s19. If the atomic is in storage the application
allocated from the OS, then its sharability is the responsibility of the
application. If the atomic is part of a struct that resides in an
odp_shm_t, then its sharability is governed by the share flag of the
odp_shm it is taken from.

No additional parameters are required for pools because what is shared from
them is handles, not addresses. As noted, when these handles are converted
to addresses those addresses are only valid for the calling thread. All
other threads that have access to the handle make their own calls and
whether or not those two addresses have any relationship to each other is
not visible to the application.


>

> ---------------------
>
> Thanks for your feedback!
>
>
>
>
_______________________________________________
lng-odp mailing list
lng-odp@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] thread/shmem discussion summary V4

Reply via email to