If we intend to add another top-level key to the data to make it more 
accessible for index/search, we should do so in a manner that is extensible for 
any number of IDs.  Index/search, as well as security and business audits, 
require identifiers exclusively and this, in my view, is different from general 
metadata which should be more descriptive and disposable.

The approach that I have seen work elsewhere I refer to as "tagging", that is 
"tagging" data (in this case activations) with domain-specific identifiers used 
to construct diff. views for diff. domains. 

A single key is assoc. with a list of any number of these domain specific 
identifiers each expressed as a URI where the URI components include a 
prefix/domain that identifies the domain wherein the ID is unique (and 
consequently how to interpret the ID), optional paths can be used to further 
describe the ID's unique space (resource or purpose) and end with the actual 
ID.  URIs, aside from being self-descriptive for interpretation, are desirable 
as they intrinsically avoid collisions and also do not require a key as the URI 
prefix/domain/path uniquely identify the domain/purpose of the identifier 
within the same string.

we could define any number if IDs that are recognized by the OW domain and 
event create a resrved prefix to keep them short, e.g., :

full: "//openwhisk.apache.org/transaction/<UID>"
prefixed: "ow:transaction-<UID>"

For example, let's say an activation handled credit card data, one could "tag" 
the record with a PCi indicator:

"//GRC20.gov/cloud/security/pci-dss/transaction/<UID>"

these could appear on an optional key such as:

{
   "tags":[
      "p1://d1/id1",
      "p2://d2/id2",
      ...
   ]
}

tags do not necessarily need to be for IDs alone... that is they can also help 
in aggregating search data; for example, we could "tag" all data that was 
assigned to a certain region or cluster using this method as well:

{
   "tags":[
      "//ibmcloud.com/icf/region/us-south/cluster/0fdeg1"
      "ow:cluster-kube-055b10f",
      "ow:trans-0555ffca456919",
      ...
   ]
}

of course, the array could be limited in size and downstream processors (search 
or otherwise) could easily "pick out" what tags they care about and discard 
ones they do not.

On 2019/08/20 10:30:19, Chetan Mehrotra <[email protected]> wrote: 
> Hi Team,
> 
> Branching the thread [1] to discuss how to record some metadata
> related to activation. Based on some of the usecases I see a need to
> record some more metadata related to activation. Some examples are
> 
> 1. transactionId - Record the transactionId for which the activation is part 
> of
> 2. pod name - Records the pod running the action container when using
> KubernetesContainerFactory
> 3. invocationId - Some id returned by underlying system when
> integrating with AWS Lambda or Azure Function
> 4. clusterId - If running multiple clusters for same system we would
> like to know which cluster handed the given execution
> 
> Some of these ids are determined as part of `ContainerResponse` itself
> and have to be made part of activation json such that later we can
> correlate the activation with other parts.
> 
> Now we need to determine how to store such id
> 
> Option 1 - New "meta" sub document
> -----------
> 
> Introduce a new "meta" key in activation json under which we store such ids
> 
> "meta" : {
>             "transactionId" : "xxx",
>             "podId" : "ow_xxx"
>         }
> 
> 
> Option 2 - Store them as annotations
> -------------
> 
> Instead of  introducing a new field we store them as annotations. Note
> we still make change in code to capture such data as part of
> `ContainerResponse` but just map it to annotations
> 
> One drawback of this approach is that current approach of annotations
> make it harder to index such fields easily. Having a flat structure
> like with "meta" field enables indexing such fields in db's other than
> Couch
> 
> Chetan Mehrotra
> [1]: 
> https://lists.apache.org/thread.html/f8b73a9ffb0d09a50aecfb54538da2e8365c54dcc3e26a78382ad7bd@%3Cdev.openwhisk.apache.org%3E
> 

Reply via email to