Re: Follow up ideas, to simplify creating MonitoringInfos.

Lukasz Cwik Wed, 31 Oct 2018 11:19:31 -0700

I see and don't know how to help you beyond what your already suggesting.
>From what I remember, maps were added as syntactic sugar of lists of key
value pairs.


On Tue, Oct 30, 2018 at 5:37 PM Alex Amato <ajam...@google.com> wrote:

> I am not sure on the correct syntax to populate the instances of my
> MonitoringInfoSpec messages
>
> message MonitoringInfoSpec {
>
> string urn = 1;
>
> string type_urn = 2;
>
> repeated string required_labels = 3;
>
> * map<string, string> annotations = 4;*
>
> }
>
>
> Notice how the annotations field is not used anywhere. I was unable to get
> this to compile and could find no examples of this on the proto github.
> Perhaps I'll have to reach out to them. I was wondering if anyone here was
> familiar first.
>
>
> message MonitoringInfoSpecs {
>
> enum MonitoringInfoSpecsEnum {
>
>   USER_COUNTER = 0 [(monitoring_info_spec) = {
>
>     urn: "beam:metric:user",
>
>     type_urn: "beam:metrics:sum_int_64",
>
>   }];
>
>
>   ELEMENT_COUNT = 1 [(monitoring_info_spec) = {
>
>     urn: "beam:metric:element_count:v1",
>
>     type_urn: "beam:metrics:sum_int_64",
>
>     required_labels: ["PTRANSFORM"],
>
>   }];
>
>
>   START_BUNDLE_MSECS = 2 [(monitoring_info_spec) = {
>
>     urn: "beam:metric:pardo_execution_time:start_bundle_msecs:v1",
>
>     type_urn: "beam:metrics:sum_int_64",
>
>     required_labels: ["PTRANSFORM"],
>
>   }];
>
>
>   PROCESS_BUNDLE_MSECS = 3 [(monitoring_info_spec) = {
>
>     urn: "beam:metric:pardo_execution_time:process_bundle_msecs:v1",
>
>     type_urn: "beam:metrics:sum_int_64",
>
>     required_labels: ["PTRANSFORM"],
>
>   }];
>
>
>   FINISH_BUNDLE_MSECS = 4 [(monitoring_info_spec) = {
>
>     urn: "beam:metric:pardo_execution_time:finish_bundle_msecs:v1",
>
>     type_urn: "beam:metrics:sum_int_64",
>
>     required_labels: ["PTRANSFORM"],
>
>   }];
>
>
>   TOTAL_MSECS = 5 [(monitoring_info_spec) = {
>
>     urn: "beam:metric:ptransform_execution_time:total_msecs:v1",
>
>     type_urn: "beam:metrics:sum_int_64",
>
>     required_labels: ["PTRANSFORM"],
>
>   }];
>
> }
>
> }
>
>
>
>
> On Tue, Oct 30, 2018 at 2:01 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> I'm not sure what you mean by "Using a map in an option."
>>
>> For your second issue, the google docs around this show[1]:
>>
>> Note that if you want to use a custom option in a package other than the
>> one in which it was defined, you must prefix the option name with the
>> package name, just as you would for type names. For example:
>>
>> // foo.proto
>> import "google/protobuf/descriptor.proto";
>> package foo;
>> extend google.protobuf.MessageOptions {
>>   optional string my_option = 51234;
>> }
>>
>> // bar.proto
>> import "foo.proto";
>> package bar;
>> message MyMessage {
>>   option (foo.my_option) = "Hello world!";
>> }
>>
>>
>> 1:
>> https://developers.google.com/protocol-buffers/docs/proto#customoptions
>>
>>
>> On Mon, Oct 29, 2018 at 5:19 PM Alex Amato <ajam...@google.com> wrote:
>>
>>> Hi Robert and community, :)
>>>
>>> I was starting to code this up, but I wasn't sure exactly how to do some
>>> of the proto syntax. Would you mind taking a look at this doc
>>> <https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>
>>> and let me know if you know how to resolve any of these issues:
>>>
>>>    - Using a map in an option.
>>>    - Referring to string "constants" from other enum options in .proto
>>>    files.
>>>
>>> Please see the comments I have listed in the doc
>>> <https://docs.google.com/document/d/1SB59MMVZXO0Aa6w0gf4m0qM4oYt4SiofDq3QxnpQaK4/edit?usp=sharing>,
>>> and a few alternative suggestions.
>>> Thanks
>>>
>>> On Wed, Oct 24, 2018 at 10:08 AM Alex Amato <ajam...@google.com> wrote:
>>>
>>>> Okay. That makes sense. Using runtime validation and protos is what I
>>>> was thinking as well.
>>>> I'll include you as a reviewer in my PRs.
>>>>
>>>> As for the choice of a builder/constructor/factory. If we go with
>>>> factory methods/constructor then we will need a method for each metric type
>>>> (intCounter, latestInt64, ...). Plus, then I don't think we can have any
>>>> constructor parameters for labels, we will just need to accept a dictionary
>>>> for label keys+values like you say. This is because we cannot offer a
>>>> method for each URN and its combination of labels, and we should avoid such
>>>> static detection, as you say.
>>>>
>>>> The builder however, allows us to add a method for setting each label.
>>>> Since there are a small number of labels I think this should be fine. A
>>>> specific metric URN will have a specific set of labels which we can
>>>> validate being set. Any variant of this should use a different label (or a
>>>> new version in the label). Thus, the builder can give an advantage, making
>>>> it easier to set label values without needing to provide the correct key
>>>> for the label, when its set. You just need to call the correct method.
>>>>
>>>> Perhaps it might be best to leave this open to each SDK based on its
>>>> language style (Builder, Factory, etc.) , but make sure we use the
>>>> protos+runtime validation.
>>>>
>>>> On Wed, Oct 24, 2018 at 7:02 AM Robert Bradshaw <rober...@google.com>
>>>> wrote:
>>>>
>>>>> Thanks for bringing this to the list; it's a good question.
>>>>>
>>>>> I think the difficulty comes from trying to statically define a lists
>>>>> of possibilities that should instead be runtime values. E.g. we
>>>>> currently we're up to about a dozen distinct types, and having a
>>>>> setter for each is both verbose and not very extensible (especially
>>>>> replicated cross language). I'm not sure the set of possible labels is
>>>>> fixed either. Generally in the FnAPI we've been using shared constants
>>>>> for this kind of thing instead. (I was wary about the protos for the
>>>>> same reasons; it would be good to avoid leaking this through to each
>>>>> of the various SDKs.) The amount of static typing/validation one gets
>>>>> depends on how much logic you build into each of these methods (e.g.
>>>>> does it (almost?) always "metric" which is assumed to already be
>>>>> encoded correctly, or a specific type that has tradeoffs with the
>>>>> amount you can do generically (e.g. we have code that first loops over
>>>>> counters, then over distributions, then over gauges, and I don't think
>>>>> we want continue that pattern all M places for all N types)).
>>>>>
>>>>> I would probably lean towards mostly doing runtime validation here.
>>>>> Specifically, one would have a data file that defines, for each known
>>>>> URN, its type along with the set of permitted/expected/required
>>>>> labels. On construction a MonitoringInfo data point, one would provide
>>>>> a URN + value + labelMap, and optionally a type. On construction, the
>>>>> constructor (method, factory, whatever) would look up the URN to
>>>>> determine the type (or throw an error if it's both not known and not
>>>>> provided), which could be then used to fetch an encoder of sorts (that
>>>>> can go from value <-> proto encoding, possibly with some validation).
>>>>> If labels and/or annotations are provided and the URN is known, we can
>>>>> validate these as well.
>>>>>
>>>>> As for proto/enums vs. yaml, the former is nice because code
>>>>> generation comes for free, but has turned out to be much more verbose
>>>>> (both the definition and the use) and I'm still on the fence whether
>>>>> it's a net win.
>>>>>
>>>>> (Unfortunately AutoValue won't help much here, as the ultimate goal is
>>>>> to wrap a very nested proto OneOf, ideally with some validation.
>>>>> (Also, in Python, generally, having optional, named arguments makes
>>>>> this kind of builder pattern less needed.))
>>>>>
>>>>> On Wed, Oct 24, 2018 at 4:12 AM Kenneth Knowles <k...@apache.org>
>>>>> wrote:
>>>>> >
>>>>> > FWIW AutoValue will build most of that class for you, if it is as
>>>>> you say.
>>>>> >
>>>>> > Kenn
>>>>> >
>>>>> > On Tue, Oct 23, 2018 at 6:04 PM Alex Amato <ajam...@google.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Hi Robert + beam dev list,
>>>>> >>
>>>>> >> I was thinking about your feedback in PR#6205, and agree that this
>>>>> monitoring_infos.py became a bit big.
>>>>> >>
>>>>> >> I'm working on the Java Implementation of this now, and want to
>>>>> incorporate some of these ideas and improve on this.
>>>>> >>
>>>>> >> I that that I should make something like a MonitoringInfoBuilder
>>>>> class. With a few methods
>>>>> >>
>>>>> >> setUrn
>>>>> >> setTimestamp
>>>>> >> setting the value (One method for each Type we support. Setting
>>>>> this will also set the type string)
>>>>> >>
>>>>> >> setInt64CounterValue
>>>>> >> setDoubleCounterValue
>>>>> >> setLatestInt64
>>>>> >> setTopNInt64
>>>>> >> setMonitoringDataTable
>>>>> >> setDistributionInt64
>>>>> >> ...
>>>>> >>
>>>>> >> setting labels (will set the key and value properly for the label)
>>>>> >>
>>>>> >> setPTransform(value)
>>>>> >> setPcollection(value)
>>>>> >> ...
>>>>> >>
>>>>> >>
>>>>> >> I think this will make building a metric much easier, you would
>>>>> just call the 4 methods and the .build(). These builders are common in
>>>>> Java. (I guess there is a similar thing way we could do in python? I'd 
>>>>> like
>>>>> to go back and refactor that as well when I am done)
>>>>> >>
>>>>> >> -------------
>>>>> >>
>>>>> >> As for your other suggestion to define metrics in the proto/enum
>>>>> file instead of the yaml file. I am not too sure about the best strategy
>>>>> for this. My initial thoughts are:
>>>>> >>
>>>>> >> Make a proto extension allowing you to describe/define a
>>>>> MonitoringInfo's (the same info as the metric_definitions.yaml file):
>>>>> >>
>>>>> >> URN
>>>>> >> Type
>>>>> >> Labels required
>>>>> >> Annotations: Description, Units, etc.
>>>>> >>
>>>>> >> Make the builder read in that MonitoringInfo definision/description
>>>>> assert everything is set properly? I think this would be a decent data
>>>>> driven approach.
>>>>> >>
>>>>> >> I was wondering if you had something else in mind?
>>>>> >>
>>>>> >> Thanks
>>>>> >> Alex
>>>>> >>
>>>>> >>
>>>>>
>>>>

Re: Follow up ideas, to simplify creating MonitoringInfos.

Reply via email to