[ 
https://issues.apache.org/jira/browse/MESOS-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360011#comment-14360011
 ] 

Michael Park edited comment on MESOS-2487 at 3/13/15 6:33 AM:
--------------------------------------------------------------

h3. TL;DR
* Note that we don't always want full member-wise equality.
* I don't think we can solve this at the protobuf level.
* Ideal solution I think is to a tool like {{clang-tidy}}.
* Considered adding a default {{operator==}} for {{google::protobuf::Message}} 
which does full member-wise equality by default, but I think we're worse off.
* I don't have a good short-term recommendation, so if you're looking for a 
solution for right now... stop reading.

h3. Important Note

An important thing to remember is that full member-wise equality is not always 
what we want. Some objects include *inessential* parts that should be ignored. 
Here are a few examples in Mesos:

1. {{DiskInfo}}: This one's fairly new so I remember it off the top of my head. 
Here's [~jieyu]'s comment:

{code}
// NOTE: We ignore 'volume' inside DiskInfo when doing comparison
// because it describes how this resource will be used which has
// nothing to do with the Resource object itself. A framework can
// use this resource and specify different 'volume' every time it
// uses it
{code}

In our context this means: {{volume}} is an inessential component of 
{{DiskInfo}} as far as equality is concerned.

2. {{FrameworkInfo}}: It has many fields, but the {{operator==}} implementation 
only compares for {{user}} and {{name}}. These 2 fields are what's considered 
*essential* for {{FrameworkInfo}}.

Only the author of the class knows what parts are essential. Therefore if we 
decide to provide a full member-wise equality as the default implementation of 
{{operator==}}, we must ensure that the author can *override* this behavior. If 
protobuf generated the {{operator==}} for all messages, the wouldn't be 
possible. This I think is the more important point than [overhead in generated 
code size|https://code.google.com/p/protobuf/issues/detail?id=154].

h3. Problem

The problem then is that when we add a new essential member, we end up with 
unexpected behavior and it doesn't result in a compile-time error. If we add a 
non-essential member, things work fine... but ideally we want an explicit 
acknowledgement from the programmer that it really is non-essential.

h3. Ideal Solution

First off, I don't think we can solve the problem at the protobuf level since 
we can't auto-generate the correct implementations. Even if we had a way to add 
attributes to a field to mark it as *non-essential*, we still don't know what 
exact equality comparisons the programmer wants for the *essential* parts. For 
example, {{repeated Resource resources;}} fields need to construct into our C++ 
{{Resources}} classes so that we can collapse them and perform {{contains}} 
both ways, another example is a list of environment variables which we perform 
set equality on rather than list equality.

I think the ideal solution would be to: extract the protobuf message schema and 
feed it into a tool like {{clang-tidy}} to check something like: "for 
{{operator==}} implementations that compares 2 protobuf messages, enforce that 
all of the members are mentioned".

For a full member-wise equality type such as {{FrameworkID}}, we would have:

{code}
bool operator == (const FrameworkID& lhs, const FrameworkID& rhs)
{
  return lhs.value() == rhs.value();  // All members mentioned.
}
{code}

For a partial member-wise equality such as {{DiskInfo}}, we would need:

{code}
bool operator == (const DiskInfo& lhs, const DiskInfo& rhs)
{
  // Explicit acknowledgement that this is non-essential... plus the existing 
note about why.
  (void)lhs.volume(); (void)rhs.volume();
  if (left.has_persistence() != rhs.has_persistence()) {
    return false;
  }
  if (lhs.has_persistence()) {
    return lhs.persistence().id() == rhs.persistence().id();
  }
}
{code}

When a new member is added, we'll be notified to handle the case whether the 
new member is essential or non-essential. If we added a new member to a message 
that doesn't have an {{operator==}} implemented, well then it doesn't matter to 
us.

I think this would be nice, but it's probably far out.

h3. Considered Solution

What if we provide a default {{operator==}} and {{operator!=}} implementation 
which compares all the fields for all protobuf messages? If there's no specific 
{{operator==}} defined, adding a new essential member just works, but adding a 
non-essential member now gives unexpected behavior with no errors or warnings. 
If there *is* a specific {{operator==}} defined, adding a new essential member 
still breaks without warning, and adding a non-essential member still works. 
Now we end up with 4 cases instead 2. I considered this solution but I think 
this might be actually be worse.

The implementation is pretty simple if we wanted to pursue it as a stopgap 
solution however.
As suggested by Kenton Varda (protobuf author) 
[here|https://code.google.com/p/protobuf/issues/detail?id=154], and his quote 
[here|https://groups.google.com/forum/#!topic/protobuf/BPIL4jQ7D3U].

{quote}
Another thing you could do which might be faster is serialize the message and 
compare the raw bytes.  As long as the message contains no unknown fields, it 
will be serialized in canonical order.  You can use DiscardUnknownFields() in 
C++ to remove all unknown fields from the message.
{quote}

*NOTE*: I actually don't know what unknown fields are and I can't seem to get 
anything useful from googling "protobuf unknown fields" but my guess is that we 
don't use it. If we do... well, I don't like this solution anyway.

{code}
inline bool operator == (
    const google::protobuf::Message& left,
    const google::protobuf::Message& right)
{
  return left.SerializeAsString() == right.SerializeAsString();
}

inline bool operator != (
    const google::protobuf::Message& left,
    const google::protobuf::Message& right)
{
  return !(left == right);
}
{code}

I did a little test by adding the above function and removing the following 
{{operator==}} implementations that checked for full member-wise equality. 
That's not all of them, I just picked out a few for testing purposes.
* {{ContainerID}}
* {{ExecutorID}}
* {{FrameworkID}}
* {{OfferID}}
* {{SlaveID}}
* {{TaskID}}
* {{MasterInfo}}

I accidentally removed {{FrameworkInfo}} as well and {{make check}} broke. Then 
realized that {{FrameworkInfo}} only compares {{user}} and {{name}}, so I added 
it back and {{make check}} worked. So... I think it works.

h3. Conclusion

I don't have a good short-term recommendation for a solution. Just wanted to 
document my ideas and thoughts around this.


was (Author: mcypark):
h3. TL;DR
* Note that we don't always want full member-wise equality.
* I don't think we can solve this at the protobuf level.
* Ideal solution I think is to a tool like {{clang-tidy}}.
* Considered adding a default {{operator==}} for {{google::protobuf::Message}} 
which does full member-wise equality by default, but I think we're worse off.
* I don't have a good short-term recommendation, so if you're looking for a 
solution for right now... stop reading.

h3. Important Note

An important thing to remember is that full member-wise equality is not always 
what we want. Some objects include *inessential* parts that should be ignored. 
Here are a few examples in Mesos:

1. {{DiskInfo}}: This one's fairly new so I remember it off the top of my head. 
Here's [~jieyu]'s comment:

{code}
// NOTE: We ignore 'volume' inside DiskInfo when doing comparison
// because it describes how this resource will be used which has
// nothing to do with the Resource object itself. A framework can
// use this resource and specify different 'volume' every time it
// uses it
{code}

In our context this means: {{volume}} is an inessential component of 
{{DiskInfo}} as far as equality is concerned.

2. {{FrameworkInfo}}: It has many fields, but the {{operator==}} implementation 
only compares for {{user}} and {{name}}. These 2 fields are what's considered 
*essential* for {{FrameworkInfo}}.

Only the author of the class knows what parts are essential. Therefore if we 
decide to provide a full member-wise equality as the default implementation of 
{{operator==}}, we must ensure that the author can *override* this behavior. If 
protobuf generated the {{operator==}} for all messages, the wouldn't be 
possible. This I think is the more important point than [overhead in generated 
code size|https://code.google.com/p/protobuf/issues/detail?id=154].

h3. Problem

The problem then is that when we add a new essential member, we end up with 
unexpected behavior and it doesn't result in a compile-time error. If we add a 
non-essential member, things work fine... but ideally we want an explicit 
acknowledgement from the programmer that it really is non-essential.

h3. Ideal Solution

First off, I don't think we can solve the problem at the protobuf level since 
we can't auto-generate the correct implementations. Even if we had a way to add 
attributes to a field to mark it as *non-essential*, we still don't know what 
exact equality comparisons the programmer wants for the *essential* parts. For 
example, {{repeated Resource resources;}} fields need to construct into our C++ 
{{Resources}} classes so that we can collapse them and perform {{contains}} 
both ways, another example is a list of environment variables which we perform 
set equality on rather than list equality.

I think the ideal solution would be to: extract the protobuf message schema and 
feed it into a tool like {{clang-tidy}} to check something like: "for 
{{operator==}} implementations that compares 2 protobuf messages, enforce that 
all of the members are mentioned".

For a full member-wise equality type such as {{FrameworkID}}, we would have:

{code}
bool operator == (const FrameworkID& lhs, const FrameworkID& rhs)
{
  return lhs.value() == rhs.value();  // All members mentioned.
}
{code}

For a partial member-wise equality such as {{DiskInfo}}, we would need:

{code}
bool operator == (const DiskInfo& lhs, const DiskInfo& rhs)
{
  // Explicit acknowledgement that this is non-essential... plus the existing 
note about why.
  (void)lhs.volume(); (void)rhs.volume();
  if (left.has_persistence() != rhs.has_persistence()) {
    return false;
  }
  if (lhs.has_persistence()) {
    return lhs.persistence().id() == rhs.persistence().id();
  }
}
{code}

When a new member is added, we'll be notified to handle the case whether the 
new member is essential or non-essential. If we added a new member to a message 
that doesn't have an {{operator==}} implemented, well then it doesn't matter to 
us.

I think this would be nice, but it's probably far out.

h3. Considered Solution

What if we provide a default {{operator==}} and {{operator!=}} implementation 
which compares all the fields for all protobuf messages? If there's no specific 
{{operator==}} defined, adding a new essential member just works, but adding a 
non-essential member now gives unexpected behavior with no errors or warnings. 
If there *is* a specific {{operator==}} defined, adding a new essential member 
still breaks without warning, and adding a non-essential member still works. 
Now we end up with 4 cases instead 2. I considered this solution but I think 
this might be actually be worse.

The implementation is pretty simple if we wanted to pursue it as a stopgap 
solution however.
As suggested by Kenton Varda (protobuf author) 
[here|https://code.google.com/p/protobuf/issues/detail?id=154], and his quote 
[here|https://groups.google.com/forum/#!topic/protobuf/BPIL4jQ7D3U].

{quote}
Another thing you could do which might be faster is serialize the message and 
compare the raw bytes.  As long as the message contains no unknown fields, it 
will be serialized in canonical order.  You can use DiscardUnknownFields() in 
C++ to remove all unknown fields from the message.
{quote}

*NOTE*: I actually don't know what unknown fields are and I can't seem to get 
anything useful from googling "protobuf unknown fields" but my guess is that we 
don't use it. If we do... well, I don't like this solution anyway.

{code}
inline bool operator == (
    const google::protobuf::Message& left,
    const google::protobuf::Message& right)
{
  return left.SerializeAsString() == right.SerializeAsString();
}

inline bool operator != (
    const google::protobuf::Message& left,
    const google::protobuf::Message& right)
{
  return !(left == right);
}
{code}

I did a little test by adding the above function and removing the following 
{{operator==}} implementations that checked for full member-wise equality. 
That's not all of them, I just picked out a few for testing purposes.
* {{ContainerID}}
* {{ExecutorID}}
* {{FrameworkID}}
* {{OfferID}}
* {{SlaveID}}
* {{TaskID}}
* {{MasterInfo}}

I accidentally removed {{FrameworkInfo}} as well and {{make check}} broke. Then 
realized that {{FrameworkInfo}} only compares {{user}} and {{name}}, so I added 
it back and {{make check}} worked. So in conclusion, I think it works.

h3. Conclusion

I don't have a good short-term recommendation for a solution. Just wanted to 
document my ideas and thoughts around this.

> Ensure protobuf "==" operator does not go out of sync with new protobuf fields
> ------------------------------------------------------------------------------
>
>                 Key: MESOS-2487
>                 URL: https://issues.apache.org/jira/browse/MESOS-2487
>             Project: Mesos
>          Issue Type: Task
>            Reporter: Vinod Kone
>
> Currently when a new field is added to a protobuf that has a custom "==" 
> operator defined,  we don't make sure that the field is accounted for in the 
> comparison. Ideally we should catch such errors at build time or 'make check' 
> time. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to