[ https://issues.apache.org/jira/browse/MESOS-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360011#comment-14360011 ]
Michael Park edited comment on MESOS-2487 at 3/13/15 6:33 AM: -------------------------------------------------------------- h3. TL;DR * Note that we don't always want full member-wise equality. * I don't think we can solve this at the protobuf level. * Ideal solution I think is to a tool like {{clang-tidy}}. * Considered adding a default {{operator==}} for {{google::protobuf::Message}} which does full member-wise equality by default, but I think we're worse off. * I don't have a good short-term recommendation, so if you're looking for a solution for right now... stop reading. h3. Important Note An important thing to remember is that full member-wise equality is not always what we want. Some objects include *inessential* parts that should be ignored. Here are a few examples in Mesos: 1. {{DiskInfo}}: This one's fairly new so I remember it off the top of my head. Here's [~jieyu]'s comment: {code} // NOTE: We ignore 'volume' inside DiskInfo when doing comparison // because it describes how this resource will be used which has // nothing to do with the Resource object itself. A framework can // use this resource and specify different 'volume' every time it // uses it {code} In our context this means: {{volume}} is an inessential component of {{DiskInfo}} as far as equality is concerned. 2. {{FrameworkInfo}}: It has many fields, but the {{operator==}} implementation only compares for {{user}} and {{name}}. These 2 fields are what's considered *essential* for {{FrameworkInfo}}. Only the author of the class knows what parts are essential. Therefore if we decide to provide a full member-wise equality as the default implementation of {{operator==}}, we must ensure that the author can *override* this behavior. If protobuf generated the {{operator==}} for all messages, the wouldn't be possible. This I think is the more important point than [overhead in generated code size|https://code.google.com/p/protobuf/issues/detail?id=154]. h3. Problem The problem then is that when we add a new essential member, we end up with unexpected behavior and it doesn't result in a compile-time error. If we add a non-essential member, things work fine... but ideally we want an explicit acknowledgement from the programmer that it really is non-essential. h3. Ideal Solution First off, I don't think we can solve the problem at the protobuf level since we can't auto-generate the correct implementations. Even if we had a way to add attributes to a field to mark it as *non-essential*, we still don't know what exact equality comparisons the programmer wants for the *essential* parts. For example, {{repeated Resource resources;}} fields need to construct into our C++ {{Resources}} classes so that we can collapse them and perform {{contains}} both ways, another example is a list of environment variables which we perform set equality on rather than list equality. I think the ideal solution would be to: extract the protobuf message schema and feed it into a tool like {{clang-tidy}} to check something like: "for {{operator==}} implementations that compares 2 protobuf messages, enforce that all of the members are mentioned". For a full member-wise equality type such as {{FrameworkID}}, we would have: {code} bool operator == (const FrameworkID& lhs, const FrameworkID& rhs) { return lhs.value() == rhs.value(); // All members mentioned. } {code} For a partial member-wise equality such as {{DiskInfo}}, we would need: {code} bool operator == (const DiskInfo& lhs, const DiskInfo& rhs) { // Explicit acknowledgement that this is non-essential... plus the existing note about why. (void)lhs.volume(); (void)rhs.volume(); if (left.has_persistence() != rhs.has_persistence()) { return false; } if (lhs.has_persistence()) { return lhs.persistence().id() == rhs.persistence().id(); } } {code} When a new member is added, we'll be notified to handle the case whether the new member is essential or non-essential. If we added a new member to a message that doesn't have an {{operator==}} implemented, well then it doesn't matter to us. I think this would be nice, but it's probably far out. h3. Considered Solution What if we provide a default {{operator==}} and {{operator!=}} implementation which compares all the fields for all protobuf messages? If there's no specific {{operator==}} defined, adding a new essential member just works, but adding a non-essential member now gives unexpected behavior with no errors or warnings. If there *is* a specific {{operator==}} defined, adding a new essential member still breaks without warning, and adding a non-essential member still works. Now we end up with 4 cases instead 2. I considered this solution but I think this might be actually be worse. The implementation is pretty simple if we wanted to pursue it as a stopgap solution however. As suggested by Kenton Varda (protobuf author) [here|https://code.google.com/p/protobuf/issues/detail?id=154], and his quote [here|https://groups.google.com/forum/#!topic/protobuf/BPIL4jQ7D3U]. {quote} Another thing you could do which might be faster is serialize the message and compare the raw bytes. As long as the message contains no unknown fields, it will be serialized in canonical order. You can use DiscardUnknownFields() in C++ to remove all unknown fields from the message. {quote} *NOTE*: I actually don't know what unknown fields are and I can't seem to get anything useful from googling "protobuf unknown fields" but my guess is that we don't use it. If we do... well, I don't like this solution anyway. {code} inline bool operator == ( const google::protobuf::Message& left, const google::protobuf::Message& right) { return left.SerializeAsString() == right.SerializeAsString(); } inline bool operator != ( const google::protobuf::Message& left, const google::protobuf::Message& right) { return !(left == right); } {code} I did a little test by adding the above function and removing the following {{operator==}} implementations that checked for full member-wise equality. That's not all of them, I just picked out a few for testing purposes. * {{ContainerID}} * {{ExecutorID}} * {{FrameworkID}} * {{OfferID}} * {{SlaveID}} * {{TaskID}} * {{MasterInfo}} I accidentally removed {{FrameworkInfo}} as well and {{make check}} broke. Then realized that {{FrameworkInfo}} only compares {{user}} and {{name}}, so I added it back and {{make check}} worked. So... I think it works. h3. Conclusion I don't have a good short-term recommendation for a solution. Just wanted to document my ideas and thoughts around this. was (Author: mcypark): h3. TL;DR * Note that we don't always want full member-wise equality. * I don't think we can solve this at the protobuf level. * Ideal solution I think is to a tool like {{clang-tidy}}. * Considered adding a default {{operator==}} for {{google::protobuf::Message}} which does full member-wise equality by default, but I think we're worse off. * I don't have a good short-term recommendation, so if you're looking for a solution for right now... stop reading. h3. Important Note An important thing to remember is that full member-wise equality is not always what we want. Some objects include *inessential* parts that should be ignored. Here are a few examples in Mesos: 1. {{DiskInfo}}: This one's fairly new so I remember it off the top of my head. Here's [~jieyu]'s comment: {code} // NOTE: We ignore 'volume' inside DiskInfo when doing comparison // because it describes how this resource will be used which has // nothing to do with the Resource object itself. A framework can // use this resource and specify different 'volume' every time it // uses it {code} In our context this means: {{volume}} is an inessential component of {{DiskInfo}} as far as equality is concerned. 2. {{FrameworkInfo}}: It has many fields, but the {{operator==}} implementation only compares for {{user}} and {{name}}. These 2 fields are what's considered *essential* for {{FrameworkInfo}}. Only the author of the class knows what parts are essential. Therefore if we decide to provide a full member-wise equality as the default implementation of {{operator==}}, we must ensure that the author can *override* this behavior. If protobuf generated the {{operator==}} for all messages, the wouldn't be possible. This I think is the more important point than [overhead in generated code size|https://code.google.com/p/protobuf/issues/detail?id=154]. h3. Problem The problem then is that when we add a new essential member, we end up with unexpected behavior and it doesn't result in a compile-time error. If we add a non-essential member, things work fine... but ideally we want an explicit acknowledgement from the programmer that it really is non-essential. h3. Ideal Solution First off, I don't think we can solve the problem at the protobuf level since we can't auto-generate the correct implementations. Even if we had a way to add attributes to a field to mark it as *non-essential*, we still don't know what exact equality comparisons the programmer wants for the *essential* parts. For example, {{repeated Resource resources;}} fields need to construct into our C++ {{Resources}} classes so that we can collapse them and perform {{contains}} both ways, another example is a list of environment variables which we perform set equality on rather than list equality. I think the ideal solution would be to: extract the protobuf message schema and feed it into a tool like {{clang-tidy}} to check something like: "for {{operator==}} implementations that compares 2 protobuf messages, enforce that all of the members are mentioned". For a full member-wise equality type such as {{FrameworkID}}, we would have: {code} bool operator == (const FrameworkID& lhs, const FrameworkID& rhs) { return lhs.value() == rhs.value(); // All members mentioned. } {code} For a partial member-wise equality such as {{DiskInfo}}, we would need: {code} bool operator == (const DiskInfo& lhs, const DiskInfo& rhs) { // Explicit acknowledgement that this is non-essential... plus the existing note about why. (void)lhs.volume(); (void)rhs.volume(); if (left.has_persistence() != rhs.has_persistence()) { return false; } if (lhs.has_persistence()) { return lhs.persistence().id() == rhs.persistence().id(); } } {code} When a new member is added, we'll be notified to handle the case whether the new member is essential or non-essential. If we added a new member to a message that doesn't have an {{operator==}} implemented, well then it doesn't matter to us. I think this would be nice, but it's probably far out. h3. Considered Solution What if we provide a default {{operator==}} and {{operator!=}} implementation which compares all the fields for all protobuf messages? If there's no specific {{operator==}} defined, adding a new essential member just works, but adding a non-essential member now gives unexpected behavior with no errors or warnings. If there *is* a specific {{operator==}} defined, adding a new essential member still breaks without warning, and adding a non-essential member still works. Now we end up with 4 cases instead 2. I considered this solution but I think this might be actually be worse. The implementation is pretty simple if we wanted to pursue it as a stopgap solution however. As suggested by Kenton Varda (protobuf author) [here|https://code.google.com/p/protobuf/issues/detail?id=154], and his quote [here|https://groups.google.com/forum/#!topic/protobuf/BPIL4jQ7D3U]. {quote} Another thing you could do which might be faster is serialize the message and compare the raw bytes. As long as the message contains no unknown fields, it will be serialized in canonical order. You can use DiscardUnknownFields() in C++ to remove all unknown fields from the message. {quote} *NOTE*: I actually don't know what unknown fields are and I can't seem to get anything useful from googling "protobuf unknown fields" but my guess is that we don't use it. If we do... well, I don't like this solution anyway. {code} inline bool operator == ( const google::protobuf::Message& left, const google::protobuf::Message& right) { return left.SerializeAsString() == right.SerializeAsString(); } inline bool operator != ( const google::protobuf::Message& left, const google::protobuf::Message& right) { return !(left == right); } {code} I did a little test by adding the above function and removing the following {{operator==}} implementations that checked for full member-wise equality. That's not all of them, I just picked out a few for testing purposes. * {{ContainerID}} * {{ExecutorID}} * {{FrameworkID}} * {{OfferID}} * {{SlaveID}} * {{TaskID}} * {{MasterInfo}} I accidentally removed {{FrameworkInfo}} as well and {{make check}} broke. Then realized that {{FrameworkInfo}} only compares {{user}} and {{name}}, so I added it back and {{make check}} worked. So in conclusion, I think it works. h3. Conclusion I don't have a good short-term recommendation for a solution. Just wanted to document my ideas and thoughts around this. > Ensure protobuf "==" operator does not go out of sync with new protobuf fields > ------------------------------------------------------------------------------ > > Key: MESOS-2487 > URL: https://issues.apache.org/jira/browse/MESOS-2487 > Project: Mesos > Issue Type: Task > Reporter: Vinod Kone > > Currently when a new field is added to a protobuf that has a custom "==" > operator defined, we don't make sure that the field is accounted for in the > comparison. Ideally we should catch such errors at build time or 'make check' > time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)