> On June 16, 2014, 5:45 p.m., Maxim Khutornenko wrote: > > src/main/python/apache/aurora/client/cli/jobs.py, lines 181-182 > > <https://reviews.apache.org/r/22457/diff/3/?file=609592#file609592line181> > > > > I don't think it's enough to json-serialize a thrift task. This is > > bound to set/dict serialization sorting problem we have battled with in > > updater.py. Specifically this block: > > https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/api/updater.py#L200-L223 > > > > Without sorting, the diff will only work 99% of the time. This has been > > demonstrated in a few real-life job updates. > > Mark Chu-Carroll wrote: > This isn't a new thing in this updated diff system - this is the current > behavior. > > If you use the new json-tree-diff, you won't have this problem; but for > people who rely on the current diff > behavior, I'd rather not change that. > > > Maxim Khutornenko wrote: > | If you use the new json-tree-diff, you won't have this problem; > > Perhaps I am missing something but how would the new approach ensure > equivalence of these two structs? > { 'set1': { ['k1':'v1', 'k2':'v2']} } > { 'set2': { ['k2':'v2', 'k1':'v1']} } > > It's "normal" to see element re-ordering like this during thrift struct > serialization. > > Maxim Khutornenko wrote: > Here is a real TaskConfig json fragment and its reordered twin: > > 1: u'constraints': [{u'name': u'value', u'constraint': {u'values': [u'1', > u'2']}}, {u'name': u'limit', u'constraint': {u'limit': {u'limit': 10}}}] > 2: u'constraints': [{u'constraint': {u'values': [u'2', u'1']}, u'name': > u'value'}, {u'name': u'limit', u'constraint': {u'limit': {u'limit': 10}}}] > > Mark Chu-Carroll wrote: > I don't think that the reordering that you show in your first example > isn't valid json. Key-value pairs belong in a dictionary, not a list. In a > dictionary, the comparison routine does the right thing: it compares key by > key, without regard to ordering. > > In the second example: I'd argue that that's actually a bug in our > serializer: it's using an ordered list for an unordered constraint. Diff > can't tell that the list structure isn't really supposed to be ordered; the > serializer should be canonicalizing it. > > > Maxim Khutornenko wrote: > Not sure how else you could represent a python set in JSON without > completely changing the underlying data structure (e.g. emitting fake keys > for every element and converting set into a dictionary with sorted keys). > Agree, the TJSONProtocol could be a bit smarter when dealing with python sets > but the reality is that python sets are inherently "unhashable" structs. I > have seen cases when completely repr-equivalent python sets were not > comparing as equal.
All the serializer would need to do is just canonicalize when it converts to a list. It's not an issue of being unhashable. But it should at least bother to be *uniform*. If you're converting a set to a list, chose a convention for ordering elements. Put 'em in lexicographic order, problem solved. As it stands, that problem is unsolvable for this diff tool: there's no way to tell that a given field was supposed to be a set, and should be compared without ordering. I'm not sure what you'd like me to do here, short of giving up and throwing away the new diff tool. I don't see any way to solve this, short of creating something that uses thrift introspection. - Mark ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22457/#review45835 ----------------------------------------------------------- On June 18, 2014, 10:59 a.m., Mark Chu-Carroll wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22457/ > ----------------------------------------------------------- > > (Updated June 18, 2014, 10:59 a.m.) > > > Review request for Aurora, David McLaughlin and Brian Wickman. > > > Bugs: aurora-520 > https://issues.apache.org/jira/browse/aurora-520 > > > Repository: aurora > > > Description > ------- > > Add a new diff method, which uses field-by-field comparison of JSON trees for > comparing running job configurations to potentially updated configs. > > - Allow exclusion of semantically irrelevant fields. > - Provide a clearer list of the differences between configs. > - Provide a scripting-friendly alternative JSON syntax for diffs. > > The old diff behavior is still available under the "--use-shell-diff" option. > > > Diffs > ----- > > src/main/python/apache/aurora/client/cli/BUILD > ebe681a0d1735b7cc695dc3b7a14c4292d87ae32 > src/main/python/apache/aurora/client/cli/jobs.py > 4fa03a6c9919651551238b0dc211ed69a8dfe565 > src/main/python/apache/aurora/client/cli/json_tree_diff.py PRE-CREATION > src/test/python/apache/aurora/client/cli/BUILD > 3c88ed7cf9f654bbbd80d1d44aa1dd1c8655e378 > src/test/python/apache/aurora/client/cli/test_diff.py > 38629b63c082cf81cb891dace2a70d9e8f418e18 > src/test/python/apache/aurora/client/cli/test_json_diff.py PRE-CREATION > > Diff: https://reviews.apache.org/r/22457/diff/ > > > Testing > ------- > > New unit tests of the JSON tree diff code, plus a bunch of new "job diff" > tests of the new functionality. > All tests pass. > > > Thanks, > > Mark Chu-Carroll > >