> On June 16, 2014, 5:45 p.m., Maxim Khutornenko wrote:
> > src/main/python/apache/aurora/client/cli/jobs.py, lines 181-182
> > <https://reviews.apache.org/r/22457/diff/3/?file=609592#file609592line181>
> >
> >     I don't think it's enough to json-serialize a thrift task. This is 
> > bound to set/dict serialization sorting problem we have battled with in 
> > updater.py. Specifically this block: 
> > https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/client/api/updater.py#L200-L223
> >     
> >     Without sorting, the diff will only work 99% of the time. This has been 
> > demonstrated in a few real-life job updates.
> 
> Mark Chu-Carroll wrote:
>     This isn't a new thing in this updated diff system - this is the current 
> behavior.
>     
>     If you use the new json-tree-diff, you won't have this problem; but for 
> people who rely on the current diff
>     behavior, I'd rather not change that.
>
> 
> Maxim Khutornenko wrote:
>     | If you use the new json-tree-diff, you won't have this problem; 
>     
>     Perhaps I am missing something but how would the new approach ensure 
> equivalence of these two structs?
>     { 'set1': { ['k1':'v1', 'k2':'v2']} }
>     { 'set2': { ['k2':'v2', 'k1':'v1']} }
>     
>     It's "normal" to see element re-ordering like this during thrift struct 
> serialization.
> 
> Maxim Khutornenko wrote:
>     Here is a real TaskConfig json fragment and its reordered twin:
>     
>     1: u'constraints': [{u'name': u'value', u'constraint': {u'values': [u'1', 
> u'2']}}, {u'name': u'limit', u'constraint': {u'limit': {u'limit': 10}}}]
>     2: u'constraints': [{u'constraint': {u'values': [u'2', u'1']}, u'name': 
> u'value'}, {u'name': u'limit', u'constraint': {u'limit': {u'limit': 10}}}]
> 
> Mark Chu-Carroll wrote:
>     I don't think that the reordering that you show in your first example 
> isn't valid json. Key-value pairs belong in a dictionary, not a list.  In a 
> dictionary, the comparison routine does the right thing: it compares key by 
> key, without regard to ordering.
>     
>     In the second example: I'd argue that that's actually a bug in our 
> serializer: it's using an ordered list for an unordered constraint. Diff 
> can't tell that the list structure isn't really supposed to be ordered; the 
> serializer should be canonicalizing it.
>
> 
> Maxim Khutornenko wrote:
>     Not sure how else you could represent a python set in JSON without 
> completely changing the underlying data structure (e.g. emitting fake keys 
> for every element and converting set into a dictionary with sorted keys). 
> Agree, the TJSONProtocol could be a bit smarter when dealing with python sets 
> but the reality is that python sets are inherently "unhashable" structs. I 
> have seen cases when completely repr-equivalent python sets were not 
> comparing as equal.

All the serializer would need to do is just canonicalize when it converts to a 
list. It's not an issue of being unhashable. But it should at least bother to 
be *uniform*. If you're converting a set to a list, chose a convention for 
ordering elements. Put 'em in lexicographic order, problem solved.

As it stands, that problem is unsolvable for this diff tool: there's no way to 
tell that a given field was supposed to be a set, and should be compared 
without ordering. 

I'm not sure what you'd like me to do here, short of giving up and throwing 
away the new diff tool. I don't see any way to solve this, short of creating 
something that uses thrift introspection.


- Mark


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22457/#review45835
-----------------------------------------------------------


On June 18, 2014, 10:59 a.m., Mark Chu-Carroll wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22457/
> -----------------------------------------------------------
> 
> (Updated June 18, 2014, 10:59 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin and Brian Wickman.
> 
> 
> Bugs: aurora-520
>     https://issues.apache.org/jira/browse/aurora-520
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Add a new diff method, which uses field-by-field comparison of JSON trees for 
> comparing running job configurations to potentially updated configs.
> 
> - Allow exclusion of semantically irrelevant fields.
> - Provide a clearer list of the differences between configs.
> - Provide a scripting-friendly alternative JSON syntax for diffs.
> 
> The old diff behavior is still available under the "--use-shell-diff" option.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/client/cli/BUILD 
> ebe681a0d1735b7cc695dc3b7a14c4292d87ae32 
>   src/main/python/apache/aurora/client/cli/jobs.py 
> 4fa03a6c9919651551238b0dc211ed69a8dfe565 
>   src/main/python/apache/aurora/client/cli/json_tree_diff.py PRE-CREATION 
>   src/test/python/apache/aurora/client/cli/BUILD 
> 3c88ed7cf9f654bbbd80d1d44aa1dd1c8655e378 
>   src/test/python/apache/aurora/client/cli/test_diff.py 
> 38629b63c082cf81cb891dace2a70d9e8f418e18 
>   src/test/python/apache/aurora/client/cli/test_json_diff.py PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22457/diff/
> 
> 
> Testing
> -------
> 
> New unit tests of the JSON tree diff code, plus a bunch of new "job diff" 
> tests of the new functionality.
> All tests pass.
> 
> 
> Thanks,
> 
> Mark Chu-Carroll
> 
>

Reply via email to