Re: JobConfig diff API

Maxim Khutornenko Wed, 16 Sep 2015 10:23:48 -0700

The benefit of assuming a certain format is the richer experience we
can give to our users. The *blob* may be too large to make any sense
of it during diffing. I don't propose to enforce any schema though,
that would be too restrictive. I do however believe assuming JSON
format would be an acceptable tradeoff.


Alternatively, we may allow non-JSON (e.g. binary) executor data blobs
and disabling JSON diffs for any executor types that don't follow the
guidance. This will result in a degraded user experience but may be
the middle ground here. Thoughts?

On Tue, Sep 15, 2015 at 11:43 AM,  <meghdoo...@yahoo.com.invalid> wrote:
> Isn't this data supposed to be any blob that transparently passes in to the 
> executor through mesos data blob. Why would we want to impose any sort of 
> format? It could be a binary blob. Executor writers should be able to move 
> between different schedulers/frameworks without any rework ideally. This 
> field seems like more like garbage in and garbage out and only understood by 
> end client and the executor. Scheduler may stay out of it.
> If you compute hash and indicate same or different data between 2 job update 
> diff, that may be reasonable.
>
> My 2 cents.
>
> Thx
>
> Sent from my iPhone
>
>> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zma...@apache.org> wrote:
>>
>> I'm a proponent of firming up our executor <-> scheduler contract. Since we
>> are going to get multiple executor support soon I think it would be nice if
>> we said that ExecutorConfig.data was JSON.
>>
>> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <ma...@apache.org>
>> wrote:
>>
>>> | I hope this doesn't mean we would be returning a textual
>>> representation of a diff
>>>
>>> If we can make an assumption that executor data is always JSON, we can
>>> deliver a much more specific answer by applying JSON diff tools.
>>> Something like:
>>>
>>> - "environment": "prod"
>>> + "environment": "test"
>>>
>>> Otherwise, we would have to output the entire ExecutorConfig.data blob
>>> content for both left and right sides and let users figure out the
>>> problem. I don't think that's acceptable.
>>>
>>> Does it make sense? Any suggestions on the output format of the diff?
>>> I think it should be structured but at the same time we have to get
>>> down to text level at some point to report concrete discrepancies.
>>>
>>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wfar...@apache.org> wrote:
>>>> The 'blob'-iness of ExecutorConfig is intentional so that we can support
>>>> alternative executors.  I'd hate for that to go away.
>>>>
>>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jfarr...@apache.org>
>>>> wrote:
>>>>
>>>>> This is one of the hoops encountered when using the Thrift api directly
>>> and
>>>>> not using the client, I'd love to see ExecutorConfig.data move to a
>>> thrift
>>>>> object and not be a string blob
>>>>>
>>>>> -Jake
>>>>>
>>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wfar...@apache.org>
>>> wrote:
>>>>>
>>>>>> I like the idea of adding this API, but i don't see why it requires
>>> us to
>>>>>> make assumptions about ExecutorConfig.data.  I hope this doesn't mean
>>> we
>>>>>> would be returning a textual representation of a diff.  Can you
>>> elaborate
>>>>>> on that?
>>>>>>
>>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <ma...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Problem:
>>>>>>> We currently don't have a good user experience around "aurora job
>>>>>>> diff" command. The task configs are dumped as "prettified" JSON
>>>>>>> strings and diffed with the system diff tool. Anyone who tried to
>>> use
>>>>>>> it knows it can be very hard to read especially when it comes to
>>>>>>> executor data deltas. Also, the implementation is done completely
>>>>>>> within the Aurora client making it hard to reuse this feature by
>>> other
>>>>>>> clients (e.g.: an external deploy coordination tool).
>>>>>>>
>>>>>>> Proposal:
>>>>>>> Move the diff logic to the scheduler and expose it via a new
>>>>>>> jobConfigDiff thrift API.
>>>>>>>
>>>>>>> Benefits:
>>>>>>> - Client will no longer have the custom non-reusable logic moving us
>>>>>>> closer towards a "thin client" goal.
>>>>>>> - The new RPC can be fully used by any existing or new API clients.
>>>>>>> - The diff output will be improved via leveraging third party POJO
>>>>>>> and/or JSON diff libraries [1,2,3, etc.].
>>>>>>> - The server updater can be partially/fully unified with the new
>>> diff
>>>>>>> logic further improving the overall DRY-ness.
>>>>>>>
>>>>>>> Concerns:
>>>>>>> - The executor data is currently treated as an opaque string blob on
>>>>>>> the scheduler side. In reality, it's almost guaranteed to be JSON.
>>> In
>>>>>>> order to deliver the best UX, the scheduler would have to start
>>>>>>> requiring ExecutorConfig.data to be a valid JSON.
>>>>>>>
>>>>>>> Any other concerns/objections/comments? I would like to formalize
>>> the
>>>>>>> proposal be EOW if we reach consensus quickly.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Maxim
>>>>>>>
>>>>>>> [1] -
>>> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started
>>>>>>> [2] - http://javers.org/documentation/diff-examples/
>>>>>>> [3] - https://github.com/skyscreamer/JSONassert
>>>
>>> --
>>> Zameer Manji
>>>
>>>

Re: JobConfig diff API

Reply via email to