The benefit of assuming a certain format is the richer experience we can give to our users. The *blob* may be too large to make any sense of it during diffing. I don't propose to enforce any schema though, that would be too restrictive. I do however believe assuming JSON format would be an acceptable tradeoff.
Alternatively, we may allow non-JSON (e.g. binary) executor data blobs and disabling JSON diffs for any executor types that don't follow the guidance. This will result in a degraded user experience but may be the middle ground here. Thoughts? On Tue, Sep 15, 2015 at 11:43 AM, <meghdoo...@yahoo.com.invalid> wrote: > Isn't this data supposed to be any blob that transparently passes in to the > executor through mesos data blob. Why would we want to impose any sort of > format? It could be a binary blob. Executor writers should be able to move > between different schedulers/frameworks without any rework ideally. This > field seems like more like garbage in and garbage out and only understood by > end client and the executor. Scheduler may stay out of it. > If you compute hash and indicate same or different data between 2 job update > diff, that may be reasonable. > > My 2 cents. > > Thx > > Sent from my iPhone > >> On Sep 15, 2015, at 11:01 AM, Zameer Manji <zma...@apache.org> wrote: >> >> I'm a proponent of firming up our executor <-> scheduler contract. Since we >> are going to get multiple executor support soon I think it would be nice if >> we said that ExecutorConfig.data was JSON. >> >> On Tue, Sep 15, 2015 at 10:47 AM, Maxim Khutornenko <ma...@apache.org> >> wrote: >> >>> | I hope this doesn't mean we would be returning a textual >>> representation of a diff >>> >>> If we can make an assumption that executor data is always JSON, we can >>> deliver a much more specific answer by applying JSON diff tools. >>> Something like: >>> >>> - "environment": "prod" >>> + "environment": "test" >>> >>> Otherwise, we would have to output the entire ExecutorConfig.data blob >>> content for both left and right sides and let users figure out the >>> problem. I don't think that's acceptable. >>> >>> Does it make sense? Any suggestions on the output format of the diff? >>> I think it should be structured but at the same time we have to get >>> down to text level at some point to report concrete discrepancies. >>> >>>> On Mon, Sep 14, 2015 at 8:58 PM, Bill Farner <wfar...@apache.org> wrote: >>>> The 'blob'-iness of ExecutorConfig is intentional so that we can support >>>> alternative executors. I'd hate for that to go away. >>>> >>>>> On Mon, Sep 14, 2015 at 8:56 PM, Jake Farrell <jfarr...@apache.org> >>>> wrote: >>>> >>>>> This is one of the hoops encountered when using the Thrift api directly >>> and >>>>> not using the client, I'd love to see ExecutorConfig.data move to a >>> thrift >>>>> object and not be a string blob >>>>> >>>>> -Jake >>>>> >>>>> On Mon, Sep 14, 2015 at 9:28 PM, Bill Farner <wfar...@apache.org> >>> wrote: >>>>> >>>>>> I like the idea of adding this API, but i don't see why it requires >>> us to >>>>>> make assumptions about ExecutorConfig.data. I hope this doesn't mean >>> we >>>>>> would be returning a textual representation of a diff. Can you >>> elaborate >>>>>> on that? >>>>>> >>>>>> On Mon, Sep 14, 2015 at 4:14 PM, Maxim Khutornenko <ma...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Problem: >>>>>>> We currently don't have a good user experience around "aurora job >>>>>>> diff" command. The task configs are dumped as "prettified" JSON >>>>>>> strings and diffed with the system diff tool. Anyone who tried to >>> use >>>>>>> it knows it can be very hard to read especially when it comes to >>>>>>> executor data deltas. Also, the implementation is done completely >>>>>>> within the Aurora client making it hard to reuse this feature by >>> other >>>>>>> clients (e.g.: an external deploy coordination tool). >>>>>>> >>>>>>> Proposal: >>>>>>> Move the diff logic to the scheduler and expose it via a new >>>>>>> jobConfigDiff thrift API. >>>>>>> >>>>>>> Benefits: >>>>>>> - Client will no longer have the custom non-reusable logic moving us >>>>>>> closer towards a "thin client" goal. >>>>>>> - The new RPC can be fully used by any existing or new API clients. >>>>>>> - The diff output will be improved via leveraging third party POJO >>>>>>> and/or JSON diff libraries [1,2,3, etc.]. >>>>>>> - The server updater can be partially/fully unified with the new >>> diff >>>>>>> logic further improving the overall DRY-ness. >>>>>>> >>>>>>> Concerns: >>>>>>> - The executor data is currently treated as an opaque string blob on >>>>>>> the scheduler side. In reality, it's almost guaranteed to be JSON. >>> In >>>>>>> order to deliver the best UX, the scheduler would have to start >>>>>>> requiring ExecutorConfig.data to be a valid JSON. >>>>>>> >>>>>>> Any other concerns/objections/comments? I would like to formalize >>> the >>>>>>> proposal be EOW if we reach consensus quickly. >>>>>>> >>>>>>> Thanks, >>>>>>> Maxim >>>>>>> >>>>>>> [1] - >>> http://java-object-diff.readthedocs.org/en/latest/getting-started/#getting-started >>>>>>> [2] - http://javers.org/documentation/diff-examples/ >>>>>>> [3] - https://github.com/skyscreamer/JSONassert >>> >>> -- >>> Zameer Manji >>> >>>