I'm not convinced by your argument about adding a read only RPC that's not
covered by the traditional integration test is going to cause bit rot. We
already have an RPC that is not covered by the traditional integration test
`pulseJobUpdate` and it hasn't bit rotted AFAIK.

Further some members of the community have been writing alternative clients
around our API and I think adding this RPC will better support their use
cases. `JobUpdate` is a first class struct in our API and I think it makes
sense to expose a query interface for it.

If integration tests are your primary concern, I can also add an
integration test for this.

Regarding adding `JobUpdateQuery` to `getJobUpdateDetails`, I'm no longer
convinced that it's the best way to go because of the risk it contains.
It's all to easy to craft a query that can pull a lot of data that can take
down the scheduler. In my case, I sometimes see it being useful for getting
all of the `JobUpdates` of a role (completed and active), but that risks
pulling a lot of unnecessary data.

I will also add that since `JobUpdate` is a first class struct in our
storage and APIs, this RPC contains very minimal code. I think it's highly
unlikely that it will bit rot.

On Fri, Sep 2, 2016 at 2:52 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> As I mentioned in Slack, I am ok with the new RPC as long as there is
> a use for it elsewhere in the client or UI. Adding a read-only RPC
> that isn't going to be called by our traditional integration test
> clients sets a fertile ground for bit rot.
>
> I am actually warming up to your original proposal of adding
> JobUpdateQuery into the existing getJobUpdateDetails RPC. While it may
> be more expensive to pull multiple updates, we don't necessarily risk
> much after we migrated to MVStore on the H2 side. There are no table
> locks acquired and the only downside would be pulling events along
> with what you need. Provided the query is narrowly scoped, that should
> deliver acceptable performance.
>
> On Thu, Sep 1, 2016 at 2:24 PM, Zameer Manji <zma...@apache.org> wrote:
> > Hey,
> >
> > I've noticed a hole in our current API which makes it difficult to write
> > external clients and other tooling around fetching the state of updates.
> >
> > Currently, to fetch updates we are given two RPCs:
> > ````
> > /** Gets job update summaries. */
> > Response getJobUpdateSummaries(1: JobUpdateQuery jobUpdateQuery)
> >
> > /** Gets job update details. */
> > Response getJobUpdateDetails(1: JobUpdateKey key)
> >
> > ````
> >
> > The `getJobUpdateSummaries` RPC is not scoped to a single update and
> > returns a
> > set of `JobUpdateSummary` structs. The struct is defined:
> > ````
> > /** Summary of the job update including job key, user and current state.
> */
> > struct JobUpdateSummary {
> >   /** Unique identifier for the update. */
> >   5: JobUpdateKey key
> >
> >   /** User initiated an update. */
> >   3: string user
> >
> >   /** Current job update state. */
> >   4: JobUpdateState state
> > }
> > ````
> >
> > The `getJobUpdateDetails` RPC is scoped to a single update and returns
> the
> > following struct:
> >
> > ````
> > struct JobUpdateDetails {
> >   /** Update definition. */
> >   1: JobUpdate update
> >
> >   /** History for this update. */
> >   2: list<JobUpdateEvent> updateEvents
> >
> >   /** History for the individual instances updated. */
> >   3: list<JobInstanceUpdateEvent> instanceEvents
> > }
> >
> > ````
> >
> > Maxim mentioned to me yesterday that this RPC is scoped to a single
> update
> > because assembling the `instanceEvents` can be extremely expensive. A
> query
> > that
> > could span more than a single update risks taking down the scheduler in a
> > large
> > cluster.
> >
> >
> > The problem I discovered is that there is no batch API to get the
> > inexpensive
> > information inside the `JobUpdate` struct. For reference this struct
> > contains:
> >
> > ````
> > /** Full definition of the job update. */
> > struct JobUpdate {
> >   /** Update summary. */
> >   1: JobUpdateSummary summary
> >
> >   /** Update configuration. */
> >   2: JobUpdateInstructions instructions
> > }
> > ````
> >
> > Consumers are forced to make several `getJobUpdateDetails` calls to get
> > multiple
> > `JobUpdate` structs. Since the `JobUpdate` struct is not expensive to
> > assemble,
> > I'm proposing a new RPC that will allow consumers to get several
> `JobUpdate`
> > structs in a single call.
> >
> > ````
> > /** Gets job updates. */
> > Response getJobUpdates(1: JobUpdateQuery jobUpdateQuery)
> > ````
> >
> > If there are no objections, I will file tickets and put up a patch to
> > implement
> > this.
> >
> > --
> > Zameer Manji
>

Reply via email to