Maxim and I discussed offline after the Aurora dev sync today and we think
the best way forward is improving the `getJobUpdateDetails` RPC to take a
`JobUpdateQuery`.

I'll be filing a ticket and moving forward on this.

On Fri, Sep 2, 2016 at 3:36 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> The pulseJobUpdate RPC was added as an entirely new feature. This
> proposal suggests a read-only "convenience" method to pull data that
> is already available via existing means.
>
> As for "taking down the scheduler", this argument is moot. It's
> possible to DDOS scheduler via existing RPCs today and the suggested
> alternative has the same inherent risk of crafting an unscoped query
> that will pull all updates from the store. Also, unless there are a
> LOT of those queries (read dos-attack that OOMs scheduler) it's
> unlikely that the scheduler will sustain any damage as read-only
> queries don't acquire any global or table locks.
>
> On Fri, Sep 2, 2016 at 3:24 PM, Zameer Manji <zma...@apache.org> wrote:
> > I'm not convinced by your argument about adding a read only RPC that's
> not
> > covered by the traditional integration test is going to cause bit rot. We
> > already have an RPC that is not covered by the traditional integration
> test
> > `pulseJobUpdate` and it hasn't bit rotted AFAIK.
> >
> > Further some members of the community have been writing alternative
> clients
> > around our API and I think adding this RPC will better support their use
> > cases. `JobUpdate` is a first class struct in our API and I think it
> makes
> > sense to expose a query interface for it.
> >
> > If integration tests are your primary concern, I can also add an
> > integration test for this.
> >
> > Regarding adding `JobUpdateQuery` to `getJobUpdateDetails`, I'm no longer
> > convinced that it's the best way to go because of the risk it contains.
> > It's all to easy to craft a query that can pull a lot of data that can
> take
> > down the scheduler. In my case, I sometimes see it being useful for
> getting
> > all of the `JobUpdates` of a role (completed and active), but that risks
> > pulling a lot of unnecessary data.
> >
> > I will also add that since `JobUpdate` is a first class struct in our
> > storage and APIs, this RPC contains very minimal code. I think it's
> highly
> > unlikely that it will bit rot.
> >
> > On Fri, Sep 2, 2016 at 2:52 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
> >
> >> As I mentioned in Slack, I am ok with the new RPC as long as there is
> >> a use for it elsewhere in the client or UI. Adding a read-only RPC
> >> that isn't going to be called by our traditional integration test
> >> clients sets a fertile ground for bit rot.
> >>
> >> I am actually warming up to your original proposal of adding
> >> JobUpdateQuery into the existing getJobUpdateDetails RPC. While it may
> >> be more expensive to pull multiple updates, we don't necessarily risk
> >> much after we migrated to MVStore on the H2 side. There are no table
> >> locks acquired and the only downside would be pulling events along
> >> with what you need. Provided the query is narrowly scoped, that should
> >> deliver acceptable performance.
> >>
> >> On Thu, Sep 1, 2016 at 2:24 PM, Zameer Manji <zma...@apache.org> wrote:
> >> > Hey,
> >> >
> >> > I've noticed a hole in our current API which makes it difficult to
> write
> >> > external clients and other tooling around fetching the state of
> updates.
> >> >
> >> > Currently, to fetch updates we are given two RPCs:
> >> > ````
> >> > /** Gets job update summaries. */
> >> > Response getJobUpdateSummaries(1: JobUpdateQuery jobUpdateQuery)
> >> >
> >> > /** Gets job update details. */
> >> > Response getJobUpdateDetails(1: JobUpdateKey key)
> >> >
> >> > ````
> >> >
> >> > The `getJobUpdateSummaries` RPC is not scoped to a single update and
> >> > returns a
> >> > set of `JobUpdateSummary` structs. The struct is defined:
> >> > ````
> >> > /** Summary of the job update including job key, user and current
> state.
> >> */
> >> > struct JobUpdateSummary {
> >> >   /** Unique identifier for the update. */
> >> >   5: JobUpdateKey key
> >> >
> >> >   /** User initiated an update. */
> >> >   3: string user
> >> >
> >> >   /** Current job update state. */
> >> >   4: JobUpdateState state
> >> > }
> >> > ````
> >> >
> >> > The `getJobUpdateDetails` RPC is scoped to a single update and returns
> >> the
> >> > following struct:
> >> >
> >> > ````
> >> > struct JobUpdateDetails {
> >> >   /** Update definition. */
> >> >   1: JobUpdate update
> >> >
> >> >   /** History for this update. */
> >> >   2: list<JobUpdateEvent> updateEvents
> >> >
> >> >   /** History for the individual instances updated. */
> >> >   3: list<JobInstanceUpdateEvent> instanceEvents
> >> > }
> >> >
> >> > ````
> >> >
> >> > Maxim mentioned to me yesterday that this RPC is scoped to a single
> >> update
> >> > because assembling the `instanceEvents` can be extremely expensive. A
> >> query
> >> > that
> >> > could span more than a single update risks taking down the scheduler
> in a
> >> > large
> >> > cluster.
> >> >
> >> >
> >> > The problem I discovered is that there is no batch API to get the
> >> > inexpensive
> >> > information inside the `JobUpdate` struct. For reference this struct
> >> > contains:
> >> >
> >> > ````
> >> > /** Full definition of the job update. */
> >> > struct JobUpdate {
> >> >   /** Update summary. */
> >> >   1: JobUpdateSummary summary
> >> >
> >> >   /** Update configuration. */
> >> >   2: JobUpdateInstructions instructions
> >> > }
> >> > ````
> >> >
> >> > Consumers are forced to make several `getJobUpdateDetails` calls to
> get
> >> > multiple
> >> > `JobUpdate` structs. Since the `JobUpdate` struct is not expensive to
> >> > assemble,
> >> > I'm proposing a new RPC that will allow consumers to get several
> >> `JobUpdate`
> >> > structs in a single call.
> >> >
> >> > ````
> >> > /** Gets job updates. */
> >> > Response getJobUpdates(1: JobUpdateQuery jobUpdateQuery)
> >> > ````
> >> >
> >> > If there are no objections, I will file tickets and put up a patch to
> >> > implement
> >> > this.
> >> >
> >> > --
> >> > Zameer Manji
> >>
>

Reply via email to