Re: [Discussion] Job generation / submission hooks & Atlas integration

Gyula Fóra Thu, 12 Mar 2020 06:47:28 -0700

Hi Stephan!

Thanks for checking this out. I agree that wrapping the other
PipelineExecutors with a delegating AtlasExecutor would be a good
alternative approach to implement this but I actually feel that it suffers
even more problems than exposing the Pipeline instance in the JobListener.

The main idea with the Atlas integration would be to have the Atlas hook
logic in the Atlas project where it would be maintained. This means that
any approach we take has to rely on public APIs. The JobListener is already
a public evolving API while the PipelineExecutor and the factory is purely
internal. Even if we make it public it will still expose the Pipeline so we
did not gain much on the public/internal API front.

I also feel that since the Atlas hook logic should only observe the
pipeline and collect information the JobListener interface seems an ideal
match and the implementation can be pretty lightweight. From a purely
implementation perspective adding an Executor would be more heavy as it has
to properly delegate to an other executor making sure that we don't break
anything.

Don't take me wrong, I am not opposed to reworking the implementations we
have as it's very simple at this point but I also want to make sure that we
take the approach that is simple from a maintainability standpoint. Of
course my argument rests on the assumption that the AtlasHook itself will
live outside of the Flink project, thats another question.

Cheers,
Gyula

On Thu, Mar 12, 2020 at 11:34 AM Stephan Ewen <se...@apache.org> wrote:

> Hi all!
>
> In general, nice idea to support this integration with Atlas.
>
> I think we could make this a bit easier/lightweight with a small change.
> One of the issues that is not super nice is that this starts exposing the
> (currently empty) Pipeline interface in the public API.
> The Pipeline is an SPI interface that would be good to hide in the API.
>
> Since 1.10, Flink has the notion of Executors, which take the pipeline and
> execute them. Meaning each pipeline is passed on anyways. And executors are
> already configurable in the Flink configuration.
> So, instead of passing the pipeline both "down" (to the executor) and "to
> the side" (JobListener), could we just have a wrapping "AtlasExecutor" that
> takes the pipeline, does whatever it wants, and then passes it to the
> proper executor? This would also have the advantage that it supports making
> changes to the pipeline, if needed in the future. For example, if there is
> ever the need to add additional configuration fields, set properties, add
> "labels" or so, this could be easily done in the suggested approach.
>
> I tried to sketch this in the picture below, pardon my bad drawing.
>
> [image: Listener_Executor.png]
>
>
> https://xjcrkw.bn.files.1drv.com/y4pWH57aEvLU5Ww4REC9XLi7nJMLGHq2smPSzaslU8ogywFDcMkP-_Rsl8B1njf4qphodim6bgnLTNFwNoEuwFdTuA2Xmf7CJ_8lTJjrKlFlDwrugVeBQzEhAY7n_5j2bumwDBf29jn_tZ1ueZxe2slhLkPC-9K6Dry_vpvRvZRY-CSnQXxj9jDf7P53Vz1K9Ez/Listener_Executor.png?psid=1
>
>
> Best,
> Stephan
>
>
> On Wed, Mar 11, 2020 at 11:41 AM Aljoscha Krettek <aljos...@apache.org>
> wrote:
>
>> Thanks! I'm reading the document now and will get back to you.
>>
>> Best,
>> Aljoscha
>>
>

Re: [Discussion] Job generation / submission hooks & Atlas integration

Reply via email to