I still have a concern that introducing an assumption how populateDAG is used for anything else other than to construct an application DAG. The only use case provided does not well justifies changing primary API (even though it is changed without braking semantic version). I would prefer an alternate solution, so -0.5.

Thank you,

Vlad

On 12/22/17 11:50, Pramod Immaneni wrote:
On Fri, Dec 22, 2017 at 8:19 AM, Vlad Rozov <vro...@apache.org> wrote:

I don't see more complexity in implementing a plugin compared to
implementing an application. Additionally, for the use case you mention,
plugin is a better option, as likely the behavior applies not to a single
application, but all applications in that environment.

It would be developing a plugin in addition to an application as opposed to
doing something directly in the application. Also what they may want to do
may not be general enough or have enough reuse to justify developing a
plugin. Our users typically build applications and not plugins so I would
say most who have this need would not build a plugin but would do this
directly in the application.


Websocket gateway address is a configuration parameter. A DAG may change
depending on configuration parameters (presence of a gateway, hadoop
version/vendor, security being enabled or disabled), but it should not
change depending whether DAG is populated for a launch or to get an info.

The configuration does not prompt the user to return a different DAG and
even with plugins some kind of configuration hint is needed. There is no
formal definition of what the method should and shouldn't do and attempt to
define the method to only construct a DAG and not do anything else is not
only retrospective but restrictive, for example should I not be able to
connect to a syslog server and log something. What you are saying on the
behavior w.r.t the population of the DAG with varying properties and the
like is good practice. Like I mentioned earlier, users have always had full
control of what they want to do in populateDAG method and what DAG they
want to return and the platform does not particularly care what DAG is
returned. It does not enforce nor rely that DAGs returned by multiple calls
to populateDAG be the same DAG.


Thank you,

Vlad


On 12/21/17 10:05, Pramod Immaneni wrote:

Asking users to create plugins for something they want to do in their
application logic is to do things in an indirect and cumbersome way with
an
added level of complexity. I don't think users will elect to do that.
There
is a reason populateDAG and the operators give users the flexibility they
do today to have any custom logic they want. populateDAG isn't only for
returning a constant DAG for an application, the configuration that is
passed today to populateDAG, apart from hadoop environmental properties
that could be considered constant also includes a variable component,
which
is the user customizable configuration properties. There are already
examples of applications that have used these properties to do something
different. Apart from the properties, some attributes are also injected
into the DAG in a deliberate fashion by the platform to provide user with
these so they can create the dag accordingly. One example is a websocket
gateway address. If this is present applications create a websocket output
operator else they end up create a console or some other output operator.

On Thu, Dec 21, 2017 at 8:27 AM, Vlad Rozov <vro...@apache.org> wrote:

"Sometimes" is not a use case. Config is not a context.
Without concrete use cases the proposed change is not well justified.
populateDAG() is supposed to populate DAG, not to record anything in an
external system. It was a design goal for plugins.

Thank you,

Vlad


On 12/20/17 02:23, Priyanka Gugale wrote:

+1
Sometimes this context is required. We shouldn't change any default
behaviour other than making this config available.

-Priyanka



On Wed, Dec 20, 2017 at 5:32 AM, Pramod Immaneni <
pra...@datatorrent.com>
wrote:

The external system recording was just an example, not a specific use

case.
The idea is to provide comprehensive information to populateDAG as to
the
context it is being called under. It is akin to the test mode or
simulate
flag that you see with various utilities. The platform cannot control
what
populateDAG does, even without this information, in multiple calls that
you
mention the application can return different DAGs by depending on
any external factor such as time of day or some external variable. This
is
to merely provide more context information in the config. It is upto
the
application to do what it wishes with it.

On Tue, Dec 19, 2017 at 2:28 PM, Vlad Rozov <vro...@apache.org> wrote:

-0.5: populateDAG() may be called by the platform as many times as it

needs (even in case it calls it only once now to launch an
application).
Passing different parameters to populateDAG() in simulate launch mode
and
actual launch may lead to different DAG being constructed for those
two
modes. Can't the use case you described be handled by a plugin?

Thank you,

Vlad


On 12/19/17 10:06, Sanjay Pujare wrote:

+1 although I prefer something that is more enforceable. So I like the

idea
of another method but that introduces incompatibility so may be in
4.0?

On Tue, Dec 19, 2017 at 9:40 AM, Munagala Ramanath <
amberar...@yahoo.com.invalid> wrote:

     +1

Ram
        On Tuesday, December 19, 2017, 8:33:21 AM PST, Pramod
Immaneni <
pra...@datatorrent.com> wrote:

     I have a mini proposal. The command get-app-package-info runs
the
populateDAG method of an application to construct the DAG but does
not
actually launch the DAG. An application developer does not know in

which
context the populateDAG is being called. For example, if they are

recording
application starts in an external system from populateDAG, they will

have
false entries there. This can be solved in different ways such as

introducing another method in StreamingApplication or more parameters
to populateDAG but a non disruptive option would be to add a
property

in
the configuration object that is passed to populateDAG to indicate if

it

is

simulate/test mode or real launch. An application developer can use
this
property to take the appropriate actions.

Thanks





Reply via email to