Re: [Proposal] Extension of the Apex configuration to add dependent jar files in runtime.

Vlad Rozov Mon, 05 Feb 2018 16:23:35 -0800

Plugins are platform extensions and provide a way to extend platformfunctionality, not an application extensions.

It may be useful for others who participate in this discussion toprovide a use case outside of Mesos/Kubertenes support that showslimitations of plugins and LIBRARY_JARS attribute assuming thatenhancement proposed by Pramod is implemented.


Thank you,

Vlad

On 2/5/18 10:37, Sergey Golovko wrote:

The Apex platform system configuration is hardcoded in the Apex-core Java
code. But the platform does not have a flexible system configuration.
The plugins can only cover the run-time configuration of Apex applications.
It means the plugins cannot be a part of the system configuration of the
platform.

Thanks,
Sergey


On Mon, Feb 5, 2018 at 9:39 AM, Vlad Rozov <[email protected]> wrote:

Apex platform dependencies are already covered with the compile time check
in place. Platform extensions are covered by plugins.

Thank you,

Vlad


On 2/4/18 08:56, Sergey Golovko wrote:

The usage of any Apex attributes is the generic configuration of Apex
applications on the end-user level. But the subject of the discussion is
to
provide the system level configuration of Apex applications. I guess the
having of the two different layers of the configuration (system and
end-user) is a generic approach for all good designed tools.

Thanks,
Sergey


On Sat, Feb 3, 2018 at 10:02 AM, Pramod Immaneni <[email protected]>
wrote:

Yes generic in the Attribute class

On Feb 3, 2018, at 10:00 AM, Vlad Rozov <[email protected]> wrote:

+1 assuming that support for merge/override will be generic for all

attributes that support list/set of values and not limited to
LIBRARY_JARS
attribute only.

Thank you,

Vlad

On 2/3/18 09:13, Pramod Immaneni wrote:

I too agree that the discussion has veered off from the original topic.

Why
can't LIBRARY_JARS be used for this, albeit with a minor improvement?

Currently, our attribute layering is an override, so if you have an
attribute that is specified as apex.application.<appname>.

attr.<attrname>
it overrides apex.attr.<attrname> for that application. What if were to

expand the attribute definition to allow for the specification of how

the
layering of attributes will be combined, override being one option,
merge
being another with these being implemented with a combiner interface?
This
way a set of common jars could be specified using dt.attr.LIBRARY_JARS
and
applications can still add extra jars on top.

On Fri, Feb 2, 2018 at 6:32 PM, Vlad Rozov <[email protected]> wrote:

IMO, support for Kubernetes, Docker images, Mesos and anything outside
of

Yarn deployments is a topic by itself and design for such support

needs to

be discussed. I do not want to propose any specific design, but assume

that

logic to create proper execution environment would be coded into Apex

client. Whether it (hardcoded logic to create an execution

environment) can

be expressed simply as a list of dependent classes or jars is at

minimum

questionable. Until design is proposed and agreed upon, I'd prefer to

use

plugins for the subject.

Thank you,

Vlad


On 2/2/18 13:17, Sanjay Pujare wrote:

In cases where we have an "über" docker image containing support for

multiple execution environments it might be useful for the Apex core

to

infer what kind of execution environment to use for a particular

invocation  (say based on configuration values/environment variables)

and

in that case the core will load the corresponding libraries. And I

think

this kind of flexibility or support would be difficult through the

plugins

hence I think Sergey's proposal will be useful.

Sanjay


On Fri, Feb 2, 2018 at 11:18 AM, Sergey Golovko <

[email protected]>

wrote:

Unfortunately the moving of .apa file to a docker image cannot

resolve all

problems with the dependencies. If we assume an Apex application

should

be

run in different execution environments, the application docker
image
must
contain all possible execution environment dependencies.

I think the better way is to assume that the original application

docker

image like the current .apa file should contain the application

specific

dependencies only. And some smart client tool should create the

executable
application docker image form the original one and include the

execution

specific environment dependencies into the target application docker

image.
It means anyway an smart client Apex tool should have an interface
to
define different environment dependencies or combination of
different
dimensions of the environment dependencies.

Thanks,
Sergey


On Fri, Feb 2, 2018 at 10:23 AM, Thomas Weise <[email protected]>

wrote:

The current dependencies are based on how Apex YARN client works.

YARN

depends on a DFS implementation for deployment (not necessarily

HDFS).

I think a better way to look at this is to consider that instead of

an

.apa

file the application is a docker image, which would contain Apex
and all

dependencies that the "StramClient"  today adds for YARN.

In that world there would be no Apex CLI or Apex specific client.

Thomas



On Thu, Feb 1, 2018 at 5:57 PM, Sergey Golovko <

[email protected]>

wrote:

I agree. It can be implemented with usage of plugins. But if I need

to

enable and configurate the plugin I need to put this information

into

dt-site.xml. It means The plugin and its parameter must be

documented

and

the list of the added specific jars will be visible and available

for

updates to the end-user. The implementation via plugins is more

dynamic

solution that is more convenient for the application developers.

But

I'm

talking about the static configuration of the Apex build or
installation
that relates more to the platform development.

The current Apex core implementation uses the static unchanged

list of

jars

for long time, because the Apex implementation still contains
several

basic

static assumptions (for instance, the usage of YARN, HDSF, etc.).
And

the

current Apex assumptions are hardcoded in the implementation. But

if we

are

going to improve Apex and use Java interfaces in generic Apex

implementation, the current static approach in Apex code to

hardcode a

list

of dependent jars will not work anymore. It will require to
include a

new

solution to add/change jars in specific Apex builds/configurations.
And I
don't think the usage of the plugins will be good for that.

Thanks,
Sergey


On Thu, Feb 1, 2018 at 1:47 PM, Vlad Rozov <[email protected]>

wrote:

There is a way to get the same end result by using plugins. It will

be

good to understand why plugin can't be used and can they be
extended

to

provide the required functionality.

Thank you,

Vlad


On 1/29/18 15:14, Sergey Golovko wrote:

Hello All,

In Apex there are two ways to deploy non-Hadoop jars to the

deployed

cluster.

The first approach is static (hardcoded) and it is used by Apex

platform

developers only. There are several final static arrays of Java

classes

in StramClient.java
that define which of the available jars should be included into

deployment

for every Apex application.

The second approach is to add paths of all dependent jar-files
to

the

value
of the attribute LIB_JARS. The end-user can set/update the value

of

the

attribute LIB_JARS via dt-site.xml files, command line parameters,

application properties and plugins. The usage of the

attribute LIB_JARS is the official documented way for all Apex

users

to

manage by the deployment jars.

But some of the dependent jars (not from the Apex core) can be

common

for
all customer's applications for a specific installation and/or

execution

environment. Unfortunately the Apex implementation does not
contain

the

middle solution that would allow the Apex developers and customer

support
to

define and add new dependent jar-files (jars that should not be
configurable/managed by the end-user) without the

updates/recompilation

of

the Apex Java code during the Apex building process and/or

installation/configuration.

Also the having of such kind of flexibility would allow the Apex

core

developers to use Java interfaces during the development to define
an
abstraction layer in Apex implementation and configurate Apex core
to
add
some specific jars to all Apex applications without recompilation
of

the

Apex source code.

For instance, now the usage of HDFS is hardcoded in Apex platform

code

but
it can be replaced with any other distributed or cloud base file

system.

The Apex core code can use an interface for all I/O operations but

the

supporting of a real specific file system implementation can be
added
as

an

independent jar-file. Or if the implementation of some of Apex

operators

depend on a specific service, and it is necessary to add some of

the

service jars to every Apex application implicitly.

The proposal:

- add a predefined configuration text file (we can make any

choice

for

the
file syntax: XML, JSON or Properties) to Apex engine resources
with

predefined values of some of the Apex attributes (now we can

include

LIB_JARS

attribute only);
- allow to have a configuration text file with the same

functionality

in

the Apex installation folder "conf";

- read the content of the predefined configuration text files by
the

stram

client in runtime and add the jars to the list of the dependent
jars;

- allow to use paths to jars and Java classes to refer to the
dependent
jars (the references can have the extensions: .class and .jar).

Thanks,

Sergey

Re: [Proposal] Extension of the Apex configuration to add dependent jar files in runtime.

Reply via email to