Thank you,
Vlad
On 2/3/18 09:13, Pramod Immaneni wrote:
I too agree that the discussion has veered off from the original topic.
Why
can't LIBRARY_JARS be used for this, albeit with a minor improvement?
Currently, our attribute layering is an override, so if you have an
attribute that is specified as apex.application.<appname>.
attr.<attrname>
it overrides apex.attr.<attrname> for that application. What if were to
expand the attribute definition to allow for the specification of how
the
layering of attributes will be combined, override being one option,
merge
being another with these being implemented with a combiner interface?
This
way a set of common jars could be specified using dt.attr.LIBRARY_JARS
and
applications can still add extra jars on top.
On Fri, Feb 2, 2018 at 6:32 PM, Vlad Rozov <vro...@apache.org> wrote:
IMO, support for Kubernetes, Docker images, Mesos and anything outside
of
Yarn deployments is a topic by itself and design for such support
needs to
be discussed. I do not want to propose any specific design, but assume
that
logic to create proper execution environment would be coded into Apex
client. Whether it (hardcoded logic to create an execution
environment) can
be expressed simply as a list of dependent classes or jars is at
minimum
questionable. Until design is proposed and agreed upon, I'd prefer to
use
plugins for the subject.
Thank you,
Vlad
On 2/2/18 13:17, Sanjay Pujare wrote:
In cases where we have an "über" docker image containing support for
multiple execution environments it might be useful for the Apex core
to
infer what kind of execution environment to use for a particular
invocation (say based on configuration values/environment variables)
and
in that case the core will load the corresponding libraries. And I
think
this kind of flexibility or support would be difficult through the
plugins
hence I think Sergey's proposal will be useful.
Sanjay
On Fri, Feb 2, 2018 at 11:18 AM, Sergey Golovko <
ser...@datatorrent.com>
wrote:
Unfortunately the moving of .apa file to a docker image cannot
resolve all
problems with the dependencies. If we assume an Apex application
should
be
run in different execution environments, the application docker
image
must
contain all possible execution environment dependencies.
I think the better way is to assume that the original application
docker
image like the current .apa file should contain the application
specific
dependencies only. And some smart client tool should create the
executable
application docker image form the original one and include the
execution
specific environment dependencies into the target application docker
image.
It means anyway an smart client Apex tool should have an interface
to
define different environment dependencies or combination of
different
dimensions of the environment dependencies.
Thanks,
Sergey
On Fri, Feb 2, 2018 at 10:23 AM, Thomas Weise <t...@apache.org>
wrote:
The current dependencies are based on how Apex YARN client works.
YARN
depends on a DFS implementation for deployment (not necessarily
HDFS).
I think a better way to look at this is to consider that instead of
an
.apa
file the application is a docker image, which would contain Apex
and all
dependencies that the "StramClient" today adds for YARN.
In that world there would be no Apex CLI or Apex specific client.
Thomas
On Thu, Feb 1, 2018 at 5:57 PM, Sergey Golovko <
ser...@datatorrent.com>
wrote:
I agree. It can be implemented with usage of plugins. But if I need
to
enable and configurate the plugin I need to put this information
into
dt-site.xml. It means The plugin and its parameter must be
documented
and
the list of the added specific jars will be visible and available
for
updates to the end-user. The implementation via plugins is more
dynamic
solution that is more convenient for the application developers.
But
I'm
talking about the static configuration of the Apex build or
installation
that relates more to the platform development.
The current Apex core implementation uses the static unchanged
list of
jars
for long time, because the Apex implementation still contains
several
basic
static assumptions (for instance, the usage of YARN, HDSF, etc.).
And
the
current Apex assumptions are hardcoded in the implementation. But
if we
are
going to improve Apex and use Java interfaces in generic Apex
implementation, the current static approach in Apex code to
hardcode a
list
of dependent jars will not work anymore. It will require to
include a
new
solution to add/change jars in specific Apex builds/configurations.
And I
don't think the usage of the plugins will be good for that.
Thanks,
Sergey
On Thu, Feb 1, 2018 at 1:47 PM, Vlad Rozov <vro...@apache.org>
wrote:
There is a way to get the same end result by using plugins. It will
be
good to understand why plugin can't be used and can they be
extended
to
provide the required functionality.
Thank you,
Vlad
On 1/29/18 15:14, Sergey Golovko wrote:
Hello All,
In Apex there are two ways to deploy non-Hadoop jars to the
deployed
cluster.
The first approach is static (hardcoded) and it is used by Apex
platform
developers only. There are several final static arrays of Java
classes
in StramClient.java
that define which of the available jars should be included into
deployment
for every Apex application.
The second approach is to add paths of all dependent jar-files
to
the
value
of the attribute LIB_JARS. The end-user can set/update the value
of
the
attribute LIB_JARS via dt-site.xml files, command line parameters,
application properties and plugins. The usage of the
attribute LIB_JARS is the official documented way for all Apex
users
to
manage by the deployment jars.
But some of the dependent jars (not from the Apex core) can be
common
for
all customer's applications for a specific installation and/or
execution
environment. Unfortunately the Apex implementation does not
contain
the
middle solution that would allow the Apex developers and customer
support
to
define and add new dependent jar-files (jars that should not be
configurable/managed by the end-user) without the
updates/recompilation
of
the Apex Java code during the Apex building process and/or
installation/configuration.
Also the having of such kind of flexibility would allow the Apex
core
developers to use Java interfaces during the development to define
an
abstraction layer in Apex implementation and configurate Apex core
to
add
some specific jars to all Apex applications without recompilation
of
the
Apex source code.
For instance, now the usage of HDFS is hardcoded in Apex platform
code
but
it can be replaced with any other distributed or cloud base file
system.
The Apex core code can use an interface for all I/O operations but
the
supporting of a real specific file system implementation can be
added
as
an
independent jar-file. Or if the implementation of some of Apex
operators
depend on a specific service, and it is necessary to add some of
the
service jars to every Apex application implicitly.
The proposal:
- add a predefined configuration text file (we can make any
choice
for
the
file syntax: XML, JSON or Properties) to Apex engine resources
with
predefined values of some of the Apex attributes (now we can
include
LIB_JARS
attribute only);
- allow to have a configuration text file with the same
functionality
in
the Apex installation folder "conf";
- read the content of the predefined configuration text files by
the
stram
client in runtime and add the jars to the list of the dependent
jars;
- allow to use paths to jars and Java classes to refer to the
dependent
jars (the references can have the extensions: .class and .jar).
Thanks,
Sergey