Re: [Proposal] Extension of the Apex configuration to add dependent jar files in runtime.

Pramod Immaneni Sat, 03 Feb 2018 10:02:38 -0800
Yes generic in the Attribute class

> On Feb 3, 2018, at 10:00 AM, Vlad Rozov <vro...@apache.org> wrote:
> 
> +1 assuming that support for merge/override will be generic for all 
> attributes that support list/set of values and not limited to LIBRARY_JARS 
> attribute only.
> 
> Thank you,
> 
> Vlad
> 
> On 2/3/18 09:13, Pramod Immaneni wrote:
>> I too agree that the discussion has veered off from the original topic. Why
>> can't LIBRARY_JARS be used for this, albeit with a minor improvement?
>> Currently, our attribute layering is an override, so if you have an
>> attribute that is specified as apex.application.<appname>.attr.<attrname>
>> it overrides apex.attr.<attrname> for that application. What if were to
>> expand the attribute definition to allow for the specification of how the
>> layering of attributes will be combined, override being one option, merge
>> being another with these being implemented with a combiner interface? This
>> way a set of common jars could be specified using dt.attr.LIBRARY_JARS and
>> applications can still add extra jars on top.
>> 
>> On Fri, Feb 2, 2018 at 6:32 PM, Vlad Rozov <vro...@apache.org> wrote:
>> 
>>> IMO, support for Kubernetes, Docker images, Mesos and anything outside of
>>> Yarn deployments is a topic by itself and design for such support needs to
>>> be discussed. I do not want to propose any specific design, but assume that
>>> logic to create proper execution environment would be coded into Apex
>>> client. Whether it (hardcoded logic to create an execution environment) can
>>> be expressed simply as a list of dependent classes or jars is at minimum
>>> questionable. Until design is proposed and agreed upon, I'd prefer to use
>>> plugins for the subject.
>>> 
>>> Thank you,
>>> 
>>> Vlad
>>> 
>>> 
>>> On 2/2/18 13:17, Sanjay Pujare wrote:
>>> 
>>>> In cases where we have an "über" docker image containing support for
>>>> multiple execution environments it might be useful for the Apex core to
>>>> infer what kind of execution environment to use for a particular
>>>> invocation  (say based on configuration values/environment variables) and
>>>> in that case the core will load the corresponding libraries. And I think
>>>> this kind of flexibility or support would be difficult through the plugins
>>>> hence I think Sergey's proposal will be useful.
>>>> 
>>>> Sanjay
>>>> 
>>>> 
>>>> On Fri, Feb 2, 2018 at 11:18 AM, Sergey Golovko <ser...@datatorrent.com>
>>>> wrote:
>>>> 
>>>> Unfortunately the moving of .apa file to a docker image cannot resolve all
>>>>> problems with the dependencies. If we assume an Apex application should
>>>>> be
>>>>> run in different execution environments, the application docker image
>>>>> must
>>>>> contain all possible execution environment dependencies.
>>>>> 
>>>>> I think the better way is to assume that the original application docker
>>>>> image like the current .apa file should contain the application specific
>>>>> dependencies only. And some smart client tool should create the
>>>>> executable
>>>>> application docker image form the original one and include the execution
>>>>> specific environment dependencies into the target application docker
>>>>> image.
>>>>> It means anyway an smart client Apex tool should have an interface to
>>>>> define different environment dependencies or combination of different
>>>>> dimensions of the environment dependencies.
>>>>> 
>>>>> Thanks,
>>>>> Sergey
>>>>> 
>>>>> 
>>>>> On Fri, Feb 2, 2018 at 10:23 AM, Thomas Weise <t...@apache.org> wrote:
>>>>> 
>>>>> The current dependencies are based on how Apex YARN client works. YARN
>>>>>> depends on a DFS implementation for deployment (not necessarily HDFS).
>>>>>> 
>>>>>> I think a better way to look at this is to consider that instead of an
>>>>>> 
>>>>> .apa
>>>>> 
>>>>>> file the application is a docker image, which would contain Apex and all
>>>>>> dependencies that the "StramClient"  today adds for YARN.
>>>>>> 
>>>>>> In that world there would be no Apex CLI or Apex specific client.
>>>>>> 
>>>>>> Thomas
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Feb 1, 2018 at 5:57 PM, Sergey Golovko <ser...@datatorrent.com>
>>>>>> wrote:
>>>>>> 
>>>>>> I agree. It can be implemented with usage of plugins. But if I need to
>>>>>>> enable and configurate the plugin I need to put this information into
>>>>>>> dt-site.xml. It means The plugin and its parameter must be documented
>>>>>>> 
>>>>>> and
>>>>>> the list of the added specific jars will be visible and available for
>>>>>>> updates to the end-user. The implementation via plugins is more dynamic
>>>>>>> solution that is more convenient for the application developers. But
>>>>>>> 
>>>>>> I'm
>>>>>> talking about the static configuration of the Apex build or
>>>>>> installation
>>>>>> that relates more to the platform development.
>>>>>>> The current Apex core implementation uses the static unchanged list of
>>>>>>> 
>>>>>> jars
>>>>>> 
>>>>>>> for long time, because the Apex implementation still contains several
>>>>>>> 
>>>>>> basic
>>>>>> 
>>>>>>> static assumptions (for instance, the usage of YARN, HDSF, etc.). And
>>>>>>> 
>>>>>> the
>>>>>> current Apex assumptions are hardcoded in the implementation. But if we
>>>>>> are
>>>>>> 
>>>>>>> going to improve Apex and use Java interfaces in generic Apex
>>>>>>> implementation, the current static approach in Apex code to hardcode a
>>>>>>> 
>>>>>> list
>>>>>> 
>>>>>>> of dependent jars will not work anymore. It will require to include a
>>>>>>> 
>>>>>> new
>>>>>> solution to add/change jars in specific Apex builds/configurations.
>>>>>> And I
>>>>>> don't think the usage of the plugins will be good for that.
>>>>>>> Thanks,
>>>>>>> Sergey
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Feb 1, 2018 at 1:47 PM, Vlad Rozov <vro...@apache.org> wrote:
>>>>>>> 
>>>>>>> There is a way to get the same end result by using plugins. It will
>>>>>>> be
>>>>>> good to understand why plugin can't be used and can they be extended
>>>>>>> to
>>>>>> provide the required functionality.
>>>>>>>> Thank you,
>>>>>>>> 
>>>>>>>> Vlad
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 1/29/18 15:14, Sergey Golovko wrote:
>>>>>>>> 
>>>>>>>> Hello All,
>>>>>>>>> In Apex there are two ways to deploy non-Hadoop jars to the deployed
>>>>>>>>> cluster.
>>>>>>>>> 
>>>>>>>>> The first approach is static (hardcoded) and it is used by Apex
>>>>>>>>> 
>>>>>>>> platform
>>>>>>> developers only. There are several final static arrays of Java
>>>>>>>> classes
>>>>>> in StramClient.java
>>>>>>>>> that define which of the available jars should be included into
>>>>>>>>> 
>>>>>>>> deployment
>>>>>>>> for every Apex application.
>>>>>>>>> The second approach is to add paths of all dependent jar-files to
>>>>>>>>> 
>>>>>>>> the
>>>>>> value
>>>>>>>>> of the attribute LIB_JARS. The end-user can set/update the value of
>>>>>>>>> 
>>>>>>>> the
>>>>>>> attribute LIB_JARS via dt-site.xml files, command line parameters,
>>>>>>>>> application properties and plugins. The usage of the
>>>>>>>>> attribute LIB_JARS is the official documented way for all Apex users
>>>>>>>>> 
>>>>>>>> to
>>>>>>> manage by the deployment jars.
>>>>>>>>> But some of the dependent jars (not from the Apex core) can be
>>>>>>>>> 
>>>>>>>> common
>>>>>> for
>>>>>>>> all customer's applications for a specific installation and/or
>>>>>>>> execution
>>>>>>> environment. Unfortunately the Apex implementation does not contain
>>>>>>>> the
>>>>>>> middle solution that would allow the Apex developers and customer
>>>>>>>> support
>>>>>>>> to
>>>>>>>>> define and add new dependent jar-files (jars that should not be
>>>>>>>>> configurable/managed by the end-user) without the
>>>>>>>>> 
>>>>>>>> updates/recompilation
>>>>>>> of
>>>>>>> 
>>>>>>>> the Apex Java code during the Apex building process and/or
>>>>>>>>> installation/configuration.
>>>>>>>>> 
>>>>>>>>> Also the having of such kind of flexibility would allow the Apex
>>>>>>>>> 
>>>>>>>> core
>>>>>> developers to use Java interfaces during the development to define
>>>>>>>> an
>>>>>> abstraction layer in Apex implementation and configurate Apex core
>>>>>>>> to
>>>>>> add
>>>>>>>> some specific jars to all Apex applications without recompilation of
>>>>>>>> the
>>>>>>> Apex source code.
>>>>>>>>> For instance, now the usage of HDFS is hardcoded in Apex platform
>>>>>>>>> 
>>>>>>>> code
>>>>>> but
>>>>>>>> it can be replaced with any other distributed or cloud base file
>>>>>>>> system.
>>>>>>> The Apex core code can use an interface for all I/O operations but
>>>>>>>> the
>>>>>> supporting of a real specific file system implementation can be
>>>>>>>> added
>>>>>> as
>>>>>> 
>>>>>>> an
>>>>>>>>> independent jar-file. Or if the implementation of some of Apex
>>>>>>>>> 
>>>>>>>> operators
>>>>>>> depend on a specific service, and it is necessary to add some of the
>>>>>>>>> service jars to every Apex application implicitly.
>>>>>>>>> 
>>>>>>>>> The proposal:
>>>>>>>>> 
>>>>>>>>> - add a predefined configuration text file (we can make any choice
>>>>>>>>> 
>>>>>>>> for
>>>>>> the
>>>>>>>> file syntax: XML, JSON or Properties) to Apex engine resources with
>>>>>>>>> predefined values of some of the Apex attributes (now we can include
>>>>>>>>> LIB_JARS
>>>>>>>>> attribute only);
>>>>>>>>> - allow to have a configuration text file with the same
>>>>>>>>> 
>>>>>>>> functionality
>>>>>> in
>>>>>> 
>>>>>>> the Apex installation folder "conf";
>>>>>>>>> - read the content of the predefined configuration text files by the
>>>>>>>>> 
>>>>>>>> stram
>>>>>>>> client in runtime and add the jars to the list of the dependent
>>>>>>>> jars;
>>>>>> - allow to use paths to jars and Java classes to refer to the
>>>>>>>> dependent
>>>>>>> jars (the references can have the extensions: .class and .jar).
>>>>>>>>> Thanks,
>>>>>>>>> Sergey
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>
Re: [Proposal] Extension of the Apex configuration to add dependent jar files in runtime.

Reply via email to