Strongly +1 on this. One thing that proves this is useful for Apex is
hadoop streaming where python is used write map-reduce jobs. This not only
will increase the reach in development world but also would be appealing to
administrators to write an app as they are usually aware of python.


Few suggestions (not in specific order):
1. As a part of supporting python execution in operator code, we should
provide a complete lifecycle of an operator to be specified from python.

2. I would personally not worry about providing python binding for low
level apex client APIs like addOperator, addStream etc... If one has to do
it, I think its best to use JAVA api as the most power of those low level
APIs can be leveraged there.

3. For client APIs, I would rather suggest we focus on high level APIs like
apex stream API (malhar-stream). We should provide a complete python
binding for them. Python is very useful when it comes to functional
programming and Stream API provide exactly that.

4. Thinking very high level, I don't think we need any change in apex-core
for this. This could be another project in malhar itself. There are python
libraries like py4j or pyjnius or JPype which allows to access Java objects
from python.
Basically, we just need to establish a right bridge betweeen java and
python VM. We need to be thoughtful about performance as these bridges
across programming languages are costly.

5. We need to decide on how the code execution will look like on this. For
eg., should a py file be an alternative to Application.java in the package?
This means, the starting point is apex cli i.e. java. Hence instead of
finding classes implementing StreamingApplication, apexcli needs to find py
file which defines definition of DAG.
OR should the flow start with "__main__" of python file and end up in Java?

6. This might be too early, but it important to emphasis that we need to
plan for writing examples and documentation for python binding.

-Chinmay.



On Fri, Sep 16, 2016 at 2:36 AM, Thomas Weise <[email protected]> wrote:

> Hi,
>
> Python (not Jython) seems to be a popular language and frequently used for
> data analysis, especially where flexibility matters. It has a comprehensive
> library and it is generally considered low barrier to entry. I have also
> seen Python used in critical back-end components, although that's probably
> not very common?
>
> I think Python support could potentially expand the user base for Apex.
> There are 2 main areas that can be considered:
>
> 1) Support to execute Python code through an operator
> 2) A client API that lets users construct pipelines in Python
>
> The former can exist without the latter. And it would enable users to
> leverage existing code that otherwise would have to be rewritten in a JVM
> language. The engine could ship scripts/packages so they are automatically
> distributed on the cluster.
>
> A useful client API probably requires back-end support for lambda functions
> and more complex UDFs.
>
> Would be great to get some feedback, especially from those that have
> experience with Python, on how an integration could potentially open up new
> use cases for Apex.
>
> Thanks,
> Thomas
>

Reply via email to