[ https://issues.apache.org/jira/browse/APEXMALHAR-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129959#comment-16129959 ]
Ananth edited comment on APEXMALHAR-2260 at 8/17/17 5:39 AM: ------------------------------------------------------------- Thanks for the comment [~vikram] . My understanding of the requirement is that [https://issues.apache.org/jira/browse/APEXMALHAR-2261] is about having an ability to use apex from a python environment i.e. The streaming application is launched via python and this JIRA [2260] is more about invoking python code from a java Apex application. I see a lot of value in both of these use cases. I glanced at the pull request 613 before and it looked the pull request is addressing ApexMalhar-2261 in its entirety and not ApexMalhar-2260. The use case I am trying to solve is the latter wherein we want to invoke a python function for scoring with the data points extracted and streamed from an upstream operator and the application is primarily coded in java. The pain points that this use case is going to solve is the following situations. A data scientist develops the model and pickles the model into a repo and this is then pulled in by this operator or an operator derived thereof to execute and collect back a score. The params to the python scoring function are possibly coming from the upstream operator say a cassandra read operator and basic feature engineering done in the current operator before it invokes the configured python function. Other interesting aspects that I would like to see is to use a virtualenv construct for this operator so that multiple versions of python libraries can exist on the datanode where the operator is currently executing etc. Happy to collaborate and discuss regarding the pull request 613 but wanted to confirm the above thinking before the task is taken up. was (Author: ananthg.apex): Thanks for the comment [~vikram] . My understanding of the requirement is that [https://issues.apache.org/jira/browse/APEXMALHAR-2261] is about having an ability to use apex from a python environment i.e. The streaming application is launched via python and this JIRA [2260] is more about invoking python code from a java Apex application. I see a lot of value in both of these use cases. I glanced at the pull request 613 before and it looked the pull request is addressing ApexMalhar-2261 in its entirety and not ApexMalhar-2260. The use case I am trying to solve is the latter wherein we want to invoke a python function for scoring with the data points extracted and streamed from an upstream operator and the application is primarily coded in java. The pain points that this use case is going to solve is the following situations. A data scientist develops the model and pickles the model into a repo and this is then pulled in by this operator or an operator derived thereof to execute and collect back a score. The params to the python scoring function are possibly coming from the upstream operator say a cassandra read operator and basic feature engineering done in the current operator before it invokes the current function. Other interesting aspects that I would like to see is to use a virtualenv construct for this operator so that multiple versions of python libraries can exist on the datanode where the operator is currently executing etc. Happy to collaborate and discuss regarding the pull request 613 but wanted to confirm the above thinking before the task is taken up. > Python execution for operator logic > ------------------------------------ > > Key: APEXMALHAR-2260 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2260 > Project: Apache Apex Malhar > Issue Type: New Feature > Reporter: Thomas Weise > Labels: roadmap > > Support execution of Python code in an operator. > https://lists.apache.org/thread.html/9837b1dee8f909ed400c6030ce5c6a94a12f43183718019dd0bfd228@%3Cdev.apex.apache.org%3E -- This message was sent by Atlassian JIRA (v6.4.14#64029)