Hi - This is very interesting. I’ve been thinking about using Heron for this functionality.
An Admin API for configuring the functions on live Executors and specifying a unique return value Topic need discussion. I would also like to chain Functions. I think Functions will need Profiles to include metadata for parallelism, memory, configuration, etc. Regards, Dave Sent from my iPhone > On Feb 20, 2018, at 4:05 PM, Sanjeev Kulkarni <sanjee...@gmail.com> wrote: > > https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions > > ------- > > * **Status**: Proposal > * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio > * **Pull Request**: See Below > * **Mailing List discussion**: > > Motivation > > There has been a renewed interest from users in lightweight computing > frameworks. Typical things what they mean by lightweight is: > > 1. They are not compute systems that need to be installed/run/monitored. > Thus they are much more ops light. Some of them are offered as pure > SaaS(like AWS Lambda) while others are integrated with message queues(like > KStreams) > 2. Their interface should be as simple as it gets. Typically it takes > the form of a function/subroutine that is the basic compute block in most > programming languages. And API must be multi-language capable. > 3. The deployment models should be flexible. Users should be able to run > these functions using their favorite management tools, or they can run them > with the brokers. > > The aim of all of these would be to dramatically increase the pace of > experimentation/dev productivity. They also fit in the event driven > architecture that most companies are moving towards where data is > constantly arriving. The aim is for users to run simple functions against > arriving data and not really worry about mastering the complicated > API/semantics as well as managing/monitoring a complex compute infra. > > A message queue like Pulsar sits at the heart of any event driven > architecture. Data coming in from all sources typically lands in the > message bus first. Thus if Pulsar(or a Pulsar extension) has this feature > of being able to register/run simple user functions, it could be a long way > to drive Pulsar adoption. Users could just deploy Pulsar and instantly have > a very flexible way of doing basic computation. > > This document outlines the goals/design of what we want in such a system > and how they can be built into Pulsar. > <https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions#goals> > Goals > > 1. Simplest possible programmability: This is the overarching goal. > Anyone with the ability to write a function in a supported language should > be able to get productive in matter of minutes. > 2. Multi Language Capability:- We should provide the API in at-least the > most popular languages, Java/Scala/Python/Go/JavaScript. > 3. Flexible runtime deployment:- User should be able to run these > functions as a simple process using their favorite management tools. They > should also be able to submit their functions to be run in a Pulsar cluster. > 4. Built in State Management:- Computations should be allowed to keep > state across computations. The system should take care of persisting this > state in a robust manner. Basic things like incrBy/get/put/update > functionality is a must. This dramatically simplifies the architecture for > the developer. > 5. Queryable State:- The state written by a function should be queryable > using standard rest apis. > 6. Automatic Load Balancing:- The Managed runtime should take care of > assigning workers to the functions. > 7. Scale Up/Down:- Users should be able to scale up/down the number of > function instances in the managed runtime. > 8. Flexible Invocation:- Thread based, process based and docker based > invocation should be supported for running each function. > 9. Metrics:- Basic metrics like events processed per second, failures, > latency etc should be made available on a per function basis. Users should > also be able to publish their own metrics > 10. REST interface:- Function control should be using REST protocol to > have the widest adoption. > 11. Library/CLI:- Simple Libraries in all supported languages should > exist. Also should come with basic CLI to register/list/query/stats and > other admin activities. > > More details on the PIP page. > Thanks!