https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions
------- * **Status**: Proposal * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio * **Pull Request**: See Below * **Mailing List discussion**: Motivation There has been a renewed interest from users in lightweight computing frameworks. Typical things what they mean by lightweight is: 1. They are not compute systems that need to be installed/run/monitored. Thus they are much more ops light. Some of them are offered as pure SaaS(like AWS Lambda) while others are integrated with message queues(like KStreams) 2. Their interface should be as simple as it gets. Typically it takes the form of a function/subroutine that is the basic compute block in most programming languages. And API must be multi-language capable. 3. The deployment models should be flexible. Users should be able to run these functions using their favorite management tools, or they can run them with the brokers. The aim of all of these would be to dramatically increase the pace of experimentation/dev productivity. They also fit in the event driven architecture that most companies are moving towards where data is constantly arriving. The aim is for users to run simple functions against arriving data and not really worry about mastering the complicated API/semantics as well as managing/monitoring a complex compute infra. A message queue like Pulsar sits at the heart of any event driven architecture. Data coming in from all sources typically lands in the message bus first. Thus if Pulsar(or a Pulsar extension) has this feature of being able to register/run simple user functions, it could be a long way to drive Pulsar adoption. Users could just deploy Pulsar and instantly have a very flexible way of doing basic computation. This document outlines the goals/design of what we want in such a system and how they can be built into Pulsar. <https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions#goals> Goals 1. Simplest possible programmability: This is the overarching goal. Anyone with the ability to write a function in a supported language should be able to get productive in matter of minutes. 2. Multi Language Capability:- We should provide the API in at-least the most popular languages, Java/Scala/Python/Go/JavaScript. 3. Flexible runtime deployment:- User should be able to run these functions as a simple process using their favorite management tools. They should also be able to submit their functions to be run in a Pulsar cluster. 4. Built in State Management:- Computations should be allowed to keep state across computations. The system should take care of persisting this state in a robust manner. Basic things like incrBy/get/put/update functionality is a must. This dramatically simplifies the architecture for the developer. 5. Queryable State:- The state written by a function should be queryable using standard rest apis. 6. Automatic Load Balancing:- The Managed runtime should take care of assigning workers to the functions. 7. Scale Up/Down:- Users should be able to scale up/down the number of function instances in the managed runtime. 8. Flexible Invocation:- Thread based, process based and docker based invocation should be supported for running each function. 9. Metrics:- Basic metrics like events processed per second, failures, latency etc should be made available on a per function basis. Users should also be able to publish their own metrics 10. REST interface:- Function control should be using REST protocol to have the widest adoption. 11. Library/CLI:- Simple Libraries in all supported languages should exist. Also should come with basic CLI to register/list/query/stats and other admin activities. More details on the PIP page. Thanks!