[DISCUSS] PIP 15: Pulsar Functions

Sanjeev Kulkarni Tue, 20 Feb 2018 16:05:52 -0800

https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions


-------

 * **Status**: Proposal
 * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio
 * **Pull Request**: See Below
 * **Mailing List discussion**:

Motivation

There has been a renewed interest from users in lightweight computing
frameworks. Typical things what they mean by lightweight is:

   1. They are not compute systems that need to be installed/run/monitored.
   Thus they are much more ops light. Some of them are offered as pure
   SaaS(like AWS Lambda) while others are integrated with message queues(like
   KStreams)
   2. Their interface should be as simple as it gets. Typically it takes
   the form of a function/subroutine that is the basic compute block in most
   programming languages. And API must be multi-language capable.
   3. The deployment models should be flexible. Users should be able to run
   these functions using their favorite management tools, or they can run them
   with the brokers.

The aim of all of these would be to dramatically increase the pace of
experimentation/dev productivity. They also fit in the event driven
architecture that most companies are moving towards where data is
constantly arriving. The aim is for users to run simple functions against
arriving data and not really worry about mastering the complicated
API/semantics as well as managing/monitoring a complex compute infra.

A message queue like Pulsar sits at the heart of any event driven
architecture. Data coming in from all sources typically lands in the
message bus first. Thus if Pulsar(or a Pulsar extension) has this feature
of being able to register/run simple user functions, it could be a long way
to drive Pulsar adoption. Users could just deploy Pulsar and instantly have
a very flexible way of doing basic computation.

This document outlines the goals/design of what we want in such a system
and how they can be built into Pulsar.
<https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions#goals>
Goals

   1. Simplest possible programmability: This is the overarching goal.
   Anyone with the ability to write a function in a supported language should
   be able to get productive in matter of minutes.
   2. Multi Language Capability:- We should provide the API in at-least the
   most popular languages, Java/Scala/Python/Go/JavaScript.
   3. Flexible runtime deployment:- User should be able to run these
   functions as a simple process using their favorite management tools. They
   should also be able to submit their functions to be run in a Pulsar cluster.
   4. Built in State Management:- Computations should be allowed to keep
   state across computations. The system should take care of persisting this
   state in a robust manner. Basic things like incrBy/get/put/update
   functionality is a must. This dramatically simplifies the architecture for
   the developer.
   5. Queryable State:- The state written by a function should be queryable
   using standard rest apis.
   6. Automatic Load Balancing:- The Managed runtime should take care of
   assigning workers to the functions.
   7. Scale Up/Down:- Users should be able to scale up/down the number of
   function instances in the managed runtime.
   8. Flexible Invocation:- Thread based, process based and docker based
   invocation should be supported for running each function.
   9. Metrics:- Basic metrics like events processed per second, failures,
   latency etc should be made available on a per function basis. Users should
   also be able to publish their own metrics
   10. REST interface:- Function control should be using REST protocol to
   have the widest adoption.
   11. Library/CLI:- Simple Libraries in all supported languages should
   exist. Also should come with basic CLI to register/list/query/stats and
   other admin activities.

More details on the PIP page.
Thanks!

[DISCUSS] PIP 15: Pulsar Functions

Reply via email to