Re: [DISCUSS] PIP 15: Pulsar Functions

Sanjeev Kulkarni Tue, 20 Feb 2018 19:34:09 -0800

Hi Dave,
Chaining functions is certainly on the roadmap. The PIP document briefly
talks about at-least two ways of doing it, but it probably requires another
PIP by itself at a later stage.
Wrt parallelism, for functions managed by Pulsar cluster, parallelism can
be provided at submission time. For functions that will be run as a simple
process, the parallelism should be managed by the user.
WRT cpu/memory and other configuration, the aim of the inbuilt Pulsar
cluster is to keep it simple by just doing some simple distribution across
multiple workers. The aim is not to replicate features that are already
present in full-fledged schedulers like Mesos/Yarn/K8. If one needs
memory/cpu bounds for a function, the ideal way to do that would be to run
them on one of these full-blown schedulers.  We could provide an easier
path for users to run these functions onto these schedulers by providing
launch templates.
Hope that helps.


On Tue, Feb 20, 2018 at 6:08 PM, Dave Fisher <dave2w...@comcast.net> wrote:

> Hi -
>
> This is very interesting. I’ve been thinking about using Heron for this
> functionality.
>
> An Admin API for configuring the functions on live Executors and
> specifying a unique return value Topic need discussion. I would also like
> to chain Functions.
>
> I think Functions will need Profiles to include metadata for parallelism,
> memory, configuration, etc.
>
> Regards,
> Dave
>
> Sent from my iPhone
>
> > On Feb 20, 2018, at 4:05 PM, Sanjeev Kulkarni <sanjee...@gmail.com>
> wrote:
> >
> > https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions
> >
> > -------
> >
> > * **Status**: Proposal
> > * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio
> > * **Pull Request**: See Below
> > * **Mailing List discussion**:
> >
> > Motivation
> >
> > There has been a renewed interest from users in lightweight computing
> > frameworks. Typical things what they mean by lightweight is:
> >
> >  1. They are not compute systems that need to be installed/run/monitored.
> >  Thus they are much more ops light. Some of them are offered as pure
> >  SaaS(like AWS Lambda) while others are integrated with message
> queues(like
> >  KStreams)
> >  2. Their interface should be as simple as it gets. Typically it takes
> >  the form of a function/subroutine that is the basic compute block in
> most
> >  programming languages. And API must be multi-language capable.
> >  3. The deployment models should be flexible. Users should be able to run
> >  these functions using their favorite management tools, or they can run
> them
> >  with the brokers.
> >
> > The aim of all of these would be to dramatically increase the pace of
> > experimentation/dev productivity. They also fit in the event driven
> > architecture that most companies are moving towards where data is
> > constantly arriving. The aim is for users to run simple functions against
> > arriving data and not really worry about mastering the complicated
> > API/semantics as well as managing/monitoring a complex compute infra.
> >
> > A message queue like Pulsar sits at the heart of any event driven
> > architecture. Data coming in from all sources typically lands in the
> > message bus first. Thus if Pulsar(or a Pulsar extension) has this feature
> > of being able to register/run simple user functions, it could be a long
> way
> > to drive Pulsar adoption. Users could just deploy Pulsar and instantly
> have
> > a very flexible way of doing basic computation.
> >
> > This document outlines the goals/design of what we want in such a system
> > and how they can be built into Pulsar.
> > <https://github.com/apache/incubator-pulsar/wiki/PIP-15:-
> Pulsar-Functions#goals>
> > Goals
> >
> >  1. Simplest possible programmability: This is the overarching goal.
> >  Anyone with the ability to write a function in a supported language
> should
> >  be able to get productive in matter of minutes.
> >  2. Multi Language Capability:- We should provide the API in at-least the
> >  most popular languages, Java/Scala/Python/Go/JavaScript.
> >  3. Flexible runtime deployment:- User should be able to run these
> >  functions as a simple process using their favorite management tools.
> They
> >  should also be able to submit their functions to be run in a Pulsar
> cluster.
> >  4. Built in State Management:- Computations should be allowed to keep
> >  state across computations. The system should take care of persisting
> this
> >  state in a robust manner. Basic things like incrBy/get/put/update
> >  functionality is a must. This dramatically simplifies the architecture
> for
> >  the developer.
> >  5. Queryable State:- The state written by a function should be queryable
> >  using standard rest apis.
> >  6. Automatic Load Balancing:- The Managed runtime should take care of
> >  assigning workers to the functions.
> >  7. Scale Up/Down:- Users should be able to scale up/down the number of
> >  function instances in the managed runtime.
> >  8. Flexible Invocation:- Thread based, process based and docker based
> >  invocation should be supported for running each function.
> >  9. Metrics:- Basic metrics like events processed per second, failures,
> >  latency etc should be made available on a per function basis. Users
> should
> >  also be able to publish their own metrics
> >  10. REST interface:- Function control should be using REST protocol to
> >  have the widest adoption.
> >  11. Library/CLI:- Simple Libraries in all supported languages should
> >  exist. Also should come with basic CLI to register/list/query/stats and
> >  other admin activities.
> >
> > More details on the PIP page.
> > Thanks!
>
>

Re: [DISCUSS] PIP 15: Pulsar Functions

Reply via email to