Re: [DISCUSS] PIP 15: Pulsar Functions

Sanjeev Kulkarni Wed, 21 Feb 2018 13:48:43 -0800

Thread based and process based run times exist in code. Docker based
runtime is still to be done. We plan to release a preview version in a
couple of weeks. And based on community feedback evolve from there.
Hope that helps!


On Wed, Feb 21, 2018 at 1:12 PM Dave Fisher <dave2w...@comcast.net> wrote:

> Hi Sanjeev -
>
> I have read the PIP more carefully on my computer (rather than iPhone).
>
>
>    1. Process Runtime in which each instance is run as a process.
>    2. Docker Runtime in which each instance is run as a docker container
>    3. Threaded Runtime in which each instance is run as a thread. This
>    type is applicable only to Java instance since Pulsar Functions framework
>    itself is written in Java.
>
> I’m interested in knowing a bit more about the Runtime API for these three
> types.
>
> How much of the PIP exists in code?
>
> Best Regards,
> Dave
>
>
> On Feb 20, 2018, at 7:33 PM, Sanjeev Kulkarni <sanjee...@gmail.com> wrote:
>
> Hi Dave,
> Chaining functions is certainly on the roadmap. The PIP document briefly
> talks about at-least two ways of doing it, but it probably requires another
> PIP by itself at a later stage.
> Wrt parallelism, for functions managed by Pulsar cluster, parallelism can
> be provided at submission time. For functions that will be run as a simple
> process, the parallelism should be managed by the user.
> WRT cpu/memory and other configuration, the aim of the inbuilt Pulsar
> cluster is to keep it simple by just doing some simple distribution across
> multiple workers. The aim is not to replicate features that are already
> present in full-fledged schedulers like Mesos/Yarn/K8. If one needs
> memory/cpu bounds for a function, the ideal way to do that would be to run
> them on one of these full-blown schedulers.  We could provide an easier
> path for users to run these functions onto these schedulers by providing
> launch templates.
> Hope that helps.
>
> On Tue, Feb 20, 2018 at 6:08 PM, Dave Fisher <dave2w...@comcast.net>
> wrote:
>
> Hi -
>
> This is very interesting. I’ve been thinking about using Heron for this
> functionality.
>
> An Admin API for configuring the functions on live Executors and
> specifying a unique return value Topic need discussion. I would also like
> to chain Functions.
>
> I think Functions will need Profiles to include metadata for parallelism,
> memory, configuration, etc.
>
> Regards,
> Dave
>
> Sent from my iPhone
>
> On Feb 20, 2018, at 4:05 PM, Sanjeev Kulkarni <sanjee...@gmail.com>
>
> wrote:
>
>
> https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions
>
> -------
>
> * **Status**: Proposal
> * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio
> * **Pull Request**: See Below
> * **Mailing List discussion**:
>
> Motivation
>
> There has been a renewed interest from users in lightweight computing
> frameworks. Typical things what they mean by lightweight is:
>
> 1. They are not compute systems that need to be installed/run/monitored.
> Thus they are much more ops light. Some of them are offered as pure
> SaaS(like AWS Lambda) while others are integrated with message
>
> queues(like
>
> KStreams)
> 2. Their interface should be as simple as it gets. Typically it takes
> the form of a function/subroutine that is the basic compute block in
>
> most
>
> programming languages. And API must be multi-language capable.
> 3. The deployment models should be flexible. Users should be able to run
> these functions using their favorite management tools, or they can run
>
> them
>
> with the brokers.
>
> The aim of all of these would be to dramatically increase the pace of
> experimentation/dev productivity. They also fit in the event driven
> architecture that most companies are moving towards where data is
> constantly arriving. The aim is for users to run simple functions against
> arriving data and not really worry about mastering the complicated
> API/semantics as well as managing/monitoring a complex compute infra.
>
> A message queue like Pulsar sits at the heart of any event driven
> architecture. Data coming in from all sources typically lands in the
> message bus first. Thus if Pulsar(or a Pulsar extension) has this feature
> of being able to register/run simple user functions, it could be a long
>
> way
>
> to drive Pulsar adoption. Users could just deploy Pulsar and instantly
>
> have
>
> a very flexible way of doing basic computation.
>
> This document outlines the goals/design of what we want in such a system
> and how they can be built into Pulsar.
> <https://github.com/apache/incubator-pulsar/wiki/PIP-15:-
>
> Pulsar-Functions#goals>
>
> Goals
>
> 1. Simplest possible programmability: This is the overarching goal.
> Anyone with the ability to write a function in a supported language
>
> should
>
> be able to get productive in matter of minutes.
> 2. Multi Language Capability:- We should provide the API in at-least the
> most popular languages, Java/Scala/Python/Go/JavaScript.
> 3. Flexible runtime deployment:- User should be able to run these
> functions as a simple process using their favorite management tools.
>
> They
>
> should also be able to submit their functions to be run in a Pulsar
>
> cluster.
>
> 4. Built in State Management:- Computations should be allowed to keep
> state across computations. The system should take care of persisting
>
> this
>
> state in a robust manner. Basic things like incrBy/get/put/update
> functionality is a must. This dramatically simplifies the architecture
>
> for
>
> the developer.
> 5. Queryable State:- The state written by a function should be queryable
> using standard rest apis.
> 6. Automatic Load Balancing:- The Managed runtime should take care of
> assigning workers to the functions.
> 7. Scale Up/Down:- Users should be able to scale up/down the number of
> function instances in the managed runtime.
> 8. Flexible Invocation:- Thread based, process based and docker based
> invocation should be supported for running each function.
> 9. Metrics:- Basic metrics like events processed per second, failures,
> latency etc should be made available on a per function basis. Users
>
> should
>
> also be able to publish their own metrics
> 10. REST interface:- Function control should be using REST protocol to
> have the widest adoption.
> 11. Library/CLI:- Simple Libraries in all supported languages should
> exist. Also should come with basic CLI to register/list/query/stats and
> other admin activities.
>
> More details on the PIP page.
> Thanks!
>
>
>
>
>

Re: [DISCUSS] PIP 15: Pulsar Functions

Reply via email to