Hi all, Thank you for everyone in this email thread. It seems that people are interested in this feature. We'd like to contribute our initial work as part of pulsar and continue the development of this idea under the ASF.
I am going to send a pull request soon this week. The pull request is going to be a "gaint" pull request, however we have made sure all the changes are made under a submodule called "pulsar-functions", so that pull requests will not contain any changes to the main pulsar repo. Hopefully it would be easier for the community to accept this feature :-) I know pulsar repo only accepts squash merges now. I am wondering is there anyway for accepting this feature while keeping all the commits for it? We would also like to see is there any better approaches for merging this change :-) Thanks, Sijie On Wed, Feb 21, 2018 at 1:48 PM, Sanjeev Kulkarni <sanjee...@gmail.com> wrote: > Thread based and process based run times exist in code. Docker based > runtime is still to be done. We plan to release a preview version in a > couple of weeks. And based on community feedback evolve from there. > Hope that helps! > > On Wed, Feb 21, 2018 at 1:12 PM Dave Fisher <dave2w...@comcast.net> wrote: > > > Hi Sanjeev - > > > > I have read the PIP more carefully on my computer (rather than iPhone). > > > > > > 1. Process Runtime in which each instance is run as a process. > > 2. Docker Runtime in which each instance is run as a docker container > > 3. Threaded Runtime in which each instance is run as a thread. This > > type is applicable only to Java instance since Pulsar Functions > framework > > itself is written in Java. > > > > I’m interested in knowing a bit more about the Runtime API for these > three > > types. > > > > How much of the PIP exists in code? > > > > Best Regards, > > Dave > > > > > > On Feb 20, 2018, at 7:33 PM, Sanjeev Kulkarni <sanjee...@gmail.com> > wrote: > > > > Hi Dave, > > Chaining functions is certainly on the roadmap. The PIP document briefly > > talks about at-least two ways of doing it, but it probably requires > another > > PIP by itself at a later stage. > > Wrt parallelism, for functions managed by Pulsar cluster, parallelism can > > be provided at submission time. For functions that will be run as a > simple > > process, the parallelism should be managed by the user. > > WRT cpu/memory and other configuration, the aim of the inbuilt Pulsar > > cluster is to keep it simple by just doing some simple distribution > across > > multiple workers. The aim is not to replicate features that are already > > present in full-fledged schedulers like Mesos/Yarn/K8. If one needs > > memory/cpu bounds for a function, the ideal way to do that would be to > run > > them on one of these full-blown schedulers. We could provide an easier > > path for users to run these functions onto these schedulers by providing > > launch templates. > > Hope that helps. > > > > On Tue, Feb 20, 2018 at 6:08 PM, Dave Fisher <dave2w...@comcast.net> > > wrote: > > > > Hi - > > > > This is very interesting. I’ve been thinking about using Heron for this > > functionality. > > > > An Admin API for configuring the functions on live Executors and > > specifying a unique return value Topic need discussion. I would also like > > to chain Functions. > > > > I think Functions will need Profiles to include metadata for parallelism, > > memory, configuration, etc. > > > > Regards, > > Dave > > > > Sent from my iPhone > > > > On Feb 20, 2018, at 4:05 PM, Sanjeev Kulkarni <sanjee...@gmail.com> > > > > wrote: > > > > > > https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Pulsar-Functions > > > > ------- > > > > * **Status**: Proposal > > * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio > > * **Pull Request**: See Below > > * **Mailing List discussion**: > > > > Motivation > > > > There has been a renewed interest from users in lightweight computing > > frameworks. Typical things what they mean by lightweight is: > > > > 1. They are not compute systems that need to be installed/run/monitored. > > Thus they are much more ops light. Some of them are offered as pure > > SaaS(like AWS Lambda) while others are integrated with message > > > > queues(like > > > > KStreams) > > 2. Their interface should be as simple as it gets. Typically it takes > > the form of a function/subroutine that is the basic compute block in > > > > most > > > > programming languages. And API must be multi-language capable. > > 3. The deployment models should be flexible. Users should be able to run > > these functions using their favorite management tools, or they can run > > > > them > > > > with the brokers. > > > > The aim of all of these would be to dramatically increase the pace of > > experimentation/dev productivity. They also fit in the event driven > > architecture that most companies are moving towards where data is > > constantly arriving. The aim is for users to run simple functions against > > arriving data and not really worry about mastering the complicated > > API/semantics as well as managing/monitoring a complex compute infra. > > > > A message queue like Pulsar sits at the heart of any event driven > > architecture. Data coming in from all sources typically lands in the > > message bus first. Thus if Pulsar(or a Pulsar extension) has this feature > > of being able to register/run simple user functions, it could be a long > > > > way > > > > to drive Pulsar adoption. Users could just deploy Pulsar and instantly > > > > have > > > > a very flexible way of doing basic computation. > > > > This document outlines the goals/design of what we want in such a system > > and how they can be built into Pulsar. > > <https://github.com/apache/incubator-pulsar/wiki/PIP-15:- > > > > Pulsar-Functions#goals> > > > > Goals > > > > 1. Simplest possible programmability: This is the overarching goal. > > Anyone with the ability to write a function in a supported language > > > > should > > > > be able to get productive in matter of minutes. > > 2. Multi Language Capability:- We should provide the API in at-least the > > most popular languages, Java/Scala/Python/Go/JavaScript. > > 3. Flexible runtime deployment:- User should be able to run these > > functions as a simple process using their favorite management tools. > > > > They > > > > should also be able to submit their functions to be run in a Pulsar > > > > cluster. > > > > 4. Built in State Management:- Computations should be allowed to keep > > state across computations. The system should take care of persisting > > > > this > > > > state in a robust manner. Basic things like incrBy/get/put/update > > functionality is a must. This dramatically simplifies the architecture > > > > for > > > > the developer. > > 5. Queryable State:- The state written by a function should be queryable > > using standard rest apis. > > 6. Automatic Load Balancing:- The Managed runtime should take care of > > assigning workers to the functions. > > 7. Scale Up/Down:- Users should be able to scale up/down the number of > > function instances in the managed runtime. > > 8. Flexible Invocation:- Thread based, process based and docker based > > invocation should be supported for running each function. > > 9. Metrics:- Basic metrics like events processed per second, failures, > > latency etc should be made available on a per function basis. Users > > > > should > > > > also be able to publish their own metrics > > 10. REST interface:- Function control should be using REST protocol to > > have the widest adoption. > > 11. Library/CLI:- Simple Libraries in all supported languages should > > exist. Also should come with basic CLI to register/list/query/stats and > > other admin activities. > > > > More details on the PIP page. > > Thanks! > > > > > > > > > > >