Hi all, We've sent out a PR for contributing the pulsar-functions: https://github.com/apache/incubator-pulsar/pull/1314
All the changes are made within a sub-module called `pulsar-functions`. Look forward to comments and reviews. - Sijie On Tue, Feb 27, 2018 at 2:43 PM, Sijie Guo <guosi...@gmail.com> wrote: > Hi all, > > Thank you for everyone in this email thread. It seems that people are > interested in this feature. We'd like to contribute our initial work as > part of pulsar and continue the development of this idea under the ASF. > > I am going to send a pull request soon this week. The pull request is > going to be a "gaint" pull request, however we have made sure all the > changes are made under a submodule called "pulsar-functions", so that pull > requests > will not contain any changes to the main pulsar repo. Hopefully it would > be easier for the community to accept this feature :-) > > I know pulsar repo only accepts squash merges now. I am wondering is there > anyway for accepting this feature while keeping all the commits for it? > > We would also like to see is there any better approaches for merging this > change :-) > > Thanks, > Sijie > > On Wed, Feb 21, 2018 at 1:48 PM, Sanjeev Kulkarni <sanjee...@gmail.com> > wrote: > >> Thread based and process based run times exist in code. Docker based >> runtime is still to be done. We plan to release a preview version in a >> couple of weeks. And based on community feedback evolve from there. >> Hope that helps! >> >> On Wed, Feb 21, 2018 at 1:12 PM Dave Fisher <dave2w...@comcast.net> >> wrote: >> >> > Hi Sanjeev - >> > >> > I have read the PIP more carefully on my computer (rather than iPhone). >> > >> > >> > 1. Process Runtime in which each instance is run as a process. >> > 2. Docker Runtime in which each instance is run as a docker container >> > 3. Threaded Runtime in which each instance is run as a thread. This >> > type is applicable only to Java instance since Pulsar Functions >> framework >> > itself is written in Java. >> > >> > I’m interested in knowing a bit more about the Runtime API for these >> three >> > types. >> > >> > How much of the PIP exists in code? >> > >> > Best Regards, >> > Dave >> > >> > >> > On Feb 20, 2018, at 7:33 PM, Sanjeev Kulkarni <sanjee...@gmail.com> >> wrote: >> > >> > Hi Dave, >> > Chaining functions is certainly on the roadmap. The PIP document briefly >> > talks about at-least two ways of doing it, but it probably requires >> another >> > PIP by itself at a later stage. >> > Wrt parallelism, for functions managed by Pulsar cluster, parallelism >> can >> > be provided at submission time. For functions that will be run as a >> simple >> > process, the parallelism should be managed by the user. >> > WRT cpu/memory and other configuration, the aim of the inbuilt Pulsar >> > cluster is to keep it simple by just doing some simple distribution >> across >> > multiple workers. The aim is not to replicate features that are already >> > present in full-fledged schedulers like Mesos/Yarn/K8. If one needs >> > memory/cpu bounds for a function, the ideal way to do that would be to >> run >> > them on one of these full-blown schedulers. We could provide an easier >> > path for users to run these functions onto these schedulers by providing >> > launch templates. >> > Hope that helps. >> > >> > On Tue, Feb 20, 2018 at 6:08 PM, Dave Fisher <dave2w...@comcast.net> >> > wrote: >> > >> > Hi - >> > >> > This is very interesting. I’ve been thinking about using Heron for this >> > functionality. >> > >> > An Admin API for configuring the functions on live Executors and >> > specifying a unique return value Topic need discussion. I would also >> like >> > to chain Functions. >> > >> > I think Functions will need Profiles to include metadata for >> parallelism, >> > memory, configuration, etc. >> > >> > Regards, >> > Dave >> > >> > Sent from my iPhone >> > >> > On Feb 20, 2018, at 4:05 PM, Sanjeev Kulkarni <sanjee...@gmail.com> >> > >> > wrote: >> > >> > >> > https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Puls >> ar-Functions >> > >> > ------- >> > >> > * **Status**: Proposal >> > * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio >> > * **Pull Request**: See Below >> > * **Mailing List discussion**: >> > >> > Motivation >> > >> > There has been a renewed interest from users in lightweight computing >> > frameworks. Typical things what they mean by lightweight is: >> > >> > 1. They are not compute systems that need to be installed/run/monitored. >> > Thus they are much more ops light. Some of them are offered as pure >> > SaaS(like AWS Lambda) while others are integrated with message >> > >> > queues(like >> > >> > KStreams) >> > 2. Their interface should be as simple as it gets. Typically it takes >> > the form of a function/subroutine that is the basic compute block in >> > >> > most >> > >> > programming languages. And API must be multi-language capable. >> > 3. The deployment models should be flexible. Users should be able to run >> > these functions using their favorite management tools, or they can run >> > >> > them >> > >> > with the brokers. >> > >> > The aim of all of these would be to dramatically increase the pace of >> > experimentation/dev productivity. They also fit in the event driven >> > architecture that most companies are moving towards where data is >> > constantly arriving. The aim is for users to run simple functions >> against >> > arriving data and not really worry about mastering the complicated >> > API/semantics as well as managing/monitoring a complex compute infra. >> > >> > A message queue like Pulsar sits at the heart of any event driven >> > architecture. Data coming in from all sources typically lands in the >> > message bus first. Thus if Pulsar(or a Pulsar extension) has this >> feature >> > of being able to register/run simple user functions, it could be a long >> > >> > way >> > >> > to drive Pulsar adoption. Users could just deploy Pulsar and instantly >> > >> > have >> > >> > a very flexible way of doing basic computation. >> > >> > This document outlines the goals/design of what we want in such a system >> > and how they can be built into Pulsar. >> > <https://github.com/apache/incubator-pulsar/wiki/PIP-15:- >> > >> > Pulsar-Functions#goals> >> > >> > Goals >> > >> > 1. Simplest possible programmability: This is the overarching goal. >> > Anyone with the ability to write a function in a supported language >> > >> > should >> > >> > be able to get productive in matter of minutes. >> > 2. Multi Language Capability:- We should provide the API in at-least the >> > most popular languages, Java/Scala/Python/Go/JavaScript. >> > 3. Flexible runtime deployment:- User should be able to run these >> > functions as a simple process using their favorite management tools. >> > >> > They >> > >> > should also be able to submit their functions to be run in a Pulsar >> > >> > cluster. >> > >> > 4. Built in State Management:- Computations should be allowed to keep >> > state across computations. The system should take care of persisting >> > >> > this >> > >> > state in a robust manner. Basic things like incrBy/get/put/update >> > functionality is a must. This dramatically simplifies the architecture >> > >> > for >> > >> > the developer. >> > 5. Queryable State:- The state written by a function should be queryable >> > using standard rest apis. >> > 6. Automatic Load Balancing:- The Managed runtime should take care of >> > assigning workers to the functions. >> > 7. Scale Up/Down:- Users should be able to scale up/down the number of >> > function instances in the managed runtime. >> > 8. Flexible Invocation:- Thread based, process based and docker based >> > invocation should be supported for running each function. >> > 9. Metrics:- Basic metrics like events processed per second, failures, >> > latency etc should be made available on a per function basis. Users >> > >> > should >> > >> > also be able to publish their own metrics >> > 10. REST interface:- Function control should be using REST protocol to >> > have the widest adoption. >> > 11. Library/CLI:- Simple Libraries in all supported languages should >> > exist. Also should come with basic CLI to register/list/query/stats and >> > other admin activities. >> > >> > More details on the PIP page. >> > Thanks! >> > >> > >> > >> > >> > >> > >