Hi all,

We've sent out a PR for contributing the pulsar-functions:
https://github.com/apache/incubator-pulsar/pull/1314

All the changes are made within a sub-module called `pulsar-functions`.

Look forward to comments and reviews.

- Sijie

On Tue, Feb 27, 2018 at 2:43 PM, Sijie Guo <guosi...@gmail.com> wrote:

> Hi all,
>
> Thank you for everyone in this email thread. It seems that people are
> interested in this feature. We'd like to contribute our initial work as
> part of pulsar and continue the development of this idea under the ASF.
>
> I am going to send a pull request soon this week. The pull request is
> going to be a "gaint" pull request, however we have made sure all the
> changes are made under a submodule called "pulsar-functions", so that pull
> requests
> will not contain any changes to the main pulsar repo. Hopefully it would
> be easier for the community to accept this feature :-)
>
> I know pulsar repo only accepts squash merges now. I am wondering is there
> anyway for accepting this feature while keeping all the commits for it?
>
> We would also like to see is there any better approaches for merging this
> change :-)
>
> Thanks,
> Sijie
>
> On Wed, Feb 21, 2018 at 1:48 PM, Sanjeev Kulkarni <sanjee...@gmail.com>
> wrote:
>
>> Thread based and process based run times exist in code. Docker based
>> runtime is still to be done. We plan to release a preview version in a
>> couple of weeks. And based on community feedback evolve from there.
>> Hope that helps!
>>
>> On Wed, Feb 21, 2018 at 1:12 PM Dave Fisher <dave2w...@comcast.net>
>> wrote:
>>
>> > Hi Sanjeev -
>> >
>> > I have read the PIP more carefully on my computer (rather than iPhone).
>> >
>> >
>> >    1. Process Runtime in which each instance is run as a process.
>> >    2. Docker Runtime in which each instance is run as a docker container
>> >    3. Threaded Runtime in which each instance is run as a thread. This
>> >    type is applicable only to Java instance since Pulsar Functions
>> framework
>> >    itself is written in Java.
>> >
>> > I’m interested in knowing a bit more about the Runtime API for these
>> three
>> > types.
>> >
>> > How much of the PIP exists in code?
>> >
>> > Best Regards,
>> > Dave
>> >
>> >
>> > On Feb 20, 2018, at 7:33 PM, Sanjeev Kulkarni <sanjee...@gmail.com>
>> wrote:
>> >
>> > Hi Dave,
>> > Chaining functions is certainly on the roadmap. The PIP document briefly
>> > talks about at-least two ways of doing it, but it probably requires
>> another
>> > PIP by itself at a later stage.
>> > Wrt parallelism, for functions managed by Pulsar cluster, parallelism
>> can
>> > be provided at submission time. For functions that will be run as a
>> simple
>> > process, the parallelism should be managed by the user.
>> > WRT cpu/memory and other configuration, the aim of the inbuilt Pulsar
>> > cluster is to keep it simple by just doing some simple distribution
>> across
>> > multiple workers. The aim is not to replicate features that are already
>> > present in full-fledged schedulers like Mesos/Yarn/K8. If one needs
>> > memory/cpu bounds for a function, the ideal way to do that would be to
>> run
>> > them on one of these full-blown schedulers.  We could provide an easier
>> > path for users to run these functions onto these schedulers by providing
>> > launch templates.
>> > Hope that helps.
>> >
>> > On Tue, Feb 20, 2018 at 6:08 PM, Dave Fisher <dave2w...@comcast.net>
>> > wrote:
>> >
>> > Hi -
>> >
>> > This is very interesting. I’ve been thinking about using Heron for this
>> > functionality.
>> >
>> > An Admin API for configuring the functions on live Executors and
>> > specifying a unique return value Topic need discussion. I would also
>> like
>> > to chain Functions.
>> >
>> > I think Functions will need Profiles to include metadata for
>> parallelism,
>> > memory, configuration, etc.
>> >
>> > Regards,
>> > Dave
>> >
>> > Sent from my iPhone
>> >
>> > On Feb 20, 2018, at 4:05 PM, Sanjeev Kulkarni <sanjee...@gmail.com>
>> >
>> > wrote:
>> >
>> >
>> > https://github.com/apache/incubator-pulsar/wiki/PIP-15:-Puls
>> ar-Functions
>> >
>> > -------
>> >
>> > * **Status**: Proposal
>> > * **Author**: Sanjeev Kulkarni/Sijie Guo/Jerry Peng - Streamlio
>> > * **Pull Request**: See Below
>> > * **Mailing List discussion**:
>> >
>> > Motivation
>> >
>> > There has been a renewed interest from users in lightweight computing
>> > frameworks. Typical things what they mean by lightweight is:
>> >
>> > 1. They are not compute systems that need to be installed/run/monitored.
>> > Thus they are much more ops light. Some of them are offered as pure
>> > SaaS(like AWS Lambda) while others are integrated with message
>> >
>> > queues(like
>> >
>> > KStreams)
>> > 2. Their interface should be as simple as it gets. Typically it takes
>> > the form of a function/subroutine that is the basic compute block in
>> >
>> > most
>> >
>> > programming languages. And API must be multi-language capable.
>> > 3. The deployment models should be flexible. Users should be able to run
>> > these functions using their favorite management tools, or they can run
>> >
>> > them
>> >
>> > with the brokers.
>> >
>> > The aim of all of these would be to dramatically increase the pace of
>> > experimentation/dev productivity. They also fit in the event driven
>> > architecture that most companies are moving towards where data is
>> > constantly arriving. The aim is for users to run simple functions
>> against
>> > arriving data and not really worry about mastering the complicated
>> > API/semantics as well as managing/monitoring a complex compute infra.
>> >
>> > A message queue like Pulsar sits at the heart of any event driven
>> > architecture. Data coming in from all sources typically lands in the
>> > message bus first. Thus if Pulsar(or a Pulsar extension) has this
>> feature
>> > of being able to register/run simple user functions, it could be a long
>> >
>> > way
>> >
>> > to drive Pulsar adoption. Users could just deploy Pulsar and instantly
>> >
>> > have
>> >
>> > a very flexible way of doing basic computation.
>> >
>> > This document outlines the goals/design of what we want in such a system
>> > and how they can be built into Pulsar.
>> > <https://github.com/apache/incubator-pulsar/wiki/PIP-15:-
>> >
>> > Pulsar-Functions#goals>
>> >
>> > Goals
>> >
>> > 1. Simplest possible programmability: This is the overarching goal.
>> > Anyone with the ability to write a function in a supported language
>> >
>> > should
>> >
>> > be able to get productive in matter of minutes.
>> > 2. Multi Language Capability:- We should provide the API in at-least the
>> > most popular languages, Java/Scala/Python/Go/JavaScript.
>> > 3. Flexible runtime deployment:- User should be able to run these
>> > functions as a simple process using their favorite management tools.
>> >
>> > They
>> >
>> > should also be able to submit their functions to be run in a Pulsar
>> >
>> > cluster.
>> >
>> > 4. Built in State Management:- Computations should be allowed to keep
>> > state across computations. The system should take care of persisting
>> >
>> > this
>> >
>> > state in a robust manner. Basic things like incrBy/get/put/update
>> > functionality is a must. This dramatically simplifies the architecture
>> >
>> > for
>> >
>> > the developer.
>> > 5. Queryable State:- The state written by a function should be queryable
>> > using standard rest apis.
>> > 6. Automatic Load Balancing:- The Managed runtime should take care of
>> > assigning workers to the functions.
>> > 7. Scale Up/Down:- Users should be able to scale up/down the number of
>> > function instances in the managed runtime.
>> > 8. Flexible Invocation:- Thread based, process based and docker based
>> > invocation should be supported for running each function.
>> > 9. Metrics:- Basic metrics like events processed per second, failures,
>> > latency etc should be made available on a per function basis. Users
>> >
>> > should
>> >
>> > also be able to publish their own metrics
>> > 10. REST interface:- Function control should be using REST protocol to
>> > have the widest adoption.
>> > 11. Library/CLI:- Simple Libraries in all supported languages should
>> > exist. Also should come with basic CLI to register/list/query/stats and
>> > other admin activities.
>> >
>> > More details on the PIP page.
>> > Thanks!
>> >
>> >
>> >
>> >
>> >
>>
>
>

Reply via email to