Re: [DISCUSS] Retire pulsar-all Docker image and spin-off Python Functions runtime

2024-03-07 Thread Neng Lu
+1

This can reduce the image size significantly and thus improve the
efficiency and reduce the cost.

On Tue, Mar 5, 2024 at 11:25 PM Enrico Olivelli  wrote:

> +1
>
> Great idea
>
> Enrico
>
> Il Mer 6 Mar 2024, 08:23 Zixuan Liu  ha scritto:
>
> > +1
> >
> > This is a good idea, and then we must provide a document on building the
> > own connector image and python functions runtime image.
> >
> > Thanks,
> > Zixuan
> >
> > Matteo Merli  于2024年3月6日周三 07:04写道:
> >
> > > The docker image `pulsar-all` is a convenience image that is created on
> > top
> > > of the base `pulsar` image, including all the Pulsar IO connectors as
> > well
> > > as the tiered storage offloaders.
> > >
> > > The Dockerfile for `pulsar-all` can be found here:
> > >
> >
> https://github.com/apache/pulsar/blob/master/docker/pulsar-all/Dockerfile
> > >
> > > The resulting image is very big:
> > >
> > > ```
> > > apachepulsar/pulsar-all   3.1.2
> > >  3d1aa250bf6c   2 months ago3.68GB
> > > ```
> > >
> > > This poses a challenge in many ways:
> > >  1. Our CI pipeline needs to build these images and cache them across
> > > different stages of the pipeline
> > >  2. It takes a lot of time for release managers to build and push these
> > > images to Docker Hub
> > >  3. Users using this image in production see very long download times,
> > > something that can affect the availability of the system (eg: more
> > chances
> > > of a 2nd broker to crash if a restart takes a very long time).
> > >  4. It's very unlikely that one user will require all the connectors,
> > most
> > > likely, it would use just 2-3 of them.
> > >
> > > The problem is that `pulsar-all` was introduced at a time when there
> were
> > > ~3 Pulsar IO connectors. Right now we do have 35 connectors, with a 1.9
> > GB
> > > total size.
> > >
> > > The proposal here is to drop this image altogether. Users will be able
> to
> > > construct their own targeted images in a very simple way:
> > >
> > > ```
> > > FROM apachepulsar/pulsar:latest
> > > RUN mkdir -p connectors && \
> > > cd connectors && \
> > > wget
> > >
> > >
> >
> https://downloads.apache.org/pulsar/pulsar-3.2.0/connectors/pulsar-io-elastic-search-3.2.0.nar
> > > ```
> > >
> > >
> > >
> > > ### Pulsar Functions Python Runtime
> > >
> > > In order to support Python functions runtime, we have been including
> the
> > > Pulsar base image with quite a bit of dependencies, from
> `pulsar-client`
> > > Python SDK, to gRPC which is quite a heavy package with many transitive
> > > dependencies.
> > >
> > > Given that the vast majority would be using the `pulsar` base image to
> > run
> > > brokers and not python functions, it would make sense to split the
> Python
> > > support into a different image, like `pulsar-functions-python`, which
> > > extends from the base image and adds all the needed Python
> dependencies.
> > >
> > > This way it will be very easy for users to select the appropriate image
> > and
> > > we wouldn't be carrying a big amount of useless Python dependencies to
> > > users who don't need them.
> > >
> > >
> > > What are people's opinions with respect to this?
> > >
> > > Matteo
> > >
> > > --
> > > Matteo Merli
> > > 
> > >
> >
>


-- 
Best Regards,
Neng


Re: (Apache committer criteria) [ANNOUNCE] New Committer: Asaf Mesika

2024-03-07 Thread Neng Lu
I normally tend to be a silent observer of the community due to my
limited time schedule.
But I feel the obligation to share my thoughts in this thread so that
people are not misled.

> Most of the PIP monitoring involves adding plugins to existing
> metrics APIs. He has also contributed to the PIP reviews, but his
> contribution is more philosophical rather than technical. Most of his
> comments are comparing Pulsar to other projects, rather than focusing on
> the internal insights that Pulsar brings to the table

Please be aware, per ASF Community Guide[1], "There is nothing at the
Apache Software Foundation that says you must write code in order to be a
committer. Anyone who is supportive of the community and works in any of
the CoPDoC areas is a likely candidate for committership."

And I personally hold this view (which you may disagree): at this stage,
philosophical contributions have a very big impact on the overall health of
Pulsar and community.
And many of Asaf's contributions are guiding us toward the right direction.


> The Pulsar community needs to break away from the monopolies of
> the provider companies, start focusing on stable releases, and let other
> companies make their enhancements to meet their requirements, and
> experienced contributors or Pulsar creators should be active to prevent
> unfairness in the community.

First of all, if there's unfair treatment to any contributor, please
collect the specific proof and file a complaint to the Pulsar PMC and ASF
Board if you want.
I believe they will investigate and decide if that's true or not in an open
and fair way.

Secondly, you can help increase the community diversity by starting working
on Pulsar issues and making more contributions right now.

Thirdly, having a company behind an open source project isn't necessarily a
bad thing. A few examples:
- Apache Spark -- Databricks
- Apache Kafka -- Confluent
- Apache Hadoop -- Cloudera, Hortonworks, MapR
- Apache Cassandra -- DataStax
- Apache Flink -- Ververica
- Apache Beam -- Google
These companies are paying their employees money to work on the open source
project. As long as all the activities follow the community rule, everyone
using these projects are benefiting from these companies' contribution.


> Also, Pulsar may have
> numbers of non-SM PMC’s and committers, but if you look at the numbers
over
> the last 2-3 years, you’ll see that 99% are from SM.

People have replied previously regarding your statement about committers,
so I won't tackle it again.
Instead of just looking at where these new committers are coming from, I
would suggest you think about the following:

  1. Does this new committer make meaningful contributions to the
community?
  2. Is the standard of becoming a Pulsar committer consistent regardless
of where people are coming from?
  3. Did you find anyone who makes great contributions is left behind or
prevented from becoming a committer?


> We also had a very difficult time in finding and running a stable version
> of Pulsar that has no major issues and we are constantly looking for the
> root cause in the main branch or newer versions for possible bug fixes and
> that is very painful.

Apache Pulsar is an evolving open source project, so bugs may be identified
along the way when people are using it in various scenarios.
But the community has invested a lot into its correctness and stability, as
other people mentioned every new release improves significantly.

BTW, if your team is really tired of managing a stable Pulsar, StreamNative
can help you :)


> I'm sure that many users are feeling the same way and
> it's only a question of time until they start to speak up about their
> experiences.

Please focus on sharing your own specific problems and experiences.
I do see many non-SN people speak up in this thread, and they share a
completely different experience than you.


> Many companies have faced similar issues with Confluent. Pulsar was an
> alternative solution, but unfortunately, we have come to the conclusion
> that Pulsar isn’t better with a streamnative provider and reviewers. We
> also found major bugs in each Pulsar release, which forced us to
> continually upgrade Pulsar with little to no stability. With unstable
> releases, a lazy review process, and a single provider-driven system,
> Pulsar will be extremely risky to use for any company and we would rather
> continue with our legacy Kafka clusters.

Pulsar has its own unique advantages over Kafka: multi-tenancy,
geo-replication, messaging queue support, zero-down-time scaling out & in.
We also see many companies(DiDi, Tencent, Paypal, etc) adopting OSS Pulsar.
Some of the problems like bugs in release, responsiveness for PR review can
be improved along the way.
To be fair, DataStax is also a vendor for Pulsar, and many of their folks
are also active in the community.
It's okay to choose other technologies that best fits your own
requirements, but please don't assume 

Re: [VOTE] PIP-312 Use StateStoreProvider to manage state in Pulsar Functions endpoints

2023-11-14 Thread Neng Lu
+1

On 2023/11/15 03:39:42 Pengcheng Jiang wrote:
> Hi Pulsar Community,
> 
> This thread is to start a vote for PIP-312: Use StateStoreProvider to
> manage state in Pulsar Functions endpoints.
> 
> I start the voting process since there are some approves for the PIP PR.
> 
> PR: https://github.com/apache/pulsar/pull/21438
> Discussion thread:
> https://lists.apache.org/thread/0rz29wotonmdck76pdscwbqo19t3rbds
> 
> Sincerely,
> Pengcheng Jiang
> 


Re: [DISCUSS] PIP 289: Secure Pulsar Connector Configuration

2023-07-28 Thread Neng Lu
Hi Michael,

Thanks for writing the PIP for the connector secret issue.

One question I have is why not reusing the `context.getSecret()` method inside 
connectors to get sensitive values. 

In this way, no API/framework changes are needed and all we need to do is 
update each connector to get the secret value with `context.getSecret()` first. 
If nothing provided, then fall back to the plain text way.

What do you think?

On 2023/07/28 21:59:57 Michael Marshall wrote:
> Hi Pulsar Community,
> 
> This is the discussion thread for PIP
> https://github.com/apache/pulsar/pull/20903.
> 
> This PIP will help improve Pulsar Connector Security by giving users
> the ability to remove all plaintext secrets from their configurations.
> 
> Thanks,
> Michael
> 


Re: [DISCUSS] forbid user to upload `BYTES` schema

2023-05-31 Thread Neng Lu
Treating the BYTES schema differently (i.e. forbid uploading it) than other
schemas will be confusing to users.

If the BYTES schema is the default schema for a new topic, then how about
saving a BYTES schema in the registry instead of saving nothing?



On Mon, Apr 17, 2023 at 8:42 PM PengHui Li  wrote:

> I have pushed out a PR to revert this change first.
>
> https://github.com/apache/pulsar/pull/20123
>
> Please help review.
>
> Thanks,
> Penghui
>
> On Tue, Apr 18, 2023 at 11:36 AM PengHui Li  wrote:
>
> > > Flink uses the schema to store some kv based properties. If we can
> > expose all the operations of the topic level metadata store. We can
> > truly drop the use of uploading the BYTES schema in the Flink
> > connector.
> >
> > After 3.0.0, Pulsar will provide the ability to set properties for a
> topic
> > https://github.com/apache/pulsar/pull/17238.
> >
> > Thanks,
> > Penghui
> >
> > On Mon, Apr 17, 2023 at 11:44 PM Yufan Sheng  wrote:
> >
> >> >I support reverting the PR first and then looking for a long-term
> >> solution.
> >>
> >> Flink uses the schema to store some kv based properties. If we can
> >> expose all the operations of the topic level metadata store. We can
> >> truly drop the use of uploading the BYTES schema in the Flink
> >> connector.
> >>
> >> On Mon, Apr 17, 2023 at 12:12 PM PengHui Li  wrote:
> >> >
> >> > I'm sorry. I have provided the wrong description of the changes from
> >> the PR.
> >> > The PR has changed the server side, so it's hard for users to ask to
> >> > upgrade
> >> > the Flink connector if the pulsar server is upgraded.
> >> >
> >> > I support reverting the PR first and then looking for a long-term
> >> solution.
> >> >
> >> > Best,
> >> > Penghui
> >> >
> >> > On Mon, Apr 17, 2023 at 10:16 AM PengHui Li 
> wrote:
> >> >
> >> > > > Did we consider
> >> > > making a call to upload a Bytes schema a no-op?
> >> > >
> >> > > It was a BUG that the PR fixed.
> >> > > You will not be able to get the uploaded schema as expected.
> >> > > Please take a look at the details from the GitHub issue.
> >> > >
> >> > > What is the challenge for the Flink connector now?
> >> > > The changes only take effect on the client side.
> >> > > So, the issue will only happen if they use a new connector.
> >> > > Upgrading the Pulsar server will not make any impaction?
> >> > > Is it better to fix the Flink connector?
> >> > > IMO, the Flink connector should not use admin-api
> >> > > to upload a BYTE schema. It's a redundant operation.
> >> > > Pulsar will do nothing.
> >> > >
> >> > > What do you think about a long-term solution?
> >> > >
> >> > > Regards
> >> > > - Penghui
> >> > >
> >> > > On Sat, Apr 15, 2023 at 12:52 AM Michael Marshall <
> >> mmarsh...@apache.org>
> >> > > wrote:
> >> > >
> >> > >> I think the primary point is that unless there is a strict need, we
> >> > >> shouldn't introduce breaking changes to the implementation. Why did
> >> we
> >> > >> choose to forbid users from uploading a Bytes schema? Did we
> consider
> >> > >> making a call to upload a Bytes schema a no-op?
> >> > >>
> >> > >> Thanks,
> >> > >> Michael
> >> > >>
> >> > >> On Fri, Apr 14, 2023 at 10:46 AM SiNan Liu  >
> >> > >> wrote:
> >> > >> >
> >> > >> > 1. I don't know much about flink, but what I see here is that you
> >> need
> >> > >> to
> >> > >> > save a `ResolvedCatalogTable`, which I see has `CatalogTable`, so
> >> it is
> >> > >> > used to record the metadata information of the table.
> >> > >> > **In
> >> > >> >
> >> > >>
> >>
> org.apache.flink.connector.pulsar.table.catalog.impl.PulsarCatalogSupport#createTable**
> >> > >> > ```java
> >> > >> > @Override
> >> > >> > public void createTable(ObjectPath tablePath,
> >> ResolvedCatalogTable
> >> > >> > table)
> >> > >> > throws PulsarAdminException {
> >> > >> > // only allow creating table in explict database, the
> >> topic is
> >> > >> used
> >> > >> > to save table
> >> > >> > // information
> >> > >> > if (!isExplicitDatabase(tablePath.getDatabaseName())) {
> >> > >> > throw new CatalogException(
> >> > >> > String.format(
> >> > >> > "Can't create explict table under
> >> pulsar
> >> > >> > tenant/namespace: %s because it's a native database",
> >> > >> > tablePath.getDatabaseName()));
> >> > >> > }
> >> > >> >
> >> > >> > String mappedTopic =
> >> > >> findExplicitTablePlaceholderTopic(tablePath);
> >> > >> > pulsarAdminTool.createTopic(mappedTopic, 1);
> >> > >> >
> >> > >> > // use pulsar schema to store explicit table information
> >> > >> > try {
> >> > >> > SchemaInfo schemaInfo =
> >> > >> > TableSchemaHelper.generateSchemaInfo(table.toProperties());
> >> > >> > pulsarAdminTool.uploadSchema(mappedTopic,
> schemaInfo);
> >> > >> > } catch (Exception e) {
> >> > >> > // delete topic if table info cannot be 

Re: [DISCUSS] PIP-272 Add a `StateStoreConfig` to the `WorkerConfig`

2023-05-30 Thread Neng Lu
thanks for the improvements, +1

On Tue, May 30, 2023 at 2:20 AM Pengcheng Jiang
 wrote:

> Hi Mesika:
>
> Thanks for the suggestions, I updated the pip, and for the rest questions:
>
> 5. yes, all config goes through arguments instead of a file
> 6. it should be a JSON string that can be deserialized to a `Map Object>`, updated in pip
> 7. it should be `pulsar-admin functions localrun` command, updated in pip
> 8. the `stateStorageServiceUrl` won't be touched
>
> Sincerely
> Pengcheng Jiang
>
> Asaf Mesika  于2023年5月29日周一 19:53写道:
>
> > Hi Pengcheng,
> >
> > Looks like a solid improvement, definitely helping people using their own
> > state store.
> >
> > I have a few comments:
> >
> > 1. Background knowledge should explain what is a state storage
> > 2. Move problem description from Background Knowledge to Motivation.
> >
> > I'm quoting the template to understand what should be included in
> > the Background knowledge section:
> >
> > 
> >
> > 3. `WorkerConfig` - explain briefly what is Worker and how it differs
> from
> > Broker. Should be in background knowledge section.
> >
> > 4. Background knowledge should explain briefly what is a runtime and
> > runtime factory.
> >
> > 5.
> >
> > Add a new cli argument to JavaInstanceStarter and LocalRunner so
> > > process runtime can pass state related config to them
> >
> >
> > Today all config goes through arguments and not a file?
> >
> > 6. `--stateStorageConfig`
> >   What format is the expected value?
> >
> > 7. `functions local run`
> >  What is this?
> >
> > 8. Are you keeping `stateStorageServiceUrl`? Maybe people rely on it?
> >
> > 9. Don't forget to include link to discussion thread using Apache Pony
> Mail
> >
> >
> > On Mon, May 29, 2023 at 10:44 AM Rui Fu  wrote:
> >
> > > Hi Pengcheng,
> > >
> > > Thanks for bringing this up, the PIP lgtm, +1.
> > >
> > > Best,
> > >
> > > Rui Fu
> > > On May 29, 2023 at 13:52 +0800, Enrico Olivelli ,
> > > wrote:
> > > > Looks good
> > > > +1
> > > >
> > > > Enrico
> > > >
> > > > Il Lun 29 Mag 2023, 04:47 Pengcheng Jiang
> > > >  ha scritto:
> > > >
> > > > > Dear Pulsar community,
> > > > >
> > > > > I created a pip to make pulsar functions' `StateStoreProvider`
> > > configurable
> > > > > with custom configurations:
> > > https://github.com/apache/pulsar/issues/20419
> > > > >
> > > > > Any feedback and suggestions are welcome
> > > > >
> > > > > Sincerely
> > > > > Pengcheng Jiang
> > > > >
> > >
> >
>


Re: [DISCUSS] Improve Pulsar Function Source Primitive Schema Mapping

2023-05-10 Thread Neng Lu
Hi All,

Here's the PR for this proposed change:
https://github.com/apache/pulsar/pull/20294
If you have time, please take a look.

On Fri, May 5, 2023 at 6:08 AM Rui Fu  wrote:

> Hi Neng,
>
> Thanks for bringing this issue up. Using JSON as the default schema and
> wrapping it with other primitive types are counterintuitive, and +1 to make
> [2] align with [1] so that both Pulsar Source and Pulsar Sink will make
> correct support with other primitive types.
>
> And as per the code [3], if the topic already exists, it will try to use
> the existing schema instead of the schema type returned by [2]. So the
> changes will only affect the newly deployed instances.
>
> [3]
> https://github.com/apache/pulsar/blob/branch-3.0/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/TopicSchema.java#L102-L122
>
> Best,
>
> Rui Fu
> On Apr 28, 2023 at 13:36 +0800, Pengcheng Jiang
> , wrote:
> > Hello Neng,
> >
> > IMO, we should update code[2] to follow the doc, and for existing
> > functions, if they are in running status, they won't touch code[2]; and
> for
> > a new run, functions
> > will fail to start, and this will remind users to update their function
> >
> > Regards,
> > Pengcheng Jiang
> >
> > Neng Lu  于2023年4月28日周五 06:59写道:
> >
> > > Hi All,
> > >
> > > Based on [1], Pulsar has various primitive schema types and has a very
> > > clear mapping between java classes to primitive schema types.
> > >
> > > But in code [2], Pulsar Function Source only handles the byte and
> String
> > > java classes primitive schema mapping while default all other primitive
> > > types to JSON schema. Also for byte class types, the NONE schema is
> used
> > > instead of the BYTES schema.
> > >
> > > All these differences cause confusion for users trying to use Pulsar
> > > Functions for the first time, and also make Pulsar Function not
> following
> > > the Pulsar Schema official document.
> > >
> > > Ideally, we should change the code [2], to make it following [1]. But
> such
> > > changes may lead to breaking behaviors for existing users who adapted
> their
> > > code to run the Pulsar Functions.
> > >
> > > I would like to hear your thoughts on this and see how we should
> proceed.
> > >
> > > Thank you! Regards
> > >
> > > [1]
> > >
> https://pulsar.apache.org/docs/2.11.x/schema-understand/#primitive-type
> > > [2]
> > >
> > >
> https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/TopicSchema.java#L124
> > >
>


Re: Publick Stable class SchemaInfoImpl breaks

2023-05-06 Thread Neng Lu
+1

The SchemaInfo class is claimed to be public and stable, but it's actually
changed even between patch releases.
This is really concerning.

On Fri, May 5, 2023 at 11:51 PM tison  wrote:

> PR-16380 adds a new field timestamp. With Lombok config auto-generate, it
> causes the constructor signature to change [2].
>
> Fortunately, we generate a builder class also, and that way remain in
> compatible.
>
> But I'd like to raise this issue that lombok annotation can break
> interfaces silently, so it requires extra attention to review Publick
> Stable class field change, even if bare addition.
>
> Best,
> tison.
>
> [1] https://github.com/apache/pulsar/pull/16380
> [2]
>
> https://github.com/apache/flink-connector-pulsar/pull/44#issuecomment-1537065990
>


[DISCUSS] Improve Pulsar Function Source Primitive Schema Mapping

2023-04-27 Thread Neng Lu
Hi All,

Based on [1], Pulsar has various primitive schema types and has a very
clear mapping between java classes to primitive schema types.

But in code [2], Pulsar Function Source only handles the byte and String
java classes primitive schema mapping while default all other primitive
types to JSON schema. Also for byte class types, the NONE schema is used
instead of the BYTES schema.

All these differences cause confusion for users trying to use Pulsar
Functions for the first time, and also make Pulsar Function not following
the Pulsar Schema official document.

Ideally, we should change the code [2], to make it following [1]. But such
changes may lead to breaking behaviors for existing users who adapted their
code to run the Pulsar Functions.

I would like to hear your thoughts on this and see how we should proceed.

Thank you! Regards

[1] https://pulsar.apache.org/docs/2.11.x/schema-understand/#primitive-type
[2]
https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/TopicSchema.java#L124


Re: [DISCUSS] Separate the Python Functions from the installation of the Python client

2023-04-19 Thread Neng Lu
Hi Yunze,

+1 for separating Python client and Python Pulsar Functions pip installation.
On the Java side, the client lib and functions lib are also published 
separately.

My concern is how the migration progress should look like,
1. we need to set up functions lib so that users can install it using `pip 
install pulsar-functions`
2. the current `pip install pulsar-client[functions]` should prompt user to use 
the new way
3. all docs need to be updated
4. for historical versions, what can we do?



On 2023/04/19 15:23:49 Yunze Xu wrote:
> Hi all,
> 
> The Python client has been separated since PIP-209 [1] and now the
> Python client is maintained in a separated repository [2]. However,
> the Python Function is still maintained in the main repo [3].
> 
> Currently, we can install the Python client with the following ways:
> 1. pip install pulsar-client
> 2. pip install pulsar-client[avro]
> 3. pip install pulsar-client[functions]
> 4. pip install pulsar-client[all]
> 
> See [4] for the difference. However, for the 3rd and 4th ways, it
> installs all the dependencies required by the Python Functions.
> However, they are broken for the recent releases because of the
> outdated dependencies [5]. However, these dependencies are from the
> Python Functions [3], not the Python client library itself. Also,
> there are no tests in the Python client repo [2] for these functions
> so these dependencies cannot be verified.
> 
> IMO, these dependencies should be maintained in the directory of the
> Python Functions. We should not rely on the Python client to install
> the dependencies for the Python Functions.
> 
> Therefore, my suggestion is to drop the 3rd and 4th installation
> methods in future releases of the Python client. After that, we should
> update the scripts in the main repo to install the Python Functions in
> [3].
> 
> I'm not familiar with the Pulsar Functions, so feel free to show your
> suggestions if any of you have any concerns.
> 
> [1] https://github.com/apache/pulsar/pull/17881
> [2] http://github.com/apache/pulsar-client-python
> [3] 
> https://github.com/apache/pulsar/tree/master/pulsar-functions/instance/src/main/python
> [4] https://pulsar.apache.org/docs/2.11.x/client-libraries-python/
> [5] 
> https://github.com/apache/pulsar-client-python/blob/a6476d9c45508f55a7af4b25001038a8e3a27489/setup.py#L80-L88
> 
> Thanks,
> Yunze
> 


Re: [Discussion] Allowing configure if function consumer should skip to latest

2023-03-08 Thread Neng Lu
Hi everyone,

This discussion has been one week old. If there's no objection or concerns, 
I'll move forward and close the discussion with the conclusion that we are good 
with the proposed change.

This will result in the PR merge (although it was already merged, so the merged 
change won't be reverted in this case).

On 2023/03/07 03:45:02 Neng Lu wrote:
> Hi Penghui,
> 
> Thanks for your question.
> 
> One case is failure recovery for a windowing function.
> 
> A windowing function will ack message until its window is emitted. If the 
> window function fails due to issues such as OOM and restarts, it has a 
> massive backlog to catch up. And the function will never be able to recover 
> itself since the backlog keeps growing and it keeps OOM.
> 
> Our user prefers an automatic way for recovery, given they are okay with 
> skipping some backlog data. (This is acceptable in IoT cases). Also, Users 
> may deploy hundreds of functions in their environment. Manually resetting the 
> cursor is not scalable and is a heavy burden for the on-call person in such 
> cases. 
> 
> Hope the above use case can help provide some more context regarding the 
> change.
> 
> On 2023/03/03 03:51:35 PengHui Li wrote:
> > Hi Neng,
> > 
> > Thanks for raising up the discussion
> > 
> > > In certain failure cases, the function needs to skip all the content
> > between the last successfully acked message and the latest message in the
> > topic in order to skip the huge backlog and quick recovery.
> > 
> > Do you have some real cases that can help us to understand it
> > is necessary to introduce a new flag? Another possibility is users
> > can use pulsar admin to reset the cursor to the latest position,
> > Why will it not work for users? 
> > 
> > Regards,
> > Penghui
> > 
> > > On Mar 1, 2023, at 10:16, Neng Lu  wrote:
> > > 
> > > In certain failure cases, the function needs to skip all the content
> > > between the last successfully acked message and the latest message in the
> > > topic in order to skip the huge backlog and quick recovery.
> > 
> > 
> 


Re: [Discussion] Allowing configure if function consumer should skip to latest

2023-03-06 Thread Neng Lu
Hi Penghui,

Thanks for your question.

One case is failure recovery for a windowing function.

A windowing function will ack message until its window is emitted. If the 
window function fails due to issues such as OOM and restarts, it has a massive 
backlog to catch up. And the function will never be able to recover itself 
since the backlog keeps growing and it keeps OOM.

Our user prefers an automatic way for recovery, given they are okay with 
skipping some backlog data. (This is acceptable in IoT cases). Also, Users may 
deploy hundreds of functions in their environment. Manually resetting the 
cursor is not scalable and is a heavy burden for the on-call person in such 
cases. 

Hope the above use case can help provide some more context regarding the change.

On 2023/03/03 03:51:35 PengHui Li wrote:
> Hi Neng,
> 
> Thanks for raising up the discussion
> 
> > In certain failure cases, the function needs to skip all the content
> between the last successfully acked message and the latest message in the
> topic in order to skip the huge backlog and quick recovery.
> 
> Do you have some real cases that can help us to understand it
> is necessary to introduce a new flag? Another possibility is users
> can use pulsar admin to reset the cursor to the latest position,
> Why will it not work for users? 
> 
> Regards,
> Penghui
> 
> > On Mar 1, 2023, at 10:16, Neng Lu  wrote:
> > 
> > In certain failure cases, the function needs to skip all the content
> > between the last successfully acked message and the latest message in the
> > topic in order to skip the huge backlog and quick recovery.
> 
> 


Re: [DISCUSS] Support custom compressionType for pulsar functions

2023-02-28 Thread Neng Lu
+1 for the change.

On 2023/02/28 01:06:51 Pengcheng Jiang wrote:
> Hello, community:
> 
> ### Motivation
> 
> Currently, pulsar functions are using `LZ4` as the compression type, and
> users cannot change it, yet some users may want to custom this behavior.
> 
> ### Modifications
> 
> Add a `CompressionType` field(which is an enum) to the `ProducerSpec` in
> the `Function.proto`; this enum has six values: `NOTSET`, `NONE`, `LZ4`,
> `ZLIB`, `ZSTD` and `SNAPPY`, there is a `NOTSET` value besides of 5
> supported compression type, so that even users don't set the
> `CompressionType`, it will fallback to its "zero" value: `NOTSET` instead
> of `NONE`, and in such case, pulsar function instances will use `LZ4` to
> keep the same behavior with before.
> 
> PTAL when you have time and feel free to leave any comments.
> 
> Best Regards,
> Pengcheng Jiang
> 
> [0] https://github.com/apache/pulsar/pull/19470
> -- 
> 
> 
> 
> Pengcheng Jiang
> 
> Software Engineer
> 
> e: pengcheng.ji...@streamnative.io
> 
> p: 13540631948
> 
> streamnative.io
> 
> 
> 
> 
> 


[Discussion] Allowing configure if function consumer should skip to latest

2023-02-28 Thread Neng Lu
Hello Community,

In this [PR](https://github.com/apache/pulsar/pull/17214), we changed the
function protobuf by adding one more field `bool skipToLatest = 14;`.

The change itself is minimum and self-contained for function internal usage.

Given the new community guideline that protobuf change should notify the
community, I'm writing this email to initiate the (late) discussion.

### Motivation
In certain failure cases, the function needs to skip all the content
between the last successfully acked message and the latest message in the
topic in order to skip the huge backlog and quick recovery.

### Modifications
1. Providing a boolean config for function submission cmd
2. PulsarSource will call `consumer.seek(MessageId.latest)` if the skip
flag is set
3. the internal function protobuf file will have a new field `skipToLatest`

Let me know your thoughts. And if there're big concerns regarding this
change, we will revert the above PR fist and then make modifications to it.

Best Regards,
Neng Lu


Re: [ANNOUNCE] Bo Cong as new PMC member in Apache Pulsar

2023-01-19 Thread Neng Lu
Congratulations!

On Thu, Jan 19, 2023 at 7:18 AM Zike Yang  wrote:

> Congratulations! Bo Cong
>
> BR,
> Zike Yang
>
> On Thu, Jan 19, 2023 at 6:11 PM Jiuming Tao
>  wrote:
> >
> > Congrats!
> >
> > > 2023年1月18日 21:50,PengHui Li  写道:
> > >
> > > Hi all,
> > >
> > > The Apache Pulsar Project Management Committee (PMC) has invited Bo
> Cong
> > > (https://github.com/congbobo184) as a member of the PMC and we are
> > > pleased to announce that he has accepted.
> > >
> > > He is very active in the community in the past few years and made a
> lot of
> > > great contributions
> > > such as transactions and schemas.
> > >
> > > Welcome Bo Cong to the Apache Pulsar PMC.
> > >
> > > Best Regards,
> > > Penghui on behalf of the Pulsar PMC
> >
>


Re: [DISCUSS] PIP-237: Make PulsarAdmin accessible in SinkContext and SourceContext

2023-01-18 Thread Neng Lu
The proposal makes sense to me.

The only note I want to add is to make sure the PulsarAdmin access in Source 
and Sink is also controlled by the flag `exposePulsarAdminClientEnabled`. 
Similar to the following:

https://github.com/apache/pulsar/blob/master/pulsar-functions/runtime/src/main/java/org/apache/pulsar/functions/runtime/thread/ThreadRuntimeFactory.java#L111

On 2023/01/03 14:31:52 Alexander Preuss wrote:
> Hi all,
> 
> I opened a PIP to discuss making PulsarAdmin accessible in Pulsar IO
> connectors through SinkContext and SourceContext:
> https://github.com/apache/pulsar/issues/19123
> 
> Looking forward to hearing your thoughts,
> Alex
> 


Re: [VOTE] PIP-193 : Sink preprocessing Function

2022-08-09 Thread Neng Lu
Hi,

Not sure if this is too late or not, I replied in the discussion thread
about some thinking.
Whether we tweak the sink connector or we allow a flexible and general
function creation.

On Mon, Aug 1, 2022 at 5:03 PM mattison chao 
wrote:

> +1 (non-binding)
>
> Best,
> Mattison
>
> On Tue, 2 Aug 2022 at 01:56, Dave Fisher  wrote:
> >
> > +1 (binding)
> >
> > On 2022/07/28 10:39:35 Christophe Bornet wrote:
> > > Hi, Pulsar community,
> > >
> > > I'd like to start a vote on PIP-193 : Sink preprocessing Function
> > >
> > > You can find the proposal at
> https://github.com/apache/pulsar/issues/16739 and
> > > the discussion thread at
> > > https://lists.apache.org/thread/qn59jwn47w9ngxpkvq3kswbl1y882jth.
> > >
> > > The vote will stay open for at least 48 hours.
> > >
> > > Best regards.
> > >
> > > Christophe Bornet
> > >
>


-- 
Best Regards,
Neng


Re: [DISCUSS] PIP 193 : Sink preprocessing Function

2022-08-09 Thread Neng Lu
To my understanding, the Pulsar IO Connectors (i.e. Sources/Sinks) are
quite self-contained. They move data around.

If we want to enable functionality described inside the PIP (process ->
write to otherplace), can we think in another way -- allow flexible
configuring of a Pulsar Function?

Originally Pulsar Function pipeline is:
PulsarSource -> func() -> PulsarSink()

Can we look into allowing users to change a source/sink in the
PulsarFunction pipeline instead of tweaking the Sink?

Syntax could be:
```
pulsar-admins functions create --sink ... --source ...
```

This will be more flexible and opens a lot possibility for further
development



On Tue, Jul 26, 2022 at 2:56 AM Christophe Bornet 
wrote:

> Thanks for the feedback Jerry.
> We don't modify the way sources, sinks and functions are detected when it's
> based on their fields. The proposal is just to modify the classname of the
> function applied in the instance so the same detection rules apply. The
> only difference is when detecting if the sink or function is built-in. For
> this we add some code to do this detection also based on the ComponentType
> (either detected or explicit). You can check the implementation PR about
> it: https://github.com/apache/pulsar/pull/16740
>
> IMO, making it separate implementation of what currently exist would make
> things more complex and this more error prone for no good reason. The
> proposal is "just" to replace the name of the already existing function
> (IdentityFunction) by another one and to provide the location of the
> function JAR.
>
> Best regards
> Christophe
>
> Le lun. 25 juil. 2022 à 23:31, Jerry Peng  a
> écrit :
>
> > My feedback is to make this change as self contained as possible.  Can we
> > just have a special implementation of a sink that will run the logic of
> the
> > "preprocess" function?  There are many places in the code where we figure
> > out if it is a source, sink or a function based on the fields in the
> > Function metadata.  Changing that may have unintended consequences.
> >
> > On Mon, Jul 25, 2022 at 5:55 AM Baodi Shi 
> > wrote:
> >
> > > > Can you explain more what you mean ?
> > > This PIP doesn't change the API of a Function and it's already possible
> > to
> > > write a Function>.
> > > And when declaring a Sink with a Function we'll check that it's the
> case.
> > >
> > > I mean: we should constrain the function interface, otherwise, the user
> > > may return a structure that is not a record.
> > >
> > > Thanks,
> > > Baodi Shi
> > >
> > > > On Jul 25, 2022, at 01:0233, Christophe Bornet <
> bornet.ch...@gmail.com
> > >
> > > wrote:
> > > >
> > > > Thanks for the feedback Asaf
> > > >
> > > >
> > > >>>   - preprocess-function: the preprocess function applied before the
> > > >>>   Sink. Starts by builtin:// for built-in functions, function://
> for
> > > >>>   package function, http:// or file://
> > > >>>
> > > >>> 1. While this function is applied only before sink? I thought it
> > > replaces
> > > >> the identity function, so why a source can't have a function that
> > reads
> > > >> from the source (say S3), runs the function and only then writes to
> a
> > > >> pulsar topic?
> > > >>
> > > >
> > > > Yes that's totally possible to implement and will be done in future
> > work
> > > > like written in the PIP.
> > > >
> > > >
> > > >> 2. Can you clarify more about built in and function for package
> > > function?
> > > >> Is this an existing functionality ?
> > > >>
> > > > Yes those are existing functionalities.
> > > > Built-in functions are not documented (and we should do something
> about
> > > > that).
> > > > Package management of functions is described in
> > > >
> > >
> >
> https://pulsar.apache.org/docs/functions-deploy#use-package-management-service
> > > >
> > > >
> > > >> 3. Regarding http - Are you loading a class through that URL? Aren't
> > we
> > > >> exposed to same problem Log4Shell security issue had? If so, what
> > > measures
> > > >> are you taking to protect ?
> > > >>
> > > > Yes we are loading code via URL. This feature already exists for
> > > > Sources/Sinks/Functions.
> > > > I guess you need to have a huge trust of the source from where you
> > > download.
> > > > This PIP has the same security level as what already exists for this
> > > > functionality.
> > > >
> > > >
> > > >>
> > > >> The field extraFunctionPackageLocation to the protobuf structure
> > > >>> FunctionMetaData will be added. This field will be filled with the
> > > >>> location of the extra function to apply when registering a sink and
> > > used
> > > >> in
> > > >>> the Runtime to load the function code.
> > > >>
> > > >> Can you please expand on that? You mean the JAR location, which you
> > will
> > > >> search that class name and function specified in the 3 fields you've
> > > added
> > > >> to the config?
> > > >>
> > > > Not exactly. It's the location of where the JAR is stored. It can be
> > > > BookKeeper, package management, built-in NAR, etc...
> > > > In 

Re: [DISCUSS] PIP-175: Extend time based release process

2022-06-08 Thread Neng Lu
1. We can compose some charts/tables as https://endoflife.date/python on
the Pulsar website to help people understand the time frame.

2. Regarding the LTS release proposal, one thing I need some clarification
is will the LTS release include major new features compared to last release?
For example, in the following releases, is 4.0 basically the same as
3.2 except it's supported longer or 4.0 has new major features than 3.2?
3.0 <- LTS
3.1
3.2
4.0 <- LTS


On Wed, Jun 8, 2022 at 4:45 PM Michael Marshall 
wrote:

> > From the code-freeze point, to minimize the risk of delaying the
> > release, only bug fixes involving a regression of behavior compared to
> > a previous release should be allowed. Occasional exceptions will be
> > possible after higher scrutiny of the change.
>
> It's a frequent point of discussion to debate which bug fixes are
> urgent enough to trigger a new RC candidate.
>
> I propose that we try to set guidelines now so that we can rely on
> them when issues come up.
>
> In my opinion, net new bugs that were introduced in the version should
> be fixed. Bugs that are not new, should not trigger a new RC, unless
> there is general consensus (however we define that) that the bug is
> sufficiently bad that we should trigger a new RC.
>
> If a bug is discovered during the RC testing period and it is not
> fixed, we should note this in the release notes for the version under
> a "known issues" section. This way, users of the impacted feature may
> decide to delay upgrading, while users that do not rely on the feature
> will have confidence upgrading. The main goal here is to ensure that
> we don't miss release deadlines, as Matteo just mentioned, and to give
> our users confidence when upgrading.
>
> Performance regressions are another issue that will arise. Dave Fisher
> has been working extensively on building out performance testing
> infrastructure to measure many Pulsar features. I expect that he (or
> someone else doing performance testing) will find degraded performance
> for some feature in the future. I think we should have a general rule
> of thumb that some percent regression (maybe 10%?) is serious enough
> to trigger a new RC. A percent might not be the only measure. If we
> have absolute metrics, like that a Pulsar cluster of x size must be
> able to handle y topics, that might be meaningful, too.
>
> What do others think?
>
> - Michael
>
> On Wed, Jun 8, 2022 at 2:26 PM Matteo Merli 
> wrote:
> >
> > > Schedules always slip. Would you say that if the 3.x feature releases
> take too long with these hypothetical dates that 3.5 would be dropped in
> order to release 4.0 on schedule?
> >
> > Yes, there needs to be clarity for the users on when releases are to
> > be expected, even more so for LTS releases. And while it's true that
> > there are always delays, we have a lot that we can do to try to
> > minimize it. (eg: having a public deadline to hit will help in getting
> > everyone on the same page).
>


-- 
Best Regards,
Neng


Re: [DISCUSS] PIP-173 : Create a built-in Function implementing the most common basic transformations

2022-06-08 Thread Neng Lu
I would suggest first having some concrete implementations in a separate
repo.
After verifying its functionality and performance, then we can move into
the main pulsar repo.

On Fri, Jun 3, 2022 at 5:09 AM Enrico Olivelli  wrote:

> Overall I agree with the proposal.
> I left some minor feedback on the issue
>
> Thank you
>
> Enrico
>
> Il giorno gio 2 giu 2022 alle ore 16:57 Christophe Bornet
>  ha scritto:
> >
> > Dear Pulsar community,
> >
> > I opened PIP-173 https://github.com/apache/pulsar/issues/15902 to
> create a
> > built-in Function implementing the most common basic transformations.
> >
> > Let me know what you think.
> >
> > Best regards,
> >
> > Christophe
> >
> > --
> >
> > ## Motivation
> >
> > Currently, when users want to modify the data in Pulsar, they need to
> write
> > a Function.
> > For a lot of use cases, it would be handy for them to be able to use a
> > ready-made built-in Function that implements the most common basic
> > transformations like the ones available in [Kafka Connect’s SMTs](
> >
> https://docs.confluent.io/platform/current/connect/transforms/overview.html
> > ).
> > This removes users the burden of writing the Function themselves, having
> to
> > understanding the perks of Pulsar Schemas, coding in a language that they
> > may not master (probably Java if they want to do advanced stuff), and
> they
> > benefit from battle-tested, maintained, performance-optimised code.
> >
> > ## Goal
> >
> > This PIP is about providing a `TransformFunction` that executes a
> sequence
> > of basic transformations on the data.
> > The `TransformFunction` shall be easy to configure, launchable as a
> > built-in NAR.
> > The `TransformFunction` shall be able to apply a sequence of common
> > transformations in-memory so we don’t need to execute the
> > `TransformFunction` multiple times and read/write to a topic each time.
> >
> > This PIP is not about appending such a Function to a Source or a Sink.
> > While this is the ultimate goal, so we can provide an experience similar
> to
> > Kafka SMTs and avoid a read/write to a topic, this work will be done in a
> > future PIP.
> > It is expected that the code written for this PIP will be reusable in
> this
> > future work.
> >
> > ## API Changes
> >
> > This PIP will introduce a new `transform` module in `pulsar-function`
> > multi-module project.  The produced artifact will be a NAR of the
> > TransformFunction.
> >
> > ## Implementation
> >
> > When it processes a record, `TransformFunction` will :
> >
> > * Create a mutable structure `TransformContext` that contains
> >
> > ```java
> > @Data
> > public class TransformContext {
> > private Context context;
> > private Schema keySchema;
> > private Object keyObject;
> > private boolean keyModified;
> > private Schema valueSchema;
> > private Object valueObject;
> > private boolean valueModified;
> > private KeyValueEncodingType keyValueEncodingType;
> > private String key;
> > private Map properties;
> > private String outputTopic;
> > ```
> >
> > If the record is a `KeyValue`, the key and value schemas and object are
> > unpacked. Otherwise the `keySchema` and `keyObject` are null.
> >
> > * Call in sequence the process method of a series of `TransformStep` on
> > this `TransformContext`
> >
> > ```java
> > public interface TransformStep {
> > void process(TransformContext transformContext) throws Exception;
> > }
> > ```
> >
> > Each `TransformStep` can then modify the `TransformContext` as needed.
> >
> > * Call the `send()` method of the `TransformContext` which will create
> the
> > message to send to the outputTopic, repacking the KeyValue if needed.
> >
> > The `TransformFunction` will read its configuration as Json from
> > `userConfig` in the format:
> >
> > ```json
> > {
> >   "steps": [
> > {
> >   "type": "drop-fields", "fields": "keyField1,keyField2", "part":
> "key"
> > },
> > {
> >   "type": "merge-key-value"
> > },
> > {
> >   "type": "unwrap-key-value"
> > },
> > {
> >   "type": "cast", "schema-type": "STRING"
> > }
> >   ]
> > }
> > ```
> >
> > Each step is defined by its `type` and uses its own arguments.
> >
> > This example config applied on a KeyValue input record with
> > value `{key={keyField1: key1, keyField2: key2, keyField3: key3},
> > value={valueField1: value1, valueField2: value2, valueField3: value3}}`
> > will give after each step:
> > ```
> > {key={keyField1: key1, keyField2: key2, keyField3: key3},
> > value={valueField1: value1, valueField2: value2, valueField3:
> > value3}}(KeyValue)
> >|
> >| ”type": "drop-fields", "fields": "keyField1,keyField2”,
> > "part": "key”
> >|
> > {key={keyField3: key3}, value={valueField1: value1, valueField2: value2,
> > valueField3: value3}} (KeyValue)
> >|
> >| "type": "merge-key-value"
> >|
> > {key={keyField3: key3}, value={keyField3: key3, 

Re: [VOTE] PIP-166: Function add MANUAL delivery semantics

2022-06-07 Thread Neng Lu
Hi All,

+1 (non-binding)

On Tue, Jun 7, 2022 at 5:42 AM Enrico Olivelli  wrote:

> I have left one last minute comment, can you please take a look ? then
> I will post my +1
>
> thanks
> Enrico
>


-- 
Best Regards,
Neng


Re: [DISCUSS] PIP-166: Function add NONE delivery semantics

2022-05-31 Thread Neng Lu
Hi Baodi,

Thanks for the reply and update of the PIP.

1. Pulsar Functions currently isn't integrated with the Transaction feature
yet, so there's no EXACTLY_ONCE support.

2. And Yes, "EFFECTIVELY_ONCE = ATLEAST_ONCE + Message Deduplication"



On Tue, May 31, 2022 at 9:16 AM 石宝迪 
wrote:

> >> If you fail to start the function, you immediately break people's
> > functions when they upgrade to this version. How about notifying them
> once
> > via logger (WARN)?
>
>
> I tend to fail. Although this breaks the current logic. but the current
> implementation can be considered is a bug.
>
> > It will flood their logs if they used it wrong. Maybe write to log once?
>
>
> Agree, I changed PIP.
>
> Thanks,
> Baodi Shi
>
> > 2022年5月31日 23:5720,Asaf Mesika  写道:
> >
> > Hi Baodi,
> >
> > Regarding
> >
> >>
> >>   1. When the delivery semantic is ATMOST_ONCE, add verify autoAck must
> >>   be true. If the validation fails, let the function fail to start (This
> >>   temporarily resolves the configuration ambiguity). When autoAck is
> >>   subsequently removed, the message will be acked immediately after
> receiving
> >>   the message.
> >>
> >>
> >> If you fail to start the function, you immediately break people's
> > functions when they upgrade to this version. How about notifying them
> once
> > via logger (WARN)?
> >
> > Regarding
> >
> >>
> >>   1.
> >>
> >>
> >>   When user call record.ack() in function, just ProcessingGuarantees ==
> >>   MANUAL can be work. In turn, when ProcessingGuarantees != MANUAL, user
> >>   call record.ack() is invalid(print warn log).
> >>
> >> It will flood their logs if they used it wrong. Maybe write to log once?
> >
> > On Tue, May 31, 2022 at 12:24 PM Baozi  .invalid>
> > wrote:
> >
> >> Hi, Asaf.
> >>
> >> Thanks review.
> >>
> >>>> I'm not entirely sure that is accurate. The Effectively-Once as I
> >>> understand it is achieved using transactions, thus the consumption of
> >> that
> >>> message and production of any messages, as a result, are considered one
> >>> atomic unit - either message acknowledged and messages produced or
> >> neither.
> >>
> >>
> >> Not using transactions now, I understand: EFFECTIVELY_ONCE =
> ATLEAST_ONCE
> >> + Message Deduplication.
> >>
> >> @Neng Lu @Rui Fu Can help make sure?
> >>
> >>>> I would issue a WARN when reading configuring the function (thus
> emitted
> >>> once) when the user actively configured autoAck=false and warn them
> that
> >>> this configuration is deprecated and they should switch to the MANUAL
> >>> ProcessingGuarantee configuration option.
> >>
> >>
> >> Added to API Change(2)
> >>
> >>>> suggest you clarify what existing behavior remains for backward
> >>> compatibility with the appropriate comment that this is deprecated and
> >> will
> >>> be removed.
> >>
> >> Yes, I have rewritten it, please see Implementation(1)
> >>
> >>> 5. Regarding Test Plan
> >>> * I would add: Validate the test of autoAck=false still works (you
> >> haven't
> >>> broken anything)
> >>> * I would add: Validate existing ProcessingGuarantee test for
> AtMostOnce,
> >>> AtLeastOnce, ExactlyOnce still works (when autoAck=true)
> >>
> >>
> >> Nice, I added to PIP.
> >>
> >>
> >> Thanks,
> >> Baodi Shi
> >>
> >>> 2022年5月30日 22:0011,Asaf Mesika  写道:
> >>>
> >>> Thanks for applying the fixes.
> >>>
> >>> 1. Regarding
> >>>
> >>>>
> >>>>  - EFFECTIVELY_ONCE: The message is acknowledged *after* the function
> >>>>  finished execution. Depends on pulsar deduplication, and provides
> >>>>  end-to-end effectively once processing.
> >>>>
> >>>> I'm not entirely sure that is accurate. The Effectively-Once as I
> >>> understand it is achieved using transactions, thus the consumption of
> >> that
> >>> message and production of any messages, as a result, are considered one
> >>> atomic unit - either message acknowledged and messages produced or
> >> neither.
> >>>
> >>> 2. Regarding
> >>>
> >>>>

Re: [Discuss] Update Helm Chart to Support 2.10 Docker Image

2022-05-31 Thread Neng Lu
Hi Michael,

Thanks for the detailed explanation.

On Thu, May 26, 2022 at 11:08 PM Michael Marshall 
wrote:

> Hi Neng Lu,
>
> I put together a doc [0] that includes some tips for troubleshooting a
> non-root docker image. Some of the details depend on how you're
> deploying Pulsar.
>
> If you can ssh to the host as the root user, you can run `docker exec
> --user 0 ...` to get a shell in the container as the root user.
>
> When running on Kubernetes, you might be able to utilize [1] to gain
> root access to the host node for the pod, and then you can exec into
> the container as the root user, as described in the doc [0]. Or, if
> you don't have any pod security policies, you can set the pod's
> securityContext so that the container runs as the root user.
>
> The final option is to build a custom image with additional tooling.
>
> If you find other helpful resources, feel free to update that doc or
> send a note here, and I'll update the doc.
>
> - Michael
>
> [0]
> https://github.com/apache/pulsar/blob/master/docker/README.md#troubleshooting-non-root-containers
> [1] https://github.com/kvaps/kubectl-node-shell
>
> On Thu, May 26, 2022 at 5:24 PM Neng Lu  wrote:
> >
> > Hi All,
> >
> > I'm curious to learn once the image is run as non-root, how can we debug
> or
> > investigate production issues inside a running cluster?
> >
> > On Thu, May 19, 2022 at 12:14 PM Michael Marshall 
> > wrote:
> >
> > > Hello Pulsar Community,
> > >
> > > With the 2.10.0 release, our Pulsar Docker images default to run as a
> > > non-root user. In order to use the 2.10.0 Docker image with the Apache
> > > Pulsar Helm Chart, we need to merge this PR [0]. If you're able,
> > > please review it. Once merged, I propose that we follow up with a
> > > release so that users wanting to upgrade to 2.10.0 have an upgrade
> > > path.
> > >
> > > Thanks,
> > > Michael
> > >
> > > [0] https://github.com/apache/pulsar-helm-chart/pull/266
> > >
> >
> >
> > --
> > Best Regards,
> > Neng
>


-- 
Best Regards,
Neng


Re: [VOTE] PIP-168: Support zero-copy of NIC to NIC on Proxy

2022-05-31 Thread Neng Lu
+1 (non-binding)

On Tue, May 31, 2022 at 6:21 AM Haiting Jiang 
wrote:

> +1
>
> Thanks,
> Haiting
>
> On 2022/05/26 13:40:30 zhaocong wrote:
> > Hi Pulsar Community,
> >
> >
> > I would like to start a VOTE on "Support zero-copy of NIC to NIC on
> Proxy"
> > (PIP-168).
> >
> >
> > The proposal can be read at
> https://github.com/apache/pulsar/issues/15631
> >
> > and the discussion thead is available at
> >
> > https://lists.apache.org/thread/gjys9tvbd5hy28mbkbcq7wkqfldycn7v
> >
> >
> > Voting will stay open for at least 48h.
> >
> >
> > Thanks,
> >
> > Cong Zhao
> >
>


-- 
Best Regards,
Neng


Re: [DISCUSS] [PIP-165] Auto release client useless connections

2022-05-26 Thread Neng Lu
This is a good idea.

Also one thing i realize is people are replying with +1, is this considered
as a vote?

On Thu, May 26, 2022 at 6:26 AM mattison chao 
wrote:

> +1
>
> Best,
> Mattison
>
> On Thu, 26 May 2022 at 15:25, Gavin Gao  wrote:
>
> > +1
> >
> >
> > On 2022/05/26 06:31:37 Yubiao Feng wrote:
> > > I open a pip to discuss Auto release client useless connections, could
> > you
> > > help me review
> > >
> > >
> > > ## Motivation
> > > Currently, the Pulsar client keeps the connection even if no producers
> or
> > > consumers use this connection.
> > > If a client produces messages to topic A and we have 3 brokers 1, 2, 3.
> > Due
> > > to the bundle unloading(load manager)
> > > topic ownership will change from A to B and finally to C. For now, the
> > > client-side will keep 3 connections to all 3 brokers.
> > > We can optimize this part to reduce the broker side connections, the
> > client
> > > should close the unused connections.
> > >
> > > So a mechanism needs to be added to release unwanted connections.
> > >
> > > ### Why are there idle connections?
> > >
> > > 1.When configuration `maxConnectionsPerHosts ` is not set to 0, the
> > > connection is not closed at all.
> > > The design is to hold a fixed number of connections per Host, avoiding
> > > frequent closing and creation.
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/72349117c4fd9825adaaf16d3588a695e8a9dd27/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionPool.java#L325-L335
> > >
> > > 2-1. When clients receive `command-close`, will reconnect immediately.
> > > It's designed to make it possible to reconnect, rebalance, and unload.
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/72349117c4fd9825adaaf16d3588a695e8a9dd27/pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionHandler.java#L122-L141
> > >
> > > 2-2. The broker will close client connections before writing ownership
> > info
> > > to the ZK. Then clients will get deprecated broker address when it
> tries
> > > lookup.
> > >
> > >
> >
> https://github.com/apache/pulsar/blob/72349117c4fd9825adaaf16d3588a695e8a9dd27/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentTopic.java#L1282-L1293
> > >
> > > ## Goal
> > > Automatically release connections that are no longer used.
> > >
> > > - Scope
> > >   - **Pulsar client**
> > > Contains connections used by consumers, Producers, and Transactions.
> > >
> > >   - **Pulsar proxy**
> > > Contains only the connection between Proxy and broker
> > >
> > > ## Approach
> > > Periodically check for idle connections and close them.
> > >
> > > ## Changes
> > >
> > > ### API changes
> > > **ClientCnx** added an idle check method to mark idle time.
> > >
> > > ```java
> > > /** Create time. **/
> > > private final long createTime;
> > > /** The time when marks the connection is idle. **/
> > > private long IdleMarkTime;
> > > /** The time when the last valid data was transmitted. **/
> > > private long lastWorkTime;
> > > /** Stat. enumerated values: using, idle_marked, before_release,
> > released**/
> > > private int stat;
> > > /**
> > >   * Check client connection is now free. This method may change the
> state
> > > to idle.
> > >   * This method will not change the state to idle.
> > >   */
> > > public boolen doIdleCheck();
> > > /** Get stat **/
> > > public int getStat();
> > > /** Change stat **/
> > > public int setStat(int originalStat, int newStat);
> > > ```
> > >
> > > ### Configuration changes
> > > We can set the check frequency and release rule for idle connections at
> > > `ClientConfigurationData`.
> > >
> > > ```java
> > > @ApiModelProperty(
> > > name = "autoReleaseIdleConnectionsEnabled",
> > > value = "Do you want to automatically clean up unused
> > connections"
> > > )
> > > private boolean autoReleaseIdleConnectionsEnabled = true;
> > >
> > > @ApiModelProperty(
> > > name = "connectionMaxIdleSeconds",
> > > value = "Release the connection if it is not used for more than
> > > [connectionMaxIdleSeconds] seconds"
> > > )
> > > private int connectionMaxIdleSeconds = 180;
> > >
> > > @ApiModelProperty(
> > > name = "connectionIdleDetectionIntervalSeconds",
> > > value = "How often check idle connections"
> > > )
> > > private int connectionIdleDetectionIntervalSeconds = 60;
> > > ```
> > >
> > > ## Implementation
> > >
> > > - **Pulsar client**
> > > If no consumer, producer, or transaction uses the current connection,
> > > release it.
> > >
> > > - **Pulsar proxy**
> > > If the connection has not transmitted valid data for a long time,
> release
> > > it.
> > >
> > >
> > > Yubiao Feng
> > > Thanks
> > >
> >
>


-- 
Best Regards,
Neng


Re: [Discuss] Update Helm Chart to Support 2.10 Docker Image

2022-05-26 Thread Neng Lu
Hi All,

I'm curious to learn once the image is run as non-root, how can we debug or
investigate production issues inside a running cluster?

On Thu, May 19, 2022 at 12:14 PM Michael Marshall 
wrote:

> Hello Pulsar Community,
>
> With the 2.10.0 release, our Pulsar Docker images default to run as a
> non-root user. In order to use the 2.10.0 Docker image with the Apache
> Pulsar Helm Chart, we need to merge this PR [0]. If you're able,
> please review it. Once merged, I propose that we follow up with a
> release so that users wanting to upgrade to 2.10.0 have an upgrade
> path.
>
> Thanks,
> Michael
>
> [0] https://github.com/apache/pulsar-helm-chart/pull/266
>


-- 
Best Regards,
Neng


Re: [DISCUSS] PIP-157 Exclusive Producer: new mode ExclusiveWithFencing

2022-05-26 Thread Neng Lu
Hi All,

Would "Preemptive" Mode make sense?

On Wed, May 11, 2022 at 8:56 AM Matteo Merli  wrote:

> +1
>
>
> On Tue, May 10, 2022 at 5:56 AM Enrico Olivelli 
> wrote:
>
> > Hello,
> > I created a new PIP about a new AccessMode for the Producer.
> > https://github.com/apache/pulsar/issues/15528
> >
> > This is the PR: https://github.com/apache/pulsar/pull/15488
> >
> > Honestly I don't like the name "ExclusiveWithFencing", any suggestion
> > is really appreciated.
> >
> > Enrico
> >
> > Motivation
> >
> > In PIP-68 we introduced two access modes for the Producer:
> >
> > Exclusive: The producer is the only one who can publish to the topic.
> > Fail if there is another Exclusive Producer connected to the topic
> > while creating the new Producer.
> > WaitForExclusive: Like Exclusive, but instead of Failing we are going
> > to wait for the current Exclusive Producer to disconnect.
> >
> > Those two modes are very powerful and allow you to perform some kind
> > of Locking on a topic.
> >
> > We are missing a third more, in which the Producer always succeeds to
> > acquire the Exclusive lock on the topic by fencing out any other
> > Producer that is connected, even the current Exclusive Producer and
> > the other Producers waiting in WaitForExclusive mode.
> >
> > Goal
> >
> > The modes that are available with PIP-68 require a writer to acquire
> > the lock and release it as soon as possible in order to allow other
> > clients to write to the topic.
> >
> > With the new mode it will be possible to implement locking in another
> > way: the Producer holds the lock until someone else steals it. This
> > way when you have very low contention you can achieve better latency
> > for writes because you don't have to acquire the lock every time you
> > want to write,.
> >
> > API and Wire protocol Changes
> >
> > Changes:
> >
> > a new constant on the Wire Protocol for AccessMode
> > a new constant in the Java Client API AccessMode#ExclusiveWithFencing
> >
> > Implementation
> >
> > The new mode will behave mostly like AccessMode#Exclusive but instead
> > of failing in case of the presence of other Producers it will force
> > all of the current connected Producers to be removed and invalidated
> > (they will see ProducerFencedException).
> >
> > Reject Alternatives
> >
> > None
> >
> --
> --
> Matteo Merli
> 
>


-- 
Best Regards,
Neng


Re: [DISCUSS] PIP-166: Function add NONE delivery semantics

2022-05-26 Thread Neng Lu
Some suggestions:

1. Instead of deleting the `autoAck`, keep it but not use it in the code.
And documented clearly it's deprecated for the following 2~3 release. And
then delete it.
2. For `PulsarSinkAtLeastOnceProcessor` and
`PulsarSinkEffectivelyOnceProcessor`, if `NONE` is configured, it defaults
to ATLEAST_ONCE.
3. Need to let users know the behavior when they call `record.ack()` inside
the function implementation.

On Thu, May 12, 2022 at 1:52 AM Baozi 
wrote:

> Hi Pulsar community,
>
> I open a https://github.com/apache/pulsar/issues/15560 for Function add
> NONE delivery semantics
>
> Let me know what you think.
>
>
> Thanks,
> Baodi Shi
>
>
> ## Motivation
>
> Currently Function supports three delivery semantics, and also provides
> autoAck to control whether to automatically ack.
> Because autoAck affects the delivery semantics of Function, it can be
> confusing for users to understand the relationship between these two
> parameters.
>
> For example, when the user configures `Guarantees == ATMOST_ONCE` and
> `autoAck == false`, then the framework will not help the user to ack
> messages, and the processing semantics may become `ATLEAST_ONCE`.
>
> The delivery semantics provided by Function should be clear. When the user
> sets the guarantees, the framework should ensure point-to-point semantic
> processing and cannot be affected by other parameters.
>
> ## Goal
>
> Added `NONE` delivery semantics and delete `autoAck` config.
>
> The original intention of `autoAck` semantics is that users want to
> control the timing of ack by themselves. When autoAck == false, the
> processing semantics provided by the framework should be invalid. Then we
> can add `NONE` processing semantics to replace the autoAck == false
> scenario.
>
> When the user configuration `ProcessingGuarantees == NONE`, the framework
> does not help the user to do any ack operations, and the ack is left to the
> user to handle. In other cases, the framework guarantees processing
> semantics.
>
> ## API Changes
> 1. Add `NONE` type to ProcessingGuarantees
> ``` java
> public enum ProcessingGuarantees {
>   ATLEAST_ONCE,
>   ATMOST_ONCE,
>   EFFECTIVELY_ONCE,
>   NONE
> }
> ```
>
> 2. Delete autoAck config in FunctionConfig
> ``` java
> public class FunctionConfig {
> -private Boolean autoAck;
> }
> ```
>
> ## Implementation
>
> 1. In `PulsarSinkAtLeastOnceProcessor` and
> `PulsarSinkEffectivelyOnceProcessor`, when `ProcessingGuarantees != NONE`
> can be ack.
>
> <
> https://github.com/apache/pulsar/blob/c49a977de4b0b525ec80e5070bc90eddcc7cddad/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/sink/PulsarSink.java#L274-L276
> >
>
> 2. When the delivery semantic is `ATMOST_ONCE`, the message will be acked
> immediately after receiving the message, no longer affected by the autoAck
> configuration.
>
>
> https://github.com/apache/pulsar/blob/c49a977de4b0b525ec80e5070bc90eddcc7cddad/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceRunnable.java#L271-L276
>
> 3. When user call `record.ack()` in function, just  `ProcessingGuarantees
> == NONE` can be work.
>
> ## Plan test
> The main test and assert is that when ProcessingGuarantees == NONE, the
> function framework will not do any ack operations for the user.
>
> ## Compatibility
> 1. This change will invalidate the user's setting of autoAck, which should
> be explained in the documentation and provide parameter verification to
> remind the user.
> 2. Runtimes of other languages ​​need to maintain consistent processing
> logic (python, go).
>
>
>

-- 
Best Regards,
Neng


Re: [VOTE] [PIP-153] Optimize metadataPositions in MLPendingAckStore

2022-05-26 Thread Neng Lu
Hi All,

+1 (non-binding)

On Mon, May 23, 2022 at 5:09 AM Enrico Olivelli  wrote:

> +1 (binding)
> it looks like the patch has already been committed
>
>
> https://github.com/apache/pulsar/commit/ebca19b522fd9f4496689ca7d32ede345d28511a
>
> Enrico
>
> Il giorno lun 16 mag 2022 alle ore 07:18 Hang Chen
>  ha scritto:
> >
> > +1 (binding)
> >
> > Thanks,
> > Hang
> >
> > mattison chao  于2022年5月12日周四 21:17写道:
> > >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Mattison
> > >
> > > On Thu, 12 May 2022 at 12:15, Ran Gao  wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Best,
> > > > Ran
> > > >
> > > > On 2022/04/25 06:54:58 一苇以航 wrote:
> > > > > Hi Pulsar Community.
> > > > >
> > > > > This is the voting thread for PIP-153. It will stay open for at
> least 48
> > > > hours.
> > > > >
> > > > > The proposal can be found:
> https://github.com/apache/pulsar/issues/15073
> > > > >
> > > > > Discuss thread:
> > > > > https://lists.apache.org/thread/svmbp8ybn6l8o0o8dmvsysnb86qj77r3
> > > > >
> > > > > Thanks,
> > > > > Xiangying
> > > > >
> > > > > 
> > > >
>


Re: [DISCUSS] Cancel the configuration of autoAck in Function framework

2022-05-09 Thread Neng Lu
Regarding your question "why AUTO_ACK is designed this way"

I think at the time when it's firstly implemented, the AUTO_ACK is just a 
convenient way to help user ack the message.

We can discuss the gap between expected behavior and actual behavior and try to 
resolve or simplify it.

On 2022/05/10 01:14:07 Baozi wrote:
> Thanks reply,
> 
> > If AUTO_ACK is TRUE, then the JavaInstanceRunnable will be acking messages.
> > If AUTO_ACK is  FALSE, then the acking will be done by Sink implementation.
> 
> A little confused, I want to know why AUTO_ACK is designed this way.
> 
> I'll give another example:
> 
> > If AUTO_ACK is TRUE, then the JavaInstanceRunnable will be acking messages.
> 
> 
> And if Guarantees != ATMOST_ONCE,then the JavaInstanceRunnable not will ack 
> message.
> 
> > JavaInstanceRunnable#run():Line273
> 
> 
> Thanks,
> Baodi Shi
> 
> > 2022年5月10日 01:0009,Neng Lu  写道:
> > 
> > Thanks for this detailed discussion about processing guarantee and ack.
> > These two settings are together affecting the behavior of a running 
> > function.
> > 
> > One thing I want to clarify is: 
> > AUTO_ACK setting means if the function runtime will ack messages or not. 
> > ("function runtime" here specifically refers to the JavaInstanceRunnable. 
> > If the ack happens inside a sink's implemented write method, it's not 
> > auto-ack). 
> > 
> > If AUTO_ACK is TRUE, then the JavaInstanceRunnable will be acking messages.
> > If AUTO_ACK is  FALSE, then the acking will be done by Sink implementation.
> > 
> > Now with this context, let's review your two scenarios:
> > 
> >> 1.If the user set Guarantees == ATMOST_ONCE and autoAck == false.
> > To be precise, the processing semantics is not ATLEAST_ONCE. It's actually 
> > left to the implemented Sink to decide which semantics it is. It can be 
> > ATMOST_ONCE, ATLEAST_ONCE and probably EFFECTIVELLY_ONCE.
> > 
> >> 2. If the user thinks that the framework doesn’t auto ack when autoAck == 
> >> false
> > This behavior is actually correct based on our previous context.
> > 
> > A real problematic scenario here is when USER sets 
> > ATLEAST_ONCE/EFFECTIVELY_ONCE and AUTO_ACK=true. I don't think the 
> > JavaInstanceRunnable can ack for use under these cases. So there should be 
> > some check to ban user submit function with such configs.
> > 
> > 
> > 
> > On 2022/05/09 09:02:12 Baozi wrote:
> >> Hi, guys:
> >> 
> >> I found out that autoAck configuration in function framework now affects 
> >> Delivery semantics, and make it difficult for users to understand. Refer 
> >> to the following two scenarios.
> >> 
> >> 1. If the user understands that the semantics of Guarantees shall prevail
> >> 
> >> If the user set Guarantees == ATMOST_ONCE and autoAck == false. Then the 
> >> processing semantics of the actual Function will become ATLEAST_ONCE. 
> >> Refer to the following code, this scenario will not immediately ack.
> >> 
> >> JavaInstanceRunnable#run():Line273  
> >> <https://github.com/apache/pulsar/blob/c49a977de4b0b525ec80e5070bc90eddcc7cddad/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceRunnable.java#L271-L276>
> >> if (instanceConfig.getFunctionDetails().getProcessingGuarantees() == 
> >> org.apache.pulsar.functions
> >>.proto.Function.ProcessingGuarantees.ATMOST_ONCE) {
> >>if (instanceConfig.getFunctionDetails().getAutoAck()) { // just when 
> >> autoAck == true to auto ack
> >>currentRecord.ack();
> >>}
> >> }
> >> 
> >> 2. If the user thinks that the framework doesn’t auto ack when autoAck == 
> >> false
> >> 
> >> According to the following code, the framework is still automatically 
> >> acked.
> >> 
> >> PulsarSinkAtLeastOnceProcessor#sendOutputMessage():Line275 
> >> <https://github.com/apache/pulsar/blob/c49a977de4b0b525ec80e5070bc90eddcc7cddad/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/sink/PulsarSink.java#L274-L276>
> >> PulsarSinkEffectivelyOnceProcessor#sendOutputMessage():Line325 
> >> <https://github.com/apache/pulsar/blob/c49a977de4b0b525ec80e5070bc90eddcc7cddad/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/sink/PulsarSink.java#L325>
> >> 
> >> public void sendOutputMessage(TypedMessageBuilder msg, SinkRecord 
> >> record) {
> >>msg.sendAsync()
> >>

Re: [DISCUSS] Cancel the configuration of autoAck in Function framework

2022-05-09 Thread Neng Lu
> For users, sink is also part of the function framework.

^^ Is this written inside any Pulsar documentation? If you look the code 
closely, the source and sink are actually configurable in Java runtime. User 
can actually provide their own source/sink implementation.

On 2022/05/10 01:38:48 Baozi wrote:
> > AUTO_ACK setting means if the function runtime will ack messages or not. 
> > ("function runtime" here specifically refers to the JavaInstanceRunnable. 
> > If the ack happens inside a sink's implemented write method, it's not 
> > auto-ack). 
> The description of the official website document is:Whether or not the 
> framework acknowledges messages automatically.
> For users, sink is also part of the function framework.
> 
> 
> Thanks,
> Baodi Shi
> 
> > 2022年5月10日 09:1407,Baozi  写道:
> > 
> > Thanks for this detailed discussion about processing guarantee and ack.
> > These two settings are together affecting the behavior of a running 
> > function.
> > 
> > One thing I want to clarify is: 
> > AUTO_ACK setting means if the function runtime will ack messages or not. 
> > ("function runtime" here specifically refers to the JavaInstanceRunnable. 
> > If the ack happens inside a sink's implemented write method, it's not 
> > auto-ack). 
> > 
> > If AUTO_ACK is TRUE, then the JavaInstanceRunnable will be acking messages.
> > If AUTO_ACK is FALSE, then the acking will be done by Sink implementation.
> > 
> > Now with this context, let's review your two scenarios:
> > 
> >> 1.If the user set Guarantees == ATMOST_ONCE and autoAck == false.
> > To be precise, the processing semantics is not ATLEAST_ONCE. It's actually 
> > left to the implemented Sink to decide which semantics it is. It can be 
> > ATMOST_ONCE, ATLEAST_ONCE and probably EFFECTIVELLY_ONCE.
> > 
> >> 2. If the user thinks that the framework doesn’t auto ack when autoAck == 
> >> false
> > This behavior is actually correct based on our previous context.
> > 
> > A real problematic scenario here is when USER sets 
> > ATLEAST_ONCE/EFFECTIVELY_ONCE and AUTO_ACK=true. I don't think the 
> > JavaInstanceRunnable can ack for use under these cases. So there should be 
> > some check to ban user submit function with such configs.
> > 
> > 
> > 
> > On 2022/05/09 09:02:12 Baozi wrote:
> >> Hi, guys:
> >> 
> >> I found out that autoAck configuration in function framework now affects 
> >> Delivery semantics, and make it difficult for users to understand. Refer 
> >> to the following two scenarios.
> >> 
> >> 1. If the user understands that the semantics of Guarantees shall prevail
> >> 
> >> If the user set Guarantees == ATMOST_ONCE and autoAck == false. Then the 
> >> processing semantics of the actual Function will become ATLEAST_ONCE. 
> >> Refer to the following code, this scenario will not immediately ack.
> >> 
> >> JavaInstanceRunnable#run():Line273 
> >>  >>  
> >> >
> >> if (instanceConfig.getFunctionDetails().getProcessingGuarantees() == 
> >> org.apache.pulsar.functions
> >> .proto.Function.ProcessingGuarantees.ATMOST_ONCE) {
> >> if (instanceConfig.getFunctionDetails().getAutoAck()) { // just when 
> >> autoAck == true to auto ack
> >> currentRecord.ack();
> >> }
> >> }
> >> 
> >> 2. If the user thinks that the framework doesn’t auto ack when autoAck == 
> >> false
> >> 
> >> According to the following code, the framework is still automatically 
> >> acked.
> >> 
> >> PulsarSinkAtLeastOnceProcessor#sendOutputMessage():Line275 
> >>  >>  
> >> >
> >> PulsarSinkEffectivelyOnceProcessor#sendOutputMessage():Line325 
> >>  >>  
> >> >
> >> 
> >> public void sendOutputMessage(TypedMessageBuilder msg, SinkRecord 
> >> record) {
> >> msg.sendAsync()
> >> .thenAccept(messageId -> record.ack()) 
> >> .exceptionally(getPublishErrorHandler(record, true));
> >> }
> >> 
> >> To sum up, users may be confused when configuring Guarantees and autoAck, 
> >> and cannot judge their 

Re: [DISCUSS] Cancel the configuration of autoAck in Function framework

2022-05-09 Thread Neng Lu
Thanks for this detailed discussion about processing guarantee and ack.
These two settings are together affecting the behavior of a running function.

One thing I want to clarify is: 
AUTO_ACK setting means if the function runtime will ack messages or not. 
("function runtime" here specifically refers to the JavaInstanceRunnable. If 
the ack happens inside a sink's implemented write method, it's not auto-ack). 

If AUTO_ACK is TRUE, then the JavaInstanceRunnable will be acking messages.
If AUTO_ACK is  FALSE, then the acking will be done by Sink implementation.

Now with this context, let's review your two scenarios:

> 1.If the user set Guarantees == ATMOST_ONCE and autoAck == false.
To be precise, the processing semantics is not ATLEAST_ONCE. It's actually left 
to the implemented Sink to decide which semantics it is. It can be ATMOST_ONCE, 
ATLEAST_ONCE and probably EFFECTIVELLY_ONCE.

> 2. If the user thinks that the framework doesn’t auto ack when autoAck == 
> false
This behavior is actually correct based on our previous context.

A real problematic scenario here is when USER sets 
ATLEAST_ONCE/EFFECTIVELY_ONCE and AUTO_ACK=true. I don't think the 
JavaInstanceRunnable can ack for use under these cases. So there should be some 
check to ban user submit function with such configs.



On 2022/05/09 09:02:12 Baozi wrote:
> Hi, guys:
> 
> I found out that autoAck configuration in function framework now affects 
> Delivery semantics, and make it difficult for users to understand. Refer to 
> the following two scenarios.
> 
> 1. If the user understands that the semantics of Guarantees shall prevail
> 
> If the user set Guarantees == ATMOST_ONCE and autoAck == false. Then the 
> processing semantics of the actual Function will become ATLEAST_ONCE. Refer 
> to the following code, this scenario will not immediately ack.
> 
> JavaInstanceRunnable#run():Line273  
> 
> if (instanceConfig.getFunctionDetails().getProcessingGuarantees() == 
> org.apache.pulsar.functions
> .proto.Function.ProcessingGuarantees.ATMOST_ONCE) {
> if (instanceConfig.getFunctionDetails().getAutoAck()) { // just when 
> autoAck == true to auto ack
> currentRecord.ack();
> }
> }
> 
> 2. If the user thinks that the framework doesn’t auto ack when autoAck == 
> false
> 
> According to the following code, the framework is still automatically acked.
> 
> PulsarSinkAtLeastOnceProcessor#sendOutputMessage():Line275 
> 
> PulsarSinkEffectivelyOnceProcessor#sendOutputMessage():Line325 
> 
> 
> public void sendOutputMessage(TypedMessageBuilder msg, SinkRecord 
> record) {
> msg.sendAsync()
> .thenAccept(messageId -> record.ack()) 
> .exceptionally(getPublishErrorHandler(record, true));
> }
> 
> To sum up, users may be confused when configuring Guarantees and autoAck, and 
> cannot judge their correct expected behavior.
> 
> I would like to discuss whether it is possible to cancel the autoAck 
> configuration and add a CUSTOM type for Guarantees.
> 
> switch (processingGuarantees) {
>   case Guarantees.ATMOST_ONCE: After the framework consumes the message, 
> it immediately acks
>   case Guarantees.ATLEAST_ONCE: After processing on the source side, 
> perform ack again
>   case Guarantees.EFFECTIVELY_ONCE: After processing on the source side, 
> perform ack again
>   case  Guarantees.CUSTOM: The function framework does not help users 
> with any ack operations and semantic guarantees
> }
> 
> If you have any ideas, welcome to discuss. If everyone agrees with this idea, 
> I will mention a PIP to promote implementation.
> 
> Thanks,
> Baodi Shi
> 
> 


Re: Build Pulsar Server on Java 17- too strict ?

2022-05-09 Thread Neng Lu
+1 for requiring JDK11 and prepare for JDK17

On 2022/05/09 11:03:27 Enrico Olivelli wrote:
> I am sorry,
> I have missed this thread.
> 
> I believe that requiring JDK17 to build and especially to RUN the
> Pulsar broker is not a good idea currently.
> Many enterprises, especially the bigger, or banks, insurance
> companieshave strict requirements on some components and they are
> very slow to accept bleeding edge tecnologie.
> 
> I believe that it is good to run CI on JDK17 and also to build the
> docker images on JDK17.
> But I know a few companies who won't be able to switch to JDK17 very quickly.
> 
> I think it is better to require JDK11 at this moment, and not JDK17,
> otherwise users will be stuck with Pulsar 2.10 for a long time.
> 
> Requiring JDK17 would be justified only if there is some required new
> feature, but this is not the case.
> 
> So I propose to change the required JDK version to build and run to
> JDK11 for the server part and JDK8 for the client.
> 
> Enrico
> 
> Il giorno lun 9 mag 2022 alle ore 12:03 Lari Hotari
>  ha scritto:
> >
> > PIP-156 PR https://github.com/apache/pulsar/pull/15264 has been merged to 
> > master branch.
> >
> > Please notice that Java 17 is now required for building Pulsar master 
> > branch.
> >
> > btw. https://sdkman.io/ is handy for managing multiple JDK versions in 
> > local development environments.
> >
> > -Lari
> >
> >
> > On 2022/04/20 16:37:21 Heesung Sohn wrote:
> > > Dear Pulsar Community,
> > >
> > > Please review and vote on this PIP.
> > >
> > > PIP link : https://github.com/apache/pulsar/issues/15207
> > >
> > > Thank you,
> > > --
> > >
> > > 
> > >
> > > Heesung Sohn
> > >
> > > Platform Engineer
> > >
> > > e: heesung.s...@streamnative.io
> > >
> > > streamnative.io
> > >
> > > 
> > > 
> > > 
> > >
> 


Re: [DISCUSS] Releasing Pulsar-client-go 0.8.0

2022-02-10 Thread Neng Lu
+1 non-binding

On Wed, Feb 9, 2022 at 6:46 PM Matteo Merli  wrote:

> +1
>
>
> --
> Matteo Merli
> 
>
> On Wed, Feb 9, 2022 at 6:44 PM r...@apache.org 
> wrote:
> >
> > Hello Everyone:
> >
> >
> > I hope you’ve all been doing well. In the past two months, we have
> >
> > fixed a number of bugs related to connection leaks and added
> >
> > some new features. For more information refer to:
> >
> >
> > https://github.com/apache/pulsar-client-go/milestone/9
> > 
> >
> >
> > For that reason, I think we should be releasing a 0.8.0 version with
> >
> > what we have today.
> >
> >
> > --
> >
> >
> > Thanks
> >
> > Xiaolong Ran
>


-- 
Best Regards,
Neng


Re: [DISCUSS] Release Pulsar 2.8.3

2022-02-10 Thread Neng Lu
+1 non-binding

On Thu, Feb 10, 2022 at 1:09 AM Hang Chen  wrote:

> +1
>
> Best,
> Hang
>
> PengHui Li  于2022年2月10日周四 12:06写道:
> >
> > +1
> >
> > Thank you!
> >
> > - Penghui
> >
> > On Thu, Feb 10, 2022 at 3:18 AM Lari Hotari  wrote:
> >
> > > +1
> > > Thank you, Michael, for volunteering to be the release manager for
> 2.8.3.
> > >
> > > -Lari
> > >
> > > On Wed, Feb 9, 2022 at 8:16 PM Michael Marshall 
> > > wrote:
> > >
> > > > Hello Pulsar Community,
> > > >
> > > > We have had several important fixes since we released 2.8.2 a month
> > > > ago. I propose we start the process to release 2.8.3, and I volunteer
> > > > to be the release manager.
> > > >
> > > > Here [0] you can find the list of 90 commits to branch-2.8 since the
> > > > 2.8.2 release. There are 14 closed PRs targeting 2.8.3 that have not
> > > > yet been cherry-picked [1]. I will start reviewing and cherry-picking
> > > > these.
> > > >
> > > > There are 16 open PRs labeled with `release/2.8.3` [1]. I'll follow
> up
> > > > on each of those PRs to see if they will be completed soon or will
> > > > need to be pushed to 2.8.4.
> > > >
> > > > Thanks,
> > > > Michael
> > > >
> > > > [0] - https://github.com/apache/pulsar/compare/v2.8.2...branch-2.8
> > > > [1] -
> > > >
> > >
> https://github.com/apache/pulsar/pulls?q=is%3Apr+label%3Arelease%2F2.8.3+-label%3Acherry-picked%2Fbranch-2.8+is%3Aopen
> > > >
> > >
>


-- 
Best Regards,
Neng


Re: [VOTE] PIP-86: Pulsar Functions: Preload and release external resources

2022-01-24 Thread Neng Lu
Thanks for your participation.
Close the vote with 3 (+1) bindings and 1 (+1) non-bindings.

On Fri, Jan 21, 2022 at 3:19 PM 陳智弘  wrote:

> +1
>
> Niclas Hedhman  於 2022年1月22日 週六 05:27 寫道:
>
> >
> > +1, non-binding
> >
> > On 2022-01-21 21:07, Neng Lu wrote:
> > > Hi All,
> > >
> > > I would like to start a VOTE on the PIP 86. (If it's already been
> > > voted,
> > > please let me know.)
> > >
> > > The issue for PIP 86 is here:
> > >
> >
> https://github.com/apache/pulsar/wiki/PIP-86%3A-Pulsar-Functions%3A-Preload-and-release-external-resources
> > > And the initial implementation is here:
> > > https://github.com/apache/pulsar/pull/13205
> > >
> > > Please VOTE within 48 hours.
> > >
> > > Best Regards,
> > > Neng Lu
> >
>


-- 
Best Regards,
Neng


[VOTE] PIP-86: Pulsar Functions: Preload and release external resources

2022-01-21 Thread Neng Lu
Hi All,

I would like to start a VOTE on the PIP 86. (If it's already been voted,
please let me know.)

The issue for PIP 86 is here:
https://github.com/apache/pulsar/wiki/PIP-86%3A-Pulsar-Functions%3A-Preload-and-release-external-resources
And the initial implementation is here:
https://github.com/apache/pulsar/pull/13205

Please VOTE within 48 hours.

Best Regards,
Neng Lu


Re: [DISCUSS] PIP-86: Pulsar Functions: Preload and release external resources

2022-01-20 Thread Neng Lu
Hi Everyone,

I would like to call for a vote of the PIP-86 in a separate email.
Let me know if we've already done that.

On Thu, Jan 20, 2022 at 12:07 PM Enrico Olivelli 
wrote:

> Neng,
>
> Il Gio 20 Gen 2022, 20:20 Neng Lu  ha scritto:
>
> > Hi All,
> >
> > Just want to bring this PIP[1] to your attention. The PRs [2] have been
> > open for quite some time. The feature is valuable for many use cases and
> I
> > would like to help the original author to push the effort on it.
> >
> > The general idea is introducing a new API for Pulsar Functions which
> allows
> > develop to customize some setup and close logic.
>
>
> I am +1  on your proposal.
> I left some feedback on the second PR. Basically we need some unit tests
> and integration tests.
> The first PR looks obsolete, please close it.
>
> Enrico
>
> The API should look like
> > this:
> >
> > ```
> > public interface RichFunction extends Function{
> >
> > /**
> >  * Called when function instance start
> >  *
> >  * @throws Exception
> >  */
> > void setup(Context context) throws Exception;
> >
> > /**
> >  * Called when function instance close
> >  *
> >  * @throws Exception
> >  */
> > void tearDown() throws Exception;
> > }
> > ```
> >
> > Please join the discussion if you have any questions or concerns for this
> > new API.
> >
> > [1] PIP-86
> > <
> >
> https://github.com/apache/pulsar/wiki/PIP-86%3A-Pulsar-Functions%3A-Preload-and-release-external-resources
> > >
> > [2] PR-2 <https://github.com/apache/pulsar/pull/2> PR-13205
> > <https://github.com/apache/pulsar/pull/13205>
> >
> > Best regards,
> > Neng
> >
>


-- 
Best Regards,
Neng


[DISCUSS] PIP-86: Pulsar Functions: Preload and release external resources

2022-01-20 Thread Neng Lu
Hi All,

Just want to bring this PIP[1] to your attention. The PRs [2] have been
open for quite some time. The feature is valuable for many use cases and I
would like to help the original author to push the effort on it.

The general idea is introducing a new API for Pulsar Functions which allows
develop to customize some setup and close logic. The API should look like
this:

```
public interface RichFunction extends Function{

/**
 * Called when function instance start
 *
 * @throws Exception
 */
void setup(Context context) throws Exception;

/**
 * Called when function instance close
 *
 * @throws Exception
 */
void tearDown() throws Exception;
}
```

Please join the discussion if you have any questions or concerns for this
new API.

[1] PIP-86

[2] PR-2  PR-13205


Best regards,
Neng


Re: [VOTE] PIP-135: Include MetadataStore backend for Etcd

2022-01-20 Thread Neng Lu
+1 (non-binding)

This is very interesting

On Mon, Jan 17, 2022 at 5:08 AM Lan Liang  wrote:

> +1, Thanks for your work!
>
>
> - lan.liang
>
>
> On 01/17/2022 20:50,ZhangJian He wrote:
> +1
>
> PengHui Li  于2022年1月17日周一 14:01写道:
>
> +1 (binding)
>
> Regards,
> Penghui
>
> On Sat, Jan 15, 2022 at 9:24 PM Joe F  wrote:
>
> +1 (binding)
>
> On Sat, Jan 15, 2022 at 4:46 AM Enrico Olivelli 
> wrote:
>
> Il Sab 15 Gen 2022, 09:10 tamer Abdlatif  ha
> scritto:
>
> Will that affect the existing ZK metadata in a pulsar instance , When
> we
> upgrade from a lower version to 2.10 ?  In other words,
> Do we need a metadate migration to switch from ZK to Etcd ?
>
>
> There is no need to migrate.
>
> Most probably the first release will bring this feature as non
> production
> ready and it will take some time to stabilise
>
> Enrico
>
>
>
>
> Thanks
> Tamer
>
>
>
> On Fri, 14 Jan 2022, 22:52 Matteo Merli, 
> wrote:
>
> https://github.com/apache/pulsar/issues/13717
>
> -
>
> ## Motivation
>
> Since all the pieces that composed the proposal in PIP-45 were
> finally
> merged
> and are currently ready for 2.10 release, it is now possible to add
> other
> metadata backends that can be used to support a BookKeeper + Pulsar
> cluster.
>
> One of the popular systems that is most commonly used as an
> alternative
> of
> ZooKeeper is Etcd, thus it makes sense to have this as the first
> non-zookeeper
> implementation.
>
> ## Goal
>
> Provide an Etcd implementation for the `MetadataStore` API. This
> will
> allow
> users to deploy Pulsar clusters using Etcd service for the metadata
> and
> it
> will
> not require the presence of ZooKeeper.
>
>
> ## Implementation
>
> * Use the existing JEtcd Java client library for Etcd
> * Extends the `AbstractBatchedMetadataStore` class, in order to
> reuse
> the
> transparent batching logic that will be shared with the
> ZooKeeper
> implementation.
>
> Work in progress: https://github.com/apache/pulsar/pull/13225
>
> --
> Matteo Merli
> 
>
>
>
>
>
>

-- 
Best Regards,
Neng


Re: [VOTE] PIP-132: Include message header size when check maxMessageSize of non-batch message on the client side.

2022-01-13 Thread Neng Lu
+1 (non-binding)

On Wed, Jan 12, 2022 at 7:07 PM PengHui Li  wrote:

> +1 (binding)
>
> This is a behavior change, which we should highlight in the release note.
>
> Penghui
>
> On Thu, Jan 13, 2022 at 12:44 AM Chris Herzog 
> wrote:
>
> > +1 (non)
> >
> >
> > On Tue, Jan 11, 2022 at 9:46 PM r...@apache.org  >
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > --
> > > Thanks
> > > Xiaolong Ran
> > >
> > > mattison chao  于2022年1月12日周三 10:15写道:
> > >
> > > > +1  (non-binding)
> > > >
> > > > On Wed, 12 Jan 2022 at 09:59, Zike Yang  > .invalid>
> > > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > On Wed, Jan 12, 2022 at 9:58 AM Haiting Jiang <
> > jianghait...@apache.org
> > > >
> > > > > wrote:
> > > > > >
> > > > > > This is the voting thread for PIP-132. It will stay open for at
> > least
> > > > 48
> > > > > hours.
> > > > > >
> > > > > > https://github.com/apache/pulsar/issues/13591
> > > > > >
> > > > > > Pasted below for quoting convenience.
> > > > > >
> > > > > > 
> > > > > >
> > > > > > ## Motivation
> > > > > >
> > > > > > Currently, Pulsar client (Java) only checks payload size for max
> > > > message
> > > > > size validation.
> > > > > >
> > > > > > Client throws TimeoutException if we produce a message with too
> > many
> > > > > properties, see [1].
> > > > > > But the root cause is that is trigged TooLongFrameException in
> > broker
> > > > > server.
> > > > > >
> > > > > > In this PIP, I propose to include message header size when check
> > > > > maxMessageSize of non-batch
> > > > > > messages, this brings the following benefits.
> > > > > > 1. Clients can throw InvalidMessageException immediately if
> > > properties
> > > > > takes too much storage space.
> > > > > > 2. This will make the behaviour consistent with topic level max
> > > message
> > > > > size check in broker.
> > > > > > 3. Strictly limit the entry size less than maxMessageSize, avoid
> > > > sending
> > > > > message to bookkeeper failed.
> > > > > >
> > > > > > ## Goal
> > > > > >
> > > > > > Include message header size when check maxMessageSize for
> non-batch
> > > > > message on the client side.
> > > > > >
> > > > > > ## Implementation
> > > > > >
> > > > > > ```
> > > > > > // Add a size check in
> > > > > org.apache.pulsar.client.impl.ProducerImpl#processOpSendMsg
> > > > > > if (op.msg != null // for non-batch messages only
> > > > > > && op.getMessageHeaderAndPayloadSize() >
> > > > ClientCnx.getMaxMessageSize()) {
> > > > > > // finish send op with InvalidMessageException
> > > > > > releaseSemaphoreForSendOp(op);
> > > > > > op.sendComplete(new PulsarClientException(new
> > > InvalidMessageException,
> > > > > op.sequenceId));
> > > > > > }
> > > > > >
> > > > > >
> > > > > > //
> > > > >
> > > >
> > >
> >
> org.apache.pulsar.client.impl.ProducerImpl.OpSendMsg#getMessageHeaderAndPayloadSize
> > > > > >
> > > > > > public int getMessageHeaderAndPayloadSize() {
> > > > > > ByteBuf cmdHeader = cmd.getFirst();
> > > > > > cmdHeader.markReaderIndex();
> > > > > > int totalSize = cmdHeader.readInt();
> > > > > > int cmdSize = cmdHeader.readInt();
> > > > > > int msgHeadersAndPayloadSize = totalSize - cmdSize - 4;
> > > > > > cmdHeader.resetReaderIndex();
> > > > > > return msgHeadersAndPayloadSize;
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > > ## Reject Alternatives
> > > > > > Add a new property like "maxPropertiesSize" or "maxHeaderSize" in
> > > > > broker.conf and pass it to
> > > > > > client like maxMessageSize. But the implementation is much more
> > > > complex,
> > > > > and don't have the
> > > > > > benefit 2 and 3 mentioned in Motivation.
> > > > > >
> > > > > > ## Compatibility Issue
> > > > > > As a matter of fact, this PIP narrows down the sendable range.
> > > > > Previously, when maxMessageSize
> > > > > > is 1KB, it's ok to send message with 1KB properties and 1KB
> > payload.
> > > > But
> > > > > with this PIP, the
> > > > > > sending will fail with InvalidMessageException.
> > > > > >
> > > > > > One conservative way is to add a boolean config
> > > > > "includeHeaderInSizeCheck" to enable this
> > > > > > feature. But I think it's OK to enable this directly as it's more
> > > > > reasonable, and I don't see good
> > > > > > migration plan if we add a config for this.
> > > > > >
> > > > > > [1] https://github.com/apache/pulsar/issues/13560
> > > > > >
> > > > > > Thanks,
> > > > > > Haiting Jiang
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Zike Yang
> > > > >
> > > >
> > >
> >
> >
> > --
> >
> >
> > Chris Herzog Messaging Team | O 630 300 7718 | M 815 263 3764 |
> > www.tibco.com
> >
> > 
> >
>


-- 
Best Regards,
Neng


Re: [DISCUSS] Recent Checkstyle PRs

2022-01-12 Thread Neng Lu
Hi all,

Not all modules enable the checkstyle.
I think we need to make sure the behavior is consistent across all modules.

On Wed, Jan 12, 2022 at 9:42 AM Michael Marshall 
wrote:

> Hi Pulsar Community,
>
> I notice that we have had several recent PRs adding checkstyle to more
> of our modules:
>
> https://github.com/apache/pulsar/pull/13409
> https://github.com/apache/pulsar/pull/13413
> https://github.com/apache/pulsar/pull/13343
> https://github.com/apache/pulsar/pull/13284
> https://github.com/apache/pulsar/pull/13676
>
> The above is an incomplete list.
>
> In order to minimize divergence for release branches and master, I
> think we should cherry-pick all of these PRs to our active release
> branches, and I propose that future CheckStyle PRs be cherry-picked to
> active branches when they're merged to master.
>
> At the time of writing this email, none of the above PRs have been
> cherry-picked yet.
>
> Let me know what you think. I am not able to do any cherry picking
> this week, but I might be able to help out on this task next week.
>
> Thanks,
> Michael
>


-- 
Best Regards,
Neng


Re: [VOTE] PIP-117: Change Pulsar standalone defaults

2022-01-11 Thread Neng Lu
+1 (non-binding)

On Wed, Jan 5, 2022 at 7:19 AM Lan Liang  wrote:

> +1
>
>
>
>
>
>
> Best Regards,
> Lan Liang
> On 12/23/2021 19:21,Haiting Jiang wrote:
> +1
>
> Thanks,
> Haiting
>
> On 2021/12/23 05:35:03 Michael Marshall wrote:
> +1
>
> - Michael
>
> On Wed, Dec 22, 2021 at 6:18 PM Sijie Guo  wrote:
>
> +1
>
> On Tue, Dec 21, 2021 at 3:49 PM Matteo Merli  wrote:
>
> This is the voting thread for PIP-117. It will stay open for at least 48h.
>
> https://github.com/apache/pulsar/issues/13302
>
> 
>
> ## Motivation
>
> Pulsar standalone is the "Pulsar in a box" version of a Pulsar cluster,
> where
> all the components are started within the context of a single JVM process.
>
> Users are using the standalone as a way to get quickly started with Pulsar
> or
> in all the cases where it makes sense to have a single node deployment.
>
> Right now, the standalone is starting by default with many components,
> several of
> which are quite complex, since they are designed to be deployed in a
> distributed
> fashion.
>
> ## Goal
>
> Simplify the components of Pulsar standalone to achieve:
>
> 1. Reduce complexity
> 2. Reduce startup time
> 3. Reduce memory and CPU footprint of running standalone
>
> ## Proposed changes
>
> The proposal here is to change some of the default implementations that are
> used for the Pulsar standalone.
>
> 1. **Metadata Store implementation** -->
> Change from ZooKeeper to RocksDB
>
> 2. **Pulsar functions package backend** -->
> Change from using DistributedLog to using local filesystem, storing
> the
> jars directly in the data folder instead of uploading them into BK.
>
> 3. **Pulsar functions state store implementation** -->
> Change the state store to be backed by a MetadataStore based backed,
> with the RocksDB implementation.
>
> 4. **Table Service** -->
> Do not start BK table service by default
>
> ## Compatibility considerations
>
> In order to avoid compatibility issues where users have existing Pulsar
> standalone services that they want to upgrade without conflicts, we will
> follow the principle of keeping the old defaults where there is existing
> data on the disk.
>
> We will add a file, serving the purpose as a flag, in the `data/standalone`
> directory, for example `new-2.10-defaults`.
>
> If the file is present, or if the data directory is completely missing, we
> will adopt the new set of default configuration settings.
>
> If the file is not there, we will continue to use existing defaults and we
> will
> not break the upgrade operation.
>
>
> --
> Matteo Merli
> 
>
>
>

-- 
Best Regards,
Neng


Re: [DISCUSSION] PIP-133 Pulsar Functions Add API For Accessing Other Function States

2022-01-11 Thread Neng Lu
Before we advance further, we first need to get on the same page of the
pros and cons of allowing this feature.

If functions can access (especially the write access) other functions'
state, the data ownership will be a mess, isolation is broken and data
security might be compromised.





On Wed, Jan 5, 2022 at 3:45 PM Ethan Merrill 
wrote:

> Original PIP: https://github.com/apache/pulsar/issues/13633
>
> Pasted below for quoting convenience.
>
> -
>
> ## Motivation
>
> Certain uses of Pulsar functions could benefit from the ability to access
> the states of other functions. Currently functions can only access their
> own states, and so sharing information between functions requires other
> solutions such as writing to a separate database.
>
> ## Goal
>
> The goal is to enable the ability for a function to access another
> function's state. Given another function's tenant, namespace, and name, any
> function should be able to access the other function's state for read and
> write purposes. This PIP is not concerned with expanding the capabilities
> of function states, It only deals with expanding access to function states.
>
> ## API Changes
>
> The Pulsar function API would be modified to have the function context
> implement the following interface for accessing function states using a
> tenant, namespace, and name.
>
> ```
> public interface SharedContext {
> /**
>  * Update the state value for the key.
>  *
>  * @param key   name of the key
>  * @param value state value of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> void putState(String key, ByteBuffer value, String tenant, String ns,
> String name);
>
> /**
>  * Update the state value for the key, but don't wait for the
> operation to be completed
>  *
>  * @param key   name of the key
>  * @param value state value of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> CompletableFuture putStateAsync(String key, ByteBuffer value,
> String tenant, String ns, String name);
>
> /**
>  * Retrieve the state value for the key.
>  *
>  * @param key name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  * @return the state value for the key.
>  */
> ByteBuffer getState(String key, String tenant, String ns, String name);
>
> /**
>  * Retrieve the state value for the key, but don't wait for the
> operation to be completed
>  *
>  * @param key name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  * @return the state value for the key.
>  */
> CompletableFuture getStateAsync(String key, String tenant,
> String ns, String name);
>
> /**
>  * Delete the state value for the key.
>  *
>  * @param key   name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> void deleteState(String key, String tenant, String ns, String name);
>
> /**
>  * Delete the state value for the key, but don't wait for the
> operation to be completed
>  *
>  * @param key   name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> CompletableFuture deleteStateAsync(String key, String tenant,
> String ns, String name);
>
> /**
>  * Increment the builtin distributed counter referred by key.
>  *
>  * @param keyThe name of the key
>  * @param amount The amount to be incremented
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> void incrCounter(String key, long amount, String tenant, String ns,
> String name);
>
> /**
>  * Increment the builtin distributed counter referred by key
>  * but dont wait for the completion of the increment operation
>  *
>  * @param keyThe name of the key
>  * @param amount The amount to be incremented
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  */
> CompletableFuture incrCounterAsync(String key, long amount,
> String tenant, String ns, String name);
>
> /**
>  * Retrieve the counter value for the key.
>  *
>  * @param key name of the key
>  * @param tenant the state tenant name
>  * @param ns the state namespace name
>  * @param name the state store name
>  * @return the amount of the counter value for this key
>  */
> long getCounter(String key, 

Re: [VOTE] PIP-121: Pulsar cluster level auto failover on client side

2022-01-11 Thread Neng Lu
+1 (non-binding)

On Mon, Jan 10, 2022 at 12:40 AM PengHui Li  wrote:

> +1 (binding)
>
> Penghui
>
> On Mon, Jan 10, 2022 at 4:38 PM Enrico Olivelli 
> wrote:
>
> > +1 (binding)
> >
> > Enrico
> >
> > Il giorno lun 10 gen 2022 alle ore 07:45 Hang Chen
> >  ha scritto:
> > >
> > > This is the voting thread for PIP-121. It will stay open for at least
> 48
> > > hours.
> > >
> > > https://github.com/apache/pulsar/issues/13315
> > >
> > > Pasted below for quoting convenience.
> > >
> > > -
> > > ### Motivation
> > > We have geo-replication to support Pulsar cluster level failover. We
> > > can set up Pulsar cluster A as a primary cluster in data center A, and
> > > setup Pulsar cluster B as backup cluster in data center B. Then we
> > > configure geo-replication between cluster A and cluster B. All the
> > > clients are connected to the Pulsar cluster by DNS. If cluster A is
> > > down, we should switch the DNS to point the target Pulsar cluster from
> > > cluster A to cluster B. After the clients are resolved to cluster B,
> > > they can produce and consume messages normally. After cluster A
> > > recovers, the administrator should switch the DNS back to cluster A.
> > >
> > > However, the current method has two shortcomings.
> > > 1. The administrator should monitor the status of all Pulsar clusters,
> > > and switch the DNS as soon as possible when cluster A is down. The
> > > switch and recovery is not automatic and recovery time is controlled
> > > by the administrator, which will put the administrator under heavy
> > > load.
> > > 2. The Pulsar client and DNS system have a cache. When the
> > > administrator switches the DNS from cluster A to Cluster B, it will
> > > take some time for cache trigger timeout, which will delay client
> > > recovery time and lead to the product/consumer message failing.
> > >
> > > ### Goal
> > > It's better to provide an automatic cluster level failure recovery
> > > mechanism to make pulsar cluster failover more effective. We should
> > > support pulsar clients auto switching from cluster A to cluster B when
> > > it detects cluster A has been down according to the configured
> > > detecting policy and switch back to cluster A when it has recovered.
> > > The reason why we should switch back to cluster A is that most
> > > applications may be deployed in data center A and they have low
> > > network cost for communicating with pulsar cluster A. If they keep
> > > visiting pulsar cluster B, they have high network cost, and cause high
> > > produce/consume latency.
> > >
> > > In order to improve the DNS cache problem, we should provide an
> > > administrator controlled switch provider for administrators to update
> > > service URLs.
> > >
> > > In the end, we should provide an auto service URL switch provider and
> > > administrator controlled switch provider.
> > >
> > > ### Design
> > > We have already provided the `ServiceUrlProvider` interface to support
> > > different service URLs. In order to support automatic cluster level
> > > failure auto recovery, we can provide different ServiceUrlProvider
> > > implementations. For current requirements, we can provide
> > > `AutoClusterFailover` and `ControlledClusterFailover`.
> > >
> > >  AutoClusterFailover
> > > In order to support auto switching from the primary cluster to the
> > > secondary, we can provide a probe task, which will probe the activity
> > > of the primary cluster and the secondary one. When it finds the
> > > primary cluster failed more than `failoverDelayMs`, it will switch to
> > > the secondary cluster by calling `updateServiceUrl`. After switching
> > > to the secondary cluster, the `AutoClusterFailover` will continue to
> > > probe the primary cluster. If the primary cluster comes back and
> > > remains active for `switchBackDelayMs`, it will switch back to the
> > > primary cluster.
> > > The APIs are listed as follows.
> > >
> > > In order to support multiple secondary clusters, use List to store
> > > secondary cluster urls. When the primary cluster probe fails for
> > > failoverDelayMs, it will start to probe the secondary cluster list one
> > > by one, once it finds the active cluster, it will switch to the target
> > > cluster. Notice: If you configured multiple clusters, you should turn
> > > on cluster level geo-replication to ensure the topic data sync between
> > > all primary and secondary clusters. Otherwise, it may distribute the
> > > topic data into different clusters. And the consumers won’t get the
> > > whole data of the topic.
> > >
> > > In order to support different authentication configurations between
> > > clusters, we provide the authentication relation configurations
> > > updated with the target cluster.
> > >
> > > ```Java
> > > public class AutoClusterFailover implements ServiceUrlProvider {
> > >
> > >private AutoClusterFailover(AutoClusterFailoverBuilderImpl builder)
> {
> > > //
> > > }
> > >
> > > @Override
> > > public void 

Re: [VOTE] PIP-122: Change loadBalancer default loadSheddingStrategy to ThresholdShedder

2022-01-11 Thread Neng Lu
+1 (non-binding)

On Mon, Jan 10, 2022 at 12:39 AM Enrico Olivelli 
wrote:

> +1 (binding)
>
> Enrico
>
> Il giorno lun 10 gen 2022 alle ore 07:47 Hang Chen
>  ha scritto:
>
> >
> > This is the voting thread for PIP-122. It will stay open for at least 48
> > hours.
> >
> > https://github.com/apache/pulsar/issues/13340
> >
> > Pasted below for quoting convenience.
> >
> > 
> >
> > ### Motivation
> > The ThresholdShedder load balance policy since Pulsar 2.6.0 by
> > https://github.com/apache/pulsar/pull/6772. It can resolve many load
> > balance issues of `OverloadShedder` and works well in many Pulsar
> > production clusters.
> >
> > In Pulsar 2.6.0, 2.7.0, 2.8.0 and 2.9.0, Pulsar's default load balance
> > policy is `OverloadShedder`.
> >
> > I think it's a good time for 2.10 to change default load balance
> > policy to `ThresholdShedder`, it will make throughput more balance
> > between brokers.
> >
> > ### Proposed Changes
> > In 2.10 release,for `broker.conf`, change
> > `loadBalancerLoadSheddingStrategy` from
> > `org.apache.pulsar.broker.loadbalance.impl.OverloadShedder` to
> > `org.apache.pulsar.broker.loadbalance.impl.ThresholdShedder`
>


Re: [DISCUSSION] PIP-117: Change Pulsar standalone defaults

2021-12-14 Thread Neng Lu
+1 (non-binding)

On 2021/12/14 17:18:03 Matteo Merli wrote:
> https://github.com/apache/pulsar/issues/13302
> 
> Copying here for quoting convenience
> 
> 
> 
> 
> 
> ## Motivation
> 
> Pulsar standalone is the "Pulsar in a box" version of a Pulsar cluster, where
> all the components are started within the context of a single JVM process.
> 
> Users are using the standalone as a way to get quickly started with Pulsar or
> in all the cases where it makes sense to have a single node deployment.
> 
> Right now, the standalone is starting by default with many components,
> several of
> which are quite complex, since they are designed to be deployed in a 
> distributed
> fashion.
> 
> ## Goal
> 
> Simplify the components of Pulsar standalone to achieve:
> 
>  1. Reduce complexity
>  2. Reduce startup time
>  3. Reduce memory and CPU footprint of running standalone
> 
> ## Proposed changes
> 
> The proposal here is to change some of the default implementations that are
> used for the Pulsar standalone.
> 
>  1. **Metadata Store implementation** -->
>   Change from ZooKeeper to RocksDB
> 
>  2. **Pulsar functions package backend** -->
>   Change from using DistributedLog to using local filesystem, storing the
>   jars directly in the data folder instead of uploading them into BK.
> 
>  3. **Pulsar functions state store implementation** -->
>   Change the state store to be backed by a MetadataStore based backed,
>   with the RocksDB implementation.
> 
>  4. **Table Service** -->
>   Do not start BK table service by default
> 
> ## Compatibility considerations
> 
> In order to avoid compatibility issues where users have existing Pulsar
> standalone services that they want to upgrade without conflicts, we will
> follow the principle of keeping the old defaults where there is existing
> data on the disk.
> 
> We will add a file, serving the purpose as a flag, in the `data/standalone`
> directory, for example `new-2.10-defaults`.
> 
> If the file is present, or if the data directory is completely missing, we
> will adopt the new set of default configuration settings.
> 
> If the file is not there, we will continue to use existing defaults and we 
> will
> not break the upgrade operation.
> 
> 
> 
> 
> 
> --
> Matteo Merli
> 
> 


Re: [DISCUSSION] PIP-120: Enable client memory limit by default

2021-12-14 Thread Neng Lu
+1 (non-binding)

On 2021/12/14 19:20:02 Matteo Merli wrote:
> https://github.com/apache/pulsar/issues/13306
> 
> 
> Pasted below for quoting convenience.
> 
> 
> 
> 
> ## Motivation
> 
> In Pulsar 2.8, we have introduced a setting to control the amount of memory
> used by a client instance.
> 
> ```java
> interface ClientBuilder {
> ClientBuilder memoryLimit(long memoryLimit, SizeUnit unit);
> }
> ```
> 
> By default, in 2.8 and 2.9 this setting is set to 0, meaning no limit is being
> enforced.
> 
> I think it's a good time for 2.10 to enable this setting by default and,
> correspondingly, to disable by default the producer queue size limit.
> 
> This will simplify a lot the configuration that a producer application will
> have to come up with, when publishing with many topic/partitions or
> when messages
> are bigger than expected.
> 
> ## Proposed changes
> 
> In 2.10 release, for the `ClientBuilder`, change
>   * `memoryLimit`: 0 -> 64 MB
> 
> For the `ProducerBuilder`, changes
>   * `maxPendingMessages`: 1000 -> 0
> 
> 64MB is picked because it's a small enough memory size that will guarantee
> a very high producer throughput, irrespective of the individual messages size.
> 
> 
> 
> --
> Matteo Merli
> 
> 


Re: [DISCUSS] Release Pulsar 2.7.4

2021-12-09 Thread Neng Lu
+1

On 2021/12/09 15:29:55 Michael Marshall wrote:
> Hello Pulsar Community,
> 
> I'd like to propose that we release 2.7.4. We have merged several
> important fixes since we released 2.7.3 in August.
> 
> I am happy to volunteer to be the release manager.
> 
> Here [0] you can find the list of 36 commits cherry-picked to
> branch-2.7 since 2.7.3 release. It looks like there are more PRs
> labeled with `release/2.7.4` than commits cherry-picked, so I will
> need to work on cherry-picking those before we can create the tag for
> the release [1].
> 
> Also, I see 3 open PRs labeled with `release/2.7.4`. I'll follow up on
> each of those PRs to see if they will be completed soon.
> 
> Thanks,
> Michael
> 
> [0] - https://github.com/apache/pulsar/compare/v2.7.3...branch-2.7
> [1] - 
> https://github.com/apache/pulsar/pulls?q=is%3Aopen+is%3Apr+label%3Arelease%2F2.7.4
> 


[RESULT][VOTE] PIP 104: Add new consumer type: TableView

2021-12-09 Thread Neng Lu
Hi All,

The VOTE passed with total 8 +1s and no -1.

6 binding +1 from:
Matteo Merli
Enrico Olivelli
Jerry Peng
Guangning E
Penghui Li
Hang Chen

2 non-binding +1 from:
Michael Marshall
Li Li

I'll mark the PR as ready for review and let's push this feature : )
Thank you for all your help.

Best regards,
Neng Lu


[Vote] PIP 104: Add new consumer type: TableView

2021-12-01 Thread Neng Lu
Hi Pulsar Community,

I would like to start a VOTE on the Pulsar TableView consumer (PIP 104).

The issue for PIP 104 is here: https://github.com/apache/pulsar/issues/12356
And the prototype implementation is here:
https://github.com/apache/pulsar/pull/12838
<https://github.com/apache/pulsar/pull/12838>

Please VOTE within 72 hours.

Best Regards,
Neng Lu


Re: [ANNOUNCE] Apache Pulsar Go Client 0.7.0 released

2021-11-18 Thread Neng Lu
Thank you Chris, I also updated the function's go runtime to utilize the most 
recent release.

On 2021/11/16 05:41:11 Chris Kellogg wrote:
> The Apache Pulsar team is proud to announce Apache Pulsar Go Client version
> 0.7.0.
> 
> Pulsar is a highly scalable, low latency messaging platform running on
> commodity hardware. It provides simple pub-sub semantics over topics,
> guaranteed at-least-once delivery of messages, automatic cursor management
> for
> subscribers, and cross-datacenter replication.
> 
> For Pulsar release details and downloads, visit:
> https://github.com/apache/pulsar-client-go/releases/tag/v0.7.0
> 
> Release Notes are at:
> https://github.com/apache/pulsar-client-go/blob/master/CHANGELOG.md
> 
> We would like to thank the contributors that made the release possible.
> 
> Regards,
> 
> The Pulsar Team
> 


RE: [DISCUSS] The processing of residual scheam and the convergence of schema operation auth

2021-11-18 Thread Neng Lu
+1 for dropping the schema when a topic is deleted by default.

I previously met a strange error that after creating a topic with the same name 
as previous deleted topic, the "ghost" schema is associated with the new topic 
again and caused a lot of confusion for us.


On 2021/11/18 03:35:50 Ruguo Yu wrote:
> Hi All,
> 
>  
> After discussing with Penghui and HangChen about the consistency of topic and 
> schema deletion, our preliminary conclusion is to drop the `--deleteSchema` 
> parameter in `bin/pulsar-admin topics delete`, which can ensure the schema is 
> deleted when the topic is deleted, and the default value is `true` on the 
> broker side to be compatible with the lower version client’s deletion request.
> This change plan to be merged in the next major version release. What about 
> your thoughts on this?
>  
> 
> Thanks,
> 
> Ruguo Yu
> 
>  
> 
> On 2021/11/14 14:14:52 yuruguo wrote:
> 
> > Dear all,
> 
> > 
> 
> > Currently, topic and schema are managed separately, and there will be a 
> > situation, that is, the topic has been deleted but its schema still exists. 
> > Should we deal with these residual schemas? For this problem I created an 
> > issue[1].
> 
> > 
> 
> > In addition, the operation auth of the schema should also converge. To a 
> > certain extent, it is related to the operation auth of topic. For this 
> > problem I created an issue[2].
> 
> > 
> 
> > Regarding the two problems of the schema, please give guidance or a better 
> > solution 
> 
> > 
> 
> >  
> 
> > 
> 
> > [1] https://github.com/apache/pulsar/issues/12795
> 
> > 
> 
> > [2] https://github.com/apache/pulsar/issues/12419
> 
> > 
> 
> >  
> 
> > Thanks,
> 
> > Ruguo Yu
> 
> >  
> 
> > 
> 
> > 
> 
> 


Re: [DISCUSS] Pulsar Protocol For Client Timeouts and Creating Producers

2021-11-18 Thread Neng Lu
How about making the behavior when timeout configurable? And by default, it 
will send the `CloseProducer` cmd?

On 2021/11/17 05:52:21 Michael Marshall wrote:
> Hi All,
> 
> I noticed that the `ServerCnxTest#testCreateProducerTimeout` test
> indicates that a producer will send a `CloserProducer` command when
> producer creation times out for the client.
> 
> The Java client does not follow this protocol. When the producer
> creation times out, it just schedules another attempt to create the
> producer.
> 
> The C++ client does follow this protocol and is annotated with the
> following comment:
> 
> > // Creating the producer has timed out. We need to ensure the broker closes 
> > the producer
> > // in case it was indeed created, otherwise it might prevent new create 
> > producer operation,
> > // since we are not closing the connection
> 
> I don't see anything in our official protocol spec indicating the
> "right" behavior. Given the test coverage, it seems like the initial
> design was to expect a `CloseProducer` command. However, I also see that
> the broker's `ServerCnx` class will reply to a redundant `Producer`
> command with a `ProducerSuccess` command, as long as the producer
> is already created.
> 
> Should I submit a PR to update the Java client to send a
> `CloseProducer` command when a `Producer` command times out?
> 
> Thanks,
> Michael
> 


Re: [DISCUSS] Add Pulsar io Pulsar connector

2021-11-17 Thread Neng Lu
@sijie the case here might be tricky. They may want to move data across
pulsar clusters operated by different org or teams.

Remember we previously added the ability to send messages to external
pulsar clusters for pulsar function but got reverted. I think this is the
case they are trying to tackle.

On Wed, Nov 17, 2021 at 10:29 AM Sijie Guo  wrote:

> I don't think you need a separate connector.
>
> An identity function should be able to do the job for you.
>
> - Sijie
>
> On Mon, Nov 15, 2021 at 3:34 PM Neng Lu  wrote:
>
> > Just did a quick search, it's interesting we don't have a pulsar
> connector
> > to move data among pulsar clusters.
> > I guess people usually write their own pulsar client to move data around.
> >
> >
> > On Mon, Nov 15, 2021 at 3:11 PM ZhangJian He  wrote:
> >
> > > Yes, move data across different pulsar clusters which belongs to
> > different
> > > company or organization
> > >
> > > Thanks
> > > ZhangJian He
> > >
> > > Neng Lu  于2021年11月16日周二 上午2:50写道:
> > >
> > > > Hi,
> > > >
> > > > What's your new connector used for in the customer use cases?
> > > > A `pulsar-io-kafak-connector` is used for moving data between kafka
> and
> > > > pulsar.
> > > > But in your case, a `pulsar-io-pulsar-connector`, do you mean you
> want
> > to
> > > > move data across different pulsar clusters?
> > > >
> > > >
> > > > On Mon, Nov 15, 2021 at 6:51 AM ZhangJian He 
> > wrote:
> > > >
> > > > > Dear all
> > > > >
> > > > > My team are suggesting some of our customers use pulsar instead of
> > > kafka
> > > > > for their needs.
> > > > > Before, my team used a pulsar-io-kafka-connector, now my team wants
> > to
> > > > use
> > > > > a pulsar-io-to-pulsar-connector server for these customers.
> > > > >
> > > > > And I notice now we don't have a pulsar-io-pulsar-connector.
> > > > >
> > > > > Should I develop a connector?
> > > > > And should the connector be maintained in the pulsar main repo ?
> > > > >
> > > > > IMO, if we dicided to develop a pulsar-io-connector, it's more
> > > reasonable
> > > > > to maintain it in the pulsar main repo. (At least, the
> > > > > pulsar-io-kafka-connector is in main repo)
> > > > >
> > > > > Looking forward to your opinions.
> > > > >
> > > > >
> > > > > Thanks
> > > > > ZhangJian He
> > > > >
> > > >
> > >
> >
>


Re: Dropping Presto SQL in 2.9.0 - status ?

2021-11-17 Thread Neng Lu
Just curious to learn is there any progress on moving all the connectors into 
separate repos?
Maybe I can help if the decision is finalized.

On 2021/11/17 06:18:52 Lari Hotari wrote:
> Dear Pulsar community members,
> 
> PIP-62[1], "PIP 62: Move connectors, adapters and Pulsar Presto to separate
> repositories" was created in April 2020. The repositories for
> pulsar-connectors, pulsar-adapters and pulsar-sql were created about a year
> ago [2].
> 
> What is the current roadmap for completing PIP-62 and moving
> pulsar-connectors and pulsar-sql out of apache/pulsar repository?
> 
> BR,
> 
> Lari
> 
> [1]
> https://github.com/apache/pulsar/wiki/PIP-62%3A-Move-connectors%2C-adapters-and-Pulsar-Presto-to-separate-repositories
> [2]
> https://lists.apache.org/thread.html/r9e6ec742e2896da1f0ce7d4adc7cb84fc6db6dbf797732ccdd50fb86%40%3Cdev.pulsar.apache.org%3E
> 
> Other email threads:
> * [Discuss] Don't include presto/trino in the normal Pulsar distribution -
> https://lists.apache.org/thread/jn96tct54mn0tvdot62vdslrvs38fm6d
> * Updates on Presto connector for PIP-62 -
> https://lists.apache.org/thread/f9n6sc2mrboq5sxhjbr7gvdl8vqp9fpk
> 
> On Tue, Nov 2, 2021 at 3:59 PM Nicolò Boschi  wrote:
> 
> > Resurrecting this thread.
> >
> > 2.9 is almost released and it hasn't been merged yet
> >
> > Extending the discussion to other connectors, it looks like there has been
> > no progress on PIP-62.
> > My concern is that a lot of Pulsar IO connectors dependencies we are
> > running are obsolete with several security reports
> >
> > I see there are interesting comments in the issue (
> > https://github.com/apache/pulsar/issues/10219) and Sijie exported the
> > pulsar-io dir to https://github.com/apache/pulsar-connectors but it's
> > outdated
> >
> > From my point of view, we have to:
> > - reimport all the connectors source codes with newest ones (including
> > integration tests)
> > - add periodic CI jobs for connectors to run against master, 2.9-latest,
> > 2.8-latest, 2.7-latest to verify breaking changes
> > - define a release cycle/management for connectors (we should improve the
> > PIP doc). IMO it's not clear if each connector will have its own release
> > versions and how we'll handle it (git tags, artifacts deployment..)
> > - update pulsar release script in order to get the connectors artifacts
> > (retrieving the .nar or building it from source?)
> > - update docs
> > - remove pulsar-io dir from Pulsar repo
> >
> > It's the perfect timing to schedule this work for 2.10
> >
> > What is missing? How's the situation? Is there a roadblock I haven't seen?
> > I think it's better to take another discussion for Presto since it will
> > come to another end
> >
> >
> > Il giorno sab 14 ago 2021 alle ore 15:21 Enrico Olivelli <
> > eolive...@gmail.com> ha scritto:
> >
> > > Sijie
> > >
> > > Il Ven 13 Ago 2021, 22:00 Sijie Guo  ha scritto:
> > >
> > > > You can follow the progress at
> > > https://github.com/trinodb/trino/pull/8020.
> > > >
> > >
> > > Thanks for the pointer
> > >
> > > >
> > > > The original code doesn't conform to TrinoDB's standard. Marvin is
> > > > actively following up on that.
> > > >
> > > > Our goal is still to get this completed as part of the 2.9 release.
> > > >
> > >
> > > Wonderful
> > >
> > > Thanks
> > > Enrico
> > >
> > > >
> > > > - Sijie
> > > >
> > > > On Fri, Aug 13, 2021 at 2:04 AM Enrico Olivelli 
> > > > wrote:
> > > > >
> > > > > Hello,
> > > > > How is the Presto work going ?
> > > > > IIRC the plan was to remove it from the Pulsar code base and let it
> > be
> > > > > hosted at Trino.
> > > > >
> > > > > If this is not going to happen within the 2.9.0 release timeline
> > > > > (September?) I would prefer to upgrade to "Trino".
> > > > > Probably we will have a downside problem that recent versions of
> > > > > Presto/Trino do not work on JDK8 but only on JDK11.
> > > > >
> > > > > I believe that in that case we could open a separate thread to say
> > that
> > > > > Pulsar SQL in 2.9.0 will work only on JDK11.
> > > > > In Pulsar 2.8.0 we added official compatibility with JDK11 (and it is
> > > the
> > > > > preferred version, as it is the version used in the Docker images),
> > so
> > > > > requiring JDK11 for Pulsar SQL 2.9.0 does not sound bad to me.
> > > > >
> > > > > My primary concern is that the version of Presto that we are running
> > is
> > > > > obsolete and there are several security reports against it or its
> > third
> > > > > party dependencies.
> > > > >
> > > > > Thoughts ?
> > > > >
> > > > > Enrico
> > > >
> > >
> >
> >
> > --
> > Nicolò Boschi
> >
> 


Re: [DISCUSS] Add remove-clusters command for namespace in pulsar-admin

2021-11-16 Thread Neng Lu
+1 (non-binding)

On Mon, Nov 15, 2021 at 5:50 PM PengHui Li  wrote:

> +1,
>
> Penghui
> On Nov 16, 2021, 9:27 AM +0800, Ruguo Yu , wrote:
> > Hi Community,
> >
> > The tool ` pulsar-admin` supports `set-clusters` and `get-clusters`
> command so that we can `set` / `get` replication clusters for a namespace.
> But it lacks corresponding `remove-clusters` command to restore to the
> unset state,  I think it is necessary to add this command to ensure the
> closed-loop operation of the replication cluster.
> >
> >
> >
> > I have created a issue[1] which contains possible implementation details
> for this problem, please discuss and give opinion.
> >
> >
> >
> > Thanks,
> >
> > Ruguo Yu
> >
> >
> >
> > [1] https://github.com/apache/pulsar/issues/12822
> >
> >
> >
>


Re: [DISCUSS] Add Pulsar io Pulsar connector

2021-11-15 Thread Neng Lu
Just did a quick search, it's interesting we don't have a pulsar connector
to move data among pulsar clusters.
I guess people usually write their own pulsar client to move data around.


On Mon, Nov 15, 2021 at 3:11 PM ZhangJian He  wrote:

> Yes, move data across different pulsar clusters which belongs to different
> company or organization
>
> Thanks
> ZhangJian He
>
> Neng Lu  于2021年11月16日周二 上午2:50写道:
>
> > Hi,
> >
> > What's your new connector used for in the customer use cases?
> > A `pulsar-io-kafak-connector` is used for moving data between kafka and
> > pulsar.
> > But in your case, a `pulsar-io-pulsar-connector`, do you mean you want to
> > move data across different pulsar clusters?
> >
> >
> > On Mon, Nov 15, 2021 at 6:51 AM ZhangJian He  wrote:
> >
> > > Dear all
> > >
> > > My team are suggesting some of our customers use pulsar instead of
> kafka
> > > for their needs.
> > > Before, my team used a pulsar-io-kafka-connector, now my team wants to
> > use
> > > a pulsar-io-to-pulsar-connector server for these customers.
> > >
> > > And I notice now we don't have a pulsar-io-pulsar-connector.
> > >
> > > Should I develop a connector?
> > > And should the connector be maintained in the pulsar main repo ?
> > >
> > > IMO, if we dicided to develop a pulsar-io-connector, it's more
> reasonable
> > > to maintain it in the pulsar main repo. (At least, the
> > > pulsar-io-kafka-connector is in main repo)
> > >
> > > Looking forward to your opinions.
> > >
> > >
> > > Thanks
> > > ZhangJian He
> > >
> >
>


Re: [DISCUSS] Add Pulsar io Pulsar connector

2021-11-15 Thread Neng Lu
Hi,

What's your new connector used for in the customer use cases?
A `pulsar-io-kafak-connector` is used for moving data between kafka and
pulsar.
But in your case, a `pulsar-io-pulsar-connector`, do you mean you want to
move data across different pulsar clusters?


On Mon, Nov 15, 2021 at 6:51 AM ZhangJian He  wrote:

> Dear all
>
> My team are suggesting some of our customers use pulsar instead of kafka
> for their needs.
> Before, my team used a pulsar-io-kafka-connector, now my team wants to use
> a pulsar-io-to-pulsar-connector server for these customers.
>
> And I notice now we don't have a pulsar-io-pulsar-connector.
>
> Should I develop a connector?
> And should the connector be maintained in the pulsar main repo ?
>
> IMO, if we dicided to develop a pulsar-io-connector, it's more reasonable
> to maintain it in the pulsar main repo. (At least, the
> pulsar-io-kafka-connector is in main repo)
>
> Looking forward to your opinions.
>
>
> Thanks
> ZhangJian He
>


Re: Moving questions from Slack to Stack Overflow

2021-07-23 Thread Neng Lu
+1

On 2021/07/23 16:48:42 Aaron Williams wrote:
> Hello Apache Pulsar Community,
> 
> Many of us community members were having a conversation about Slack and how
> it is not working very well as a tool to answer questions from the wider
> community and we should try to push more of the questions to Stack
> Overflow.
> 
> We would like to suggest-
> 
>1.
> 
>Add a link to stack overflow to the contact page of the website:
>https://pulsar.apache.org/en/contact/
>2.
> 
>Add text to the webpage stating that we recommend that if you have a
>technical question, please ask it on Stack Overflow
>3.
> 
>On Slack, we encourage them to post their question to Stack Overflow
> 
> 
> Reasons:
> 
>-
> 
>We are on the free version of Slack and thus all questions/answers
>before the middle of June are lost (10k messages max).
>-
> 
>We answer the same question over and over.
>-
> 
>   We want to make it easier for our community to find the answers.
>   -
> 
>Developers expect answers to be on Stack Overflow
>-
> 
>It is easier to copy and paste an error message into Google and if it
>has been posted to Stack Overflow, then the questioner will find the answer
>quickly.
>-
> 
>   Slack is not searchable by Google, plus we won’t lose the
>   question/answer after 5 weeks
> 
> 
> Thoughts?
> 
> Thanks,
> 
> Aaron Williams
> 


Fix Debezium Connectors to pass integration test

2021-07-22 Thread Neng Lu
Hi All,

I spent some time digging into the debezium connector integration test
issue. And found that currently the connector's `ack()` method is a
blocking call. This results in the blocking of two threads
(public/default/debezium-mongodb-source-0 and pulsar-io-X) and thus the
offset is never committed successfully.

I have prepared a fix here: https://github.com/apache/pulsar/pull/11435


Best Regards,
Neng Lu


Re: [DISCUSS] Releasing Pulsar-client-go 0.6.0

2021-07-19 Thread Neng Lu
+1 

Neng Lu

On 2021/07/19 08:44:11 "r...@apache.org" wrote:
> Hello Everyone:
> 
> I hope you’ve all been doing well. In the past two months, we have
> fixed a number of bugs related to connection leaks and added
> some new features. For more information refer to:
> 
> https://github.com/apache/pulsar-client-go/milestone/7?closed=1
> 
> For that reason, I think we should be releasing a 0.6.0 version with
> what we have today.
> 
> --
> Thanks
> Xiaolong Ran
> 


Re: Re: Problems with Functions/IO in Upgrading Pulsar from 2.7 to 2.8

2021-07-19 Thread Neng Lu
Based on my local test, it's fine for String Schema.

On 2021/07/19 18:47:49 Devin Bost wrote:
> > This leads to an IncompatibleClassChangeError  when you have a Function or
> > a Connector that is using Schema.JSON(Pojo.class)
> 
> I just noticed this detail. Do we have a sense of how often people are
> using Schema.JSON in Functions/Connectors?
> Most of our functions are using a string schema, so it's not clear to me if
> they would be impacted.
> 
> Devin G. Bost
> 
> 
> On Mon, Jul 19, 2021 at 12:41 PM Devin Bost  wrote:
> 
> > > I think Sijie is referring to using KubernetesRuntime to deploy functions
> > > where each function/source/sink runs as an independent statefulset in
> > K8s.
> > > In this scenario, it is possible to have fine grained control over which
> > > version of the function container the function is using.
> >
> > Not everybody is using the KubernetesRuntime yet (especially since the
> > Helm charts aren't feature-complete), and it appears that those who aren't
> > running KubernetesRuntime would be impacted the most by this issue.
> >
> > Devin G. Bost
> >
> >
> > On Mon, Jul 19, 2021 at 12:36 PM Devin Bost  wrote:
> >
> >> > For example, if you are upgrading Flink from one version to the other
> >> > version, you have to make a save point in the previous version for all
> >> > the Flink jobs.
> >> > Upgrade the Flink cluster and resume jobs in a new version.
> >> >
> >> >
> >> https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/ops/upgrading/
> >> >
> >> > So it is not unreasonable for asking people to do that when dealing
> >> > with upgrading a centralized computing engine.
> >>
> >> One difference with Flink is that organizations running Flink in job mode
> >> or application mode can upgrade jobs independently of one another, so teams
> >> can upgrade jobs when they are ready without impacting other teams. In the
> >> Pulsar case, Pulsar is multi-tenant, so upgrading the entire cluster would
> >> break every tenant simultaneously and would block the flow of all messages
> >> until all functions are upgraded. If one team takes a year to upgrade their
> >> one function, the cluster could not be upgraded until that happened. Also,
> >> after all the functions have been upgraded, there would be production
> >> downtime while deploying all the upgraded functions, which would be a major
> >> outage... It might be possible to write a script to speed up the deployment
> >> to shrink the outage window, but there's currently a bug that wipes out
> >> existing userConfigs when a function is upgraded, so that adds to the
> >> complexity of upgrading all the functions since someone would need to know
> >> all the userConfigs for all the functions.
> >>
> >> So, I don't think we're really comparing the same things here.
> >>
> >> Devin G. Bost
> >>
> >>
> >> On Mon, Jul 19, 2021 at 12:17 PM Sijie Guo  wrote:
> >>
> >>> On Mon, Jul 19, 2021 at 10:32 AM Jerry Peng 
> >>> wrote:
> >>> >
> >>> > I agree that the best we can do right now is to just clearly document
> >>> this
> >>> > as a potential problem when updating 2.7 to 2.8.
> >>> >
> >>> > We should definitely make every attempt to not make BC breaking
> >>> changes.
> >>> > However, there are times when we have to make these tough decisions
> >>> for one
> >>> > reason or another. The bigger problem I see here is not necessarily a
> >>> BC
> >>> > breaking change occurred, but rather we didn't know about it
> >>> beforehand so
> >>> > we can clearly document this caveat when 2.8 is released.  Perhaps
> >>> this is
> >>> > where we can improve our backwards compatibility testing.  We already
> >>> have
> >>> > some but probably not enough as highlighted by this case.
> >>> >
> >>> > In regards to
> >>> >
> >>> > This is partially correct, because you can wait to upgrade the workers
> >>> pod,
> >>> > > but there is no fine grained control over which version  of each pod
> >>> will
> >>> > > be running your function, especially in a big cluster with many
> >>> tenants and
> >>> > > functions with this problem
> >>> > >
> >>> >
> >>> >
> >>> > I think Sijie is referring to using KubernetesRuntime to deploy
> >>> functions
> >>> > where each function/source/sink runs as an independent statefulset in
> >>> K8s.
> >>> > In this scenario, it is possible to have fine grained control over
> >>> which
> >>> > version of the function container the function is using.  There
> >>> currently
> >>> > might not be tools to easily allow users to do this but using kubectl
> >>> one
> >>> > can definitely determine which container version is running and
> >>> potentially
> >>> > update the container version on a per function basis.
> >>>
> >>> Jerry - Thank you! That was what I meant.
> >>>
> >>> >
> >>> > Best,
> >>> >
> >>> > Jerry
> >>> >
> >>> > On Mon, Jul 19, 2021 at 12:50 AM Enrico Olivelli 
> >>> > wrote:
> >>> >
> >>> > > Sijie,
> >>> > > Thank you for your feedback
> >>> > > Some additional considerations inline
> >>> > >
> >>> > > 

Re: Re: Re: Re: Discussion about https://github.com/apache/pulsar/pull/11112

2021-07-07 Thread Neng Lu
- Regarding the interface design:
One thing we need to confirm is whether we **don't want old runtime to execute 
new function without proper initialization** or not. 

If this rule must be followed, then one possible way is we define a separate 
`HookFunction` interface which includes `setup`, `process`, `tearDown` methods, 
and add support in the new runtime to handle this interface. Just as currently 
the runtime handles `pulsar.Function` and `java.util.Function` separately. The 
old runtime will not be able to recognize the new interface and thus won't 
execute functions which need initialization.

Documentation and examples can be added to help user learn and understand this 
new interface.

- Regarding the exception:
I think user is responsible for properly handle any checked exceptions and thus 
the interface should not allow throwing any exception. For any unchecked 
exceptions, the runtime/process will fail fast. 

- Regarding the contract:
Given that checked exceptions are handled by user properly as described above, 
the `setup` method must be successfully executed or exited. So, there's no need 
to call the `tearDown` method.


On 2021/07/06 21:02:17 Enrico Olivelli wrote:
> Il Mar 6 Lug 2021, 21:30 Neng Lu  ha scritto:
> 
> > IMHO, The old runtime should not be able to run the new functions.
> 
> 
> This is not possible to enforce this because we cannot change old code.
> 
> >
> >
> > New functions require resource initialization via hooks. If the actual
> > `setup()` method is not called (or only a default no-op one is called),
> > then the function is not properly initialized and there'll be problems if
> > they are executed.
> >
> 
> I believe that using default methods is better, as it is clearer for new
> users.
> 
> If you add a new interface then it is difficult for a new user to discover
> this feature.
> 
> Additionally we have to define the contract for calling the teardown
> function.
> Will it be called in case of failure of the setup function?
> 
> Are those functions allowed to throw exceptions. I believe it is good to
> let them throw Exception in order to not force the user to catch everything
> and re throw as a RuntimeException
> 
> Enrico
> 
> 
> >
> > On 2021/07/06 19:03:09 Sijie Guo wrote:
> > > So there are two compatibility issues we need to consider here.
> > >
> > > 1) Old runtime to run new functions.
> > > 2) New runtime to run old functions.
> > >
> > > Making the methods with default no-op implementation will resolve 1).
> > > Is that correct?
> > >
> > > We can use reflection to check if the methods exist or not to solve 2),
> > no?
> > >
> > > - Sijie
> > >
> > > On Tue, Jul 6, 2021 at 11:34 AM Neng Lu  wrote:
> > > >
> > > > I think the reason is for keeping the original `Function` unchanged,
> > so that existing implemented functions are not affected.
> > > >
> > > >
> > > > On 2021/07/06 03:34:49 Sijie Guo wrote:
> > > > > Thank you for starting the discussion!
> > > > >
> > > > > I have added this proposal to PIP-86:
> > > > >
> > https://github.com/apache/pulsar/wiki/PIP-86:-Pulsar-Functions:-Preload-and-release-external-resources
> > > > >
> > > > > I have one question: why do you introduce HookFunction? Why not just
> > add
> > > > > two default methods to the existing Functions API?
> > > > >
> > > > > - Sijie
> > > > >
> > > > > On Mon, Jul 5, 2021 at 6:49 PM 陈磊 
> > wrote:
> > > > >
> > > > > > Motivation
> > > > > >
> > > > > > It is very useful in many scenarios to provide safe and convenient
> > > > > > capabilities for function's external resource initialization and
> > release.
> > > > > > In addition to the normal data processing path, it enables
> > functions to use
> > > > > > HookFunction to manage external resources
> > > > > >
> > > > > > At present, in order to process data, only the logic of resource
> > > > > > initialization -> processing -> release and shutdown can be
> > written in the
> > > > > > process() of Function. This method is complicated, insecure, and
> > > > > > unnecessary.
> > > > > >
> > > > > > Instead, we should have a new standard way for users to use
> > Function
> > > > > > easily and safely. Summarized as follows:
> > > 

Re: Re: Re: Discussion about https://github.com/apache/pulsar/pull/11112

2021-07-06 Thread Neng Lu
IMHO, The old runtime should not be able to run the new functions. 

New functions require resource initialization via hooks. If the actual 
`setup()` method is not called (or only a default no-op one is called), then 
the function is not properly initialized and there'll be problems if they are 
executed.


On 2021/07/06 19:03:09 Sijie Guo wrote:
> So there are two compatibility issues we need to consider here.
> 
> 1) Old runtime to run new functions.
> 2) New runtime to run old functions.
> 
> Making the methods with default no-op implementation will resolve 1).
> Is that correct?
> 
> We can use reflection to check if the methods exist or not to solve 2), no?
> 
> - Sijie
> 
> On Tue, Jul 6, 2021 at 11:34 AM Neng Lu  wrote:
> >
> > I think the reason is for keeping the original `Function` unchanged, so 
> > that existing implemented functions are not affected.
> >
> >
> > On 2021/07/06 03:34:49 Sijie Guo wrote:
> > > Thank you for starting the discussion!
> > >
> > > I have added this proposal to PIP-86:
> > > https://github.com/apache/pulsar/wiki/PIP-86:-Pulsar-Functions:-Preload-and-release-external-resources
> > >
> > > I have one question: why do you introduce HookFunction? Why not just add
> > > two default methods to the existing Functions API?
> > >
> > > - Sijie
> > >
> > > On Mon, Jul 5, 2021 at 6:49 PM 陈磊  wrote:
> > >
> > > > Motivation
> > > >
> > > > It is very useful in many scenarios to provide safe and convenient
> > > > capabilities for function's external resource initialization and 
> > > > release.
> > > > In addition to the normal data processing path, it enables functions to 
> > > > use
> > > > HookFunction to manage external resources
> > > >
> > > > At present, in order to process data, only the logic of resource
> > > > initialization -> processing -> release and shutdown can be written in 
> > > > the
> > > > process() of Function. This method is complicated, insecure, and
> > > > unnecessary.
> > > >
> > > > Instead, we should have a new standard way for users to use Function
> > > > easily and safely. Summarized as follows:
> > > >
> > > > Before Function starts, some resources only need to be initialized once,
> > > > and there is no need to make various judgments in the process() method 
> > > > of
> > > > the Function interface
> > > >
> > > > After closing the Function, in the process of using process(), you need 
> > > > to
> > > > manually close the referenced external resources, which need to be 
> > > > released
> > > > separately in the close() of javaInstance
> > > > API and Implementation Changes
> > > >
> > > > The organization of the function implementation hierarchy has been 
> > > > added,
> > > > it currently looks like the following figure:
> > > >
> > > > Use Cases:
> > > >
> > > > Before transformation
> > > > public class DemoFunction implements Function{
> > > > RedisClient client;
> > > > @Override
> > > > public String process(String str, Context context) {
> > > > 1.client=init();
> > > > 2.Object object = client.get(key);
> > > > //Data enrichment
> > > > 3.client.close();
> > > > return null;
> > > > }
> > > > }
> > > >
> > > > After the transformation
> > > > public class DemoFunction implements HookFunction{
> > > > RedisClient client;
> > > > @Override
> > > > public void setup(Context context) {
> > > > Map connectInfo = context.getUserConfigMap();
> > > > client=init();
> > > > }
> > > >
> > > > @Override
> > > > public  String process(String str, Context context) {
> > > > Object object = client.get(key);
> > > > //Data enrichment
> > > > return null;
> > > > }
> > > >
> > > > @Override
> > > > public void cleanup() {
> > > > client.close();
> > > > }
> > > > }
> > > >
> > > > It is quite simple and clear to use in function processing code.
> > > > <http://conf.cmss.com/pages/viewpage.action?pageId=140726144>
> > > >
> > > >
> > > >
> > >
> 


Re: Re: Discussion about https://github.com/apache/pulsar/pull/11112

2021-07-06 Thread Neng Lu
I think the reason is for keeping the original `Function` unchanged, so that 
existing implemented functions are not affected.


On 2021/07/06 03:34:49 Sijie Guo wrote:
> Thank you for starting the discussion!
> 
> I have added this proposal to PIP-86:
> https://github.com/apache/pulsar/wiki/PIP-86:-Pulsar-Functions:-Preload-and-release-external-resources
> 
> I have one question: why do you introduce HookFunction? Why not just add
> two default methods to the existing Functions API?
> 
> - Sijie
> 
> On Mon, Jul 5, 2021 at 6:49 PM 陈磊  wrote:
> 
> > Motivation
> >
> > It is very useful in many scenarios to provide safe and convenient
> > capabilities for function's external resource initialization and release.
> > In addition to the normal data processing path, it enables functions to use
> > HookFunction to manage external resources
> >
> > At present, in order to process data, only the logic of resource
> > initialization -> processing -> release and shutdown can be written in the
> > process() of Function. This method is complicated, insecure, and
> > unnecessary.
> >
> > Instead, we should have a new standard way for users to use Function
> > easily and safely. Summarized as follows:
> >
> > Before Function starts, some resources only need to be initialized once,
> > and there is no need to make various judgments in the process() method of
> > the Function interface
> >
> > After closing the Function, in the process of using process(), you need to
> > manually close the referenced external resources, which need to be released
> > separately in the close() of javaInstance
> > API and Implementation Changes
> >
> > The organization of the function implementation hierarchy has been added,
> > it currently looks like the following figure:
> >
> > Use Cases:
> >
> > Before transformation
> > public class DemoFunction implements Function{
> > RedisClient client;
> > @Override
> > public String process(String str, Context context) {
> > 1.client=init();
> > 2.Object object = client.get(key);
> > //Data enrichment
> > 3.client.close();
> > return null;
> > }
> > }
> >
> > After the transformation
> > public class DemoFunction implements HookFunction{
> > RedisClient client;
> > @Override
> > public void setup(Context context) {
> > Map connectInfo = context.getUserConfigMap();
> > client=init();
> > }
> >
> > @Override
> > public  String process(String str, Context context) {
> > Object object = client.get(key);
> > //Data enrichment
> > return null;
> > }
> >
> > @Override
> > public void cleanup() {
> > client.close();
> > }
> > }
> >
> > It is quite simple and clear to use in function processing code.
> > 
> >
> >
> >
> 


Re: Re: [PIP] Expose Pulsar-Client via Function/Connector BaseContext

2021-07-01 Thread Neng Lu
Thank you for reviewing the PIP and the PR

On 2021/07/01 06:04:15 Enrico Olivelli wrote:
> overall the Proposal is good to me
> so +1 from my side
> 
> This is the implementation PR
> https://github.com/apache/pulsar/pull/11056
> 
> I am happy that this work will fix a long standing problem with Functions
> and Pulsar IO
> 
> I would like to cite that
> we still have some problem on using the Pulsar API inside Functions/Pulsar
> IO
> there is an open work from Matteo that is willing to fix and unblock all
> the API features inside that context
> https://github.com/apache/pulsar/pull/10922
> 
> Thank you Neng for sending the PIP
> 
> Enrico
> 
> Il giorno mer 30 giu 2021 alle ore 23:57 Sijie Guo  ha
> scritto:
> 
> > Hi Neng,
> >
> > Thank you for starting the discussion!
> >
> > I have assigned PIP-85 to your PIP.
> >
> > https://github.com/apache/pulsar/wiki/PIP-85:-Expose-Pulsar-Client-via-Function-Connector-BaseContext
> >
> > The proposal looks good to me. +1
> >
> > - Sijie
> >
> > On Wed, Jun 30, 2021 at 12:40 PM Neng Lu  wrote:
> > >
> > > Hi All,
> > >
> > > I've prepared a brief PIP for the pulsarclient changes we have
> > discussed. Please take a look and let me know what you think.
> > >
> > > I would really appreciate it.
> > >
> > > Best Regards,
> > > Neng Lu
> >
> 


Re: Re: [PIP] Expose Pulsar-Client via Function/Connector BaseContext

2021-07-01 Thread Neng Lu
Thank you for the help.

On 2021/06/30 21:57:07 Sijie Guo wrote:
> Hi Neng,
> 
> Thank you for starting the discussion!
> 
> I have assigned PIP-85 to your PIP.
> https://github.com/apache/pulsar/wiki/PIP-85:-Expose-Pulsar-Client-via-Function-Connector-BaseContext
> 
> The proposal looks good to me. +1
> 
> - Sijie
> 
> On Wed, Jun 30, 2021 at 12:40 PM Neng Lu  wrote:
> >
> > Hi All,
> >
> > I've prepared a brief PIP for the pulsarclient changes we have discussed. 
> > Please take a look and let me know what you think.
> >
> > I would really appreciate it.
> >
> > Best Regards,
> > Neng Lu
> 


[PIP] Expose Pulsar-Client via Function/Connector BaseContext

2021-06-30 Thread Neng Lu
Hi All,

I've prepared a brief PIP for the pulsarclient changes we have discussed.
Please take a look and let me know what you think.

I would really appreciate it.

Best Regards,
Neng Lu
# PIP: Expose Pulsar-Client via Function/Connector BaseContext

- Status: Proposal
- Authors: Neng Lu
- Pull Request: https://github.com/apache/pulsar/pull/11056Mailing 
- List discussion:
- Release:

## Motivation

Providing Functions/Connectors the ability to securely and easily access the underlying Pulsar Cluster is very useful in many scenarios. It enables functions/connectors to utilize Pulsar Cluster as an backend store in addition to normally data processing path.

Currently, in order to access the pulsar cluster, users have to create the pulsar client on their own and provide all the auth parameters. This way is complicated, insecure and unnecessary.

Instead, we should have a standard way to allow user access the pulsar client easily and securely.

## API and Implementation Changes

As we have refactored the organization of function's context object hierachy, it currently looks as the right side of  following graph:

[![context hierarchy](https://user-images.githubusercontent.com/16407807/118730483-8ebf5200-b7ec-11eb-9220-d41261f148bb.png)](https://user-images.githubusercontent.com/16407807/118730483-8ebf5200-b7ec-11eb-9220-d41261f148bb.png)

To allow Function, Source Connector and Sink Connector all have the ability to access the pulsar cluster, we need to introduce a new api in the `BaseContext` object:

```java
/**
 * Get the pulsar client.
 *
 * @return the instance of pulsar client
 */
default PulsarClient getPulsarClient() {
throw new UnsupportedOperationException("not implemented");
}
```

And for the implementation, the `ContextImpl` object should return the `PulsarClient` it's using for various purposes:

```java
@Override
public PulsarClient getPulsarClient() {
return client;
}
```

Notice, this return `PulsarClient` connects to the underlying pulsar cluster already and has auth parameters inherited from function/source/sink submitter. So it should be properly scoped and will only be able to perform operations that it's authorized to do.

To utilize the client in function/source/sink code is fairly straightforward.

```java

public String func(String input, Context context) {
  	PulsarClient client = context.getPulsarClient();

  	return ...;
}
```







[pulsar-io] Allow connector to access pulsar cluster via Context

2021-06-24 Thread Neng Lu
Hi All,

I'm working on a PR that allows connectors to get the pulsar client via
context object. In this way, the connectors will be able to utilize the
pulsar cluster to do message pub/sub.

The use case for the change comes from the debezium connector. In order to
access a pulsar topic as backing store, the connector currently creates its
own pulsar client without any auth. This is undesirable and causes some
problems for us. A clean way to do it is to allow it access the pulsar
client inherited from FunctionWorker. This client is authed and secure.

The draft PR is here: https://github.com/apache/pulsar/pull/11056

Based on some comments, it seems the community has previously discussed
similar issues. Just want to hear people's thoughts and see how we can move
this forward.

Best Regards,
Neng Lu