[GitHub] [pulsar] michaeljmarshall added a comment to the discussion: Test / Experiment Batching locally

2022-08-09 Thread GitBox


GitHub user michaeljmarshall added a comment to the discussion: Test / 
Experiment Batching locally

> How do I confirm the messages are batched ?

There are a few details here. First, message batching is an internal pulsar 
optimization that is transparent to the producer and consumer, so it isn't 
technically something that a user should have to verify. That being said, if 
you are not getting the throughput that you would like to see, it's always 
possible that the message batching could be tuned. The primary settings are set 
when building the producer using the `newProducer()` method, and they are 
`batchingMaxPublishDelay(long, TimeUnit)`, `batchingMaxMessages(int)`, and 
`enableBatching(boolean)`. By default, the delay is 1 millisecond, the max 
messages is 1000, and batching is enabled.

Based on the code you shared, batching is enabled but is not being leveraged 
because synchronous sends trigger flushes.

GitHub link: 
https://github.com/apache/pulsar/discussions/17027#discussioncomment-3363493


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] michaeljmarshall added a comment to the discussion: Test / Experiment Batching locally

2022-08-09 Thread GitBox


GitHub user michaeljmarshall added a comment to the discussion: Test / 
Experiment Batching locally

Thank you for this sample code. The first thing to note is that using 
synchronous sends by calling `.send()` will likely not result in batched 
messages because the `send()` call returns only when the broker sends 
acknowledgement that the message has been persisted, when using a persistent 
topic. Technically, if you share the producer across threads and call `send` 
within the message batching window, it is possible to use synchronous sends and 
get some benefit of batching. The main way to benefit from batching is to use 
the `sendAsync` method and then track the result with the returned future.

GitHub link: 
https://github.com/apache/pulsar/discussions/17027#discussioncomment-3363431


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



[GitHub] [pulsar] Shalima added a comment to the discussion: Test / Experiment Batching locally

2022-08-09 Thread GitBox


GitHub user Shalima added a comment to the discussion: Test / Experiment 
Batching locally

Producer. 

```
Producer producer = client.newProducer().topic(topicName).create();
log.info("Created producer: " + topicName);

for (int i = 0; i < numMsgs; i++) {
String msg = "Message  " + i;
byte[] bytePayload = msg.getBytes();
MessageId msgId = 
producer.newMessage().value(bytePayload).send();
log.info("Message Produced. Producer Name: " + 
producer.getProducerName() + " Message Id: " + msgId);
}
producer.close();
```

Consumer

```
Consumer consumer = client.newConsumer()
.topic(topicName)
.subscriptionName(subscriptionName).subscribe();
List msgReceived = new ArrayList();
while(true){
Message msg = consumer.receive();

byte[] plainPayload = decryptor.decrypt(msg.getData());
String msgText = new String(plainPayload);
msgReceived.add(msgText);
log.info("Consumer Name: " + consumer.getConsumerName() 
+ " Message : " + msgText);
System.out.println(String.format("Message Recieved 
consumer_name=%s message=%s",
consumer.getConsumerName(), msgText));
consumer.acknowledge(msg);
}
consumer.close();
client.close();

```


GitHub link: 
https://github.com/apache/pulsar/discussions/17027#discussioncomment-3363364


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: [Vote] PIP 198: Standardize PR Naming Convention using GitHub Actions

2022-08-09 Thread Michael Marshall
+1 (binding) to the overall proposal. I also agree with the
clarifications/requests raised by Tison and Lari. I prefer the
abbreviated prefixes, and I think using a GitHub Action to enforce the
formatting would guarantee that the formatting is correct. For
example, when I review PRs and merge them, I often forget to sanitize
the PR title because I am so focused on the PR's content itself, and
once it's committed, it's in the history forever.

> I hope we could get rid of the brackets too and simply use a similar format 
> as Angular does.

I do like the look of brackets :), but I agree that the current format
is more verbose than necessary, and if we want to target < 50 chars,
every char matters.

Thanks,
Michael

On Tue, Aug 9, 2022 at 7:57 AM Anon Hxy  wrote:
>
> +1 (non-binding)
>
> Thanks,
> Xiaoyu Hou
>
> Yu  于2022年8月4日周四 16:13写道:
>
> > Hi team,
> >
> > It has been 4 months since we discussed the [Guideline] Pulsar PR Naming
> > Convention [1].
> >
> > Nowadays, when reading the PR list [2], you’ll find more and more people
> > follow and get used to this rule.
> >
> > It improves collaboration efficiency, that is great!
> >
> > This makes us think about moving the rule forward, that is, standardizing
> > PR title naming using GitHub Actions, which is a more efficient way.
> >
> > So we'd like to start a vote on PIP 198: Standardize PR Naming Convention
> > using GitHub Actions [3].
> >
> >
> > This proposal contains:
> >
> > - Why do this?
> >
> > - How do this?
> >
> > - Pre-discussions and other thoughts
> >
> > Feel free to comment, thank you!
> >
> > [1] https://lists.apache.org/thread/sk9ops3t94jmzc5tndk08y9khf7pj6so
> >
> > [2] https://github.com/apache/pulsar/pulls
> >
> > [3]
> >
> > https://docs.google.com/document/d/1sJlUNAHnYAbvu9UtEgCrn_oVTnVc1M5nHC19x1bFab4/edit?pli=1#
> >
> >
> > Yu, Max, mangoGoForward
> >


[GitHub] [pulsar] michaeljmarshall added a comment to the discussion: Test / Experiment Batching locally

2022-08-09 Thread GitBox


GitHub user michaeljmarshall added a comment to the discussion: Test / 
Experiment Batching locally

Hi @Shalima, would you mind sharing a sample of your producer and consumer 
code? That will likely be an efficient way to help answer your question. Thanks.

GitHub link: 
https://github.com/apache/pulsar/discussions/17027#discussioncomment-3363134


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: [VOTE] PIP-174: Provide new implementation for broker dispatch cache

2022-08-09 Thread Michael Marshall
+1 - thanks for a great write up Matteo, it looks like it'll be a
great improvement!

(Sorry for casting a late vote, but I figure it's worth affirming the
PIP even if we've already closed the vote.)

Thanks,
Michael

On Sun, Jul 24, 2022 at 9:32 PM Matteo Merli  wrote:
>
> Closing this vote with 3 +1s and no -1s:
>
> +1s:
>  * Matteo
>  * PengHui
>  * Dave
>
> Thanks,
> Matteo
>
> --
> Matteo Merli
> 
>
> On Thu, Jul 21, 2022 at 7:58 PM Dave Fisher  wrote:
> >
> > Sorry I’m late to this discussion.
> >
> > I think that the motivation is correct. There is really quite a bit of 
> > activity around this issue. Let’s take extra efforts to engage extra time 
> > with commits to confirm performance improvements.
> >
> > Let’s particularly pay attention to threading.
> >
> > +1
> >
> > Regards,
> > Dave
> >
> > Sent from my iPhone
> >
> > > On Jul 21, 2022, at 11:37 AM, Matteo Merli  wrote:
> > >
> > > ## Motivation
> > >
> > > The current implementation of the read cache in the Pulsar broker has 
> > > largely
> > > remained unchanged for a long time, except for a few minor tweaks.
> > >
> > > While the implementation is stable and reasonably efficient for
> > > typical workloads,
> > > the overhead required for managing the cache evictions in a broker
> > > that is running
> > > many topics can be pretty high in terms of extra CPU utilization and on 
> > > the JVM
> > > garbage collection to track an increased number of medium-lived objects.
> > >
> > > The goal is to provide an alternative implementation that can adapt 
> > > better to
> > > a wider variety of operating conditions.
> > >
> > > ### Current implementation details
> > >
> > > The broker cache is implemented as part of the `ManagedLedger` component,
> > > which sits in the Pulsar broker and provides a higher level of
> > > abstraction of top
> > > of BookKeeper.
> > >
> > > Each topic (and managed-ledger) has its own private cache space. This
> > > cache is implemented
> > > as a `ConcurrentSkipList` sorted map that maps `(ledgerId, entryId) ->
> > > payload`. The payload
> > > is a `ByteBuf` reference that can either be a slice of a `ByteBuf` that 
> > > we got
> > > when reading from a socket, or it can be a copied buffer.
> > >
> > > Each topic cache is allowed to use the full broker max cache size before 
> > > an
> > > eviction is triggered. The total cache size is effectively a resource
> > > shared across all
> > > the topics, where a topic can use a more prominent portion of it if it
> > > "asks for more".
> > >
> > > When the eviction happens, we need to do an expensive ranking of all
> > > the caches in the broker
> > > and do an eviction in a proportional way to the currently used space
> > > for each of them.
> > >
> > > The bigger problem is represented by the `ConcurrentSkipList` and the
> > > `ByteBuf` objects
> > > that need to be tracked. The skip list is essentially like a "tree"
> > > structure and needs to
> > > maintain Java objects for each entry in the cache. We also need to
> > > potentially have
> > > a huge number of ByteBuf objects.
> > >
> > > A cache workload is typically the worst-case scenario for each garbage
> > > collector implementation because it involves creating objects, storing
> > > them for some amount of
> > > time and then throwing them away. During that time, the GC would have
> > > already tenured these
> > > objects and copy them into an "old generation" space, and sometime
> > > later, a costly compaction
> > > of that memory would have to be performed.
> > >
> > > To mitigate the effect of the cache workload on the GC, we're being
> > > very aggressive in
> > > purging the cache by triggering time-based eviction. By putting a max
> > > TTL on the elements in
> > > the cache, we can avoid keeping the objects around for too long to be
> > > a problem for the GC.
> > >
> > > The reverse side of this is that we're artificially reducing the cache
> > > capacity to a very
> > > short time frame, reducing the cache usefulness.
> > >
> > > The other problem is the CPU cost involved in doing these frequent
> > > evictions, which can
> > > be very high when there are 10s of thousands of topics in a broker.
> > >
> > >
> > > ## Proposed changes
> > >
> > > Instead of dealing with individual caches for each topic, let's adopt
> > > a model where
> > > there is a single cache space for the broker.
> > >
> > > This cache is broken into N segments which act as a circular buffer.
> > > Whenever a segment
> > > is full, we start writing into the next one, and when we reach the
> > > last one, we will
> > > restart recycling the first segment.
> > >
> > > This model has been working very well for the BookKeeper `ReadCache`:
> > > https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/storage/ldb/ReadCache.java
> > >
> > > The eviction becomes a completely trivial operation, buffers are just
> > > rotated and
> > > overwritten. We don't need to do any per-topic 

[GitHub] [pulsar-adapters] casuallc commented on issue #38: kafka adaptor can not handle non-partitioned topic

2022-08-09 Thread GitBox


casuallc commented on issue #38:
URL: https://github.com/apache/pulsar-adapters/issues/38#issuecomment-1210083991

   #37 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [VOTE] PIP-193 : Sink preprocessing Function

2022-08-09 Thread Neng Lu
Hi,

Not sure if this is too late or not, I replied in the discussion thread
about some thinking.
Whether we tweak the sink connector or we allow a flexible and general
function creation.

On Mon, Aug 1, 2022 at 5:03 PM mattison chao 
wrote:

> +1 (non-binding)
>
> Best,
> Mattison
>
> On Tue, 2 Aug 2022 at 01:56, Dave Fisher  wrote:
> >
> > +1 (binding)
> >
> > On 2022/07/28 10:39:35 Christophe Bornet wrote:
> > > Hi, Pulsar community,
> > >
> > > I'd like to start a vote on PIP-193 : Sink preprocessing Function
> > >
> > > You can find the proposal at
> https://github.com/apache/pulsar/issues/16739 and
> > > the discussion thread at
> > > https://lists.apache.org/thread/qn59jwn47w9ngxpkvq3kswbl1y882jth.
> > >
> > > The vote will stay open for at least 48 hours.
> > >
> > > Best regards.
> > >
> > > Christophe Bornet
> > >
>


-- 
Best Regards,
Neng


[GitHub] [pulsar] Shalima created a discussion: Test / Experiment Batching locally

2022-08-09 Thread GitBox


GitHub user Shalima created a discussion: Test / Experiment Batching locally

Hi,

I am trying to experiment with batching parameters in Producer configuration 
locally. Since batching is enabled by default, I tried to set maxMessages to 3. 
I have a consumer listening to the topic. I am using send in for loop from 
Producer. I tried to send 5 messages and all 5 messages were consumed same 
time. How do I confirm the messages are batched ? This might be more of an 
amateur question and I am new to Pulsar. Any help will be appreciated. Thank 
you.

GitHub link: https://github.com/apache/pulsar/discussions/17027


This is an automatically sent email for dev@pulsar.apache.org.
To unsubscribe, please send an email to: dev-unsubscr...@pulsar.apache.org



Re: [DISCUSS] PIP 193 : Sink preprocessing Function

2022-08-09 Thread Neng Lu
To my understanding, the Pulsar IO Connectors (i.e. Sources/Sinks) are
quite self-contained. They move data around.

If we want to enable functionality described inside the PIP (process ->
write to otherplace), can we think in another way -- allow flexible
configuring of a Pulsar Function?

Originally Pulsar Function pipeline is:
PulsarSource -> func() -> PulsarSink()

Can we look into allowing users to change a source/sink in the
PulsarFunction pipeline instead of tweaking the Sink?

Syntax could be:
```
pulsar-admins functions create --sink ... --source ...
```

This will be more flexible and opens a lot possibility for further
development



On Tue, Jul 26, 2022 at 2:56 AM Christophe Bornet 
wrote:

> Thanks for the feedback Jerry.
> We don't modify the way sources, sinks and functions are detected when it's
> based on their fields. The proposal is just to modify the classname of the
> function applied in the instance so the same detection rules apply. The
> only difference is when detecting if the sink or function is built-in. For
> this we add some code to do this detection also based on the ComponentType
> (either detected or explicit). You can check the implementation PR about
> it: https://github.com/apache/pulsar/pull/16740
>
> IMO, making it separate implementation of what currently exist would make
> things more complex and this more error prone for no good reason. The
> proposal is "just" to replace the name of the already existing function
> (IdentityFunction) by another one and to provide the location of the
> function JAR.
>
> Best regards
> Christophe
>
> Le lun. 25 juil. 2022 à 23:31, Jerry Peng  a
> écrit :
>
> > My feedback is to make this change as self contained as possible.  Can we
> > just have a special implementation of a sink that will run the logic of
> the
> > "preprocess" function?  There are many places in the code where we figure
> > out if it is a source, sink or a function based on the fields in the
> > Function metadata.  Changing that may have unintended consequences.
> >
> > On Mon, Jul 25, 2022 at 5:55 AM Baodi Shi 
> > wrote:
> >
> > > > Can you explain more what you mean ?
> > > This PIP doesn't change the API of a Function and it's already possible
> > to
> > > write a Function>.
> > > And when declaring a Sink with a Function we'll check that it's the
> case.
> > >
> > > I mean: we should constrain the function interface, otherwise, the user
> > > may return a structure that is not a record.
> > >
> > > Thanks,
> > > Baodi Shi
> > >
> > > > On Jul 25, 2022, at 01:0233, Christophe Bornet <
> bornet.ch...@gmail.com
> > >
> > > wrote:
> > > >
> > > > Thanks for the feedback Asaf
> > > >
> > > >
> > > >>>   - preprocess-function: the preprocess function applied before the
> > > >>>   Sink. Starts by builtin:// for built-in functions, function://
> for
> > > >>>   package function, http:// or file://
> > > >>>
> > > >>> 1. While this function is applied only before sink? I thought it
> > > replaces
> > > >> the identity function, so why a source can't have a function that
> > reads
> > > >> from the source (say S3), runs the function and only then writes to
> a
> > > >> pulsar topic?
> > > >>
> > > >
> > > > Yes that's totally possible to implement and will be done in future
> > work
> > > > like written in the PIP.
> > > >
> > > >
> > > >> 2. Can you clarify more about built in and function for package
> > > function?
> > > >> Is this an existing functionality ?
> > > >>
> > > > Yes those are existing functionalities.
> > > > Built-in functions are not documented (and we should do something
> about
> > > > that).
> > > > Package management of functions is described in
> > > >
> > >
> >
> https://pulsar.apache.org/docs/functions-deploy#use-package-management-service
> > > >
> > > >
> > > >> 3. Regarding http - Are you loading a class through that URL? Aren't
> > we
> > > >> exposed to same problem Log4Shell security issue had? If so, what
> > > measures
> > > >> are you taking to protect ?
> > > >>
> > > > Yes we are loading code via URL. This feature already exists for
> > > > Sources/Sinks/Functions.
> > > > I guess you need to have a huge trust of the source from where you
> > > download.
> > > > This PIP has the same security level as what already exists for this
> > > > functionality.
> > > >
> > > >
> > > >>
> > > >> The field extraFunctionPackageLocation to the protobuf structure
> > > >>> FunctionMetaData will be added. This field will be filled with the
> > > >>> location of the extra function to apply when registering a sink and
> > > used
> > > >> in
> > > >>> the Runtime to load the function code.
> > > >>
> > > >> Can you please expand on that? You mean the JAR location, which you
> > will
> > > >> search that class name and function specified in the 3 fields you've
> > > added
> > > >> to the config?
> > > >>
> > > > Not exactly. It's the location of where the JAR is stored. It can be
> > > > BookKeeper, package management, built-in NAR, etc...
> > > > In 

Brokers in a cluster become unhealthy collectively

2022-08-09 Thread FootfallCam Raven
Details in https://github.com/apache/pulsar/issues/17024 . Appreciate if any 
Pulsar experts could spare some pointers.

Best regards,
Raven

- -
Raven TanMEng (Hons)
Software Development
Research Engineer

FootfallCam
51-2, Jalan SL 1/4, Bandar Sg Long,
43000 Kajang, Malaysia
Website: www.footfallcam.com



Re: [DISCUSS] Does stale bot make value for you?

2022-08-09 Thread tison
A good example is Apache SkyWalking who has less than 100 issues. If we
reach similar status, we don't have to worry about stale bot at all and
even simply remove it - our committers should be able to handle such
traffic.

Best,
tison.


tison  于2022年8月9日周二 23:18写道:

> After a discussion with @codelipenghui we agree that spending time on
> handling issues is better than writing more automation or rules.
>
> One thing to help us handle the backlog is close stale issues directly as
> not planned[1]. As committers can be confident close stale issues, we can
> significantly reduce the backlog.
>
> I will send a dedicated mail with title "[COMMITTER ATTENTION] You can
> close stale issues as not planned" tomorrow.
>
> Best,
> tison.
>
> [1]
> https://github.blog/changelog/2022-03-10-the-new-github-issues-march-10th-update/#%F0%9F%95%B5%F0%9F%8F%BD%E2%99%80%EF%B8%8F-issue-closed-reasons
>
>
> Zixuan Liu  于2022年8月2日周二 10:30写道:
>
>> I agreed with @tison, LGTM
>>
>> Thanks,
>> Zixuan
>>
>>
>> tison  于2022年8月1日周一 23:07写道:
>>
>> > Sum up the ideas above:
>> >
>> > * Keep it as is @penghui
>> > * Extend the interval @dave
>> > * Change the message @eolivelli
>> > * Remove the message @zixuan
>> >
>> > I may try to remove the stale bot at the beginning, but when I consider
>> it
>> > more constructively, the goal here is that we'd like to nudge the
>> ping-pong
>> > circle between the author and the reviewer.
>> >
>> > So I'd like to prototype a feature based on pulsarbot, simulate ASF
>> INFRA's
>> > waiting-for-user, waiting-for-infra cycle to react to:
>> >
>> > /pulsarbot waiting-for-author
>> > /pulsarbot waiting-for-reviewer
>> >
>> > ... which labels the issue. The original stale logic can be integrated
>> into
>> > waiting-for-reviewer, which is better than "you are stale/inactive".
>> Then a
>> > reviewer can label it as waiting-for-author so that we learn the state
>> and
>> > the stale bot will skip it.
>> >
>> > What do you think?
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > Zixuan Liu  于2022年8月1日周一 11:42写道:
>> >
>> > > Hi tison,
>> > >
>> > > Good catch! I also noticed some issues with a stable label and a no
>> > > activity commit message. This is going to add a lot of useless
>> > information
>> > > to the issue.
>> > >
>> > > I don't recommend leaving a commit message.
>> > >
>> > > > For example, even if we close (and lock?) the issue or pull request
>> > after
>> > > a
>> > > certain interval, the stale bot helps on transforming issues state
>> with a
>> > > clear rule (although a human action could be more friendly).
>> > >
>> > > We can use the bot to mark the issue or PR but don't leave a commit
>> > > message, and then it's up to the committer/PMC to decide whether to
>> close
>> > > it or continue handling it.
>> > >
>> > > Thanks,
>> > > Zixuan
>> > >
>> > > Enrico Olivelli  于2022年7月31日周日 01:39写道:
>> > >
>> > > > Il Sab 30 Lug 2022, 17:53 tison  ha scritto:
>> > > >
>> > > > > For example, even if we close (and lock?) the issue or pull
>> request
>> > > > after a
>> > > > > certain interval, the stale bot helps on transforming issues state
>> > > with a
>> > > > > clear rule (although a human action could be more friendly).
>> > > > >
>> > > > > Instead, we leave a comment and add a label which information can
>> be
>> > > > > filtered as the search query mentioned above. I'm curious if our
>> > > members
>> > > > > treat an issue with/without the stale label differently. If not, I
>> > > don't
>> > > > > see the value we gain from running workflows and potentially
>> spamming
>> > > > > comments.
>> > > > >
>> > > >
>> > > > Totally agreed.
>> > > > It is very hard to follow the overwhelming flow of github pull
>> > requests.
>> > > > On one hand this is great because the project is very active.
>> > > > On the other hand it is very hard to take time to pay attention to
>> > > > everyone.
>> > > >
>> > > > The stale bot is useful only because it bumps up the pr by sending a
>> > > > notification and possibly you find it.
>> > > >
>> > > >
>> > > > I think that we should encourage people to talk about their PRs on
>> the
>> > > dev@
>> > > > list.
>> > > > We should add some message on the PR template to advise folks to
>> > > advertise
>> > > > their patches here.
>> > > >
>> > > > In the stale bot the comment should suggest to the author of the PR
>> to
>> > > ask
>> > > > for review here on dev@.
>> > > > It will be less frustrating.
>> > > > Like:
>> > > > We are sorry if your patch has not make it yet. Please advertise
>> about
>> > > your
>> > > > patch on dev@pulsar.apache.org
>> > > >
>> > > >
>> > > > Enrico
>> > > >
>> > > >
>> > > >
>> > > > > Best,
>> > > > > tison.
>> > > > >
>> > > > >
>> > > > > tison  于2022年7月30日周六 23:00写道:
>> > > > >
>> > > > > > Hi Dave,
>> > > > > >
>> > > > > > > The other aspect is it would be helpful if many Pulsar
>> committers
>> > > > would
>> > > > > > spend effort every few weeks reviewing issues and PRs to engage
>> the
>> > > > > > community.
>> > > > 

Re: [DISCUSS] Does stale bot make value for you?

2022-08-09 Thread tison
After a discussion with @codelipenghui we agree that spending time on
handling issues is better than writing more automation or rules.

One thing to help us handle the backlog is close stale issues directly as
not planned[1]. As committers can be confident close stale issues, we can
significantly reduce the backlog.

I will send a dedicated mail with title "[COMMITTER ATTENTION] You can
close stale issues as not planned" tomorrow.

Best,
tison.

[1]
https://github.blog/changelog/2022-03-10-the-new-github-issues-march-10th-update/#%F0%9F%95%B5%F0%9F%8F%BD%E2%99%80%EF%B8%8F-issue-closed-reasons


Zixuan Liu  于2022年8月2日周二 10:30写道:

> I agreed with @tison, LGTM
>
> Thanks,
> Zixuan
>
>
> tison  于2022年8月1日周一 23:07写道:
>
> > Sum up the ideas above:
> >
> > * Keep it as is @penghui
> > * Extend the interval @dave
> > * Change the message @eolivelli
> > * Remove the message @zixuan
> >
> > I may try to remove the stale bot at the beginning, but when I consider
> it
> > more constructively, the goal here is that we'd like to nudge the
> ping-pong
> > circle between the author and the reviewer.
> >
> > So I'd like to prototype a feature based on pulsarbot, simulate ASF
> INFRA's
> > waiting-for-user, waiting-for-infra cycle to react to:
> >
> > /pulsarbot waiting-for-author
> > /pulsarbot waiting-for-reviewer
> >
> > ... which labels the issue. The original stale logic can be integrated
> into
> > waiting-for-reviewer, which is better than "you are stale/inactive".
> Then a
> > reviewer can label it as waiting-for-author so that we learn the state
> and
> > the stale bot will skip it.
> >
> > What do you think?
> >
> > Best,
> > tison.
> >
> >
> > Zixuan Liu  于2022年8月1日周一 11:42写道:
> >
> > > Hi tison,
> > >
> > > Good catch! I also noticed some issues with a stable label and a no
> > > activity commit message. This is going to add a lot of useless
> > information
> > > to the issue.
> > >
> > > I don't recommend leaving a commit message.
> > >
> > > > For example, even if we close (and lock?) the issue or pull request
> > after
> > > a
> > > certain interval, the stale bot helps on transforming issues state
> with a
> > > clear rule (although a human action could be more friendly).
> > >
> > > We can use the bot to mark the issue or PR but don't leave a commit
> > > message, and then it's up to the committer/PMC to decide whether to
> close
> > > it or continue handling it.
> > >
> > > Thanks,
> > > Zixuan
> > >
> > > Enrico Olivelli  于2022年7月31日周日 01:39写道:
> > >
> > > > Il Sab 30 Lug 2022, 17:53 tison  ha scritto:
> > > >
> > > > > For example, even if we close (and lock?) the issue or pull request
> > > > after a
> > > > > certain interval, the stale bot helps on transforming issues state
> > > with a
> > > > > clear rule (although a human action could be more friendly).
> > > > >
> > > > > Instead, we leave a comment and add a label which information can
> be
> > > > > filtered as the search query mentioned above. I'm curious if our
> > > members
> > > > > treat an issue with/without the stale label differently. If not, I
> > > don't
> > > > > see the value we gain from running workflows and potentially
> spamming
> > > > > comments.
> > > > >
> > > >
> > > > Totally agreed.
> > > > It is very hard to follow the overwhelming flow of github pull
> > requests.
> > > > On one hand this is great because the project is very active.
> > > > On the other hand it is very hard to take time to pay attention to
> > > > everyone.
> > > >
> > > > The stale bot is useful only because it bumps up the pr by sending a
> > > > notification and possibly you find it.
> > > >
> > > >
> > > > I think that we should encourage people to talk about their PRs on
> the
> > > dev@
> > > > list.
> > > > We should add some message on the PR template to advise folks to
> > > advertise
> > > > their patches here.
> > > >
> > > > In the stale bot the comment should suggest to the author of the PR
> to
> > > ask
> > > > for review here on dev@.
> > > > It will be less frustrating.
> > > > Like:
> > > > We are sorry if your patch has not make it yet. Please advertise
> about
> > > your
> > > > patch on dev@pulsar.apache.org
> > > >
> > > >
> > > > Enrico
> > > >
> > > >
> > > >
> > > > > Best,
> > > > > tison.
> > > > >
> > > > >
> > > > > tison  于2022年7月30日周六 23:00写道:
> > > > >
> > > > > > Hi Dave,
> > > > > >
> > > > > > > The other aspect is it would be helpful if many Pulsar
> committers
> > > > would
> > > > > > spend effort every few weeks reviewing issues and PRs to engage
> the
> > > > > > community.
> > > > > >
> > > > > > Agree. I'll try to help with reviewing issues and PRs as I
> handled
> > > > > > backlogs for the Apache Curator project.
> > > > > >
> > > > > > The topic here is whether "the stale bot" helps or it creates
> > > > > frustration,
> > > > > > spamming comments, and consumes resources unnecessarily. We
> should
> > > > always
> > > > > > handle backlogs in some way, but may not with a stale bot.
> > > > > >
> > > > > > 

Re: [Vote] PIP 198: Standardize PR Naming Convention using GitHub Actions

2022-08-09 Thread Anon Hxy
+1 (non-binding)

Thanks,
Xiaoyu Hou

Yu  于2022年8月4日周四 16:13写道:

> Hi team,
>
> It has been 4 months since we discussed the [Guideline] Pulsar PR Naming
> Convention [1].
>
> Nowadays, when reading the PR list [2], you’ll find more and more people
> follow and get used to this rule.
>
> It improves collaboration efficiency, that is great!
>
> This makes us think about moving the rule forward, that is, standardizing
> PR title naming using GitHub Actions, which is a more efficient way.
>
> So we'd like to start a vote on PIP 198: Standardize PR Naming Convention
> using GitHub Actions [3].
>
>
> This proposal contains:
>
> - Why do this?
>
> - How do this?
>
> - Pre-discussions and other thoughts
>
> Feel free to comment, thank you!
>
> [1] https://lists.apache.org/thread/sk9ops3t94jmzc5tndk08y9khf7pj6so
>
> [2] https://github.com/apache/pulsar/pulls
>
> [3]
>
> https://docs.google.com/document/d/1sJlUNAHnYAbvu9UtEgCrn_oVTnVc1M5nHC19x1bFab4/edit?pli=1#
>
>
> Yu, Max, mangoGoForward
>


Re: [Vote] PIP-192 New Pulsar Broker Load Balancer

2022-08-09 Thread guo jiwei
+1

Regards
Jiwei Guo (Tboy)


On Tue, Aug 9, 2022 at 3:12 PM Qiang Huang 
wrote:

> +1 (non-binding)
>
> Kai Wang  于2022年8月5日周五 10:18写道:
>
> > +1 (non-binding)
> >
> > Thanks,
> > Kai
> >
> > Heesung Sohn  于2022年8月2日周二
> 08:50写道:
> >
> > > Dear Pulsar Community,
> > >
> > > Please review and vote on this PIP.
> > >
> > > PIP link: https://github.com/apache/pulsar/issues/16691
> > >
> > > Thank you,
> > > -Heesung
> > >
> >
>
>
> --
> BR,
> Qiang Huang
>


[GitHub] [pulsar-adapters] casuallc opened a new issue, #38: kafka adaptor can not handle non-partitioned topic

2022-08-09 Thread GitBox


casuallc opened a new issue, #38:
URL: https://github.com/apache/pulsar-adapters/issues/38

   **Reproduce**
   - create non-paritioned topic
   - send message to this topic
   
   **error**
   
![image](https://user-images.githubusercontent.com/9473606/183640375-517bcea0-fa50-48af-8881-9696fab6d558.png)
   
   **probable reason**
   PulsarKafkaConsumer -> poll
   ```
   public ConsumerRecords poll(long timeoutMillis) {
   try {
   QueueItem item = receivedMessages.poll(timeoutMillis, 
TimeUnit.MILLISECONDS);
   if (item == null) {
   return (ConsumerRecords) ConsumerRecords.EMPTY;
   }
   
   Map>> records = new 
HashMap<>();
   
   int numberOfRecords = 0;
   
   while (item != null) {
   TopicName topicName = 
TopicName.get(item.consumer.getTopic());
   String topic = topicName.getPartitionedTopicName();
   int partition = topicName.isPartitioned() ? 
topicName.getPartitionIndex() : 0;
   Message msg = item.message;
   MessageId msgId = msg.getMessageId();
   if (msgId instanceof TopicMessageIdImpl) {
   msgId = ((TopicMessageIdImpl) msgId).getInnerMessageId();
   }
   long offset = MessageIdUtils.getOffset(msgId);
   
   TopicPartition tp = new TopicPartition(topic, partition);
   if (lastReceivedOffset.get(tp) == null && 
!unpolledPartitions.contains(tp)) {
log.info("When polling offsets, invalid offsets were 
detected. Resetting topic partition {}", tp);
resetOffsets(tp);
   }
   
  // .. other code
   
   // If no interceptor is provided, interceptors list will an 
empty list, original ConsumerRecords will be return.
   return applyConsumerInterceptorsOnConsume(interceptors, new 
ConsumerRecords<>(records));
   } catch (InterruptedException e) {
   throw new RuntimeException(e);
   }
   }
   ```
   **int partition = topicName.isPartitioned() ? topicName.getPartitionIndex() 
: 0;**
   This code can not discriminate partitioned-topic or non-paritioned-topic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [Vote] PIP 198: Standardize PR Naming Convention using GitHub Actions

2022-08-09 Thread Lari Hotari
+1, with some conditions about the details of PIP 198 that are listed below:

Would it be possible to improve the proposal in a way that the valid prefixes 
for type and component are in a file in the repository and the possible checker 
would use this file as the source of truth? Tison already pointed out in a 
Slack discussion that such a GHA exists which uses a yaml file.

I also hope that the prefixes are as short as possible since there's a general 
recommendation to keep a commit title under 50 characters as I have explained 
in https://lists.apache.org/thread/67fqbo25oq75wrpsq5s4xw9rr55mlbms . I know 
it's not a hard limit, but it does harm readability of the commit log in many 
tools if prefixes use up a majority of the title length. 

So as long as the prefixes are short and easy, I'm fine with this proposal. 

I would have hoped that the proposal would have been more like the Angular 
commit message format, 
https://github.com/angular/angular/blob/main/CONTRIBUTING.md#-commit-message-format
 . I like the short prefixes and how the "component" is "scope" and maps to a 
npm module. 

In our case, the scope (we are calling this "component") could map directly to 
the Maven artifactId by droping the "pulsar-" prefix. That would prevent making 
up new names for the components that are different from existing names. 
for example: artifactId: pulsar-broker, use "broker"
artifactId: pulsar-io-kafke, use "io-kafka"
There would be some exceptions in the apache/pulsar repository for the cpp 
client. That could be client-cpp (from directory name "pulsar-client-cpp" by 
dropping the "pulsar-" prefix). 

Another concern that I had was about duplicating the information with labels. 
Tison explained to my that the automation could add the labels based on the 
title and the user wouldn't have to add duplicate information if such a 
solution exists.

Summary:
I hope that PIP 198 could be revisited with the proposed way to map the 
component name directly from the Maven artifactId. Another request is to 
shorten the type (use "feat" instead of "feature", etc.) to save characters. 
I hope we could get rid of the brackets too and simply use a similar format as 
Angular does.

-Lari

On 2022/08/04 08:12:21 Yu wrote:
> Hi team,
> 
> It has been 4 months since we discussed the [Guideline] Pulsar PR Naming
> Convention [1].
> 
> Nowadays, when reading the PR list [2], you’ll find more and more people
> follow and get used to this rule.
> 
> It improves collaboration efficiency, that is great!
> 
> This makes us think about moving the rule forward, that is, standardizing
> PR title naming using GitHub Actions, which is a more efficient way.
> 
> So we'd like to start a vote on PIP 198: Standardize PR Naming Convention
> using GitHub Actions [3].
> 
> 
> This proposal contains:
> 
> - Why do this?
> 
> - How do this?
> 
> - Pre-discussions and other thoughts
> 
> Feel free to comment, thank you!
> 
> [1] https://lists.apache.org/thread/sk9ops3t94jmzc5tndk08y9khf7pj6so
> 
> [2] https://github.com/apache/pulsar/pulls
> 
> [3]
> https://docs.google.com/document/d/1sJlUNAHnYAbvu9UtEgCrn_oVTnVc1M5nHC19x1bFab4/edit?pli=1#
> 
> 
> Yu, Max, mangoGoForward
> 


Re: [Vote] PIP 198: Standardize PR Naming Convention using GitHub Actions

2022-08-09 Thread tison
Hi Yu,

To be clear, the candidates of types and components are a bit long which
may waste space for meaningful information.

For example, Angular names feature as feat to save letters. GitHub only
shows the first 50 characters for PR title.

I'd like to confirm that the name of types and components are not voted in
this thread and postponed when we review the patch.

Currently, I'd suggest:

feature -> feat
workflow -> ci
improve/cleanup -> chore
dependency -> dep
function -> fn
security -> sec
website -> site
...

Best,
tison.


Qiang Huang  于2022年8月9日周二 15:07写道:

> +1 (non-binding) I like the idea of check PR title as a job.
> Good job, Yu.
>
> tison  于2022年8月8日周一 22:07写道:
>
> > +1 (non-binding) to the proposal itself.
> >
> > Although, we should later move the standard to our website where the
> whole
> > project can easily contribute to and follow the general contribution
> > process - that is, send a pull request, review, and merge. I regard
> current
> > gdoc content as a temporary container for this content.
> >
> > If this proposal gets accepted, @Yu you can create an issue for the dev
> doc
> > part and ping me. I can offer my help to write so.
> >
> > Best,
> > tison.
> >
> >
> > Zike Yang  于2022年8月8日周一 13:35写道:
> >
> > > +1 (non-binding)
> > >
> > > Thanks
> > > Zike Yang
> > >
> > > On Mon, Aug 8, 2022 at 1:06 PM Xiangying Meng 
> > > wrote:
> > > >
> > > > +1(non-binding)
> > > >
> > > > yours sincerely,
> > > > xiangying Meng
> > > >
> > > > On Thu, Aug 4, 2022 at 4:13 PM Yu  wrote:
> > > >
> > > > > Hi team,
> > > > >
> > > > > It has been 4 months since we discussed the [Guideline] Pulsar PR
> > > Naming
> > > > > Convention [1].
> > > > >
> > > > > Nowadays, when reading the PR list [2], you’ll find more and more
> > > people
> > > > > follow and get used to this rule.
> > > > >
> > > > > It improves collaboration efficiency, that is great!
> > > > >
> > > > > This makes us think about moving the rule forward, that is,
> > > standardizing
> > > > > PR title naming using GitHub Actions, which is a more efficient
> way.
> > > > >
> > > > > So we'd like to start a vote on PIP 198: Standardize PR Naming
> > > Convention
> > > > > using GitHub Actions [3].
> > > > >
> > > > >
> > > > > This proposal contains:
> > > > >
> > > > > - Why do this?
> > > > >
> > > > > - How do this?
> > > > >
> > > > > - Pre-discussions and other thoughts
> > > > >
> > > > > Feel free to comment, thank you!
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/sk9ops3t94jmzc5tndk08y9khf7pj6so
> > > > >
> > > > > [2] https://github.com/apache/pulsar/pulls
> > > > >
> > > > > [3]
> > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1sJlUNAHnYAbvu9UtEgCrn_oVTnVc1M5nHC19x1bFab4/edit?pli=1#
> > > > >
> > > > >
> > > > > Yu, Max, mangoGoForward
> > > > >
> > >
> >
>
>
> --
> BR,
> Qiang Huang
>


Re: [Vote] PIP-192 New Pulsar Broker Load Balancer

2022-08-09 Thread Zixuan Liu
+1 (non-binding)

Thanks,
Zixuan

Qiang Huang  于2022年8月9日周二 15:12写道:

> +1 (non-binding)
>
> Kai Wang  于2022年8月5日周五 10:18写道:
>
> > +1 (non-binding)
> >
> > Thanks,
> > Kai
> >
> > Heesung Sohn  于2022年8月2日周二
> 08:50写道:
> >
> > > Dear Pulsar Community,
> > >
> > > Please review and vote on this PIP.
> > >
> > > PIP link: https://github.com/apache/pulsar/issues/16691
> > >
> > > Thank you,
> > > -Heesung
> > >
> >
>
>
> --
> BR,
> Qiang Huang
>


RE: [DISCUSS] ARM Support for Pulsar 2.11 Docker Image

2022-08-09 Thread Alexander Preuss
Hi Michael,

Thank you for bringing up this topic.
I was just running into an issue that prevented me from using the standard
Pulsar image in Testcontainers and found this discussion.

In my opinion, refactoring the docker builds to allow us to use the ASF
infra is a great idea.
I'm also looping in Kay, as she might be able to provide more insights.

Best,
Alex

On 2022/07/09 07:18:31 Michael Marshall wrote:
> Hi Pulsar Community,
>
> I would like to see the 2.11 docker image ship with support to run on
> ARM architecture. The issue asking for this feature [0] has had a lot
> of traction.
>
> The Bookkeeper 4.15 upgrade was the last blocker, and since we
> upgraded to BK 4.15 in May, we should be able to upgrade the docker
> build to make it a multi-arch build.
>
> kezhenxu94 opened a PR [1] to upgrade our build process to include a
> multi-arch docker image build, but he is unable to finish the PR and
> has asked for someone else to pick up the work.
>
> Before we continue the work, does anyone have strong opinions on how
> we should update our docker image build? Dave indicated on a separate
> thread that we should revisit where the docker images are hosted, and
> Enrico indicated on the PR [2] that we might want to consider
> automating our docker image build so that the ASF Infra Docker hub bot
> builds our images. Once we have consensus on these topics, it should
> be straightforward to update the docker build process for the
> multi-arch build.
>
> In my opinion, we need to support a manual build option to be used by
> the integration tests (and probably by some users building modified
> versions of Pulsar). I also think it could be very convenient to have
> our image built by the ASF bot and hosted in the apache docker hub
> repo.
>
> Let me know what you think.
>
> Thanks,
> Michael
>
> [0] https://github.com/apache/pulsar/issues/12944
> [1] https://github.com/apache/pulsar/pull/14005
> [2]
https://github.com/apache/pulsar/pull/14005#pullrequestreview-913331330
>


Re: [Vote] PIP-192 New Pulsar Broker Load Balancer

2022-08-09 Thread Qiang Huang
+1 (non-binding)

Kai Wang  于2022年8月5日周五 10:18写道:

> +1 (non-binding)
>
> Thanks,
> Kai
>
> Heesung Sohn  于2022年8月2日周二 08:50写道:
>
> > Dear Pulsar Community,
> >
> > Please review and vote on this PIP.
> >
> > PIP link: https://github.com/apache/pulsar/issues/16691
> >
> > Thank you,
> > -Heesung
> >
>


-- 
BR,
Qiang Huang


[GitHub] [pulsar-site] urfreespace merged pull request #159: feat: add lookup rest api page

2022-08-09 Thread GitBox


urfreespace merged PR #159:
URL: https://github.com/apache/pulsar-site/pull/159


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [Vote] PIP 198: Standardize PR Naming Convention using GitHub Actions

2022-08-09 Thread Qiang Huang
+1 (non-binding) I like the idea of check PR title as a job.
Good job, Yu.

tison  于2022年8月8日周一 22:07写道:

> +1 (non-binding) to the proposal itself.
>
> Although, we should later move the standard to our website where the whole
> project can easily contribute to and follow the general contribution
> process - that is, send a pull request, review, and merge. I regard current
> gdoc content as a temporary container for this content.
>
> If this proposal gets accepted, @Yu you can create an issue for the dev doc
> part and ping me. I can offer my help to write so.
>
> Best,
> tison.
>
>
> Zike Yang  于2022年8月8日周一 13:35写道:
>
> > +1 (non-binding)
> >
> > Thanks
> > Zike Yang
> >
> > On Mon, Aug 8, 2022 at 1:06 PM Xiangying Meng 
> > wrote:
> > >
> > > +1(non-binding)
> > >
> > > yours sincerely,
> > > xiangying Meng
> > >
> > > On Thu, Aug 4, 2022 at 4:13 PM Yu  wrote:
> > >
> > > > Hi team,
> > > >
> > > > It has been 4 months since we discussed the [Guideline] Pulsar PR
> > Naming
> > > > Convention [1].
> > > >
> > > > Nowadays, when reading the PR list [2], you’ll find more and more
> > people
> > > > follow and get used to this rule.
> > > >
> > > > It improves collaboration efficiency, that is great!
> > > >
> > > > This makes us think about moving the rule forward, that is,
> > standardizing
> > > > PR title naming using GitHub Actions, which is a more efficient way.
> > > >
> > > > So we'd like to start a vote on PIP 198: Standardize PR Naming
> > Convention
> > > > using GitHub Actions [3].
> > > >
> > > >
> > > > This proposal contains:
> > > >
> > > > - Why do this?
> > > >
> > > > - How do this?
> > > >
> > > > - Pre-discussions and other thoughts
> > > >
> > > > Feel free to comment, thank you!
> > > >
> > > > [1] https://lists.apache.org/thread/sk9ops3t94jmzc5tndk08y9khf7pj6so
> > > >
> > > > [2] https://github.com/apache/pulsar/pulls
> > > >
> > > > [3]
> > > >
> > > >
> >
> https://docs.google.com/document/d/1sJlUNAHnYAbvu9UtEgCrn_oVTnVc1M5nHC19x1bFab4/edit?pli=1#
> > > >
> > > >
> > > > Yu, Max, mangoGoForward
> > > >
> >
>


-- 
BR,
Qiang Huang