[jira] [Created] (KAFKA-13286) Revisit Streams State Store and Serde Implementation

2021-09-09 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-13286:
-

 Summary: Revisit Streams State Store and Serde Implementation
 Key: KAFKA-13286
 URL: https://issues.apache.org/jira/browse/KAFKA-13286
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Guozhang Wang


Kafka Streams state store is built in hierarchical layers as metered -> cached 
-> logged -> [convert] -> raw stores (rocksDB, in-memory), and it leveraged on 
the builtin Serde libraries for serialize / deserialize. There are several 
inefficiencies in the current design:

* The API only supports serde using byte arrays. This means we generate a lot 
of garbage and spend unnecessary time copying bytes, especially when working 
with windowed state stores that rely on composite keys. In many places in the 
code we have extract parts of the composite key to deserialize the either the 
timestamp or the message key from the state store key (e.g. the methods in 
WindowStoreUtils).
* The serde operation could happen on multiple layers of the state store 
hierarchies, which means we need to extra byte array copies as we move along 
doing serdes. For example, we do serde in the metered layer, but then again in 
cached layer with cache functions, and also in logged stores for generated the 
key/value in bytes to send to Kafka.

To improve on this, we can consider having support for serde into/from 
ByteBuffers would allow us to reuse the underlying bytearrays and just pass 
around slices of the underlying Buffers to avoid the unnecessary copying. 

1) More specifically, e.g. the serialize interface could be refactored to:

{code}
ByteBuffer serialize(String topic, T data, ByteBuffer);
{code}

Where the serialized bytes would be appended to the ByteBuffer. When a series 
of serialize functions are called along side the state store hierarchies, we 
then just need to make sure that what's should be appended first to the 
ByteBuffer would be serialized first. E.g. if the serialized bytes format of a 
WindowSchema is 

Then we would need to call the serialize as in:

{code}
serialize(key, serialize(leftRightBoolean, serialize(timestamp, buffer))); 
{code}

2) In addition, we can consider having a pool of ByteBuffers representing a set 
of byte arrays that can be re-used. This can be captured as an intelligent 
{{ByteBufferSupplier}}, which provides:

{code}
ByteBuffer ByteBufferSupplier#allocate(long size)
{code}

Its implementation can choose to either create new byte arrays, or re-use 
existing ones in the pool; the gottcha though is that we may usually not know 
the serialized byte length for raw keys (think: in practice the keys would be 
in json/avro etc), and hence would not know how to pass in {{size}} for 
serialization, and hence may need to be conservative, or trial and error etc.

Of course callers then would be responsible for returning the used ByteBuffer 
back to the Supplier via

{code}
ByteBufferSupplier#deallocate(ByteBuffer buffer)
{code}

3) With RocksDB's direct byte-buffer (KAFKA-9168) we can optionally also 
allocate them from RocksDB directly so that using them for puts/gets would not 
go through JNI, hence is more efficient. The Supplier then would need to be 
careful to deallocate these direct byte-buffers since they would not be GC'ed 
by the JVM.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP 771: KRaft brokers should not expose controller metrics

2021-09-07 Thread Guozhang Wang
Thanks Ryan,

Read the KIP and it makes sense. +1 as well.

On Tue, Sep 7, 2021 at 1:42 PM Colin McCabe  wrote:

> +1 (binding)
>
> thanks, Ryan
>
> best,
> Colin
>
> On Tue, Sep 7, 2021, at 09:47, Colin McCabe wrote:
> > Hi Ryan,
> >
> > Thanks for working on this. I think it is almost ready to go. However,
> > I left a comment about the wording of the KIP in the DISCUSS thread.
> >
> > best,
> > Colin
> >
> >
> > On Thu, Sep 2, 2021, at 13:20, Ryan Dielhenn wrote:
> > > Hello kafka devs,
> > >
> > > I would like to start a vote on KIP-771. This KIP proposes to not
> expose
> > > controller metrics on KRaft brokers since KRaft brokers are not
> controller
> > > eligible and will never have a non-zero value for the metric. Since
> > > exposing metrics that will always be zero is both unneeded and causes
> > > non-negligible performance impact it would be best to not move forward
> with
> > > KAFKA-13140: https://github.com/apache/kafka/pull/11133 and instead
> accept
> > > this KIP.
> > >
> > >
> > >
> > > Here is a link to the KIP which documents the behavior change from how
> > > controller metrics are exposed in a Kafka cluster using Zookeeper to
> how
> > > they are exposed in a Kafka cluster using KRaft.
> > > :
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+771%3A+KRaft+brokers+should+not+expose+controller+metrics
> > >
> > > Here is a link to the discussion:
> > >
> https://lists.apache.org/thread.html/r74432034527fab13cc973ad5187ef5881a642500d77b0d275dd7f018%40%3Cdev.kafka.apache.org%3E
> > >
> > > Regards,
> > > Ryan Dielhenn
> > >
> >
>


-- 
-- Guozhang


Re: [VOTE] KIP-770: Replace "buffered.records.per.partition" with "input.buffer.max.bytes"

2021-09-07 Thread Guozhang Wang
Thanks Sagar, +1 from me.


Guozhang

On Sat, Sep 4, 2021 at 10:29 AM Sagar  wrote:

> Hi All,
>
> I would like to start a vote on the following KIP:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
>
> Thanks!
> Sagar.
>


-- 
-- Guozhang


Re: [VOTE] KIP-773 Differentiate consistently metric latency measured in millis and nanos

2021-09-03 Thread Guozhang Wang
Thanks Josep,

Took a look at the KIP, LGTM.

On Fri, Sep 3, 2021 at 11:25 AM Josep Prat 
wrote:

> Hi there,
>
> Since it's a rather small KIP, I'd like to start a vote for the KIP-773
> Differentiate consistently metric latency measured  in millis and nanos.
> The KIP page The KIP can be found at:
> https://cwiki.apache.org/confluence/x/ZwNACw
>
> Thanks in advance,
>
> --
>
> Josep Prat
>
> *Aiven Deutschland GmbH*
>
> Immanuelkirchstraße 26, 10405 Berlin
>
> Amtsgericht Charlottenburg, HRB 209739 B
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> *m:* +491715557497
>
> *w:* aiven.io
>
> *e:* josep.p...@aiven.io
>


-- 
-- Guozhang


[jira] [Created] (KAFKA-13268) Add more integration tests for Table Table FK joins with repartitioning

2021-09-02 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-13268:
-

 Summary: Add more integration tests for Table Table FK joins with 
repartitioning
 Key: KAFKA-13268
 URL: https://issues.apache.org/jira/browse/KAFKA-13268
 Project: Kafka
  Issue Type: Improvement
  Components: streams, unit tests
Reporter: Guozhang Wang


We should add to the FK join multipartition integration test with a 
Repartitioned for:
1) just the new partition count
2) a custom partitioner

This is to test if there's a bug where the internal topics don't pick up a 
partitioner provided that way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-770: Replace "buffered.records.per.partition" with "input.buffer.max.bytes"

2021-09-01 Thread Guozhang Wang
Currently the state store cache size default value is 10MB today, which
arguably is rather small. So I'm thinking maybe for this config default to
512MB.

Other than that, LGTM.

On Sat, Aug 28, 2021 at 11:34 AM Sagar  wrote:

> Thanks Guozhang and Sophie.
>
> Yeah a small default value would lower the throughput. I didn't quite
> realise it earlier. It's slightly hard to predict this value so I would
> guess around 1/2 GB to 1 GB? WDYT?
>
> Regarding the renaming of the config and the new metric, sure would include
> it in the KIP.
>
> Lastly, importance would also. be added. I guess Medium should be ok.
>
> Thanks!
> Sagar.
>
>
> On Sat, Aug 28, 2021 at 10:42 AM Sophie Blee-Goldman
>  wrote:
>
> > 1) I agree that we should just distribute the bytes evenly, at least for
> > now. It's simpler to understand and
> > we can always change it later, plus it makes sense to keep this aligned
> > with how the cache works today
> >
> > 2) +1 to being conservative in the generous sense, it's just not
> something
> > we can predict with any degree
> > of accuracy and even if we could, the appropriate value is going to
> differ
> > wildly across applications and use
> > cases. We might want to just pick some multiple of the default cache
> size,
> > and maybe do some research on
> > other relevant defaults or sizes (default JVM heap, size of available
> > memory in common hosts eg EC2
> > instances, etc). We don't need to worry as much about erring on the side
> of
> > too big, since other configs like
> > the max.poll.records will help somewhat to keep it from exploding.
> >
> > 4) 100%, I always found the *cache.max.bytes.buffering* config name to be
> > incredibly confusing. Deprecating this in
> > favor of "*statestore.cache.max.bytes*" and aligning it to the new input
> > buffer config sounds good to me to include here.
> >
> > 5) The KIP should list all relevant public-facing changes, including
> > metadata like the config's "Importance". Personally
> > I would recommend Medium, or even High if we're really worried about the
> > default being wrong for a lot of users
> >
> > Thanks for the KIP, besides those few things that Guozhang brought up and
> > the config importance, everything SGTM
> >
> > -Sophie
> >
> > On Thu, Aug 26, 2021 at 2:41 PM Guozhang Wang 
> wrote:
> >
> > > 1) I meant for your proposed solution. I.e. to distribute the
> configured
> > > bytes among threads evenly.
> > >
> > > 2) I was actually thinking about making the default a large enough
> value
> > so
> > > that we would not introduce performance regression: thinking about a
> use
> > > case with many partitions and each record may be large, then
> effectively
> > we
> > > would only start pausing when the total bytes buffered is pretty large.
> > If
> > > we set the default value to small, we would be "more aggressive" on
> > pausing
> > > which may impact throughput.
> > >
> > > 3) Yes exactly, this would naturally be at the "partition-group" class
> > > since that represents the task's all input partitions.
> > >
> > > 4) This is just a bold thought, I'm interested to see other's thoughts.
> > >
> > >
> > > Guozhang
> > >
> > > On Mon, Aug 23, 2021 at 4:10 AM Sagar 
> wrote:
> > >
> > > > Thanks Guozhang.
> > > >
> > > > 1) Just for my confirmation, when you say we should proceed with the
> > even
> > > > distribution of bytes, are you referring to the Proposed Solution in
> > the
> > > > KIP or the option you had considered in the JIRA?
> > > > 2) Default value for the config is something that I missed. I agree
> we
> > > > can't have really large values as it might be detrimental to the
> > > > performance. Maybe, as a starting point, we assume that only 1 Stream
> > > Task
> > > > is running so what could be the ideal value in such a scenario?
> > Somewhere
> > > > around 10MB similar to the caching config?
> > > > 3) When you say,  *a task level metric indicating the current totally
> > > > aggregated metrics, * you mean the bytes aggregated at a task level?
> > > > 4) I am ok with the name change, but would like to know others'
> > thoughts.
> > > >
> > > > Thanks!
> > > > Sagar.
> > > >
> > > > On Sun, Aug 22, 2021 at 11:54 PM Guozhang Wang 
> > >

[jira] [Resolved] (KAFKA-13257) KafkaStreams Support For Latest RocksDB Version

2021-09-01 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-13257.
---
Resolution: Not A Problem

> KafkaStreams Support For Latest RocksDB Version
> ---
>
> Key: KAFKA-13257
> URL: https://issues.apache.org/jira/browse/KAFKA-13257
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.8.0
>Reporter: Alagukannan
>Priority: Major
> Attachments: hs_err_pid6.log
>
>
> Hi,
>  Can you please let us know if there is any plan for adding the latest 
> versions of rocksDB in kafka streams. If your planning it what's the timeline 
> we are looking at. If not planning to upgrade what's the reason behind it. Is 
> there any significant impact on upgrading like backward combability etc.. 
> Just to remind this general query to know about the rocksdb upgrade and its 
> impact on streams application.
> The main pain point behind asking this upgrade is, We tried to build an 
> application with kafka streams 2.8.0 on an alpine based OS and the docker 
> base image is as follows  
> azul/zulu-openjdk-alpine:11.0.12-11.50.19-jre-headless.  The streams 
> application worked fine until it had an interaction with state 
> store(rocksdb). The jvm crashed with the following error:
>  #
>  # A fatal error has been detected by the Java Runtime Environment:
>  #
>  # SIGSEGV (0xb) at pc=0x7f9551951b27, pid=6, tid=207
>  #
>  # JRE version: OpenJDK Runtime Environment Zulu11.45+27-CA (11.0.10+9) 
> (build 11.0.10+9-LTS)
>  # Java VM: OpenJDK 64-Bit Server VM Zulu11.45+27-CA (11.0.10+9-LTS, mixed 
> mode, tiered, compressed oops, g1 gc, linux-amd64)
>  # Problematic frame:
>  # C [librocksdbjni15322693993163550519.so+0x271b27] 
> std::_Rb_tree, 
> std::less, std::allocator 
> >::_M_erase(std::_Rb_tree_node*)+0x27
> Then we found out rocksdb works well on glibc and not musl lib, where as 
> alpine supports musl lib alone for native dependencies. Further looking into 
> rocksdb for a solution we found that they have started supporting both glib 
> and musl native libs from 6.5.x versions.
>  But latest kafka streams(2.8.0) is having rocksdb(5.18.x) version. This is 
> the main reason behind asking for the rocksDB upgrade in kafka streams as 
> well.
> Have attached the PID log where JVM failures are happening.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-13175) The topic is marked for deletion, create topic with the same name throw exception topic already exists.

2021-09-01 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-13175.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

> The topic is marked for deletion, create topic with the same name throw 
> exception topic already exists.
> ---
>
> Key: KAFKA-13175
> URL: https://issues.apache.org/jira/browse/KAFKA-13175
> Project: Kafka
>  Issue Type: Bug
>Reporter: yangshengwei
>Priority: Major
> Fix For: 3.1.0
>
> Attachments: kafka (2).jpg, zookeeper.jpg
>
>
> After a topic is deleted, the topic is marked for deletion, create topic with 
> the same name throw exception topic already exists. It should throw exception 
> the topic is marked for deletion. I can choose to wait for the topic to be 
> completely deleted. If the topic is still not deleted for a long time, we 
> need to check the reason why it is not deleted.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-761: Add Total Blocked Time Metric to Streams

2021-08-31 Thread Guozhang Wang
Thanks for letting us know, Rohan.

On Tue, Aug 31, 2021 at 1:08 AM Rohan Desai  wrote:

> FYI I've updated the metric names in the KIP to the form ".*-time-ns-total"
> and clarified that the times being measured are in nanoseconds.
>
> On Wed, Jul 21, 2021 at 5:09 PM Rohan Desai 
> wrote:
>
> > Now that the discussion thread's been open for a few days, I'm calling
> for
> > a vote on
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams
> >
>


-- 
-- Guozhang


[jira] [Created] (KAFKA-13243) Differentiate metric latency measured in millis and nanos

2021-08-27 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-13243:
-

 Summary: Differentiate metric latency measured in millis and nanos
 Key: KAFKA-13243
 URL: https://issues.apache.org/jira/browse/KAFKA-13243
 Project: Kafka
  Issue Type: Improvement
  Components: metrics
Reporter: Guozhang Wang


Today most of the client latency metrics are measured in millis, and some in 
nanos. For those measured in nanos we usually differentiate them by having a 
`-ns` suffix in the metric names, e.g. `io-wait-time-ns-avg` and 
`io-time-ns-avg`. But there are a few that we obviously forgot to follow this 
pattern, e.g. `io-wait-time-total`: it is inconsistent where `avg` has `-ns` 
suffix and `total` has not. I did a quick search and found just two of them:

* bufferpool-wait-time-total : bufferpool-wait-time-ns-total
* io-wait-time-total: io-wait-time-ns-total

We should change their name accordingly with the `-ns` suffix as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13239) Use RocksDB.ingestExternalFile for restoration

2021-08-27 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-13239:
-

 Summary: Use RocksDB.ingestExternalFile for restoration
 Key: KAFKA-13239
 URL: https://issues.apache.org/jira/browse/KAFKA-13239
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Guozhang Wang


Now that we are in newer version of RocksDB, we can consider using the new

{code}
ingestExternalFile(final ColumnFamilyHandle columnFamilyHandle,
  final List filePathList,
  final IngestExternalFileOptions ingestExternalFileOptions)
{code}

for restoring changelog into state stores. More specifically:

1) Use larger default batch size in restore consumer polling behavior so that 
each poll would return more records as possible.
2) For a single batch of records returned from a restore consumer poll call, 
first write them as a single SST File using the {{SstFileWriter}}. The existing 
{{DBOptions}} could be used to construct the {{EnvOptions} and {{Options}} for 
the writter.
Do not yet ingest the written file to the db yet within each iteration
3) At the end of the restoration, call {{RocksDB.ingestExternalFile}} given all 
the written files' path as the parameter. The {{IngestExternalFileOptions}} 
would be specifically configured to allow key range overlapping with mem-table.
4) A specific note is that after the call in 3), heavy compaction may be 
executed by RocksDB in the background and before it cools down, starting normal 
processing immediately which would try to {{put}} new records into the store 
may see high stalls. To work around it we would consider using 
{{RocksDB.compactRange()}} which would block until the compaction is completed.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-770: Replace "buffered.records.per.partition" with "input.buffer.max.bytes"

2021-08-26 Thread Guozhang Wang
1) I meant for your proposed solution. I.e. to distribute the configured
bytes among threads evenly.

2) I was actually thinking about making the default a large enough value so
that we would not introduce performance regression: thinking about a use
case with many partitions and each record may be large, then effectively we
would only start pausing when the total bytes buffered is pretty large. If
we set the default value to small, we would be "more aggressive" on pausing
which may impact throughput.

3) Yes exactly, this would naturally be at the "partition-group" class
since that represents the task's all input partitions.

4) This is just a bold thought, I'm interested to see other's thoughts.


Guozhang

On Mon, Aug 23, 2021 at 4:10 AM Sagar  wrote:

> Thanks Guozhang.
>
> 1) Just for my confirmation, when you say we should proceed with the even
> distribution of bytes, are you referring to the Proposed Solution in the
> KIP or the option you had considered in the JIRA?
> 2) Default value for the config is something that I missed. I agree we
> can't have really large values as it might be detrimental to the
> performance. Maybe, as a starting point, we assume that only 1 Stream Task
> is running so what could be the ideal value in such a scenario? Somewhere
> around 10MB similar to the caching config?
> 3) When you say,  *a task level metric indicating the current totally
> aggregated metrics, * you mean the bytes aggregated at a task level?
> 4) I am ok with the name change, but would like to know others' thoughts.
>
> Thanks!
> Sagar.
>
> On Sun, Aug 22, 2021 at 11:54 PM Guozhang Wang  wrote:
>
> > Thanks Sagar for writing this PR.
> >
> > I think twice about the options that have been proposed in
> > https://issues.apache.org/jira/browse/KAFKA-13152, and feel that at the
> > moment it's simpler to just do the even distribution of the configured
> > total bytes. My rationale is that right now we have a static tasks ->
> > threads mapping, and hence each partition would only be fetched by a
> single
> > thread / consumer at a given time. If in the future we break that static
> > mapping into dynamic mapping, then we would not be able to do this even
> > distribution. Instead we would have other threads polling from consumer
> > only, and those threads would be responsible for checking the config and
> > pause non-empty partitions if it goes beyond the threshold. But since at
> > that time we would not change the config but just how it would be
> > implemented behind the scenes we would not need another KIP to change it.
> >
> > Some more comments:
> >
> > 1. We need to discuss a bit about the default value of this new config.
> > Personally I think we need to be a bit conservative with large values so
> > that it would not have any perf regression compared with old configs
> > especially with large topology and large number of partitions.
> > 2. I looked at the existing metrics, and do not have corresponding
> sensors.
> > How about also adding a task level metric indicating the current totally
> > aggregated metrics. The reason I do not suggest this metric on the
> > per-thread level is that in the future we may break the static mapping of
> > tasks -> threads.
> >
> > [optional] As an orthogonal thought, I'm thinking maybe we can rename the
> > other "*cache.max.bytes.buffering*" as "statestore.cache.max.bytes" (via
> > deprecation of course), piggy-backed in this KIP? Would like to hear
> > others' thoughts.
> >
> >
> > Guozhang
> >
> >
> >
> > On Sun, Aug 22, 2021 at 9:29 AM Sagar  wrote:
> >
> > > Hi All,
> > >
> > > I would like to start a discussion on the following KIP:
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
> > >
> > > Thanks!
> > > Sagar.
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-770: Replace "buffered.records.per.partition" with "input.buffer.max.bytes"

2021-08-22 Thread Guozhang Wang
Thanks Sagar for writing this PR.

I think twice about the options that have been proposed in
https://issues.apache.org/jira/browse/KAFKA-13152, and feel that at the
moment it's simpler to just do the even distribution of the configured
total bytes. My rationale is that right now we have a static tasks ->
threads mapping, and hence each partition would only be fetched by a single
thread / consumer at a given time. If in the future we break that static
mapping into dynamic mapping, then we would not be able to do this even
distribution. Instead we would have other threads polling from consumer
only, and those threads would be responsible for checking the config and
pause non-empty partitions if it goes beyond the threshold. But since at
that time we would not change the config but just how it would be
implemented behind the scenes we would not need another KIP to change it.

Some more comments:

1. We need to discuss a bit about the default value of this new config.
Personally I think we need to be a bit conservative with large values so
that it would not have any perf regression compared with old configs
especially with large topology and large number of partitions.
2. I looked at the existing metrics, and do not have corresponding sensors.
How about also adding a task level metric indicating the current totally
aggregated metrics. The reason I do not suggest this metric on the
per-thread level is that in the future we may break the static mapping of
tasks -> threads.

[optional] As an orthogonal thought, I'm thinking maybe we can rename the
other "*cache.max.bytes.buffering*" as "statestore.cache.max.bytes" (via
deprecation of course), piggy-backed in this KIP? Would like to hear
others' thoughts.


Guozhang



On Sun, Aug 22, 2021 at 9:29 AM Sagar  wrote:

> Hi All,
>
> I would like to start a discussion on the following KIP:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186878390
>
> Thanks!
> Sagar.
>


-- 
-- Guozhang


Re: Making topic validation public

2021-08-17 Thread Guozhang Wang
Hi Boyang,

I feel okay to add a doc section in Kafka about which topic names are
valid; and maybe add such validation into Utils as well.

Guozhang

On Sun, Aug 15, 2021 at 12:57 PM Boyang Chen 
wrote:

> Hey there,
>
> Kafka has an internal topic validation logic
> <
> https://github.com/apache/kafka/blob/64b8e17827251174490678dd296ce2c1a79ff5ef/clients/src/main/java/org/apache/kafka/common/internals/Topic.java#L43
> >
> which
> is very useful and effective. Do you think it's possible to make it part of
> the public agreement, so that 3rd party integration with Kafka could
> validate the topic upfront instead of waiting until the application has
> been set up end-to-end?
>
> Boyang
>


-- 
-- Guozhang


Re: [VOTE] KIP-766: fetch/findSessions queries with open endpoints for SessionStore/WindowStore

2021-08-17 Thread Guozhang Wang
Thanks Luke! +1 as well.

On Tue, Aug 17, 2021 at 11:42 AM John Roesler  wrote:

> Thanks, Luke!
>
> I'm sorry I missed your discussion thread. The KIP looks
> good to me!
>
> I'm +1 (binding)
>
> Thanks,
> -John
>
> On Tue, 2021-08-17 at 16:40 +0800, Luke Chen wrote:
> > Hi all,
> > I'd like to start to vote for
> > *KIP-766: fetch/findSessions queries with open endpoints for
> > WindowStore/SessionStore*.
> >
> > This is a follow-up KIP for KIP-763: Range queries with open endpoints
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-763%3A+Range+queries+with+open+endpoints
> >.
> > In KIP-763, we focused on *ReadOnlyKeyValueStore*, in this KIP, we'll
> focus
> > on *ReadOnlySessionStore* and *ReadOnlyWindowStore, *to have open
> endpoints
> > queries for SessionStore/WindowStore.
> >
> > The KIP can be found here:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186876596
> >
> > Thank you.
> > Luke
>
>
>

-- 
-- Guozhang


[jira] [Resolved] (KAFKA-13170) Flaky Test InternalTopicManagerTest.shouldRetryDeleteTopicWhenTopicUnknown

2021-08-10 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-13170.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

> Flaky Test InternalTopicManagerTest.shouldRetryDeleteTopicWhenTopicUnknown
> --
>
> Key: KAFKA-13170
> URL: https://issues.apache.org/jira/browse/KAFKA-13170
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, unit tests
>Reporter: A. Sophie Blee-Goldman
>    Assignee: Guozhang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-11176/2/testReport/org.apache.kafka.streams.processor.internals/InternalTopicManagerTest/Build___JDK_8_and_Scala_2_12___shouldRetryDeleteTopicWhenTopicUnknown_2/]
> {code:java}
> Stacktracejava.lang.AssertionError: unexpected exception type thrown; 
> expected: but 
> was:
>   at org.junit.Assert.assertThrows(Assert.java:1020)
>   at org.junit.Assert.assertThrows(Assert.java:981)
>   at 
> org.apache.kafka.streams.processor.internals.InternalTopicManagerTest.shouldRetryDeleteTopicWhenRetriableException(InternalTopicManagerTest.java:526)
>   at 
> org.apache.kafka.streams.processor.internals.InternalTopicManagerTest.shouldRetryDeleteTopicWhenTopicUnknown(InternalTopicManagerTest.java:497)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13172) Document in Streams 3.0 that due to rocksDB footer version in-filght downgrade is not supported

2021-08-05 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-13172:
-

 Summary: Document in Streams 3.0 that due to rocksDB footer 
version in-filght downgrade is not supported
 Key: KAFKA-13172
 URL: https://issues.apache.org/jira/browse/KAFKA-13172
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang
Assignee: Guozhang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Request Permissions to Contribute

2021-08-05 Thread Guozhang Wang
Hello Jordan,

I've added you to the permission list, you should be able to create wiki
pages now.

Guozhang

On Wed, Aug 4, 2021 at 2:27 PM Jordan Bull  wrote:

> Hi,
>
> I'd like to request permissions to contribute to Kafka to propose a KIP.
>
> Wiki ID: jbull
> Jira ID: jbull
>
> Thanks you,
> Jordan
>
>

-- 
-- Guozhang


[jira] [Resolved] (KAFKA-8683) Flakey test InternalTopicManagerTest @shouldNotCreateTopicIfExistsWithDifferentPartitions

2021-07-30 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-8683.
--
Resolution: Cannot Reproduce

Have not seen this flakiness recently and also cannot reproduce locally after 
10,000 runs, closing for now.

> Flakey test  InternalTopicManagerTest 
> @shouldNotCreateTopicIfExistsWithDifferentPartitions
> --
>
> Key: KAFKA-8683
> URL: https://issues.apache.org/jira/browse/KAFKA-8683
> Project: Kafka
>  Issue Type: Bug
>Reporter: Boyang Chen
>Priority: Major
>
> [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/414/consoleFull]
> org.apache.kafka.streams.processor.internals.InternalTopicManagerTest > 
> shouldNotCreateTopicIfExistsWithDifferentPartitions PASSED*00:05:46* ERROR: 
> Failed to write output for test null.Gradle Test Executor 5*00:05:46* 
> java.lang.NullPointerException: Cannot invoke method write() on null 
> object*00:05:46*   at 
> org.codehaus.groovy.runtime.NullObject.invokeMethod(NullObject.java:91)*00:05:46*
> at 
> org.codehaus.groovy.runtime.callsite.PogoMetaClassSite.call(PogoMetaClassSite.java:47)*00:05:46*
>  at 
> org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)*00:05:46*
>   at 
> org.codehaus.groovy.runtime.callsite.NullCallSite.call(NullCallSite.java:34)*00:05:46*
>at 
> org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)*00:05:46*
>   at java_io_FileOutputStream$write.call(Unknown Source)*00:05:46*
> at build_9s5vsq3vnws1928hdaummvzb1$_r



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13152) Replace "buffered.records.per.partition" with "input.buffer.max.bytes"

2021-07-30 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-13152:
-

 Summary: Replace "buffered.records.per.partition" with 
"input.buffer.max.bytes" 
 Key: KAFKA-13152
 URL: https://issues.apache.org/jira/browse/KAFKA-13152
 Project: Kafka
  Issue Type: Improvement
  Components: streams
    Reporter: Guozhang Wang


The current config "buffered.records.per.partition" controls how many records 
in maximum to bookkeep, and hence it is exceed we would pause fetching from 
this partition. However this config has two issues:

* It's a per-partition config, so the total memory consumed is dependent on the 
dynamic number of partitions assigned.
* Record size could vary from case to case.

And hence it's hard to bound the memory usage for this buffering. We should 
consider deprecating that config with a global, e.g. "input.buffer.max.bytes" 
which controls how much bytes in total is allowed to be buffered. This is 
doable since we buffer the raw records in .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9858) CVE-2016-3189 Use-after-free vulnerability in bzip2recover in bzip2 1.0.6 allows remote attackers to cause a denial of service (crash) via a crafted bzip2 file, related

2021-07-29 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-9858.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

> CVE-2016-3189  Use-after-free vulnerability in bzip2recover in bzip2 1.0.6 
> allows remote attackers to cause a denial of service (crash) via a crafted 
> bzip2 file, related to block ends set to before the start of the block.
> -
>
> Key: KAFKA-9858
> URL: https://issues.apache.org/jira/browse/KAFKA-9858
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.2.2, 2.3.1, 2.4.1
>Reporter: sihuanx
>Priority: Major
> Fix For: 3.0.0
>
>
> I'm not sure whether  CVE-2016-3189 affects kafka 2.4.1  or not?  This 
> vulnerability  was related to rocksdbjni-5.18.3.jar  which is compiled with 
> *bzip2 .* 
> Is there any task or plan to fix it? 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-761: Add Total Blocked Time Metric to Streams

2021-07-27 Thread Guozhang Wang
Hello Rohan,

Thanks for the KIP. As Bruno mentioned in the other thread could you update
the "New Metrics" that 1) we have sub-titles for streams, producer,
consumer metrics, just for clarification, and 2) update the "producer-id"
etc to "client-id" to be consistent with the existing metrics.

Otherwise, I'm +1


Guozhang


On Mon, Jul 26, 2021 at 12:49 PM Leah Thomas 
wrote:

> Hey Rohan,
>
> Thanks for pushing this KIP through. I'm +1, non-binding.
>
> Leah
>
> On Wed, Jul 21, 2021 at 7:09 PM Rohan Desai 
> wrote:
>
> > Now that the discussion thread's been open for a few days, I'm calling
> for
> > a vote on
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams
> >
>


-- 
-- Guozhang


Re: [VOTE] KIP-763: Range queries with open endpoints

2021-07-27 Thread Guozhang Wang
Thank you, Patrick!

The KIP looks good to me, and I also agree with the pragmatic manner. +1
(binding)

Guozhang


On Thu, Jul 22, 2021 at 11:09 AM John Roesler  wrote:

> Thank you, Patrick,
>
> +1 (binding) from me as well!
>
> Thanks,
> -John
>
> On Thu, 2021-07-22 at 10:40 +0200, Bruno Cadonna wrote:
> > Hi Patrick,
> >
> > Thank you for the KIP!
> >
> > +1 (binding)
> >
> > Best,
> > Bruno
> >
> > On 22.07.21 03:47, Luke Chen wrote:
> > > Hi Patrick,
> > > I like this KIP!
> > >
> > > +1 (non-binding)
> > >
> > > Luke
> > >
> > > On Thu, Jul 22, 2021 at 7:04 AM Matthias J. Sax 
> wrote:
> > >
> > > > Thanks for the KIP.
> > > >
> > > > +1 (binding)
> > > >
> > > >
> > > > -Matthias
> > > >
> > > > On 7/21/21 1:18 PM, Patrick Stuedi wrote:
> > > > > Hi all,
> > > > >
> > > > > Thanks for the feedback on the KIP, I have updated the KIP and
> would like
> > > > > to start the voting.
> > > > >
> > > > > The KIP can be found here:
> > > > >
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-763%3A+Range+queries+with+open+endpoints
> > > > >
> > > > > Please vote in this thread.
> > > > >
> > > > > Thanks!
> > > > > -Patrick
> > > > >
> > > >
> > >
>
>
>

-- 
-- Guozhang


Re: Contributor permission for assigning Jira ticket

2021-07-26 Thread Guozhang Wang
Hello Reggie,

It seems Sophie has granted you the contribution permission :)

Guozhang

On Mon, Jul 26, 2021 at 6:19 PM Reggie Hsu  wrote:

> I would like to work on KAFKA-13120
> , and I would like to
> ask for contributor permission.
> My Jira ID is Reggiehsu111, and my user name is Reggiehsu.
>
> Best regards,
> Reggie
>


-- 
-- Guozhang


[jira] [Resolved] (KAFKA-13008) Stream will stop processing data for a long time while waiting for the partition lag

2021-07-23 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-13008.
---
  Assignee: Guozhang Wang
Resolution: Fixed

> Stream will stop processing data for a long time while waiting for the 
> partition lag
> 
>
> Key: KAFKA-13008
> URL: https://issues.apache.org/jira/browse/KAFKA-13008
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.0.0
>Reporter: Luke Chen
>Assignee: Guozhang Wang
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: image-2021-07-07-11-19-55-630.png
>
>
> In KIP-695, we improved the task idling mechanism by checking partition lag. 
> It's a good improvement for timestamp sync. But I found it will cause the 
> stream stop processing the data for a long time while waiting for the 
> partition metadata.
>  
> I've been investigating this case for a while, and figuring out the issue 
> will happen in below situation (or similar situation):
>  # start 2 streams (each with 1 thread) to consume from a topicA (with 3 
> partitions: A-0, A-1, A-2)
>  # After 2 streams started, the partitions assignment are: (I skipped some 
> other processing related partitions for simplicity)
>  stream1-thread1: A-0, A-1 
>  stream2-thread1: A-2
>  # start processing some data, assume now, the position and high watermark is:
>  A-0: offset: 2, highWM: 2
>  A-1: offset: 2, highWM: 2
>  A-2: offset: 2, highWM: 2
>  # Now, stream3 joined, so trigger rebalance with this assignment:
>  stream1-thread1: A-0 
>  stream2-thread1: A-2
>  stream3-thread1: A-1
>  # Suddenly, stream3 left, so now, rebalance again, with the step 2 
> assignment:
>  stream1-thread1: A-0, *A-1* 
>  stream2-thread1: A-2
>  (note: after initialization, the  position of A-1 will be: position: null, 
> highWM: null)
>  # Now, note that, the partition A-1 used to get assigned to stream1-thread1, 
> and now, it's back. And also, assume the partition A-1 has slow input (ex: 1 
> record per 30 mins), and partition A-0 has fast input (ex: 10K records / 
> sec). So, now, the stream1-thread1 won't process any data until we got input 
> from partition A-1 (even if partition A-0 is buffered a lot, and we have 
> `{{max.task.idle.ms}}` set to 0).
>  
> The reason why the stream1-thread1 won't process any data is because we can't 
> get the lag of partition A-1. And why we can't get the lag? It's because
>  # In KIP-695, we use consumer's cache to get the partition lag, to avoid 
> remote call
>  # The lag for a partition will be cleared if the assignment in this round 
> doesn't have this partition. check 
> [here|https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/consumer/internals/SubscriptionState.java#L272].
>  So, in the above example, the metadata cache for partition A-1 will be 
> cleared in step 4, and re-initialized (to null) in step 5
>  # In KIP-227, we introduced a fetch session to have incremental fetch 
> request/response. That is, if the session existed, the client(consumer) will 
> get the update only when the fetched partition have update (ex: new data). 
> So, in the above case, the partition A-1 has slow input (ex: 1 record per 30 
> mins), it won't have update until next 30 mins, or wait for the fetch session 
> become inactive for (default) 2 mins to be evicted. Either case, the metadata 
> won't be updated for a while.
>  
> In KIP-695, if we don't get the partition lag, we can't determine the 
> partition data status to do timestamp sync, so we'll keep waiting and not 
> processing any data. That's why this issue will happen.
>  
> *Proposed solution:*
>  # If we don't get the current lag for a partition, or the current lag > 0, 
> we start to wait for max.task.idle.ms, and reset the deadline when we get the 
> partition lag, like what we did in previous KIP-353
>  # Introduce a waiting time config when no partition lag, or partition lag 
> keeps > 0 (need KIP)
> [~vvcephei] [~guozhang] , any suggestions?
>  
> cc [~ableegoldman]  [~mjsax] , this is the root cause that in 
> [https://github.com/apache/kafka/pull/10736,] we discussed and thought 
> there's a data lose situation. FYI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-20 Thread Guozhang Wang
Thanks Rohan.

I remember now that we moved the round-trip PID's txn completion logic into
init-transaction and commit/abort-transaction. So I think we'd count time
as in StreamsProducer#initTransaction as well (admittedly it is in most
cases a one-time thing).

Other than that, I do not have any further comments.

Guozhang

On Tue, Jul 20, 2021 at 11:59 AM Rohan Desai 
wrote:

> > Similarly, I think "txn-commit-time-total" and
> "offset-commit-time-total" may better be inside producer and consumer
> clients respectively.
>
> I agree for offset-commit-time-total. For txn-commit-time-total I'm
> proposing we measure `StreamsProducer.commitTransaction`, which wraps
> multiple producer calls (sendOffsets, commitTransaction)
>
> > > For "txn-commit-time-total" specifically, besides producer.commitTxn.
> other txn-related calls may also be blocking, including
> producer.beginTxn/abortTxn, I saw you mentioned "txn-begin-time-total"
> later in the doc, but did not include it as a separate metric, and
> similarly, should we have a `txn-abort-time-total` as well? If yes, could
> you update the KIP page accordingly.
>
> `beginTransaction` is not blocking - I meant to remove that from that doc.
> I'll add something for abort.
>
> On Mon, Jul 19, 2021 at 11:55 PM Rohan Desai 
> wrote:
>
> > Thanks for the review Guozhang! responding to your feedback inline:
> >
> > > 1) I agree that the current ratio metrics is just "snapshot in point",
> > and
> > more flexible metrics that would allow reporters to calculate based on
> > window intervals are better. However, the current mechanism of the
> proposed
> > metrics assumes the thread->clients mapping as of today, where each
> thread
> > would own exclusively one main consumer, restore consumer, producer and
> an
> > admin client. But this mapping may be subject to change in the future.
> Have
> > you thought about how this metric can be extended when, e.g. the embedded
> > clients and stream threads are de-coupled?
> >
> > Of course this depends on how exactly we refactor the runtime - assuming
> > that we plan to factor out consumers into an "I/O" layer that is
> > responsible for receiving records and enqueuing them to be processed by
> > processing threads, then I think it should be reasonable to count the
> time
> > we spend blocked on this internal queue(s) as blocked. The main concern
> > there to me is that the I/O layer would be doing something expensive like
> > decompression that shouldn't be counted as "blocked". But if that really
> is
> > so expensive that it starts to throw off our ratios then it's probably
> > indicative of a larger problem that the "i/o layer" is a bottleneck and
> it
> > would be worth refactoring so that decompression (or insert other
> expensive
> > thing here) can also be done on the processing threads.
> >
> > > 2) [This and all below are minor comments] The "flush-time-total" may
> > better be a producer client metric, as "flush-wait-time-total", than a
> > streams metric, though the streams-level "total-blocked" can still
> leverage
> > it. Similarly, I think "txn-commit-time-total" and
> > "offset-commit-time-total" may better be inside producer and consumer
> > clients respectively.
> >
> > Good call - I'll update the KIP
> >
> > > 3) The doc was not very clear on how "thread-start-time" would be
> needed
> > when calculating streams utilization along with total-blocked time, could
> > you elaborate a bit more in the KIP?
> >
> > Yes, will do.
> >
> > > For "txn-commit-time-total" specifically, besides producer.commitTxn.
> > other txn-related calls may also be blocking, including
> > producer.beginTxn/abortTxn, I saw you mentioned "txn-begin-time-total"
> > later in the doc, but did not include it as a separate metric, and
> > similarly, should we have a `txn-abort-time-total` as well? If yes, could
> > you update the KIP page accordingly.
> >
> > Ack.
> >
> > On Mon, Jul 12, 2021 at 11:29 PM Rohan Desai 
> > wrote:
> >
> >> Hello All,
> >>
> >> I'd like to start a discussion on the KIP linked above which proposes
> >> some metrics that we would find useful to help measure whether a Kafka
> >> Streams application is saturated. The motivation section in the KIP goes
> >> into some more detail on why we think this is a useful addition to the
> >> metrics already implemented. Thanks in advance for your feedback!
> >>
> >> Best Regards,
> >>
> >> Rohan
> >>
> >> On Mon, Jul 12, 2021 at 12:00 PM Rohan Desai 
> >> wrote:
> >>
> >>>
> >>>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams
> >>>
> >>
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-12 Thread Guozhang Wang
Hey Rohan,

Thanks for putting up the KIP. Maybe you can also briefly talk about the
context in the email message as well (they are very well explained inside
the KIP doc though).

I have a few meta and detailed thoughts after reading it.

1) I agree that the current ratio metrics is just "snapshot in point", and
more flexible metrics that would allow reporters to calculate based on
window intervals are better. However, the current mechanism of the proposed
metrics assumes the thread->clients mapping as of today, where each thread
would own exclusively one main consumer, restore consumer, producer and an
admin client. But this mapping may be subject to change in the future. Have
you thought about how this metric can be extended when, e.g. the embedded
clients and stream threads are de-coupled?

2) [This and all below are minor comments] The "flush-time-total" may
better be a producer client metric, as "flush-wait-time-total", than a
streams metric, though the streams-level "total-blocked" can still leverage
it. Similarly, I think "txn-commit-time-total" and
"offset-commit-time-total" may better be inside producer and consumer
clients respectively.

3) The doc was not very clear on how "thread-start-time" would be needed
when calculating streams utilization along with total-blocked time, could
you elaborate a bit more in the KIP?

4) For "txn-commit-time-total" specifically, besides producer.commitTxn,
other txn-related calls may also be blocking, including
producer.beginTxn/abortTxn, I saw you mentioned "txn-begin-time-total"
later in the doc, but did not include it as a separate metric, and
similarly, should we have a `txn-abort-time-total` as well? If yes, could
you update the KIP page accordingly.

5) Not a suggestion, but just wanted to bring up that the producer related
metrics are a bit "coarsen" compared with the consumer/admin clients since
their IO mechanisms are a bit different: for producer the caller thread
does not do any IOs since it's all delegated to the background sender
thread, while for consumer/admin the caller thread would still need to do
some IOs, and hence the selector-level metrics would make sense. On top of
my head I cannot think of a better measuring mechanism for producers
either, especially for txn-related ones, we may need to experiment and see
if the generated ratio is relatively accurate and reasonable with the
reflected "block time".



Guozhang

On Mon, Jul 12, 2021 at 12:01 PM Rohan Desai 
wrote:

>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams
>


-- 
-- Guozhang


Re: [ANNOUNCE] New Kafka PMC Member: Konstantine Karantasis

2021-06-21 Thread Guozhang Wang
Congratulations Konstantine!

On Mon, Jun 21, 2021 at 9:37 AM Tom Bentley  wrote:

> Congratulations Konstantine!
>
> On Mon, Jun 21, 2021 at 5:33 PM David Jacot 
> wrote:
>
> > Congrats, Konstantine. Well deserved!
> >
> > Best,
> > David
> >
> > On Mon, Jun 21, 2021 at 6:14 PM Ramesh Krishnan  >
> > wrote:
> >
> > > Congrats Konstantine
> > >
> > > On Mon, 21 Jun 2021 at 8:58 PM, Mickael Maison 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > It's my pleasure to announce that Konstantine Karantasis is now a
> > > > member of the Kafka PMC.
> > > >
> > > > Konstantine has been a Kafka committer since Feb 2020. He has
> remained
> > > > active in the community since becoming a committer.
> > > >
> > > > Congratulations Konstantine!
> > > >
> > > > Mickael, on behalf of the Apache Kafka PMC
> > > >
> > >
> >
>


-- 
-- Guozhang


Re: KAFKA-12889 pull request review request

2021-06-19 Thread Guozhang Wang
Hi Qiang,

Thanks for filing this report! It is a good find, and I've reviewed your PR.

On Thu, Jun 17, 2021 at 7:11 AM qiang liu  wrote:

> hi kafka devlopers.
> I have created a jira KAFKA-12889
>  and a pull requst
> 10818
>  for fix log cleaner may left
> empty segments about every 2^31 messages.
> It has been about two weeks since the pr, but not got any review. ping some
> commiter on github but still no response.
> so, can some one review this or tell me what to do next to continue this
> contribution
>
> best wishes and thanks.
>
> qiang
>


-- 
-- Guozhang


[jira] [Created] (KAFKA-12957) Refactor Streams Logical Plan Generation

2021-06-16 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12957:
-

 Summary: Refactor Streams Logical Plan Generation
 Key: KAFKA-12957
 URL: https://issues.apache.org/jira/browse/KAFKA-12957
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Guozhang Wang


There is a general issue of Streams logical plan -> physical plan generation, 
where the physical processor nodes are generated at the parsing phase rather 
than the logical plan compilation phase. The former stage is agnostic to any 
user configurations while only the latter stage have access to it, and hence we 
would not generate physical processor nodes during the parsing phase (i.e. any 
code related to StreamsBuilder), but defer them to the logical plan phase (i.e. 
XXNode.writeToTopology). This has several issues such that many physical 
processor instantiation requires to access the configs, and hence we have to 
defer it to the `init` procedure of the node, which is scattered in many places 
from logical nodes to physical processors.

This would be a big refactoring on Stream's logical plan generation, but I 
think it would worth to get this in a cleaner state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-752: Support --bootstrap-server in ReplicaVerificationTool

2021-06-14 Thread Guozhang Wang
If we believe this tool does not work anymore and there's other ways to
achieve the intended function, then we should remove it in the next
release; otherwise, I think this KIP still is worthy. In any ways, we
should not left a cmd tool not maintained but not removed either.

Guozhang

On Thu, Jun 10, 2021 at 10:05 PM Dongjin Lee  wrote:

> Hi Ismael,
>
> > I am not convinced this tool is actually useful, I haven't seen anyone
> using it in years.
>
> Sure, you may right indeed. The `ReplicaVerificationTool` may not be so
> useful.[^0] However, I hope to propose another perspective.
>
> As long as this tool is provided with a launcher script in a distribution,
> its command-line parameters look so weird to the users since it breaks
> consistency. It is even worse with KIP-499
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=123899170
> >[^1],
> which tries to unify the command line parameters and deprecate old ones -
> even the tools without launcher script (e.g., VerifiableLog4jAppender) now
> uses `--bootstrap-server` parameter. This situation is rather odd, isn't
> it?
>
> This improvement may not have a great value, but it may reduce awkwardness
> from the user's viewpoint.
>
> Best,
> Dongjin
>
> [^0]: With my personal experience, I used it to validate the replication
> when working with a client so sensitive to replication missing, like a
> Semiconductor manufacturing company.
> [^1]: Somewhat strange, two omitted tools from KIP-499
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=123899170
> >
> all have their own launcher script.
>
> On Thu, Jun 10, 2021 at 2:02 PM Ismael Juma  wrote:
>
> > KAFKA-12600 was a general change, not related to this tool specifically.
> I
> > am not convinced this tool is actually useful, I haven't seen anyone
> using
> > it in years.
> >
> > Ismael
> >
> > On Wed, Jun 9, 2021 at 9:51 PM Dongjin Lee  wrote:
> >
> > > Hi Ismael,
> > >
> > > Before I submit this KIP, I reviewed some history. When KIP-499
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-499+-+Unify+connection+name+flag+for+command+line+tool
> > > >
> > > tried to resolve the inconsistencies between the command line tools,
> two
> > > tools were omitted, probably by mistake.
> > >
> > > - KAFKA-12878: Support --bootstrap-server
> kafka-streams-application-reset
> > > 
> > > - KAFKA-12899: Support --bootstrap-server in ReplicaVerificationTool
> > >  (this one)
> > >
> > > And it seems like this tool is still working. The last update was
> > > KAFKA-12600  by
> you,
> > > which will also be included in this 3.0.0 release. It is why I
> determined
> > > that this tool is worth updating.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > On Thu, Jun 10, 2021 at 1:26 PM Ismael Juma  wrote:
> > >
> > > > Hi Dongjin,
> > > >
> > > > Does this tool still work? I recall that there were some doubts about
> > it
> > > > and that's why it wasn't updated previously.
> > > >
> > > > Ismael
> > > >
> > > > On Sat, Jun 5, 2021 at 2:38 PM Dongjin Lee 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to call for a vote on KIP-752: Support --bootstrap-server
> in
> > > > > ReplicaVerificationTool:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-752%3A+Support+--bootstrap-server+in+ReplicaVerificationTool
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > >
> > > > >
> > > > >
> > > > > *github:  github.com/dongjinleekr
> > > > > keybase:
> > > > https://keybase.io/dongjinleekr
> > > > > linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > > speakerdeck:
> > > > > speakerdeck.com/dongjin
> > > > > *
> > > > >
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > >
> > >
> > > *github:  github.com/dongjinleekr
> > > keybase:
> > https://keybase.io/dongjinleekr
> > > linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > speakerdeck:
> > > speakerdeck.com/dongjin
> > > *
> > >
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
>
>
> *github:  github.com/dongjinleekr
> keybase: https://keybase.io/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> speakerdeck:
> speakerdeck.com/dongjin
> 

Re: Request for contributor permission

2021-06-14 Thread Guozhang Wang
Hello Nicolas,

I've added you to the contributors list. You should be able to
assign tickets to yourself now.

Guozhang

On Mon, Jun 14, 2021 at 7:24 AM Nicolas Guignard <
nicolas.guignar...@gmail.com> wrote:

> Hi,
>
> I am sending this email to ask to have the contributor's permission in
> order to be able to assign Jiras to me. My Jira username is: Nicolas
> Guignard. Is this all you need or do you need something else?
>
> Thanks in advance.
>
> Have a good day,
> Cheers,
> Nicolas Guignard
> --
> Software engineer
>


-- 
-- Guozhang


[jira] [Resolved] (KAFKA-10585) Kafka Streams should clean up the state store directory from cleanup

2021-06-11 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-10585.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

> Kafka Streams should clean up the state store directory from cleanup
> 
>
> Key: KAFKA-10585
> URL: https://issues.apache.org/jira/browse/KAFKA-10585
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Rohan Desai
>Assignee: Dongjin Lee
>Priority: Minor
>  Labels: newbie++
> Fix For: 3.0.0
>
>
> Currently, `KafkaStreams.cleanup` cleans up all the task-level directories 
> and the global directory. However it doesn't clean up the enclosing state 
> store directory, though streams does create this directory when it 
> initializes the state for the streams app. Feels like it should remove this 
> directory when it cleans up.
> We notice this in ksql quite often, since every new query is a new streams 
> app. Over time, we see lots of state store directories left around for old 
> queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Request permission for KIP creation

2021-06-10 Thread Guozhang Wang
Hello Yi,

I've added you to the wiki space. Cheers,

Guozhang

On Thu, Jun 10, 2021 at 12:58 PM Yi Ding  wrote:

> Hi Team,
>
> Could you help grant me permission to create KIP on
>
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> ?
>
> My info is:
> Wiki ID: Yi Ding
> Email: yd...@confluent.io
>
> Thank you!
> Yi
>


-- 
-- Guozhang


[jira] [Created] (KAFKA-12920) Consumer's cooperative sticky assignor need to clear generation / assignment data upon `onPartitionsLost`

2021-06-08 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12920:
-

 Summary: Consumer's cooperative sticky assignor need to clear 
generation / assignment data upon `onPartitionsLost`
 Key: KAFKA-12920
 URL: https://issues.apache.org/jira/browse/KAFKA-12920
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang


Consumer's cooperative-sticky assignor does not track the owned partitions 
inside the assignor --- i.e. when it reset its state in event of 
``onPartitionsLost``, the ``memberAssignment`` and ``generation`` inside the 
assignor would not be cleared. This would cause a member to join with empty 
generation on the protocol while with non-empty user-data encoding the old 
assignment still (and hence pass the validation check on broker side during 
JoinGroup), and eventually cause a single partition to be assigned to multiple 
consumers within a generation.

We should let the assignor to also clear its assignment/generation when 
``onPartitionsLost`` is triggered in order to avoid this scenario.

Note that 1) for the regular sticky assignor the generation would still have an 
older value, and this would cause the previously owned partitions to be 
discarded during the assignment, and 2) for Streams' sticky assignor, it’s 
encoding would indeed be cleared along with ``onPartitionsLost``. Hence only 
Consumer's cooperative-sticky assignor have this issue to solve.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12918) Watch-Free Animated 'America: The Motion Picture' Trailer 2021 Full Hd Download

2021-06-08 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12918.
---

> Watch-Free Animated 'America: The Motion Picture' Trailer 2021 Full Hd  
> Download
> 
>
> Key: KAFKA-12918
> URL: https://issues.apache.org/jira/browse/KAFKA-12918
> Project: Kafka
>  Issue Type: Bug
>Reporter: mushfiqur rahoman
>Priority: Major
>
> America’ Trailer: Channing Tatum Voices Foul-Mouthed George Washington in 
> Netflix Comedy Channing Tatum Leads a Revolution in America: The Motion 
> Picture Trailer Channing Tatum, Simon Pegg, Killer Mike Star in First Trailer 
> for America: The Motion Picture: Watch Online Full HD Free
> ###
> Watch Here ▶️▶️ [https://streamsable.com/movies/]
> Download Here ▶️▶️ [https://streamsable.com/movies/]
> ###
> To help usher in a (mostly Covid-free?) 4th of July weekend, Netflix has 
> something special lined up: an animated film that’s an R-Rated take on the 
> American Revolution. “America: The Motion Picture” offers a radically 
> different take on the familiar history of America’s inception as a country. 
> George Washington and other founding fathers rally the colonial troops to 
> victory against the British but in a totally wild and anachronistic fashion. 
> Here’s the official synopsis:
> READ MORE: Netflix Unveils Massive Summer Slate Teaser Trailer: New Films 
> With Felicity Jones, Jason Mamoa, Shailene Woodley, Kevin Hart, Liam Neeson & 
> More
> In this wildly tongue-in-cheek animated revisionist history, a 
> chainsaw-wielding George Washington assembles a team of rabble-rousers — 
> including beer-loving bro Sam Adams, famed scientist Thomas Edison, acclaimed 
> horseman Paul Revere, and a very pissed off Geronimo — to defeat Benedict 
> Arnold and King James in the American Revolution. Who will win? No one knows, 
> but you can be sure of one thing: these are not your father’s Founding… uh, 
> Fathers.
> Channing Tatum leads the voice cast as George Washington. Alongside him is 
> Simon Pegg as King James, Bobby Moynihan as Paul Revere, Raoul Trujillo as 
> Geronimo, and Jason Mantzoukas as Sam Adams. Judy Greer is also on board as 
> Martha Dandridge, as is Olivia Munn as Thomas Edison (yes, you read that 
> right). Will Forte, Andy Samberg, rapper Killer Mike, and Amber Nash are also 
> part of the cast.
> READ MORE: The 100 Most Anticipated Films of 2021
> Matt Thompson, one of the executive producers of the cult animated show 
> “Archer,” directs David Callaham‘s (‘Wonder Woman;’ ‘Shang-Chi And The Legend 
> Of The Ten Rings‘) screenplay. Thompson and Callaham also serve as producers 
> with Adam Reed, also on the “Archer” team. Tatum also has a producer’s credit 
> with Peter Kiernan and Reed Carolin under his Free Association company. Phil 
> Lord and Christopher Miller, the dream team behind “The Lego Movie,” also 
> serve as producers with Will Allegra through Lord Miller.
> READ MORE: Channing Tatum Reuniting With Lord And Miller For Universal 
> Monster Movie
> What other crazy surprises does “America: The Motion Picture” has in store 
> for its audience? Find out on June 30, when the film hits Netflix. Check out 
> the trailer below.
> Channing Tatum's R-rated George Washington and the rest of the Founding 
> Fathers unite in a trailer for Netflix's America: The Motion Picture.
> The trailer begins by reminding us this animated film comes "From the 
> Founding Fathers who brought you Archer, Spider-Man: Into the Spider-Verse, 
> The Expendables and Magic Mike." The Magic Mike part then comes into play 
> when a scene of gyrating dancers with neon clothing is quickly shown. Next, 
> we are introduced to Tatum's George Washington, who delivers the surprising 
> declaration, "I'm George Washington. Let's go start a fucking revolution."
> Netflix has released a ridiculous trailer for its star-studded animated 
> comedy “America: The Motion Picture,” which stars Channing Tatum as the voice 
> of a beefed-up and vulgar George Washington in a satirical take on the 
> American Revolution. The movie hails from “Archer” producer Matt Thompson, 
> who directs a script by “Wonder Woman” writer Dave Callahan. With Tatum in an 
> executive producer role alongside partner Reid Carolin as well as Phil Lord 
> and Chris Miller, the wacky historical comedy is sure to be a hit with its 
> target audience.
> Here’s the official synopsis: “For, like, thousands of years, the origins of 
> the United States of Americ

Re: [VOTE] KIP-744: Migrate TaskMetadata and ThreadMetadata to an interface with internal implementation

2021-06-07 Thread Guozhang Wang
Thanks, that update LGTM. +1!

Guozhang

On Mon, Jun 7, 2021 at 9:45 AM Josep Prat 
wrote:

> Hi Guozhang,
> Let me know if the updated KIP meets your requests. And thanks again for
> your feedback!
>
> Thanks,
>
> On Sat, Jun 5, 2021 at 4:38 PM Josep Prat  wrote:
>
> > Done, KIP is now updated to reflect this.
> >
> > On Sat, Jun 5, 2021 at 4:29 PM Josep Prat  wrote:
> >
> >> Hi Guozhang,
> >>
> >> Thanks for your review, let's exclude the "localThreadsMetadata
> returning
> >> StreamsMetadata" change then. This way, as well, this KIP is completely
> >> consistent on making those Metadata classes interfaces with internal
> >> implementations.
> >> I'll update the KIP document right away.
> >>
> >> Thanks for replying on Saturday :)
> >>
> >> On Sat, Jun 5, 2021 at 4:22 PM Guozhang Wang 
> wrote:
> >>
> >>> Hi Josep,
> >>>
> >>> I think the most significant change would be, "localThreadsMetadata"
> >>> returns a StreamsMetadata instead of ThreadMetadata, and
> StreamsMetadata
> >>> would expose a new API to return a set of ThreadMetadata.
> >>>
> >>> All others (including the repackaging and splitting of interface /
> impl)
> >>> are minor indeed.
> >>>
> >>> After a second thought, I feel it is not fair to squeeze in this
> >>> significant change into your KIP, without the community having a
> >>> separating
> >>> discussion about it, so I think we can table it for now and only align
> on
> >>> the other minor things: 1) have a o.a.k.streams.StreamsMetadata
> interface
> >>> (along with an internal implementation class), 2) deprecate the
> >>> o.a.k.streams.state.StreamsMetadata class and also the corresponding
> >>> caller
> >>> of Streams that returns this class.
> >>>
> >>> Guozhang
> >>>
> >>> On Fri, Jun 4, 2021 at 4:13 PM Josep Prat  >
> >>> wrote:
> >>>
> >>> > Hi Guozhang,
> >>> > So if I understand correctly, it's only a couple of small changes
> that
> >>> need
> >>> > to be made to this KIP to be aligned with KAFKA-12370, right?
> >>> >
> >>> > I'm guessing that StreamsMetadata would not only moved to
> >>> o.a.k.streams but
> >>> > also be split with Interface + internal implementation, am I right?
> >>> >
> >>> >
> >>> > If that's the case, I could, most probably, update the KIP by
> Saturday
> >>> > afternoon CEST.
> >>> >
> >>> > Let me know if I understood you correctly.
> >>> >
> >>> > Thanks for the comments!
> >>> >
> >>> > ———
> >>> > Josep Prat
> >>> >
> >>> > Aiven Deutschland GmbH
> >>> >
> >>> > Immanuelkirchstraße 26, 10405 Berlin
> >>> >
> >>> > Amtsgericht Charlottenburg, HRB 209739 B
> >>> >
> >>> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >>> >
> >>> > m: +491715557497
> >>> >
> >>> > w: aiven.io
> >>> >
> >>> > e: josep.p...@aiven.io
> >>> >
> >>> > On Sat, Jun 5, 2021, 00:11 Guozhang Wang  wrote:
> >>> >
> >>> > > Hello Josep,
> >>> > >
> >>> > > Thanks for the proposal! The write-up looks good to me in general.
> >>> I'm
> >>> > just
> >>> > > wondering if you feel comfortable to align this with another
> JIRA/KIP
> >>> > > further down the road:
> >>> > >
> >>> > > https://issues.apache.org/jira/browse/KAFKA-12370
> >>> > >
> >>> > > Which tries to clean up the metadata hierarchy and the callers as
> >>> > > StreamsMetadata -> ThreadMetadata -> TaskMetadata, and most Streams
> >>> APIs
> >>> > > return the top-level StreamsMetadata.
> >>> > >
> >>> > > It just have slight differences with the current proposal: 1)
> >>> instead of
> >>> > > returning a ThreadMetadata, "localThreadsMetadata" returns
> >>> > > a StreamsMetadata, and 2) the `StreamsMetadata` would also be moved
> >>> from
> >>> 

[jira] [Resolved] (KAFKA-10614) Group coordinator onElection/onResignation should guard against leader epoch

2021-06-07 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-10614.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

> Group coordinator onElection/onResignation should guard against leader epoch
> 
>
> Key: KAFKA-10614
> URL: https://issues.apache.org/jira/browse/KAFKA-10614
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>    Reporter: Guozhang Wang
>Assignee: Tom Bentley
>Priority: Major
> Fix For: 3.0.0
>
>
> When there are a sequence of LeaderAndISR or StopReplica requests sent from 
> different controllers causing the group coordinator to elect / resign, we may 
> re-order the events due to race condition. For example:
> 1) First LeaderAndISR request received from old controller to resign as the 
> group coordinator.
> 2) Second LeaderAndISR request received from new controller to elect as the 
> group coordinator.
> 3) Although threads handling the 1/2) requests are synchronized on the 
> replica manager, their callback {{onLeadershipChange}} would trigger 
> {{onElection/onResignation}} which would schedule the loading / unloading on 
> background threads, and are not synchronized.
> 4) As a result, the {{onElection}} maybe triggered by the thread first, and 
> then {{onResignation}}. As a result, the coordinator would not recognize it 
> self as the coordinator and hence would respond any coordinator request with 
> {{NOT_COORDINATOR}}.
> Here are two proposals on top of my head:
> 1) Let the scheduled load / unload function to keep the passed in leader 
> epoch, and also materialize the epoch in memory. Then when execute the 
> unloading check against the leader epoch.
> 2) This may be a bit simpler: using a single background thread working on a 
> FIFO queue of loading / unloading jobs, since the caller are actually 
> synchronized on replica manager and order preserved, the enqueued loading / 
> unloading job would be correctly ordered as well. In that case we would avoid 
> the reordering. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-744: Migrate TaskMetadata and ThreadMetadata to an interface with internal implementation

2021-06-05 Thread Guozhang Wang
Hi Josep,

I think the most significant change would be, "localThreadsMetadata"
returns a StreamsMetadata instead of ThreadMetadata, and StreamsMetadata
would expose a new API to return a set of ThreadMetadata.

All others (including the repackaging and splitting of interface / impl)
are minor indeed.

After a second thought, I feel it is not fair to squeeze in this
significant change into your KIP, without the community having a separating
discussion about it, so I think we can table it for now and only align on
the other minor things: 1) have a o.a.k.streams.StreamsMetadata interface
(along with an internal implementation class), 2) deprecate the
o.a.k.streams.state.StreamsMetadata class and also the corresponding caller
of Streams that returns this class.

Guozhang

On Fri, Jun 4, 2021 at 4:13 PM Josep Prat 
wrote:

> Hi Guozhang,
> So if I understand correctly, it's only a couple of small changes that need
> to be made to this KIP to be aligned with KAFKA-12370, right?
>
> I'm guessing that StreamsMetadata would not only moved to o.a.k.streams but
> also be split with Interface + internal implementation, am I right?
>
>
> If that's the case, I could, most probably, update the KIP by Saturday
> afternoon CEST.
>
> Let me know if I understood you correctly.
>
> Thanks for the comments!
>
> ———
> Josep Prat
>
> Aiven Deutschland GmbH
>
> Immanuelkirchstraße 26, 10405 Berlin
>
> Amtsgericht Charlottenburg, HRB 209739 B
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> m: +491715557497
>
> w: aiven.io
>
> e: josep.p...@aiven.io
>
> On Sat, Jun 5, 2021, 00:11 Guozhang Wang  wrote:
>
> > Hello Josep,
> >
> > Thanks for the proposal! The write-up looks good to me in general. I'm
> just
> > wondering if you feel comfortable to align this with another JIRA/KIP
> > further down the road:
> >
> > https://issues.apache.org/jira/browse/KAFKA-12370
> >
> > Which tries to clean up the metadata hierarchy and the callers as
> > StreamsMetadata -> ThreadMetadata -> TaskMetadata, and most Streams APIs
> > return the top-level StreamsMetadata.
> >
> > It just have slight differences with the current proposal: 1) instead of
> > returning a ThreadMetadata, "localThreadsMetadata" returns
> > a StreamsMetadata, and 2) the `StreamsMetadata` would also be moved from
> > o.a.k.streams.state to o.a.k.streams.
> >
> > What do you think about this? It's totally okay if you are not
> comfortable
> > changing or expanding the scope of this KIP, that's totally fine with me
> as
> > well, and we can just change again in the future if necessary --- just
> > trying to see if we can align the direction on the first shot here :)
> >
> >
> > Guozhang
> >
> > On Fri, Jun 4, 2021 at 1:51 AM Bruno Cadonna  wrote:
> >
> > > Thanks, Josep!
> > >
> > > +1 (binding)
> > >
> > > Bruno
> > >
> > > On 04.06.21 10:27, Josep Prat wrote:
> > > > Hi all,
> > > > I'd like to call for a vote on KIP-744: Migrate TaskMetadata and
> > > > ThreadMetadata to an interface with internal implementation
> > > > KIP page can be found here:
> > https://cwiki.apache.org/confluence/x/XIrOCg
> > > > Discussion thread can be found here:
> > > >
> > >
> >
> https://lists.apache.org/x/thread.html/r1d20fb6dbd6b01bb84cbb17e992f4d08308980dfc5f2e0a68d674413@%3Cdev.kafka.apache.org%3E
> > > >
> > > > As it was pointed out, hopefully this KIP can be approved before the
> > 3.0
> > > > deadline, as we can clean up some non naming compliant methods
> recently
> > > > introduced.
> > > >
> > > >
> > > > Please note that the scope of the KIP increased during the discussion
> > to
> > > > also include ThreadMetadata.
> > > >
> > > > Thank you,
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang


Re: [VOTE] KIP-744: Migrate TaskMetadata and ThreadMetadata to an interface with internal implementation

2021-06-04 Thread Guozhang Wang
Hello Josep,

Thanks for the proposal! The write-up looks good to me in general. I'm just
wondering if you feel comfortable to align this with another JIRA/KIP
further down the road:

https://issues.apache.org/jira/browse/KAFKA-12370

Which tries to clean up the metadata hierarchy and the callers as
StreamsMetadata -> ThreadMetadata -> TaskMetadata, and most Streams APIs
return the top-level StreamsMetadata.

It just have slight differences with the current proposal: 1) instead of
returning a ThreadMetadata, "localThreadsMetadata" returns
a StreamsMetadata, and 2) the `StreamsMetadata` would also be moved from
o.a.k.streams.state to o.a.k.streams.

What do you think about this? It's totally okay if you are not comfortable
changing or expanding the scope of this KIP, that's totally fine with me as
well, and we can just change again in the future if necessary --- just
trying to see if we can align the direction on the first shot here :)


Guozhang

On Fri, Jun 4, 2021 at 1:51 AM Bruno Cadonna  wrote:

> Thanks, Josep!
>
> +1 (binding)
>
> Bruno
>
> On 04.06.21 10:27, Josep Prat wrote:
> > Hi all,
> > I'd like to call for a vote on KIP-744: Migrate TaskMetadata and
> > ThreadMetadata to an interface with internal implementation
> > KIP page can be found here: https://cwiki.apache.org/confluence/x/XIrOCg
> > Discussion thread can be found here:
> >
> https://lists.apache.org/x/thread.html/r1d20fb6dbd6b01bb84cbb17e992f4d08308980dfc5f2e0a68d674413@%3Cdev.kafka.apache.org%3E
> >
> > As it was pointed out, hopefully this KIP can be approved before the 3.0
> > deadline, as we can clean up some non naming compliant methods recently
> > introduced.
> >
> >
> > Please note that the scope of the KIP increased during the discussion to
> > also include ThreadMetadata.
> >
> > Thank you,
> >
>


-- 
-- Guozhang


Re: [VOTE] KIP-726: Make the "cooperative-sticky, range" as the default assignor

2021-06-03 Thread Guozhang Wang
Thanks Luke, I'm +1 on this proposal.

Guozhang

On Wed, Jun 2, 2021 at 8:16 PM Luke Chen  wrote:

> Hi all,
> I'd like to call for a vote on KIP-726: Make the "cooperative-sticky,
> range" as the default assignor.
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177048248
>
> This KIP is still waiting for the prerequisite stories to get completed,
> but it should be soon. Hopefully this can be put into V3.0 since
> cooperative rebalancing is a major
> improvement to the life of a consumer group (and its operator).
>
> The discussion thread can be found here:
>
> https://lists.apache.org/thread.html/ref63417ea84a58c9068ea025b3ad38ca2cc340f5818ac07cd83452b7%40%3Cdev.kafka.apache.org%3E
>
> Thank you.
> Luke
>


-- 
-- Guozhang


[jira] [Created] (KAFKA-12887) Do not trigger user-customized ExceptionalHandler for RTE

2021-06-03 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12887:
-

 Summary: Do not trigger user-customized ExceptionalHandler for RTE
 Key: KAFKA-12887
 URL: https://issues.apache.org/jira/browse/KAFKA-12887
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Guozhang Wang


Today in StreamThread we have a try-catch block that captures all {{Throwable 
e}} and then triggers {{this.streamsUncaughtExceptionHandler.accept(e)}}. 
However, there are possible RTEs such as IllegalState/IllegalArgument 
exceptions which are usually caused by bugs, etc. In such cases we should not 
let users to decide what to do with these exceptions, but should let Streams 
itself to enforce the decision, e.g. in the IllegalState/IllegalArgument we 
should fail fast to notify the potential error.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTING] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2021-06-02 Thread Guozhang Wang
Thanks George,

I'm generally +1 on the proposed change here, would like to review the
detailed PR to see if there's any devils in the details.


Guozhang

On Tue, Jun 1, 2021 at 7:19 PM Guoqiang Shu  wrote:

> Dear All,
> We would like to get a vote on this proposal. The implementation is linked
> to the KIP, and we have ran this in our production setup for a while.
> https://issues.apache.org/jira/browse/KAFKA-12793
>
> Thanks in advance!
>
> //George//
>


-- 
-- Guozhang


Re: Permission to create KIPs

2021-05-26 Thread Guozhang Wang
Hello Josep,

I saw your id "josep.prat" is already added on the wiki, you should be able
to create KIPs now.

On Wed, May 26, 2021 at 10:09 AM Josep Prat 
wrote:

> Hi there,
>
> I would like to have permissions to create KIPs on the wiki.
> My username should be josep.prat.
>
> Thanks,
>
> ———
> Josep Prat
>
> Aiven Deutschland GmbH
>
> Immanuelkirchstraße 26, 10405 Berlin
>
> Amtsgericht Charlottenburg, HRB 209739 B
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> m: +491715557497
>
> w: aiven.io
>
> e: josep.p...@aiven.io
>


-- 
-- Guozhang


Re: EXTERNAL: Re: [DISCUSS] Apache Kafka 3.0.0 release plan with new updated dates

2021-05-26 Thread Guozhang Wang
Hi Meiling,

Could you double check if the email sent is from non-sanboxed "
m...@rockwellautomation.com" instead of "m...@rockwellautomation.com.invalid"?
I believe the dev-unsubscr...@kafka.apache.org does work.

On Wed, May 26, 2021 at 3:32 PM Meiling He
 wrote:

> Can anyone instruct me how to unsubscribe from this mail list, please!!
> dev-unsubscr...@kafka.apache.org doesn’t work.
>
> 
> From: Ismael Juma 
> Sent: Wednesday, May 26, 2021 5:29:03 PM
> To: dev 
> Subject: EXTERNAL: Re: [DISCUSS] Apache Kafka 3.0.0 release plan with new
> updated dates
>
> [Use caution with links & attachments]
>
>
>
> Thanks Konstantine, +1 from me.
>
> Ismael
>
> On Wed, May 26, 2021 at 2:48 PM Konstantine Karantasis
>  wrote:
>
> > Hi all,
> >
> > Please find below the updated release plan for the Apache Kafka 3.0.0
> > release.
> >
> >
> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177046466__;!!JhrIYaSK6lFZ!-k0g-tIlibi6VoGNY2-ZLKPCfAz2RJ2ICdx4IA4TfulZQIp7Gx7rhD9f0KyCN4Gh2VwyHw$
> >
> > New suggested dates for the release are as follows:
> >
> > KIP Freeze is 09 June 2021 (same date as in the initial plan)
> > Feature Freeze is 30 June 2021 (new date, extended by two weeks)
> > Code Freeze is 14 July 2021 (new date, extended by two weeks)
> >
> > At least two weeks of stabilization will follow Code Freeze.
> >
> > The release plan is up to date and currently includes all the approved
> KIPs
> > that are targeting 3.0.0.
> >
> > Please let me know if you have any objections with the recent extension
> of
> > Feature Freeze and Code Freeze or any other concerns.
> >
> > Regards,
> > Konstantine
> >
>


-- 
-- Guozhang


Re: [VOTE] KIP-741: Change default serde to be null

2021-05-24 Thread Guozhang Wang
+1, thanks!

On Mon, May 24, 2021 at 1:51 PM Leah Thomas 
wrote:

> Hi,
>
> I'd like to kick-off voting for KIP-741: Change default serde to be null.
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-741%3A+Change+default+serde+to+be+null
> >
> The
> discussion is linked on the KIP for context.
>
> Cheers,
> Leah
>


-- 
-- Guozhang


Re: [VOTE] KIP-743: Remove config value 0.10.0-2.4 of Streams built-in metrics version config

2021-05-21 Thread Guozhang Wang
+1, thanks!

On Fri, May 21, 2021 at 12:45 AM Bruno Cadonna  wrote:

> Hi,
>
> I'd like to start a vote on KIP-743 that proposes to remove config value
> 0.10.0-2.4 from Streams config built.in.metrics.version.
>
> https://cwiki.apache.org/confluence/x/uIfOCg
>
> Best,
> Bruno
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-743: Deprecate config value 0.10.0-2.4 of Streams built-in metrics version config

2021-05-20 Thread Guozhang Wang
I'm also a bit leaning towards dropping the config value in 3.0 directly.

For the config itself, I'd rather keep it a bit longer in case we may
update the metrics again (there are a couple near term projects on the
roadmap including named topology, restoration threads etc).


Guozhang

On Thu, May 20, 2021 at 8:50 AM John Roesler  wrote:

> Thanks for opening this discussion, Bruno!
>
> I wonder if it would be ok to simply drop that config
> value in 3.0 instead of 4.0.
>
> Config values are a bit different than Java APIs, since many
> users would be using property files or string constants
> instead of referencing the "official" constants defined in
> StreamsConfig.java. As a result, we can expect the
> deprecation warning on the java constant to be only
> marginally effective.
>
> Plus, the config value "latest" has been the default
> already. I don't think it's a stretch for users of the
> "0.10.0-2.4" config value to have seen this coming and to
> have migrated to "latest" already.
>
> If you do agree that we can treat it as having been
> deprecated since 2.5, I'd further suggest that we no longer
> need the "built.in.metrics.version" config at all.
>
> WDYT?
> Thanks,
> John
>
> On Thu, 2021-05-20 at 13:57 +0200, Bruno Cadonna wrote:
> > Hi,
> >
> > I would like to propose KIP-743 to deprecate the old structures of the
> > built-in metrics in Streams.
> >
> > https://cwiki.apache.org/confluence/x/uIfOCg
> >
> > Best,
> > Bruno
>
>
>

-- 
-- Guozhang


Re: [VOTE] KIP-740: Use TaskId instead of String for the taskId field in TaskMetadata

2021-05-20 Thread Guozhang Wang
A quick note: since we changed the constructor of TaskMetadata as well in
the PR, we'd need to add that in the KIP wiki as well. Personally I think
it is okay to just replace the constructor as you did in the PR rather than
adding/deprecating --- I would even advocate for replacing the `taskId`
function with the new return type without introducing a new one with
different name, but I knew since this is not favored by most people :).

On Wed, May 19, 2021 at 11:01 PM Guozhang Wang  wrote:

> Thanks Sophie, I like the current proposal better compared to adding a new
> TaskInfo class. +1 !
>
> Guozhang
>
> On Wed, May 19, 2021 at 4:58 PM Sophie Blee-Goldman
>  wrote:
>
>> Just a friendly ping to please check out the finalized proposal of the KIP
>> and (re)cast your votes
>>
>> Thanks!
>> Sophie
>>
>> On Sun, May 16, 2021 at 7:28 PM Sophie Blee-Goldman 
>> wrote:
>>
>> > Thanks John. I have moved the discussion over to a [DISCUSS] thread,
>> where
>> > it should have been taking place all
>> > along. I'll officially kick off the vote again, but since this KIP has
>> > been through a significant overhauled since it's initial
>> > proposal, the previous votes cast will be invalidated. Please make a
>> pass
>> > on the latest KIP and (re)cast your vote.
>> >
>> > If you have any concerns or comments beyond just small questions, please
>> > take them to the discussion thread.
>> >
>> > Thanks!
>> > Sophie
>> >
>> > On Fri, May 14, 2021 at 10:12 AM John Roesler 
>> wrote:
>> >
>> >> Thanks for these updates, Sophie,
>> >>
>> >> Unfortunately, I have some minor suggestions:
>> >>
>> >> 1. "Topic Group" is a vestigial term from the early days of
>> >> the codebase. We call a "topic group" a "subtopology" in the
>> >> public interface now (although "topic group" is still used
>> >> internally some places). For user-facing consistency, we
>> >> should also use "subtopologyId" in your proposal.
>> >>
>> >> 2. I'm wondering if it's really necessary to introduce this
>> >> interface at all. I think your motivation is to be able to
>> >> get the subtopologyId and partition via TaskMetadata, right?
>> >> Why not just add those methods to TaskMetadata? Stepping
>> >> back, the concept of metadata about an identifier is a bit
>> >> elaborate.
>> >>
>> >> Sorry for thrashing what you were hoping would be a quick,
>> >> uncontroversial KIP.
>> >>
>> >> Thanks for your consideration,
>> >> John
>> >>
>> >> On Thu, 2021-05-13 at 19:35 -0700, Sophie Blee-Goldman
>> >> wrote:
>> >> > One last update: we will not actually remove the existing
>> >> > o.a.k.streams.processor.TaskId class, but only
>> >> > deprecate it, along with any methods that returned it (ie the
>> getters on
>> >> > ProcessorContext and StateStoreContext)
>> >> >
>> >> > Internally, everything will still be converted to use the new
>> internal
>> >> > TaskId class, and public TaskIdMetadata interface,
>> >> > where appropriate.
>> >> >
>> >> >
>> >> >
>> >> > On Thu, May 13, 2021 at 6:42 PM Sophie Blee-Goldman <
>> >> sop...@confluent.io>
>> >> > wrote:
>> >> >
>> >> > > Thanks all. I updated the KIP slightly since there is some
>> ambiguity
>> >> > > around whether the existing TaskId class is
>> >> > > currently part of the public API or not. To settle the matter, I
>> have
>> >> > > introduced a new public TaskId interface that
>> >> > > exposes the metadata, and moved the existing TaskId class to the
>> >> internals
>> >> > > package. The KIP <https://cwiki.apache.org/confluence/x/vYTOCg>
>> has
>> >> been
>> >> > > updated
>> >> > > with the proposed API changes.
>> >> > >
>> >> > > @Guozhang Wang  : I decided to leave this
>> new
>> >> > > TaskId interface in o.a.k.streams.processor since that's where the
>> >> > > TaskMetadata class is, along with the other related metadata
>> classes
>> >> (eg
>> >> > > ThreadMetadata). I do agree it makes
>

Re: [VOTE] KIP-740: Use TaskId instead of String for the taskId field in TaskMetadata

2021-05-20 Thread Guozhang Wang
Thanks Sophie, I like the current proposal better compared to adding a new
TaskInfo class. +1 !

Guozhang

On Wed, May 19, 2021 at 4:58 PM Sophie Blee-Goldman
 wrote:

> Just a friendly ping to please check out the finalized proposal of the KIP
> and (re)cast your votes
>
> Thanks!
> Sophie
>
> On Sun, May 16, 2021 at 7:28 PM Sophie Blee-Goldman 
> wrote:
>
> > Thanks John. I have moved the discussion over to a [DISCUSS] thread,
> where
> > it should have been taking place all
> > along. I'll officially kick off the vote again, but since this KIP has
> > been through a significant overhauled since it's initial
> > proposal, the previous votes cast will be invalidated. Please make a pass
> > on the latest KIP and (re)cast your vote.
> >
> > If you have any concerns or comments beyond just small questions, please
> > take them to the discussion thread.
> >
> > Thanks!
> > Sophie
> >
> > On Fri, May 14, 2021 at 10:12 AM John Roesler 
> wrote:
> >
> >> Thanks for these updates, Sophie,
> >>
> >> Unfortunately, I have some minor suggestions:
> >>
> >> 1. "Topic Group" is a vestigial term from the early days of
> >> the codebase. We call a "topic group" a "subtopology" in the
> >> public interface now (although "topic group" is still used
> >> internally some places). For user-facing consistency, we
> >> should also use "subtopologyId" in your proposal.
> >>
> >> 2. I'm wondering if it's really necessary to introduce this
> >> interface at all. I think your motivation is to be able to
> >> get the subtopologyId and partition via TaskMetadata, right?
> >> Why not just add those methods to TaskMetadata? Stepping
> >> back, the concept of metadata about an identifier is a bit
> >> elaborate.
> >>
> >> Sorry for thrashing what you were hoping would be a quick,
> >> uncontroversial KIP.
> >>
> >> Thanks for your consideration,
> >> John
> >>
> >> On Thu, 2021-05-13 at 19:35 -0700, Sophie Blee-Goldman
> >> wrote:
> >> > One last update: we will not actually remove the existing
> >> > o.a.k.streams.processor.TaskId class, but only
> >> > deprecate it, along with any methods that returned it (ie the getters
> on
> >> > ProcessorContext and StateStoreContext)
> >> >
> >> > Internally, everything will still be converted to use the new internal
> >> > TaskId class, and public TaskIdMetadata interface,
> >> > where appropriate.
> >> >
> >> >
> >> >
> >> > On Thu, May 13, 2021 at 6:42 PM Sophie Blee-Goldman <
> >> sop...@confluent.io>
> >> > wrote:
> >> >
> >> > > Thanks all. I updated the KIP slightly since there is some ambiguity
> >> > > around whether the existing TaskId class is
> >> > > currently part of the public API or not. To settle the matter, I
> have
> >> > > introduced a new public TaskId interface that
> >> > > exposes the metadata, and moved the existing TaskId class to the
> >> internals
> >> > > package. The KIP <https://cwiki.apache.org/confluence/x/vYTOCg> has
> >> been
> >> > > updated
> >> > > with the proposed API changes.
> >> > >
> >> > > @Guozhang Wang  : I decided to leave this
> new
> >> > > TaskId interface in o.a.k.streams.processor since that's where the
> >> > > TaskMetadata class is, along with the other related metadata classes
> >> (eg
> >> > > ThreadMetadata). I do agree it makes
> >> > > more sense for them to be under o.a.k.streams, but I'd rather leave
> >> them
> >> > > together for now.
> >> > >
> >> > > Please let me know if there are any concerns, or you want to redact
> >> your
> >> > > vote :)
> >> > >
> >> > > -Sophie
> >> > >
> >> > > On Thu, May 13, 2021 at 3:11 PM Guozhang Wang 
> >> wrote:
> >> > >
> >> > > > +1
> >> > > >
> >> > > > On a hindsight, maybe TaskId should not really be in
> >> > > > `org.apache.kafka.streams.processor` but rather just in
> >> `o.a.k.streams`,
> >> > > > but maybe not worth pulling it up now :)
> >> > > >
> >> > > > Guozhang
> >&g

Re: [DISCUSS] KIP-741: Change default serde to be null

2021-05-19 Thread Guozhang Wang
Thanks Sophie. I think not piggy-backing on TopologyException makes sense.

It just occurs to me that today we already have similar situations even
with this config default to Bytes, that is the other
`DEFAULT_WINDOWED_KEY/VALUE_SERDE_INNER_CLASS` config, whose default is
actually null. Quickly checking the code here, I think we throw
StreamsException when they are found not defined during runtime, we
actually throw the `*ConfigException*`. So for consistency we could just
use that exception as well.

Guozhang


On Wed, May 19, 2021 at 3:24 PM Sophie Blee-Goldman
 wrote:

> To be honest I'm not really a fan of reusing the TopologyException since it
> feels like
> a bit of a stretch from a user point of view to classify Serde
> misconfiguration as a
> topology issue.
>
> I personally think a StreamsException would be acceptable, but I would also
> propose
> to introduce a new type of exception, something like
> SerdeConfigurationException or
> so. We certainly don't want to end up like the Producer API with its 500
> different
> exceptions. Luckily Streams is nowhere near that yet, in my opinion, and
> problems
> with Serde configuration are so common and well-defined that a dedicated
> exception
> feels very appropriate.
>
> If there are any other instances in the codebase where we throw a
> StreamsException
> for a Serde-related issue, this could also be migrated to the new exception
> type (not
> necessarily all at once, but gradually after this KIP)
>
> Thoughts?
>
> On Wed, May 19, 2021 at 10:31 AM Guozhang Wang  wrote:
>
> > Leah, thanks for the KIP.
> >
> > It looks good to me overall, just following up on @br...@confluent.io
> >  's question about exception: what about using the
> > `TopologyException` class? I know that currently it is only thrown during
> > the topology parsing phase, not at the streams construction, but I feel
> we
> > can extend its scope to cover both topology building and streams object
> > (i.e. taking the topology and the config) construction time as well since
> > part of the construction is again to re-write / augment the topology.
> >
> > Guozhang
> >
> >
> > On Wed, May 19, 2021 at 8:44 AM Leah Thomas  >
> > wrote:
> >
> > > Hi Sophie,
> > >
> > > Thanks for catching that. These are existing methods inside of
> > > `StreamsConfig` that will return null (the new default) instead of byte
> > > array serde (the old default). Both `StreamsConfig` and
> > > `defaultKeySerde`/`defaultValueSerde` are public, so I assume these
> still
> > > count as part of the public API. I updated the KIP to include this
> > > information.
> > >
> > > Bruno - I was planning on including a specific message with the streams
> > > exception to indicate that either a serde needs to be passed in or a
> > > default needs to be set. I'm open to doing something more specific,
> > > perhaps something like a serde exception? WDYT? I was hoping that with
> > the
> > > message the streams exception would still give enough information for
> > users
> > > to debug the problem.
> > >
> > > Still hoping for a short discussion (; but thanks for the input so far!
> > >
> > > Leah
> > >
> > > On Wed, May 19, 2021 at 3:00 AM Bruno Cadonna 
> > wrote:
> > >
> > > > Hey Leah,
> > > >
> > > >  > what I think should be a small discussion
> > > >
> > > > Dangerous words, indeed! It seems like they trigger something in
> people
> > > ;-)
> > > >
> > > > Jokes apart!
> > > >
> > > > Did you consider throwing a more specific exception instead of a
> > > > StreamsException? Something that describes better the issue at hand.
> > > >
> > > > Best,
> > > > Bruno
> > > >
> > > >
> > > > On 19.05.21 01:19, Sophie Blee-Goldman wrote:
> > > > >>
> > > > >> what I think should be a small discussion
> > > > >
> > > > >
> > > > > Dangerous words :P
> > > > >
> > > > > I'm all for the proposal but I do have one question about something
> > in
> > > > the
> > > > > KIP. You list two methods called
> > > > > defaultKeySerde() and defaultValueSerde() but it's not clear to me
> > > where
> > > > > these are coming from. Are they
> > > > > new APIs you propose to add in this KIP? Are they existing methods
> in
> > > the
>

Re: [DISCUSS] KIP-741: Change default serde to be null

2021-05-19 Thread Guozhang Wang
Leah, thanks for the KIP.

It looks good to me overall, just following up on @br...@confluent.io
 's question about exception: what about using the
`TopologyException` class? I know that currently it is only thrown during
the topology parsing phase, not at the streams construction, but I feel we
can extend its scope to cover both topology building and streams object
(i.e. taking the topology and the config) construction time as well since
part of the construction is again to re-write / augment the topology.

Guozhang


On Wed, May 19, 2021 at 8:44 AM Leah Thomas 
wrote:

> Hi Sophie,
>
> Thanks for catching that. These are existing methods inside of
> `StreamsConfig` that will return null (the new default) instead of byte
> array serde (the old default). Both `StreamsConfig` and
> `defaultKeySerde`/`defaultValueSerde` are public, so I assume these still
> count as part of the public API. I updated the KIP to include this
> information.
>
> Bruno - I was planning on including a specific message with the streams
> exception to indicate that either a serde needs to be passed in or a
> default needs to be set. I'm open to doing something more specific,
> perhaps something like a serde exception? WDYT? I was hoping that with the
> message the streams exception would still give enough information for users
> to debug the problem.
>
> Still hoping for a short discussion (; but thanks for the input so far!
>
> Leah
>
> On Wed, May 19, 2021 at 3:00 AM Bruno Cadonna  wrote:
>
> > Hey Leah,
> >
> >  > what I think should be a small discussion
> >
> > Dangerous words, indeed! It seems like they trigger something in people
> ;-)
> >
> > Jokes apart!
> >
> > Did you consider throwing a more specific exception instead of a
> > StreamsException? Something that describes better the issue at hand.
> >
> > Best,
> > Bruno
> >
> >
> > On 19.05.21 01:19, Sophie Blee-Goldman wrote:
> > >>
> > >> what I think should be a small discussion
> > >
> > >
> > > Dangerous words :P
> > >
> > > I'm all for the proposal but I do have one question about something in
> > the
> > > KIP. You list two methods called
> > > defaultKeySerde() and defaultValueSerde() but it's not clear to me
> where
> > > these are coming from. Are they
> > > new APIs you propose to add in this KIP? Are they existing methods in
> the
> > > public API which will now return
> > > null, whereas they used to return the ByteArraySerde? If they're not a
> > > public API then you can remove them
> > > from the KIP, otherwise can you just update this section to clarify
> what
> > > class/file these belong to, etc?
> > >
> > > -Sophie
> > >
> > > On Tue, May 18, 2021 at 5:34 AM Leah Thomas
>  > >
> > > wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'd like to kick-off what I think should be a small discussion for
> > KIP-741:
> > >> Change default serde to be null.
> > >>
> > >> The wiki is here:
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-741%3A+Change+default+serde+to+be+null
> > >>
> > >>
> > >> Thanks,
> > >> Leah
> > >>
> > >
> >
>


-- 
-- Guozhang


[jira] [Created] (KAFKA-12812) Consider refactoring state store registration path

2021-05-18 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12812:
-

 Summary: Consider refactoring state store registration path
 Key: KAFKA-12812
 URL: https://issues.apache.org/jira/browse/KAFKA-12812
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Guozhang Wang


Today our state store registration call path within the stateManager (both 
local and global) is like this: 

{code}
for each store: store.init(store, context)
   -> context.register(root, callback)
   -> stateManager.registerStore(store, callback)
{code}

One can see that, we have an awkward loop from stateManager back to 
stateManager, and we require users to not forget calling context.register(root, 
callback). We do this only in order to let users pass the customized callback 
implementation to the stateManager.

What about a different path like this:

1) We add a new interface in StateStore, like `StateRestoreCallback 
getCallback()` that each impl class need to provide.
2) We remove the `context.register(root, callback)` call; and because of that, 
we do not need to pass in `root` in the store.init as well.
3) stateManager just call `store.init(context)` (without the first parameter), 
and then put the store along with its restore callback into the map, without 
the separate `registerStore` function.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-740: Replace the public TaskId class with an interface

2021-05-16 Thread Guozhang Wang
Sophie,

Thanks for the nice summary. Originally I'm leaning towards option 2) since
changing this name to others could be a pretty large conceptual change to
users, and I think the concerns you raised is good as well that we cannot
reuse the `taskId()` function names any more..

After pondering on it a bit, what about 1) keeping `TaskId` in the current
package as that's not our main concern anyways, 2) changing `TaskId` to an
abstract class and deprecate all util functions / fields that we do not
want to expose any longer, and 3) introducing a new class that extends the
o.a.k.streams.processor.TaskId for internal implementations?


Guozhang


On Sun, May 16, 2021 at 7:19 PM Sophie Blee-Goldman
 wrote:

> Sorry, one correction regarding the discussion on naming below: the other
> con for reusing TaskId as the interface name is that we would not be able
> to follow the standard getter naming conventions and introduce new taskId()
> APIs to replace the deprecated ones, as the deprecated methods have already
> claimed that name. So we would have to come up with a different name for
> these APIs, for example taskInfo()  would require the TaskId interface
> which is a bit confusing. Or using getTaskId() temporarily until the
> deprecated methods can be removed, which goes against the established
> convention of no "get" in the getters. Another possibility is something
> like "taskIdInfo()" or "taskIdMetadata()" so that it's still mentioning
> taskId somewhere in the name.
>
> Thoughts? Given this awkwardness I'm somewhat leaning towards renaming the
> interface after all, as in the example above to "TaskInfo". Or we could
> take a similar approach and just build on the original TaskId name, calling
> it TaskIdInfo or something like that. Any thoughts? As always the KIP has
> been updated with what I am proposing, so you can take a look yourself.
>
> On Sun, May 16, 2021 at 7:07 PM Sophie Blee-Goldman 
> wrote:
>
> > I'm moving the discussion on KIP-740 to an actual [DISCUSS] thread since
> > it turned out to be more nuanced
> > and grew in scope since I initially proposed it. Thanks John for
> > the suggestions/questions on the [VOTE] thread,
> > I have copied them over and addressed them below.
> >
> > Before that, I just want to apologize to everyone for bringing this to a
> > vote immediately. I truly thought this would
> > be a trivial KIP that just fixed what seemed like a bug in the API, but
> it
> > slowly grew in scope as I wrangled with
> > the question of whether TaskId was already a public API. To clarify, it
> is
> > -- a TaskId is returned by the
> > Processor/StateStoreContext.taskId() methods, I just hadn't noticed those
> > APIs at the time of writing this KIP.
> > After finding those APIs and confirming that it was public, I began to
> > wonder  whether it really should be.
> >
> > 1. "Topic Group" is a vestigial term from the early days of
> >> the codebase. We call a "topic group" a "subtopology" in the
> >> public interface now (although "topic group" is still used
> >> internally some places). For user-facing consistency, we
> >> should also use "subtopologyId" in your proposal.
> >
> >
> > 100% agree, I remember being very confused by the term "topic group"
> when
> > I was first getting to know
> > the Streams code base, and wondering why it wasn't just called
> > "subtopology". This would be the perfect
> > opportunity to migrate to the more clear terminology where appropriate.
> > I've updated the KIP to reflect this.
> >
> > 2. I'm wondering if it's really necessary to introduce this
> >> interface at all. I think your motivation is to be able to
> >> get the subtopologyId and partition via TaskMetadata, right?
> >> Why not just add those methods to TaskMetadata? Stepping
> >> back, the concept of metadata about an identifier is a bit
> >> elaborate.
> >
> >
> > I think if the only issue we wanted to address was the TaskMetadata
> > returning a string representation of a
> > TaskId rather than the real thing, as was originally the case, then I
> > would agree with just  unfolding the API
> > to expose the TaskId fields directly. However while digging into the
> > question of whether TaskId was public, it
> > became clear that the class had some awkward features for a public API:
> > public fields with no getters, many
> > utility methods for internal functionality that had to be made public and
> > cluttered up the API, etc. So I believe
> > this KIP can and should take the opportunity to clean up the TaskId API
> > altogether, not just for one getter in
> > TaskMetadata. And we can't deprecate an API without adding a replacement
> > for users to migrate onto, since
> > the concept of a TaskId itself is still very much relevant. That's the
> > primary motivation behind introducing this
> > new interface. Of course technically this could be done by retaining the
> > existing TaskId class and just moving
> > the things that should not be public to the new internal class, but
> that's
> 

Re: [VOTE] KIP-740: Use TaskId instead of String for the taskId field in TaskMetadata

2021-05-13 Thread Guozhang Wang
+1

On a hindsight, maybe TaskId should not really be in
`org.apache.kafka.streams.processor` but rather just in `o.a.k.streams`,
but maybe not worth pulling it up now :)

Guozhang

On Thu, May 13, 2021 at 1:58 PM Walker Carlson
 wrote:

> +1 from me! (non-binding)
>
> Walker
>
> On Thu, May 13, 2021 at 1:53 PM Sophie Blee-Goldman
>  wrote:
>
> > Hey all,
> >
> > I'm just going to take this KIP straight to a vote since it should be a
> > trivial and uncontroversial change. Of course please raise any concerns
> > should they come up, and I can take things to a DISCUSS thread.
> >
> > The KIP is a simple change to move from String to TaskId for the taskID
> > field of TaskMetadata.
> >
> > KIP-740: Use TaskId instead of String for the taskId field in
> TaskMetadata
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-740%3A+Use+TaskId+instead+of+String+for+the+taskId+field+in+TaskMetadata
> > >
> >
> > Cheers,
> > Sophie
> >
>


-- 
-- Guozhang


Re: [VOTE] KIP-730: Producer ID generation in KRaft mode

2021-05-06 Thread Guozhang Wang
LGTM! Thanks David.

On Thu, May 6, 2021 at 10:03 AM Ron Dagostino  wrote:

> Thanks again for the KIP, David.  +1 (non-binding) from me.
>
> Ron
>
> On Tue, May 4, 2021 at 11:21 AM David Arthur  wrote:
>
> > Hello everyone, I'd like to start the vote on KIP-730 which adds a new
> RPC
> > for producer ID generation in KRaft mode.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-730%3A+Producer+ID+generation+in+KRaft+mode
> >
> >
> >
> > --
> > David Arthur
> >
>


-- 
-- Guozhang


[jira] [Resolved] (KAFKA-12683) Remove deprecated "UsePreviousTimeOnInvalidTimeStamp"

2021-05-01 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12683.
---
Fix Version/s: 3.0.0
 Assignee: Guozhang Wang
   Resolution: Fixed

> Remove deprecated "UsePreviousTimeOnInvalidTimeStamp"
> -
>
> Key: KAFKA-12683
> URL: https://issues.apache.org/jira/browse/KAFKA-12683
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-735: Increase default consumer session timeout

2021-04-28 Thread Guozhang Wang
+1. Thanks Jason!

On Wed, Apr 28, 2021 at 12:50 PM Gwen Shapira 
wrote:

> I love this improvement.
>
> +1 (binding)
>
> On Wed, Apr 28, 2021 at 10:46 AM Jason Gustafson
> 
> wrote:
>
> > Hi All,
> >
> > I'd like to start a vote on KIP-735:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-735%3A+Increase+default+consumer+session+timeout
> > .
> > +1
> > from myself obviously
> >
> > -Jason
> >
>
>
> --
> Gwen Shapira
> Engineering Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


-- 
-- Guozhang


[jira] [Created] (KAFKA-12693) Consecutive rebalances with zombie instances may cause corrupted changelogs

2021-04-19 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12693:
-

 Summary: Consecutive rebalances with zombie instances may cause 
corrupted changelogs
 Key: KAFKA-12693
 URL: https://issues.apache.org/jira/browse/KAFKA-12693
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang


When an instance (or thread within an instance) of Kafka Streams has a soft 
failure and the group coordinator triggers a rebalance, that instance would 
temporarily become a "zombie writer". That is, this instance does not know 
there's already a new rebalance and hence its partitions have been migrated 
out, until it tries to commit and then got notified of the illegal-generation 
error and realize itself is the "zombie" already. During this period until the 
commit, this zombie may still be writing data to the changelogs of the migrated 
tasks as the new owner has already taken over and also writing to the 
changelogs.

When EOS is enabled, this would not be a problem: when the zombie tries to 
commit and got notified that it's fenced, its zombie appends would be aborted. 
With EOS disabled, though, such shared writes would be interleaved on the 
changelogs where a zombie append may arrive later after the new writer's 
append, effectively overwriting that new append.

Note that such interleaving writes do not necessarily cause corrupted data: as 
long as the new producer keep appending after the old zombie stops, and all the 
corrupted keys are overwritten again by the new values, then it is fine. 
However, if there are consecutive rebalances where right after the changelogs 
are corrupted by zombie writers, and before the new writer can overwrite them 
again, the task gets migrated again and needs to be restored from changelogs, 
the old values would be restored instead of the new values, effectively causing 
data loss.

Although this should be a rare event, we should fix it asap still. One idea is 
to have producers get a PID even under ALOS: that is, we set the transactional 
id in the producer config, but did not trigger any txn APIs; when there are 
zombie producers, they would then be immediately fenced on appends and hence 
there's no interleaved appends. I think this may require a KIP still, since 
today one has to call initTxn in order to register and get the PID.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] Apache Kafka 2.8.0

2021-04-19 Thread Guozhang Wang
This is great! Thanks to everyone who has contributed to the release.

On Mon, Apr 19, 2021 at 9:36 AM John Roesler  wrote:

> The Apache Kafka community is pleased to announce the
> release for Apache Kafka 2.8.0
>
> Kafka 2.8.0 includes a number of significant new features.
> Here is a summary of some notable changes:
>
> * Early access of replace ZooKeeper with a self-managed
> quorum
> * Add Describe Cluster API
> * Support mutual TLS authentication on SASL_SSL listeners
> * JSON request/response debug logs
> * Limit broker connection creation rate
> * Topic identifiers
> * Expose task configurations in Connect REST API
> * Update Streams FSM to clarify ERROR state meaning
> * Extend StreamJoined to allow more store configs
> * More convenient TopologyTestDriver construtors
> * Introduce Kafka-Streams-specific uncaught exception
> handler
> * API to start and shut down Streams threads
> * Improve TimeWindowedDeserializer and TimeWindowedSerde to
> handle window size
> * Improve timeouts and retries in Kafka Streams
>
> All of the changes in this release can be found in the
> release notes:
> https://www.apache.org/dist/kafka/2.8.0/RELEASE_NOTES.html
>
>
> You can download the source and binary release (Scala 2.12
> and 2.13) from:
> https://kafka.apache.org/downloads#2.8.0
>
> 
> ---
>
>
> Apache Kafka is a distributed streaming platform with four
> core APIs:
>
>
> ** The Producer API allows an application to publish a
> stream records to one or more Kafka topics.
>
> ** The Consumer API allows an application to subscribe to
> one or more topics and process the stream of records
> produced to them.
>
> ** The Streams API allows an application to act as a stream
> processor, consuming an input stream from one or more topics
> and producing an output stream to one or more output topics,
> effectively transforming the input streams to output
> streams.
>
> ** The Connector API allows building and running reusable
> producers or consumers that connect Kafka topics to existing
> applications or data systems. For example, a connector to a
> relational database might capture every change to a table.
>
>
> With these APIs, Kafka can be used for two broad classes of
> application:
>
> ** Building real-time streaming data pipelines that reliably
> get data between systems or applications.
>
> ** Building real-time streaming applications that transform
> or react to the streams of data.
>
>
> Apache Kafka is in use at large and small companies
> worldwide, including Capital One, Goldman Sachs, ING,
> LinkedIn, Netflix, Pinterest, Rabobank, Target, The New York
> Times, Uber, Yelp, and Zalando, among others.
>
> A big thank you for the following 128 contributors to this
> release!
>
> 17hao, abc863377, Adem Efe Gencer, Alexander Iskuskov, Alok
> Nikhil, Anastasia Vela, Andrew Lee, Andrey Bozhko, Andrey
> Falko, Andy Coates, Andy Wilkinson, Ankit Kumar, APaMio,
> Arjun Satish, ArunParthiban-ST, A. Sophie Blee-Goldman,
> Attila Sasvari, Benoit Maggi, bertber, bill, Bill Bejeck,
> Bob Barrett, Boyang Chen, Brajesh Kumar, Bruno Cadonna,
> Cheng Tan, Chia-Ping Tsai, Chris Egerton, CHUN-HAO TANG,
> Colin Patrick McCabe, Colin P. Mccabe, Cyrus Vafadari, David
> Arthur, David Jacot, David Mao, dengziming, Dhruvil Shah,
> Dima Reznik, Dongjoon Hyun, Dongxu Wang, Emre Hasegeli,
> feyman2016, fml2, Gardner Vickers, Geordie, Govinda Sakhare,
> Greg Harris, Guozhang Wang, Gwen Shapira, Hamza Slama,
> high.lee, huxi, Igor Soarez, Ilya Ganelin, Ismael Juma, Ivan
> Ponomarev, Ivan Yurchenko, jackyoh, James Cheng, James
> Yuzawa, Jason Gustafson, Jesse Gorzinski, Jim Galasyn, John
> Roesler, Jorge Esteban Quilcate Otoya, José Armando García
> Sancio, Julien Chanaud, Julien Jean Paul Sirocchi, Justine
> Olshan, Kengo Seki, Kowshik Prakasam, leah, Lee Dongjin,
> Levani Kokhreidze, Lev Zemlyanov, Liju John, Lincong Li,
> Lucas Bradstreet, Luke Chen, Manikumar Reddy, Marco Aurelio
> Lotz, mathieu, Matthew Wong, Matthias J. Sax, Matthias
> Merdes, Michael Bingham, Michael G. Noll, Mickael Maison,
> Montyleo, mowczare, Nikolay, Nikolay Izhikov, Ning Zhang,
> Nitesh Mor, Okada Haruki, panguncle, parafiend, Patrick
> Dignan, Prateek Agarwal, Prithvi, Rajini Sivaram, Raman
> Verma, Ramesh Krishnan M, Randall Hauch, Richard
> Fussenegger, Rohan, Rohit Deshpande, Ron Dagostino, Samuel
> Cantero, Sanket Fajage, Scott Hendricks, Shao Yang Hong,
> ssugar, Stanislav Kozlovski, Stanislav Vodetskyi, tang7526,
> Thorsten Hake, Tom Bentley, vamossagar12, Viktor Somogyi-
> Vass, voffcheg109, Walker Carlson, wenbingshen, wycc,
> xakassi, Xavier Léauté, Yilong Chang, zhangyue19921010
>
> We welcome your help and feedback. For more information on
> how to report problems, and to get involved, visit the
> project website at https://kafka.apache.org/
>
> Thank you!
>
>
> Regards,
>
> John Roesler
>
>

-- 
-- Guozhang


[jira] [Created] (KAFKA-12683) Remove deprecated "UsePreviousTimeOnInvalidTimeStamp"

2021-04-18 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12683:
-

 Summary: Remove deprecated "UsePreviousTimeOnInvalidTimeStamp"
 Key: KAFKA-12683
 URL: https://issues.apache.org/jira/browse/KAFKA-12683
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12633) Remove deprecated "TopologyTestDriver#pipeInput / readOutput"

2021-04-18 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12633.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

> Remove deprecated "TopologyTestDriver#pipeInput / readOutput"
> -
>
> Key: KAFKA-12633
> URL: https://issues.apache.org/jira/browse/KAFKA-12633
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [ANNOUNCE] New Kafka PMC Member: Randall Hauch

2021-04-17 Thread Guozhang Wang
Congratulations Randall ! Well deserved.

Guozhang

On Fri, Apr 16, 2021 at 4:45 PM Matthias J. Sax  wrote:

> Hi,
>
> It's my pleasure to announce that Randall Hauch in now a member of the
> Kafka PMC.
>
> Randall has been a Kafka committer since Feb 2019. He has remained
> active in the community since becoming a committer.
>
>
>
> Congratulations Randall!
>
>  -Matthias, on behalf of Apache Kafka PMC
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-16 Thread Guozhang Wang
ignor indicates
> that it's safe to do so.
> (This only applies if multiple assignors are used, if the assignors are
> "cooperative-sticky" only then it
> will just start out and forever remain on COOPERATIVE, like in Streams)
>
> Since it's just the first rebalance, the choice of COOPERATIVE vs EAGER
> actually doesn't matter at
> all since the consumer won't own any partitions until it's joined the
> group. So we may as well continue
> the initial protocol selection strategy of "highest commonly supported
> protocol", but the point is that
> 3.0 applications will upgrade to COOPERATIVE as soon as they have any
> partitions. If you can think
> of a better way to phrase "New applications on 3.0 will enable cooperative
> rebalancing by default" then
> please let me know.
>
>
> Thanks for the response -- hope this makes sense so far, but I'm happy to
> elaborate any aspects of the
> proposal which aren't clear. I'll also update the ticket description
> for KAFKA-12477 with the latest.
>
>
> On Wed, Apr 14, 2021 at 12:03 PM Guozhang Wang  wrote:
>
> > Hello Sophie,
> >
> > Thanks for the detailed explanation, a few clarifying questions:
> >
> > 1) when the short-circuit is triggered, what would happen next? Would the
> > consumers switch back to EAGER, and try to re-join the group, and then
> upon
> > succeeding the next rebalance reset the flag to allow committing? Or
> would
> > we just fail the consumer immediately.
> >
> > 2) at the overview you mentioned "New applications on 3.0 will enable
> > cooperative rebalancing by default", but in the detailed description as
> > "With ["cooperative-sticky", "range”], the initial protocol will be EAGER
> > when the member first joins the group." which seems contradictory? If we
> > want to have cooperative behavior be the default, then with the
> > default ["cooperative-sticky", "range”] the member would start with
> > COOPERATIVE protocol right away.
> >
> >
> > Guozhang
> >
> >
> >
> > On Mon, Apr 12, 2021 at 5:19 AM Chris Egerton
>  > >
> > wrote:
> >
> > > Whoops, small correction--meant to say
> > > ConsumerRebalanceListener::onPartitionsLost, not
> > Consumer::onPartitionsLost
> > >
> > > On Mon, Apr 12, 2021 at 8:17 AM Chris Egerton 
> > wrote:
> > >
> > > > Hi Sophie,
> > > >
> > > > This sounds fantastic. I've made a note on KAFKA-12487 about being
> sure
> > > to
> > > > implement Consumer::onPartitionsLost to avoid unnecessary task
> failures
> > > on
> > > > consumer protocol downgrade, but besides that, I don't think things
> > could
> > > > get any smoother for Connect users or developers. The automatic
> > protocol
> > > > upgrade/downgrade behavior appears safe, intuitive, and pain-free.
> > > >
> > > > Really excited for this development and hoping we can see it come to
> > > > fruition in time for the 3.0 release!
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Fri, Apr 9, 2021 at 2:43 PM Sophie Blee-Goldman
> > > >  wrote:
> > > >
> > > >> 1) Yes, all of the above will be part of KAFKA-12477 (not KIP-726)
> > > >>
> > > >> 2) No, KAFKA-12638 would be nice to have but I don't think it's
> > > >> appropriate
> > > >> to remove
> > > >> the default implementation of #onPartitionsLost in 3.0 since we
> never
> > > gave
> > > >> any indication
> > > >> yet that we intend to remove it
> > > >>
> > > >> 3) Yes, this would be similar to when a Consumer drops out of the
> > group.
> > > >> It's always been
> > > >> possible for a member to miss a rebalance and have its partition be
> > > >> reassigned to another
> > > >> member, during which time both members would claim to own said
> > > partition.
> > > >> But this is safe
> > > >> because the member who dropped out is blocked from committing
> offsets
> > on
> > > >> that partition.
> > > >>
> > > >> On Fri, Apr 9, 2021 at 2:46 AM Luke Chen  wrote:
> > > >>
> > > >> > Hi Sophie,
> > > >> > That sounds great to take care of each case I can think of.
> > > >> > Questions:
> > >

Re: [VOTE] KIP-732: Deprecate eos-alpha and replace eos-beta with eos-v2

2021-04-15 Thread Guozhang Wang
+1 as well (binding). Thanks Sophie!

On Thu, Apr 15, 2021 at 7:29 AM Bruno Cadonna  wrote:

> Sophie,
>
> Thank you for the KIP!
>
> +1 (binding)
>
> Best,
> Bruno
>
> On 15.04.21 01:59, Sophie Blee-Goldman wrote:
> > Hey all,
> >
> > I'd like to kick off the vote on KIP-732, to deprecate eos-alpha in Kafka
> > Streams and migrate away from the "eos-beta" terminology by replacing it
> > with "eos-v2" to shore up user confidence in this feature.
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-732%3A+Deprecate+eos-alpha+and+replace+eos-beta+with+eos-v2
> >
> > Please respond on the discussion thread if you have any late-breaking
> > concerns or questions.
> >
> > Thanks!
> > Sophie
> >
>


-- 
-- Guozhang


Re: [VOTE} KIP-733: change Kafka Streams default replication factor config

2021-04-15 Thread Guozhang Wang
+1 as well. Thanks!

On Wed, Apr 14, 2021 at 4:30 PM Bill Bejeck  wrote:

> Thanks for the KIP Matthias.
>
> +1 (binding)
>
> -Bill
>
> On Wed, Apr 14, 2021 at 7:06 PM Sophie Blee-Goldman
>  wrote:
>
> > Thanks Matthias. I'm +1 (binding)
> >
> > -Sophie
> >
> > On Wed, Apr 14, 2021 at 3:36 PM Matthias J. Sax 
> wrote:
> >
> > > Hi,
> > >
> > > Because this KIP is rather small, I would like to skip a dedicated
> > > discussion thread and call for a vote right way. If there are any
> > > concerns, we can just discuss on this vote thread:
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-733%3A+change+Kafka+Streams+default+replication+factor+config
> > >
> > > Note, that we actually backed this change via
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113708722
> > > already.
> > >
> > > However, I felt it might be worth do make this change more explicit as
> > > KIP-464 is rather old now.
> > >
> > > Quote from KIP-464:
> > >
> > > > To exploit this new feature in KafkaStreams, we update the default
> > value
> > > of Streams configuration parameter `replication.factor` from `1` to
> `-1`.
> > >
> > >
> > >
> > >
> > > -Matthias
> > >
> >
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-04-14 Thread Guozhang Wang
 which will be
> >> bubbled
> >> > > up through poll() after completing the rebalance. The member will
> then
> >> > > downgrade its protocol to EAGER for the next rebalance.
> >> > >
> >> > > Let me know what you think,
> >> > > Sophie
> >> > >
> >> > > On Fri, Apr 2, 2021 at 7:08 PM Luke Chen  wrote:
> >> > >
> >> > > > Hi Sophie,
> >> > > > Making the default to "cooperative-sticky, range" is a smart idea,
> >> to
> >> > > > ensure we can at least fall back to rangeAssignor if consumers are
> >> not
> >> > > > following our recommended upgrade path. I updated the KIP
> >> accordingly.
> >> > > >
> >> > > > Hi Chris,
> >> > > > No problem, I updated the KIP to include the change in Connect.
> >> > > >
> >> > > > Thank you very much.
> >> > > >
> >> > > > Luke
> >> > > >
> >> > > > On Thu, Apr 1, 2021 at 3:24 AM Chris Egerton
> >> >  >> > > >
> >> > > > wrote:
> >> > > >
> >> > > > > Hi all,
> >> > > > >
> >> > > > > @Sophie - I like the sound of the dual-protocol default. The
> >> smooth
> >> > > > upgrade
> >> > > > > path it permits sounds fantastic!
> >> > > > >
> >> > > > > @Luke - Do you think we can also include Connect in this KIP?
> >> Right
> >> > now
> >> > > > we
> >> > > > > don't set any custom partition assignment strategies for the
> >> consumer
> >> > > > > groups we bring up for sink tasks, and if we continue to just
> use
> >> the
> >> > > > > default, the assignment strategy for those consumer groups would
> >> > change
> >> > > > on
> >> > > > > Connect clusters once people upgrade to 3.0. I think this is
> fine
> >> > > > (assuming
> >> > > > > we can take care of
> >> > https://issues.apache.org/jira/browse/KAFKA-12487
> >> > > > > before then, which I'm fairly optimistic about), but it might be
> >> > worth
> >> > > a
> >> > > > > sentence or two in the KIP explaining that the change in default
> >> will
> >> > > > > intentionally propagate to Connect. And, if we think Connect
> >> should
> >> > be
> >> > > > left
> >> > > > > out of this change and stay on the range assignor instead, we
> >> should
> >> > > > > probably call that fact out in the KIP as well and state that
> >> Connect
> >> > > > will
> >> > > > > now override the default partition assignment strategy to be the
> >> > range
> >> > > > > assignor (assuming the user hasn't specified a value for
> >> > > > > consumer.partition.assignment.strategy in their worker config or
> >> for
> >> > > > > consumer.override.partition.assignment.strategy in their
> connector
> >> > > > config).
> >> > > > >
> >> > > > > Cheers,
> >> > > > >
> >> > > > > Chris
> >> > > > >
> >> > > > > On Wed, Mar 31, 2021 at 12:18 AM Sophie Blee-Goldman
> >> > > > >  wrote:
> >> > > > >
> >> > > > > > Ok I'm still fleshing out all the details of KAFKA-12477 but I
> >> > think
> >> > > we
> >> > > > > can
> >> > > > > > simplify some things a bit, and avoid
> >> > > > > > any kind of "fail-fast" which will require user intervention.
> In
> >> > > fact I
> >> > > > > > think we can avoid requiring the user to make
> >> > > > > > any changes at all for KIP-726, so we don't have to worry
> about
> >> > > whether
> >> > > > > > they actually read our documentation:
> >> > > > > >
> >> > > > > > Instead of making ["cooperative-sticky"] the default, we
> change
> >> the
> >> > > > > default
>

[jira] [Created] (KAFKA-12669) Add deleteRange to WindowStore / KeyValueStore interfaces

2021-04-14 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12669:
-

 Summary: Add deleteRange to WindowStore / KeyValueStore interfaces
 Key: KAFKA-12669
 URL: https://issues.apache.org/jira/browse/KAFKA-12669
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Guozhang Wang


We can consider adding such APIs where the underlying implementation classes 
have better optimizations than deleting the keys as get-and-delete one by one.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-730: Producer ID generation in KRaft mode

2021-04-14 Thread Guozhang Wang
Hi David,

Just putting my paranoid hat here :) Could we name the req/resp name as
"AllocateProducerIds" instead of "AllocateProducerId"? Otherwise, LGTM!

Guozhang

On Thu, Apr 8, 2021 at 2:23 PM Ron Dagostino  wrote:

> Hi David.  I'm wondering if it might be a good idea to have the broker
> send information about the last block it successfully received when it
> requests a new block.  As the RPC stands right now it can't be
> idempotent -- it just tells the controller "provide me a new block,
> please".  One case where it might be useful for the RPC to be
> idempotent is if the broker never receives the response from the
> controller such that it asks again.  That would result in the burning
> of the block that the controller provided but that the broker never
> received.  Now, granted, the ID space is 64 bits, so we would have to
> make ~2^54 requests to burn the entire space, and that isn't going to
> happen.  So whether this is actually needed is questionable.  And it
> might not be worth it to write the controller side code to make it act
> idempotently even if we added the request field to make it possible.
> But I figured this is worth mentioning even if we explicitly decide to
> reject it.
>
> Ron
>
> On Thu, Apr 8, 2021 at 3:16 PM Ron Dagostino  wrote:
> >
> > Oh, I see.  Yes, my mistake -- I read it wrong.  You are right that
> > all we need in the metadata log is the latest value allocated.
> >
> > Ron
> >
> > On Thu, Apr 8, 2021 at 11:21 AM David Arthur  wrote:
> > >
> > > Ron -- I considered making the RPC response and record use the same (or
> > > very similar) fields, but the use case is slightly different. A broker
> > > handling the RPC needs to know the bounds of the block since it has no
> idea
> > > what the block size is. Also, the brokers will normally see
> non-contiguous
> > > blocks.
> > >
> > > For the metadata log, we can just keep track of the latest producer Id
> that
> > > was allocated. It's kind of like a high watermark for producer IDs.
> This
> > > actually saves us from needing an extra field in the record (the KIP
> has
> > > just ProducerIdEnd => int64 in the record).
> > >
> > > Does that make sense?
> > >
> > > On Wed, Apr 7, 2021 at 8:44 AM Ron Dagostino 
> wrote:
> > >
> > > > Thanks for the KIP, David.
> > > >
> > > > With the RPC returning a start and length, should the record in the
> > > > metadata log do the same thing for consistency and to save the byte
> > > > per record?
> > > >
> > > > Ron
> > > >
> > > >
> > > > On Tue, Apr 6, 2021 at 11:06 PM Ismael Juma 
> wrote:
> > > > >
> > > > > Great, thanks. Instead of calling it "bridge release", can we say
> 3.0?
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Tue, Apr 6, 2021 at 7:48 PM David Arthur 
> wrote:
> > > > >
> > > > > > Thanks for the feedback, Ismael. Renaming the RPC and using
> start+len
> > > > > > instead of start+end sounds fine.
> > > > > >
> > > > > > And yes, the controller will allocate the IDs in ZK mode for the
> bridge
> > > > > > release.
> > > > > >
> > > > > > I'll update the KIP to reflect these points.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > On Tue, Apr 6, 2021 at 7:30 PM Ismael Juma 
> wrote:
> > > > > >
> > > > > > > Sorry, one more question: the allocation of ids will be done
> by the
> > > > > > > controller even in ZK mode, right?
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Tue, Apr 6, 2021 at 4:26 PM Ismael Juma 
> > > > wrote:
> > > > > > >
> > > > > > > > One additional comment: if you return the number of ids
> instead of
> > > > the
> > > > > > > end
> > > > > > > > range, you can use an int32.
> > > > > > > >
> > > > > > > > Ismael
> > > > > > > >
> > > > > > > > On Tue, Apr 6, 2021 at 4:25 PM Ismael Juma <
> ism...@juma.me.uk>
> > > > wrote:
> > > > > > > >
> > > > > > > >> Thanks for the KIP, David. Any reason not to rename
> > > > > > > >> AllocateProducerIdBlockRequest to
> AllocateProducerIdsRequest?
> > > > > > > >>
> > > > > > > >> Ismael
> > > > > > > >>
> > > > > > > >> On Tue, Apr 6, 2021 at 3:51 PM David Arthur <
> mum...@gmail.com>
> > > > wrote:
> > > > > > > >>
> > > > > > > >>> Hello everyone,
> > > > > > > >>>
> > > > > > > >>> I'd like to start the discussion for KIP-730
> > > > > > > >>>
> > > > > > > >>>
> > > > > > > >>>
> > > > > > >
> > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-730%3A+Producer+ID+generation+in+KRaft+mode
> > > > > > > >>>
> > > > > > > >>> This KIP proposes a new RPC for generating blocks of IDs
> for
> > > > > > > >>> transactional
> > > > > > > >>> and idempotent producers.
> > > > > > > >>>
> > > > > > > >>> Cheers,
> > > > > > > >>> David Arthur
> > > > > > > >>>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > David Arthur
> > > > > >
> > > >
> > >
> > >
> > > --
> > > David Arthur
>


-- 
-- Guozhang


Re: [VOTE] KIP-718: Make KTable Join on Foreign key unopinionated

2021-04-14 Thread Guozhang Wang
Thanks Marco, +1 from me too.


Guozhang

On Wed, Apr 14, 2021 at 8:15 AM John Roesler  wrote:

> I’m +1 (binding)
>
> Thanks, Marco!
> -John
>
> On Wed, Apr 14, 2021, at 07:59, Marco Aurélio Lotz wrote:
> > Hi all,
> >
> > I would like to start a vote on KIP-718, which proposes to make KTable
> join
> > on foreign key unopinionated in terms of persistence method (currently it
> > forces RocksDB usage for subscription store and no other option is
> > available).
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> >
> > Many thanks,
> > Marco Lotz
> >
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-732: Deprecate eos-alpha and replace eos-beta with eos-v2

2021-04-14 Thread Guozhang Wang
Thanks Sophie for writing the KIP! I'm +1 on the proposal.

On Wed, Apr 14, 2021 at 8:56 AM Ismael Juma  wrote:

> Thanks Sophie. This makes sense to me. One question: do we want to be a bit
> clearer about the removal plans? That is, can we say that the deprecated
> configs will be removed in 4.0 (instead of likely to be removed)? The
> implication would be that exactly-once would only work with 2.5+ while at
> least once would work with all versions. 4.0 is probably 1.5-2 years away,
> so this seems reasonable to me.
>
> Ismael
>
> On Tue, Apr 13, 2021 at 7:53 PM Sophie Blee-Goldman
>  wrote:
>
> > Hey all,
> >
> > I'd like to kick off discussion on a small KIP to move towards a unified
> > EOS and clean up the current options. Please give it a pass and let me
> know
> > what you think.
> >
> > KIP-732: Deprecate eos-alpha and replace eos-beta with eos-v2
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-732%3A+Deprecate+eos-alpha+and+replace+eos-beta+with+eos-v2
> > >
> > KAFKA-12574: Deprecate eos-alpha
> > 
> >
> > Thanks,
> > Sophie
> >
>


-- 
-- Guozhang


[jira] [Resolved] (KAFKA-12643) Kafka Streams 2.7 with Kafka Broker 2.6.x regression: bad timestamp in transform/process (this.context.schedule function)

2021-04-12 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12643.
---
Resolution: Duplicate

Thanks for confirming!

> Kafka Streams 2.7 with Kafka Broker 2.6.x regression: bad timestamp in 
> transform/process (this.context.schedule function)
> -
>
> Key: KAFKA-12643
> URL: https://issues.apache.org/jira/browse/KAFKA-12643
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.7.0
>Reporter: David EVANO
>Priority: Major
> Attachments: Capture d’écran 2021-04-09 à 17.50.05.png
>
>
> During a tranform() or a process() method:
> Define a schedule tyask:
> this.context.schedule(Duration.ofSeconds(1), PunctuationType.WALL_CLOCK_TIME, 
> timestamp -> \{...}
> store.put(...) or context.forward(...) produce a record with an invalid 
> timestamp.
> For the forward, a workaround is define the timestamp:
> context.forward(entry.key, entry.value.toString(), 
> To.all().withTimestamp(timestamp));
> But for state.put(...) or state.delete(...) functions there is no workaround.
> Is it mandatory to have the Kafka broker version aligned with the Kafka 
> Streams version?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2021-04-12 Thread Guozhang Wang
Hello Guoqiang,

This is another interesting ticket that may be also related to the issues
you observed and fixed in your production, if you used sticky partitioner
in producer clients:

https://issues.apache.org/jira/browse/KAFKA-10888


Guozhang


On Wed, Apr 7, 2021 at 11:00 AM Jun Rao  wrote:

> Hi, George,
>
> A few more comments on the KIP.
>
> 1. It would be useful to motivate the problem a bit more. For example, is
> the KIP trying to solve a transient broker problem (if so, for how long) or
> a permanent broker problem? It would also be useful to list some common
> causes that can slow the broker down.
>
> 2. It would be useful to discuss a bit more on the high level approach
> (e.g. in the rejected section). This KIP proposes to fix the issue on the
> client side by having a pluggable component to redirect the traffic to
> other brokers. One potential issue with this is that it requires all
> clients to opt in (assuming this is not the default) for the plugin to see
> the benefit. In some environments with a large number of clients,
> coordinating all those clients may not be easy. Another potential solution
> is to fix the issue on the server side. For example, if a broker is slow
> because it has noisy neighbors in a virtual environment, we could
> proactively bring down the broker and restart it somewhere else. This has
> the benefit that it requires less client side coordination.
>
> 3. Regarding how to detect broker slowness in the client. The proposal is
> based on the error in the produce response. Typically, if the broker is
> just slow, the only type of error the client gets is the timeout exception.
> Since the default timeout is 30 seconds, it may not be triggered all the
> time and it may be too late to reflect a broker side issue. I am wondering
> if there are other better indicators. For example, another potential option
> is to use the number of pending batches per partition (or broker) in the
> Accumulator. Intuitively, if a broker is slow, all partitions with the
> leader on it will gradually accumulate more batches.
>
> 4. It would be useful to have a solution that works with keyed messages so
> that they can still be distributed to the partition based on the hash of
> the key.
>
> Thanks,
>
> Jun
>
>
> On Wed, Mar 24, 2021 at 4:05 AM Guoqiang Shu 
> wrote:
>
> >
> > In our current proposal it can be configured via
> > producer.circuit.breaker.mute.retry.interval (defaulted to 10 mins), but
> > perhaps 'interval' is a confusing name.
> >
> > On 2021/03/23 00:45:23, Guozhang Wang  wrote:
> > > Thanks for the updated KIP! Some more comments inlined.
> > > >
> > > > I'm still not sure if, in your proposal, the muting length is a
> > > customizable value (and if yes, through which config) or it is always
> > hard
> > > coded as 10 minutes?
> > >
> > >
> > > > > Guozhang
> >
> >
>


-- 
-- Guozhang


Re: [ANNOUNCE] New Kafka PMC Member: Bill Bejeck

2021-04-12 Thread Guozhang Wang
Congratulations Bill !

Guozhang

On Wed, Apr 7, 2021 at 6:16 PM Matthias J. Sax  wrote:

> Hi,
>
> It's my pleasure to announce that Bill Bejeck in now a member of the
> Kafka PMC.
>
> Bill has been a Kafka committer since Feb 2019. He has remained
> active in the community since becoming a committer.
>
>
>
> Congratulations Bill!
>
>  -Matthias, on behalf of Apache Kafka PMC
>


-- 
-- Guozhang


Re: [VOTE] 2.6.2 RC1

2021-04-08 Thread Guozhang Wang
Looked over the javadocs and web docs again. Download the jars and check
the license files are all updated now (thanks to John again!). +1

On Thu, Apr 8, 2021 at 1:48 PM Sophie Blee-Goldman
 wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the second candidate for release of Apache Kafka 2.6.2.
>
> Apache Kafka 2.6.2 is a bugfix release and fixes 35 issues since the 2.6.1
> release. Please see the release notes for more information.
>
> Release notes for the 2.6.2 release:
> https://home.apache.org/~ableegoldman/kafka-2.6.2-rc1/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, April 13th, 9am PST
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~ableegoldman/kafka-2.6.2-rc1/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~ableegoldman/kafka-2.6.2-rc1/javadoc/
>
> * Tag to be voted upon (off 2.6 branch) is the 2.6.2 tag:
> https://github.com/apache/kafka/releases/tag/2.6.2-rc1
>
> * Documentation:
> https://kafka.apache.org/26/documentation.html
>
> * Protocol:
> https://kafka.apache.org/26/protocol.html
>
> * Successful Jenkins builds for the 2.6 branch:
> Unit/integration tests:
> https://ci-builds.apache.org/job/Kafka/job/kafka-2.6-jdk8/114/
> System tests:
> https://jenkins.confluent.io/job/system-test-kafka-branch-builder/4458/
>
> /**
>
> Thanks,
> Sophie
>


-- 
-- Guozhang


[jira] [Created] (KAFKA-12633) Remove deprecated "TopologyTestDriver#pipeInput / readOutput"

2021-04-08 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12633:
-

 Summary: Remove deprecated "TopologyTestDriver#pipeInput / 
readOutput"
 Key: KAFKA-12633
 URL: https://issues.apache.org/jira/browse/KAFKA-12633
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang
Assignee: Guozhang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12630) Remove deprecated KafkaClientSupplier#getAdminClient

2021-04-08 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12630.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

> Remove deprecated KafkaClientSupplier#getAdminClient
> 
>
> Key: KAFKA-12630
> URL: https://issues.apache.org/jira/browse/KAFKA-12630
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Guozhang Wang
>    Assignee: Guozhang Wang
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12568) Remove deprecated "KStream#groupBy/join", "Joined#named" overloads

2021-04-07 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12568.
---
Fix Version/s: 3.0.0
 Assignee: Guozhang Wang
   Resolution: Fixed

> Remove deprecated "KStream#groupBy/join", "Joined#named" overloads
> --
>
> Key: KAFKA-12568
> URL: https://issues.apache.org/jira/browse/KAFKA-12568
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[ANNOUNCE] New Committer: Bruno Cadonna

2021-04-07 Thread Guozhang Wang
Hello all,

I'm happy to announce that Bruno Cadonna has accepted his invitation to
become an Apache Kafka committer.

Bruno has been contributing to Kafka since Jan. 2019 and has made 99
commits and more than 80 PR reviews so far:

https://github.com/apache/kafka/commits?author=cadonna

He worked on a few key KIPs on Kafka Streams:

* KIP-471: Expose RocksDB Metrics in Kafka Streams
* KIP-607: Add Metrics to Kafka Streams to Report Properties of RocksDB
* KIP-662: Throw Exception when Source Topics of a Streams App are Deleted

Besides all the code contributions and reviews, he's also done a handful
for the community: multiple Kafka meetup talks in Berlin and Kafka Summit
talks, an introductory class to Kafka at Humboldt-Universität zu Berlin
seminars, and have co-authored a paper on Kafka's stream processing
semantics in this year's SIGMOD conference (
https://en.wikipedia.org/wiki/SIGMOD). Bruno has also been quite active on
SO channels and AK mailings.

Please join me to congratulate Bruno for all the contributions!

-- Guozhang


[jira] [Created] (KAFKA-12630) Remove deprecated KafkaClientSupplier#getAdminClient

2021-04-07 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12630:
-

 Summary: Remove deprecated KafkaClientSupplier#getAdminClient
 Key: KAFKA-12630
 URL: https://issues.apache.org/jira/browse/KAFKA-12630
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang
Assignee: Guozhang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-7785) Remove PartitionGrouper interface and it's config and move DefaultPartitionGrouper to internal package

2021-04-07 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-7785.
--
Resolution: Fixed

> Remove PartitionGrouper interface and it's config and move 
> DefaultPartitionGrouper to internal package
> --
>
> Key: KAFKA-7785
> URL: https://issues.apache.org/jira/browse/KAFKA-7785
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 2.1.0
>Reporter: Jacek Laskowski
>Assignee: highluck
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Since {{DefaultPartitionGrouper}} is only for the purpose of the internal 
> {{StreamsPartitionAssignor}} it would make sense to have it in the 
> {{org.apache.kafka.streams.processor.internals}} package.
> I would also vote to move {{PartitionGrouper.}}
> Via KAFKA-8927 we deprecated the `PartitionGrouper` interface in 2.4 release 
> – this allows us to remove the public interface and its corresponding config 
> in the next major release (ie, 3.0.0). `DefaultPartitionGrouper` was 
> implicitly deprecated via KAFKA-8927.
> Hence, we can move the interface as well as the default implementation into 
> an internal package (or maybe just remove the interface completely as there 
> are no plans to support multiple implementations atm).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-725: Streamlining configurations for TimeWindowedDeserializer.

2021-04-06 Thread Guozhang Wang
+1. Thanks!

On Tue, Apr 6, 2021 at 7:01 AM Leah Thomas 
wrote:

> Hi Sagar, +1 non-binding. Thanks again for doing this.
>
> Leah
>
> On Mon, Apr 5, 2021 at 9:40 PM John Roesler  wrote:
>
> > Thanks, Sagar!
> >
> > I’m +1 (binding)
> >
> > -John
> >
> > On Mon, Apr 5, 2021, at 21:35, Sophie Blee-Goldman wrote:
> > > Thanks for the KIP! +1 (binding) from me
> > >
> > > Cheers,
> > > Sophie
> > >
> > > On Mon, Apr 5, 2021 at 7:13 PM Sagar 
> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I would like to start voting on the following KIP:
> > > >
> > > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177047930
> > > >
> > > > Thanks!
> > > > Sagar.
> > > >
> > >
> >
>


-- 
-- Guozhang


Re: [VOTE] KIP-633: Drop 24 hour default of grace period in Streams

2021-04-06 Thread Guozhang Wang
+1. Thanks!

On Tue, Apr 6, 2021 at 7:00 AM Leah Thomas 
wrote:

> Thanks for picking this up, Sophie. +1 from me, non-binding.
>
> Leah
>
> On Mon, Apr 5, 2021 at 9:42 PM John Roesler  wrote:
>
> > Thanks, Sophie,
> >
> > I’m +1 (binding)
> >
> > -John
> >
> > On Mon, Apr 5, 2021, at 21:34, Sophie Blee-Goldman wrote:
> > > Hey all,
> > >
> > > I'd like to start the voting on KIP-633, to drop the awkward 24 hour
> > grace
> > > period and improve the API to raise visibility on an important concept
> in
> > > Kafka Streams: grace period nad out-of-order data handling.
> > >
> > > Here's the KIP:
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-633%3A+Drop+24+hour+default+of+grace+period+in+Streams
> > > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-633%3A+Drop+24hr+default+grace+period
> > >
> > >
> > > Cheers,
> > > Sophie
> > >
> >
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-718: Make KTable Join on Foreign key unopinionated

2021-04-05 Thread Guozhang Wang
Thanks Marco / John,

I think the arguments for not piggy-backing on the existing Materialized
makes sense; on the other hand, if we go this route should we just use a
separate Materialized than using an extended /
narrowed-scoped MaterializedSubscription since it seems we want to allow
users to fully customize this store?

Guozhang

On Thu, Apr 1, 2021 at 5:28 PM John Roesler  wrote:

> Thanks Marco,
>
> Sorry if I caused any trouble!
>
> I don’t remember what I was thinking before, but reasoning about it now,
> you might need the fine-grained choice if:
>
> 1. The number or size of records in each partition of both tables is
> small(ish), but the cardinality of the join is very high. Then you might
> want an in-memory table store, but an on-disk subscription store.
>
> 2. The number or size of records is very large, but the join cardinality
> is low. Then you might need an on-disk table store, but an in-memory
> subscription store.
>
> 3. You might want a different kind (or differently configured) store for
> the subscription store, since it’s access pattern is so different.
>
> If you buy these, it might be good to put the justification into the KIP.
>
> I’m in favor of the default you’ve proposed.
>
> Thanks,
> John
>
> On Mon, Mar 29, 2021, at 04:24, Marco Aurélio Lotz wrote:
> > Hi Guozhang,
> >
> > Apologies for the late answer. Originally that was my proposal - to
> > piggyback on the provided materialisation method (
> > https://issues.apache.org/jira/browse/KAFKA-10383).
> > John Roesler suggested to us to provide even further fine tuning on API
> > level parameters. Maybe we could see this as two sides of the same coin:
> >
> > - On the current API, we change it to piggy back on the materialization
> > method provided to the join store.
> > - We extend the API to allow a user to fine tune different
> materialization
> > methods for subscription and join store.
> >
> > What do you think?
> >
> > Cheers,
> > Marco
> >
> > On Thu, Mar 4, 2021 at 8:04 PM Guozhang Wang  wrote:
> >
> > > Thanks Marco,
> > >
> > > Just a quick thought: what if we reuse the existing Materialized
> object for
> > > both subscription and join stores, instead of introducing a new param /
> > > class?
> > >
> > > Guozhang
> > >
> > > On Tue, Mar 2, 2021 at 1:07 AM Marco Aurélio Lotz <
> cont...@marcolotz.com>
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I would like to invite everyone to discuss further KIP-718:
> > > >
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-718%3A+Make+KTable+Join+on+Foreign+key+unopinionated
> > > >
> > > > I welcome all feedback on it.
> > > >
> > > > Kind Regards,
> > > > Marco Lotz
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>


-- 
-- Guozhang


[jira] [Resolved] (KAFKA-9765) Could not add partitions to transaction due to errors

2021-04-02 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-9765.
--
Resolution: Duplicate

> Could not add partitions to transaction due to errors
> -
>
> Key: KAFKA-9765
> URL: https://issues.apache.org/jira/browse/KAFKA-9765
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, producer 
>Affects Versions: 2.3.1
>Reporter: Prashant Waykar
>Priority: Blocker
> Fix For: 2.4.2, 2.5.0
>
>
> I am following the producer with transactions example in 
> [https://kafka.apache.org/23/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html,]
>  and on kafkaException, I use abortTransaction and retry. 
> I am seeing these exceptions. Has anyone experienced this before ? Please 
> suggest
> {code:java}
> // code placeholder
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.KafkaException: Could not add partitions to 
> transaction due to errors: {nfvAlarmJob-0=UNKNOWN_SERVER_ERROR}
> at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:98)
> at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:67)
> at 
> org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:30)
> at 
> com.vmware.vchs.hybridity.messaging.kafka.KafkaProducerDelegate.publishMessageWithTransaction(KafkaProducerDelegate.java:197)
> at 
> com.vmware.vchs.hybridity.messaging.kafka.KafkaProducerDelegate.publish(KafkaProducerDelegate.java:164)
> at 
> com.vmware.vchs.hybridity.messaging.kafka.KafkaProducerDelegate.publish(KafkaProducerDelegate.java:158)
> at 
> com.vmware.vchs.hybridity.messaging.adapter.JobManagerJobPublisher.publish(JobManagerJobPublisher.java:140)
> at 
> com.vmware.vchs.hybridity.messaging.adapter.JobManager.queueJob(JobManager.java:1720)
> at 
> com.vmware.vchs.hybridity.messaging.adapter.JobManagementAdapter.queueJob(JobManagementAdapter.java:80)
> at 
> com.vmware.vchs.hybridity.messaging.adapter.JobManagementAdapter.queueJob(JobManagementAdapter.java:70)
> at 
> com.vmware.vchs.hybridity.messaging.adapter.JobManagedService.queueJob(JobManagedService.java:168)
> at 
> com.vmware.hybridity.nfvm.alarm.UpdateVcenterAlarmsJob.run(UpdateVcenterAlarmsJob.java:67)
> at 
> com.vmware.vchs.hybridity.messaging.LoggingJobWrapper.run(LoggingJobWrapper.java:41)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.kafka.common.KafkaException: Could not add partitions 
> to transaction due to errors: {nfvAlarmJob-0=UNKNOWN_SERVER_ERROR}
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager$AddPartitionsToTxnHandler.handleResponse(TransactionManager.java:1230)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1069)
> at 
> org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
> at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:561)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:553)
> at 
> org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:425)
> at 
> org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:311)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)
> ... 1 common frames omitted
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-633: Drop 24 hour default of grace period in Streams

2021-03-31 Thread Guozhang Wang
Hello Sophie,

I agree that the old 24-hour grace period should be updated, and I also
think now it is a better idea to make the grace period "mandatory" from the
API names since it is a very important concept and hence worth emphasizing
to users up front.

Guozhang

On Wed, Mar 31, 2021 at 1:58 PM John Roesler  wrote:

> Thanks for bringing this up, Sophie!
>
> This has indeed been a pain point for a lot of people.
>
> It's a really thorny issue with no obvious "right" solution.
> I think your proposal is a good one.
>
> Thanks,
> -John
>
> On Wed, 2021-03-31 at 13:28 -0700, Sophie Blee-Goldman
> wrote:
> > Hey all,
> >
> > It's finally time to reconsider the default grace period in Kafka
> Streams,
> > and hopefully save a lot of suppression users from the pain of figuring
> out
> > why their results don't show up until 24 hours later. Please check out
> the
> > proposal and let me know what you think.
> >
> > KIP:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-633%3A+Drop+24+hour+default+of+grace+period+in+Streams
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-633%3A+Drop+24hr+default+grace+period
> >
> >
> > JIRA: https://issues.apache.org/jira/browse/KAFKA-8613
> >
> > Cheers,
> > Sophie
>
>
>

-- 
-- Guozhang


Re: [DISCUSS] KIP-726: Make the CooperativeStickyAssignor as the default assignor

2021-03-30 Thread Guozhang Wang
Hi Sophie,

My question is more related to KAFKA-12477, but since your latest replies
are on this thread I figured we can follow-up on the same venue. Just so I
understand your latest comments above about the approach:

* I think, we would need to persist this decision so that the group would
never go back to the eager protocol, this bit would be written to the
internal topic's assignment message. Is that correct?
* Maybe you can describe the steps, after the group has decided to move
forward with cooperative protocols, when:
1) a new member joined the group with the old version, and hence only
recognized eager protocol and executing the eager protocol with its first
rebalance, what would happen.
2) in addition to 1), the new member joined the group with the old version
and only recognized the old subscription format, and was selected as the
leader, what would happen.

Guozhang




On Mon, Mar 29, 2021 at 10:30 PM Luke Chen  wrote:

> Hi Sophie & Ismael,
> Thank you for your feedback.
> No problem, let's pause this KIP and wait for this improvement: KAFKA-12477
> .
>
> Stay tuned :)
>
> Thank you.
> Luke
>
> On Tue, Mar 30, 2021 at 3:14 AM Ismael Juma  wrote:
>
> > Hi Sophie,
> >
> > I didn't analyze the KIP in detail, but the two suggestions you mentioned
> > sound like great improvements.
> >
> > A bit more context: breaking changes for a widely used product like Kafka
> > are costly and hence why we try as hard as we can to avoid them. When it
> > comes to the brokers, they are often managed by a central group (or
> they're
> > in the Cloud), so they're a bit easier to manage. Even so, it's still
> > possible to upgrade from 0.8.x directly to 2.7 since all protocol
> versions
> > are still supported. When it comes to the basic clients (producer,
> > consumer, admin client), they're often embedded in applications so we
> have
> > to be even more conservative.
> >
> > Ismael
> >
> > On Mon, Mar 29, 2021 at 10:50 AM Sophie Blee-Goldman
> >  wrote:
> >
> > > Ismael,
> > >
> > > It seems like given 3.0 is a breaking release, we have to rely on users
> > > being aware of this and responsible
> > > enough to read the upgrade guide. Otherwise we could never ever make
> any
> > > breaking changes beyond just
> > > removing deprecated APIs or other compilation-breaking errors that
> would
> > be
> > > immediately visible, no?
> > >
> > > That said, obviously it's better to have a circuit-breaker that will
> fail
> > > fast in case of a user misconfiguration
> > > rather than silently corrupting the consumer group state -- eg for two
> > > consumers to overlap in their ownership
> > > of the same partition(s). We could definitely implement this, and now
> > that
> > > I think about it this might solve a
> > > related problem in KAFKA-12477
> > > . We just add a new
> > > field to the Assignment in which the group leader
> > > indicates whether it's on a recent enough version to understand
> > cooperative
> > > rebalancing. If an upgraded member
> > > joins the group, it'll only be allowed to start following the new
> > > rebalancing protocol after receiving the go-ahead
> > > from the group leader.
> > >
> > > If we do go ahead and add this new field in the Assignment then I'm
> > pretty
> > > confident we can reduce the number
> > > of required rolling bounces to just one with KAFKA-12477
> > > . In that case we
> > > should
> > > be in much better shape to
> > > feel good about changing the default to the CooperativeStickyAssignor.
> > How
> > > does that sound?
> > >
> > > To be clear, I'm not proposing we do this as part of KIP-726. Here's my
> > > take:
> > >
> > > Let's pause this KIP while I work on making these two improvements in
> > > KAFKA-12477 . Once
> I
> > > can
> > > confirm the
> > > short-circuit and single rolling bounce will be available for 3.0, I'll
> > > report back on this thread. Then we can move
> > > forward with this KIP again.
> > >
> > > Thoughts?
> > > Sophie
> > >
> > > On Mon, Mar 29, 2021 at 12:01 AM Luke Chen  wrote:
> > >
> > > > Hi Ismael,
> > > > Thanks for your good question. Answer them below:
> > > > *1. Are we saying that every consumer upgraded would have to follow
> the
> > > > complex path described in the KIP? *
> > > > --> We suggest that every consumer did these 2 steps of rolling
> > upgrade.
> > > > And after KAFKA-12477 <
> > https://issues.apache.org/jira/browse/KAFKA-12477
> > > >
> > > > is completed, it can be reduced to 1 rolling upgrade.
> > > >
> > > > *2. what happens if they don't read the instructions and upgrade as
> > they
> > > > have in the past?*
> > > > --> The reason we want 2 steps of rolling upgrade is that we want to
> > > avoid
> > > > the situation where leader is on old byte-code and only recognize
> > > "eager",
> > > > but due to 

[jira] [Resolved] (KAFKA-12428) Add a last-heartbeat-seconds-ago metric to Kafka Consumer

2021-03-29 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12428.
---
Resolution: Duplicate

> Add a last-heartbeat-seconds-ago metric to Kafka Consumer
> -
>
> Key: KAFKA-12428
> URL: https://issues.apache.org/jira/browse/KAFKA-12428
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, streams
>    Reporter: Guozhang Wang
>Priority: Major
>  Labels: newbie++
>
> I have encountered several issues in the past where heartbeat requests are 
> not sent [1,2] (either in time, or ever), and today it is a bit hard to get 
> to that from the logs. I think it is better to add a metric as 
> "last-heartbeat-seconds-ago" where when rebalances were triggered we can 
> immediately find out if this is the root cause.
> 1. https://issues.apache.org/jira/browse/KAFKA-10793
> 2. https://issues.apache.org/jira/browse/KAFKA-10827



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-7106) Remove segment/segmentInterval from Window definition

2021-03-28 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-7106.
--
Resolution: Fixed

> Remove segment/segmentInterval from Window definition
> -
>
> Key: KAFKA-7106
> URL: https://issues.apache.org/jira/browse/KAFKA-7106
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: John Roesler
>    Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> Currently, Window configures segment and segmentInterval properties, but 
> these aren't truly properties of a window in general.
> Rather, they are properties of the particular implementation that we 
> currently have: a segmented store. Therefore, these properties should be 
> moved to configure only that implementation.
>  
> This may be related to KAFKA-4730, since an in-memory window store wouldn't 
> necessarily need to be segmented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12562) Remove deprecated-overloaded "KafkaStreams#metadataForKey" and "KafkaStreams#store"

2021-03-28 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12562.
---
Resolution: Fixed

> Remove deprecated-overloaded "KafkaStreams#metadataForKey" and 
> "KafkaStreams#store"
> ---
>
> Key: KAFKA-12562
> URL: https://issues.apache.org/jira/browse/KAFKA-12562
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>    Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12524) Remove deprecated WindowBytesStoreSupplier#segments

2021-03-27 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12524.
---
Resolution: Fixed

> Remove deprecated WindowBytesStoreSupplier#segments
> ---
>
> Key: KAFKA-12524
> URL: https://issues.apache.org/jira/browse/KAFKA-12524
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>    Reporter: Guozhang Wang
>    Assignee: Guozhang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12568) Remove deprecated "KStream#groupBy/join", "Joined#named" overloads

2021-03-27 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12568:
-

 Summary: Remove deprecated "KStream#groupBy/join", "Joined#named" 
overloads
 Key: KAFKA-12568
 URL: https://issues.apache.org/jira/browse/KAFKA-12568
 Project: Kafka
  Issue Type: Sub-task
    Reporter: Guozhang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12562) Remove deprecated-overloaded "KafkaStreams#metadataForKey" and "KafkaStreams#store"

2021-03-25 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12562:
-

 Summary: Remove deprecated-overloaded 
"KafkaStreams#metadataForKey" and "KafkaStreams#store"
 Key: KAFKA-12562
 URL: https://issues.apache.org/jira/browse/KAFKA-12562
 Project: Kafka
  Issue Type: Sub-task
    Reporter: Guozhang Wang
    Assignee: Guozhang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12527) Remove deprecated "PartitionGrouper" interface

2021-03-25 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12527.
---
Resolution: Duplicate

> Remove deprecated "PartitionGrouper" interface
> --
>
> Key: KAFKA-12527
> URL: https://issues.apache.org/jira/browse/KAFKA-12527
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12526) Remove deprecated long ms overloads

2021-03-25 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-12526.
---
Resolution: Duplicate

> Remove deprecated long ms overloads
> ---
>
> Key: KAFKA-12526
> URL: https://issues.apache.org/jira/browse/KAFKA-12526
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Guozhang Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12549) Allow state stores to opt-in transactional support

2021-03-24 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12549:
-

 Summary: Allow state stores to opt-in transactional support
 Key: KAFKA-12549
 URL: https://issues.apache.org/jira/browse/KAFKA-12549
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Reporter: Guozhang Wang


Right now Kafka Stream's EOS implementation does not make any assumptions about 
the state store's transactional support. Allowing the state stores to 
optionally provide transactional support can have multiple benefits. E.g., if 
we add some APIs into the {{StateStore}} interface, like {{beginTxn}}, 
{{commitTxn}} and {{abortTxn}}. Then these APIs can be used under both ALOS and 
EOS such that:

* store.beginTxn
* store.put // during processing
* streams commit // either through eos protocol or not
* store.commitTxn

We can have the following benefits:
* Reduce the duplicated records upon crashes for ALOS (note this is not EOS 
still, but some middle-ground where uncommitted data within a state store would 
not be retained if store.commitTxn failed).
* No need to wipe the state store and re-bootstrap from scratch upon crashes 
for EOS. E.g., if a crash-failure happened between streams commit completes and 
store.commitTxn. We can instead just roll-forward the transaction by replaying 
the changelog from the second recent  streams committed offset towards the most 
recent committed offset.
* Remote stores that support txn then does not need to support wiping 
(https://issues.apache.org/jira/browse/KAFKA-12475).
* We can fix the known issues of emit-on-change 
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-557%3A+Add+emit+on+change+support+for+Kafka+Streams).
* We can support "query committed data only" for interactive queries (see below 
for reasons).

As for the implementation of these APIs, there are several options:
* The state store itself have natural transaction features (e.g. RocksDB).
* Use an in-memory buffer for all puts within a transaction, and upon 
`commitTxn` write the whole buffer as a batch to the underlying state store, or 
just drop the whole buffer upon aborting. Then for interactive queries, one can 
optionally only query the underlying store for committed data only.
* Use a separate store as the transient persistent buffer. Upon `beginTxn` 
create a new empty transient store, and upon `commitTxn` merge the store into 
the underlying store. Same applies for interactive querying committed-only data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12527) Remove deprecated "PartitionAssignor" interface

2021-03-23 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12527:
-

 Summary: Remove deprecated "PartitionAssignor" interface
 Key: KAFKA-12527
 URL: https://issues.apache.org/jira/browse/KAFKA-12527
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12526) Remove deprecated long ms overloads

2021-03-23 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12526:
-

 Summary: Remove deprecated long ms overloads
 Key: KAFKA-12526
 URL: https://issues.apache.org/jira/browse/KAFKA-12526
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12524) Remove deprecated WindowBytesStoreSupplier#segments

2021-03-22 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-12524:
-

 Summary: Remove deprecated WindowBytesStoreSupplier#segments
 Key: KAFKA-12524
 URL: https://issues.apache.org/jira/browse/KAFKA-12524
 Project: Kafka
  Issue Type: Sub-task
Reporter: Guozhang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-693: Client-side Circuit Breaker for Partition Write Errors

2021-03-22 Thread Guozhang Wang
Thanks for the updated KIP! Some more comments inlined.

On Sun, Mar 7, 2021 at 6:43 PM Guoqiang Shu  wrote:

>
>
> Guozhang, many thanks for taking a look! Sorry for the late reply, we have
> iterated the prototype on our production setup with your question in mind
> and updated the KIP correspondingly. Below are the answers and the summary
> of the updates
>
> >
> > 1) Why does the default implementation have a criterion of when it is
> > enabled -- i.e. after a certain number of messages have been successfully
> > sent -- instead of always enabled? Also how would this mechanism be
> > customizable in user's own implementations? The current interface APIs
> seem
> > not allow such configurations.
> >
>
> [GS] The enabling condition is to prevent undesirable effect of ‘ratio'
> when the volume is small. And we should allow the enabling condition to be
> customize in an implementation of break. To that end, the
> ProducerCircuitBreaker interface now extends Configurable interface, so it
> supports passing the initialization parameters of KafkaProducer during
> instantiation. The updated KIP contains more details.
>
>
> > 2) When does the "onError" function trigger? Is that triggered when the
> > response returned contains an error code (but maybe a retriable one, so
> > that the batch of records would be re-enqueued) or when the batch of
> > records are finally going to be dropped on the floor? If it is based on
> the
> > first case, then depending on the timeout configuration we may not
> trigger
> > the breaker in time, is that the case? Also I'm wondering if we can allow
> > this function to be triggered on other events, such as inflight.requests
> > exceeding the threshold as well.
> >
>
> [GS] In our original prototype the ‘onError’ function triggers in the
> first case you described, but it only handles ’TimeoutException’ and
> ’NetworkException’. Agree it may cause the muting to be not in time and
> further the upstream buffer to be filled up. With more experiments, we
> updated the interface with two new callbacks:
> onSend(): pass the current KafkaProducer's current network congestion to
> the plug-in interface before sending the message, allowing custom
> implementations to kick in based on network congestion (for example,
> inflight.requests).
> onComplete() is triggered either upon DeliveryTimeout/RequestTimeout or
> when the final result is processed by the sender after max retries is
> reached. As these two situations are complementary we can prevent glitches
> (for example, when inflight.requests periodically falls below the
> threshold).
>
> > 3) How would users customize "how long would we mute a partition" and
> "how
> > many partitions of a topic in total could be muted at the same time" from
> > the interface?
> >
> >
>
> [GS] We believe the muting length is empirical and in our team we use the
> heuristic of process time of single node failure - hence the proposed 10
> minutes default.
> We believe it is reasonable to allow any numbers of partitions to be muted
> at the same time (i.e. no need for the upper limit. In the extreme case of
> a cluster-level failure, the user should be alerted to manually divert
> cluster-level traffic to other normal clusters.
>
> I'm still not sure if, in your proposal, the muting length is a
customizable value (and if yes, through which config) or it is always hard
coded as 10 minutes?


> > Guozhang
> >
> >
> > On Mon, Dec 14, 2020 at 5:42 PM Guoqiang Shu 
> wrote:
> >
> > >
> > > Hi Jun and Justin,
> > >
> > > Many thanks for taking a look at our proposal and for the pointer! We
> > > learned about the mechanism proposed to enhance StickyPartitioner. Both
> > > methods aim to exclude brokers with transient errors and prevent
> cluster
> > > wide failure. The difference lies in the criteria used to tell if a
> broker
> > > is problematic: our KIP uses the error condition for the operation;
> and the
> > > heuristic in StickyPartitioner relies on the internal state
> > > max.in.flight.requests.per.connection.
> > >
> > > IMHO, using final result of write operation makes the behavior simpler
> to
> > > reason about. It covers all error scenarios and potentially supports
> all
> > > implementations of Partitioner. In contrast, using intermediate state
> may
> > > trigger action prematurely, for example, when
> > > max.in.flight.requests.per.connection reaches the threshold (due to
> small
> > > linger.ms value)  but the buffer in producer side is still in healthy
> > > state. In addition, in sequential mode
> > > max.in.flight.requests.per.connection is set to 1 therefore cannot be
> > > leveraged.
> > >
> > > Finally, as Justine pointed out, having AvailablePartitions reflects
> > > broker status (as enabled in our KIP) will benefit optimizations of the
> > > Partitioner in general, so the two classes of enhancements can coexist.
> > >
> > > Cheers,
> > > //George//
> > >
> > > On 2020/12/08 18:19:43, Justine Olshan  wrote:
> > > > Hi George,
> > > > 

<    1   2   3   4   5   6   7   8   9   10   >