[jira] [Created] (KAFKA-13289) Bulk processing data through a join with kafka-streams results in `Skipping record for expired segment`

2021-09-09 Thread Matthew Sheppard (Jira)
Matthew Sheppard created KAFKA-13289:


 Summary: Bulk processing data through a join with kafka-streams 
results in `Skipping record for expired segment`
 Key: KAFKA-13289
 URL: https://issues.apache.org/jira/browse/KAFKA-13289
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 2.8.0
Reporter: Matthew Sheppard


When pushing bulk data through a kafka-steams app, I see it log the following 
message many times...

`WARN 
org.apache.kafka.streams.state.internals.AbstractRocksDBSegmentedBytesStore - 
Skipping record for expired segment.`

...and data which I expect to have been joined through a leftJoin step appears 
to be lost.

I've seen this in practice either when my application has been shut down for a 
while and then is brought back up, or when I've used something like the 
[app-reset-rool](https://docs.confluent.io/platform/current/streams/developer-guide/app-reset-tool.html)
 in an attempt to have the application reprocess past data.

I was able to reproduce this behaviour in isolation by generating 1000 messages 
to two topics spaced an hour apart (with the original timestamps in order), 
then having kafka streams select a key for them and try to leftJoin the two 
rekeyed streams.

Self contained source code for that reproduction is available at 
https://github.com/mattsheppard/ins14809/blob/main/src/test/java/ins14809/Ins14809Test.java

The actual kafka-streams topology in there looks like this.

```
final StreamsBuilder builder = new StreamsBuilder();
final KStream leftStream = 
builder.stream(leftTopic);
final KStream rightStream = 
builder.stream(rightTopic);

final KStream rekeyedLeftStream = leftStream
.selectKey((k, v) -> v.substring(0, v.indexOf(":")));

final KStream rekeyedRightStream = rightStream
.selectKey((k, v) -> v.substring(0, v.indexOf(":")));

JoinWindows joinWindow = JoinWindows.of(Duration.ofSeconds(5));

final KStream joined = rekeyedLeftStream.leftJoin(
rekeyedRightStream,
(left, right) -> left + "/" + right,
joinWindow
);
```

...and the eventual output I produce looks like this...

```
...
523 [523,left/null]
524 [524,left/null, 524,left/524,right]
525 [525,left/525,right]
526 [526,left/null]
527 [527,left/null]
528 [528,left/528,right]
529 [529,left/null]
530 [530,left/null]
531 [531,left/null, 531,left/531,right]
532 [532,left/null]
533 [533,left/null]
534 [534,left/null, 534,left/534,right]
535 [535,left/null]
536 [536,left/null]
537 [537,left/null, 537,left/537,right]
538 [538,left/null]
539 [539,left/null]
540 [540,left/null]
541 [541,left/null]
542 [542,left/null]
543 [543,left/null]
...
```

...where as, given the input data, I expect to see every row end with the two 
values joined, rather than the right value being null.

Note that I understand it's expected that we initially get the left/null values 
for many values since that's the expected semantics of kafka-streams left join, 
at least until 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+Join+Semantics#KafkaStreamsJoinSemantics-ImprovedLeft/OuterStream-StreamJoin(v3.1.xandnewer)spurious

I've noticed that if I set a very large grace value on the join window the 
problem is solved, but since the input I provide is not out of order I did not 
expect to need to do that, and I'm weary of the resource requirements doing so 
in practice on an application with a lot of volume.

My suspicion is that something is happening such that when one partition is 
processed it causes the stream time to be pushed forward to the newest message 
in that partition, meaning when the next partition is then examined it is found 
to contain many records which are 'too old' compared to the stream time. 

I ran across 
https://kafkacommunity.blogspot.com/2020/02/re-skipping-record-for-expired-segment_88.html
 from a year and a half ago which seems to describe the same problem, but I'm 
hoping the self-contained reproduction might make the issue easier to tackle!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] 3.0.0 RC2

2021-09-09 Thread Konstantine Karantasis
Hi Bill,

I just added folder 30 to the kafka-site repo. Hadn't realized that this
separate manual step was part of the RC process and not the official
release (even though, strangely enough, I was expecting myself to be able
to read the docs online). I guess I needed a second nudge after Gary's
first comment on RC1 to see what was missing. I'll update the release doc
to make this more clear.

Should be accessible now. Please take another look.

Konstantine



On Fri, Sep 10, 2021 at 12:50 AM Bill Bejeck  wrote:

> Hi Konstantine,
>
> I've started to do the validation for the release and the link for docs
> doesn't work.
>
> Thanks,
> Bill
>
> On Wed, Sep 8, 2021 at 5:59 PM Konstantine Karantasis <
> kkaranta...@apache.org> wrote:
>
> > Hello again Kafka users, developers and client-developers,
> >
> > This is the third candidate for release of Apache Kafka 3.0.0.
> > It is a major release that includes many new features, including:
> >
> > * The deprecation of support for Java 8 and Scala 2.12.
> > * Kafka Raft support for snapshots of the metadata topic and other
> > improvements in the self-managed quorum.
> > * Deprecation of message formats v0 and v1.
> > * Stronger delivery guarantees for the Kafka producer enabled by default.
> > * Optimizations in OffsetFetch and FindCoordinator requests.
> > * More flexible Mirror Maker 2 configuration and deprecation of Mirror
> > Maker 1.
> > * Ability to restart a connector's tasks on a single call in Kafka
> Connect.
> > * Connector log contexts and connector client overrides are now enabled
> by
> > default.
> > * Enhanced semantics for timestamp synchronization in Kafka Streams.
> > * Revamped public API for Stream's TaskId.
> > * Default serde becomes null in Kafka Streams and several other
> > configuration changes.
> >
> > You may read and review a more detailed list of changes in the 3.0.0 blog
> > post draft here:
> >
> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache6
> >
> > Release notes for the 3.0.0 release:
> > https://home.apache.org/~kkarantasis/kafka-3.0.0-rc2/RELEASE_NOTES.html
> >
> > *** Please download, test and vote by Tuesday, September 14, 2021 ***
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~kkarantasis/kafka-3.0.0-rc2/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~kkarantasis/kafka-3.0.0-rc2/javadoc/
> >
> > * Tag to be voted upon (off 3.0 branch) is the 3.0.0 tag:
> > https://github.com/apache/kafka/releases/tag/3.0.0-rc2
> >
> > * Documentation:
> > https://kafka.apache.org/30/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/30/protocol.html
> >
> > * Successful Jenkins builds for the 3.0 branch:
> > Unit/integration tests:
> >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/3.0/129/
> > (1 flaky test failure)
> > System tests:
> > https://jenkins.confluent.io/job/system-test-kafka/job/3.0/67/
> > (1 flaky test failure)
> >
> > /**
> >
> > Thanks,
> > Konstantine
> >
>


[jira] [Created] (KAFKA-13288) Transaction find-hanging command with --broker-id excludes internal topics

2021-09-09 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13288:
---

 Summary: Transaction find-hanging command with --broker-id 
excludes internal topics
 Key: KAFKA-13288
 URL: https://issues.apache.org/jira/browse/KAFKA-13288
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: Jason Gustafson


We use the vanilla `Admin.listTopics()` in this command if `--broker-id` is 
specified. By default, this excludes internal topics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-767 Connect Latency Metrics

2021-09-09 Thread Jordan Bull
Hey Chris,

Thanks for jumping in! I have been using the consumer lag as an indicator
for some time, but when measured directly at the consumer, it will not
factor in the time that Connect actually spends transforming and sending
the messages. This is certainly useful for measuring if the connector is
keeping up, but doesn't really tell the story of how much delay Connect
introduces. We also perform a measurement of latency based off of the
commit times, but as you mentioned this is often dominated by the commit
interval. This limits our ability to provide SLOs for sub second latency as
this is used for realtime connecting. Both approaches do allow a
measurement of a connector's ability to keep up with a throughput, but
we've found neither allows us to measure real latency SLOs for Connect as a
realtime service.

- Jordan

On Tue, Sep 7, 2021 at 2:43 PM Chris Egerton 
wrote:

> Hi Jordan,
>
> Thanks for the KIP. I'm curious about a possible alternative where the
> consumer lag for the source connector can be monitored instead of the
> newly-proposed metric in the KIP. Although sink tasks can't directly report
> the successful write of a record to the sink system, they are responsible
> for indirectly monitoring and communicating this in the form of the offsets
> returned from the SinkTask::preCommit method. This should mean that, for
> any well-behaved connector that returns accurate offsets from its preCommit
> method (including connectors that perform synchronous writes in
> SinkTask::put, which in most cases will not override the default behavior
> of the preCommit method and will allow the most up-to-date offsets read
> from each topic to be committed to the consumer), the consumer lag for the
> connector should be a decent way to monitor latency. Of course, it'll be at
> the mercy of the commit interval for the connector and whether the
> connector can successfully commit offsets with its consumer, but since that
> often dictates where tasks will resume from if restarted, there's still
> plenty of value in this metric.
>
> Cheers,
>
> Chris
>
> On Thu, Sep 2, 2021 at 7:03 PM Ryanne Dolan  wrote:
>
> > Thanks Jordan, this is a major blindspot today.
> >
> > Ryanne
> >
> >
> > On Wed, Sep 1, 2021, 6:03 PM Jordan Bull  wrote:
> >
> > > Hi all,
> > >
> > > I would like to start the discussion for KIP-767 involving adding
> latency
> > > metrics to Connect. The KIP can be found at
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-767%3A+Connect+Latency+Metrics
> > >
> > > Thanks,
> > > Jordan
> > >
> >
>


Re: [DISCUSS] KIP-770: Replace "buffered.records.per.partition" with "input.buffer.max.bytes"

2021-09-09 Thread Matthias J. Sax
Thanks for the KIP.

There was some discussion about adding a metric on the thread, but the
KIP does not contain anything about it. Did we drop this suggestion or
was the KIP not updated accordingly?


Nit:

> This would be a global config applicable per processing topology

Can we change this to `per Kafka Streams instance.`

Atm, a Stream instance executes a single topology, so it does not make
any effective difference right now. However, it seems better (more
logical) to bind the config to the instance (not the topology the
instance executes).


-Matthias

On 9/2/21 6:08 AM, Sagar wrote:
> Thanks Guozhang and Luke.
> 
> I have updated the KIP with all the suggested changes.
> 
> Do you think we could start voting for this?
> 
> Thanks!
> Sagar.
> 
> On Thu, Sep 2, 2021 at 8:26 AM Luke Chen  wrote:
> 
>> Thanks for the KIP. Overall LGTM.
>>
>> Just one thought, if we "rename" the config directly as mentioned in the
>> KIP, would that break existing applications?
>> Should we deprecate the old one first, and make the old/new names co-exist
>> for some period of time?
>>
>> Public Interfaces
>>
>>- Adding a new config *input.buffer.max.bytes *applicable at a topology
>>level. The importance of this config would be *Medium*.
>>- Renaming *cache.max.bytes.buffering* to *statestore.cache.max.bytes*.
>>
>>
>>
>> Thank you.
>> Luke
>>
>> On Thu, Sep 2, 2021 at 1:50 AM Guozhang Wang  wrote:
>>
>>> Currently the state store cache size default value is 10MB today, which
>>> arguably is rather small. So I'm thinking maybe for this config default
>> to
>>> 512MB.
>>>
>>> Other than that, LGTM.
>>>
>>> On Sat, Aug 28, 2021 at 11:34 AM Sagar 
>> wrote:
>>>
 Thanks Guozhang and Sophie.

 Yeah a small default value would lower the throughput. I didn't quite
 realise it earlier. It's slightly hard to predict this value so I would
 guess around 1/2 GB to 1 GB? WDYT?

 Regarding the renaming of the config and the new metric, sure would
>>> include
 it in the KIP.

 Lastly, importance would also. be added. I guess Medium should be ok.

 Thanks!
 Sagar.


 On Sat, Aug 28, 2021 at 10:42 AM Sophie Blee-Goldman
  wrote:

> 1) I agree that we should just distribute the bytes evenly, at least
>>> for
> now. It's simpler to understand and
> we can always change it later, plus it makes sense to keep this
>> aligned
> with how the cache works today
>
> 2) +1 to being conservative in the generous sense, it's just not
 something
> we can predict with any degree
> of accuracy and even if we could, the appropriate value is going to
 differ
> wildly across applications and use
> cases. We might want to just pick some multiple of the default cache
 size,
> and maybe do some research on
> other relevant defaults or sizes (default JVM heap, size of available
> memory in common hosts eg EC2
> instances, etc). We don't need to worry as much about erring on the
>>> side
 of
> too big, since other configs like
> the max.poll.records will help somewhat to keep it from exploding.
>
> 4) 100%, I always found the *cache.max.bytes.buffering* config name
>> to
>>> be
> incredibly confusing. Deprecating this in
> favor of "*statestore.cache.max.bytes*" and aligning it to the new
>>> input
> buffer config sounds good to me to include here.
>
> 5) The KIP should list all relevant public-facing changes, including
> metadata like the config's "Importance". Personally
> I would recommend Medium, or even High if we're really worried about
>>> the
> default being wrong for a lot of users
>
> Thanks for the KIP, besides those few things that Guozhang brought up
>>> and
> the config importance, everything SGTM
>
> -Sophie
>
> On Thu, Aug 26, 2021 at 2:41 PM Guozhang Wang 
 wrote:
>
>> 1) I meant for your proposed solution. I.e. to distribute the
 configured
>> bytes among threads evenly.
>>
>> 2) I was actually thinking about making the default a large enough
 value
> so
>> that we would not introduce performance regression: thinking about
>> a
 use
>> case with many partitions and each record may be large, then
 effectively
> we
>> would only start pausing when the total bytes buffered is pretty
>>> large.
> If
>> we set the default value to small, we would be "more aggressive" on
> pausing
>> which may impact throughput.
>>
>> 3) Yes exactly, this would naturally be at the "partition-group"
>>> class
>> since that represents the task's all input partitions.
>>
>> 4) This is just a bold thought, I'm interested to see other's
>>> thoughts.
>>
>>
>> Guozhang
>>
>> On Mon, Aug 23, 2021 at 4:10 AM Sagar 
 wrote:
>>
>>> Thanks Guozhang.
>>>
>>> 1) Just for my confirmation, when you say we should proceed

Re: [VOTE] 3.0.0 RC2

2021-09-09 Thread Bill Bejeck
Hi Konstantine,

I've started to do the validation for the release and the link for docs
doesn't work.

Thanks,
Bill

On Wed, Sep 8, 2021 at 5:59 PM Konstantine Karantasis <
kkaranta...@apache.org> wrote:

> Hello again Kafka users, developers and client-developers,
>
> This is the third candidate for release of Apache Kafka 3.0.0.
> It is a major release that includes many new features, including:
>
> * The deprecation of support for Java 8 and Scala 2.12.
> * Kafka Raft support for snapshots of the metadata topic and other
> improvements in the self-managed quorum.
> * Deprecation of message formats v0 and v1.
> * Stronger delivery guarantees for the Kafka producer enabled by default.
> * Optimizations in OffsetFetch and FindCoordinator requests.
> * More flexible Mirror Maker 2 configuration and deprecation of Mirror
> Maker 1.
> * Ability to restart a connector's tasks on a single call in Kafka Connect.
> * Connector log contexts and connector client overrides are now enabled by
> default.
> * Enhanced semantics for timestamp synchronization in Kafka Streams.
> * Revamped public API for Stream's TaskId.
> * Default serde becomes null in Kafka Streams and several other
> configuration changes.
>
> You may read and review a more detailed list of changes in the 3.0.0 blog
> post draft here:
> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache6
>
> Release notes for the 3.0.0 release:
> https://home.apache.org/~kkarantasis/kafka-3.0.0-rc2/RELEASE_NOTES.html
>
> *** Please download, test and vote by Tuesday, September 14, 2021 ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> https://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> https://home.apache.org/~kkarantasis/kafka-3.0.0-rc2/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> https://home.apache.org/~kkarantasis/kafka-3.0.0-rc2/javadoc/
>
> * Tag to be voted upon (off 3.0 branch) is the 3.0.0 tag:
> https://github.com/apache/kafka/releases/tag/3.0.0-rc2
>
> * Documentation:
> https://kafka.apache.org/30/documentation.html
>
> * Protocol:
> https://kafka.apache.org/30/protocol.html
>
> * Successful Jenkins builds for the 3.0 branch:
> Unit/integration tests:
>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka/detail/3.0/129/
> (1 flaky test failure)
> System tests:
> https://jenkins.confluent.io/job/system-test-kafka/job/3.0/67/
> (1 flaky test failure)
>
> /**
>
> Thanks,
> Konstantine
>


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #463

2021-09-09 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 489453 lines...]
[2021-09-09T21:06:40.938Z] [INFO] Installing 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/quickstart/java/pom.xml to 
/home/jenkins/.m2/repository/org/apache/kafka/streams-quickstart-java/3.1.0-SNAPSHOT/streams-quickstart-java-3.1.0-SNAPSHOT.pom
[2021-09-09T21:06:40.938Z] [INFO] 
[2021-09-09T21:06:40.938Z] [INFO] --- 
maven-archetype-plugin:2.2:update-local-catalog (default-update-local-catalog) 
@ streams-quickstart-java ---
[2021-09-09T21:06:40.938Z] [INFO] 

[2021-09-09T21:06:40.938Z] [INFO] Reactor Summary for Kafka Streams :: 
Quickstart 3.1.0-SNAPSHOT:
[2021-09-09T21:06:40.938Z] [INFO] 
[2021-09-09T21:06:40.938Z] [INFO] Kafka Streams :: Quickstart 
 SUCCESS [  1.895 s]
[2021-09-09T21:06:40.938Z] [INFO] streams-quickstart-java 
 SUCCESS [  0.998 s]
[2021-09-09T21:06:40.938Z] [INFO] 

[2021-09-09T21:06:40.938Z] [INFO] BUILD SUCCESS
[2021-09-09T21:06:40.938Z] [INFO] 

[2021-09-09T21:06:40.938Z] [INFO] Total time:  3.161 s
[2021-09-09T21:06:40.938Z] [INFO] Finished at: 2021-09-09T21:06:41Z
[2021-09-09T21:06:40.938Z] [INFO] 

[Pipeline] dir
[2021-09-09T21:06:41.461Z] Running in 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/quickstart/test-streams-archetype
[Pipeline] {
[Pipeline] sh
[2021-09-09T21:06:43.805Z] + echo Y
[2021-09-09T21:06:43.805Z] + mvn archetype:generate -DarchetypeCatalog=local 
-DarchetypeGroupId=org.apache.kafka 
-DarchetypeArtifactId=streams-quickstart-java -DarchetypeVersion=3.1.0-SNAPSHOT 
-DgroupId=streams.examples -DartifactId=streams.examples -Dversion=0.1 
-Dpackage=myapps
[2021-09-09T21:06:44.750Z] [INFO] Scanning for projects...
[2021-09-09T21:06:45.696Z] [INFO] 
[2021-09-09T21:06:45.696Z] [INFO] --< 
org.apache.maven:standalone-pom >---
[2021-09-09T21:06:45.696Z] [INFO] Building Maven Stub Project (No POM) 1
[2021-09-09T21:06:45.696Z] [INFO] [ pom 
]-
[2021-09-09T21:06:45.696Z] [INFO] 
[2021-09-09T21:06:45.696Z] [INFO] >>> maven-archetype-plugin:3.2.0:generate 
(default-cli) > generate-sources @ standalone-pom >>>
[2021-09-09T21:06:45.696Z] [INFO] 
[2021-09-09T21:06:45.696Z] [INFO] <<< maven-archetype-plugin:3.2.0:generate 
(default-cli) < generate-sources @ standalone-pom <<<
[2021-09-09T21:06:45.696Z] [INFO] 
[2021-09-09T21:06:45.696Z] [INFO] 
[2021-09-09T21:06:45.696Z] [INFO] --- maven-archetype-plugin:3.2.0:generate 
(default-cli) @ standalone-pom ---
[2021-09-09T21:06:46.642Z] [INFO] Generating project in Interactive mode
[2021-09-09T21:06:46.642Z] [WARNING] Archetype not found in any catalog. 
Falling back to central repository.
[2021-09-09T21:06:46.642Z] [WARNING] Add a repository with id 'archetype' in 
your settings.xml if archetype's repository is elsewhere.
[2021-09-09T21:06:46.642Z] [INFO] Using property: groupId = streams.examples
[2021-09-09T21:06:46.642Z] [INFO] Using property: artifactId = streams.examples
[2021-09-09T21:06:46.642Z] [INFO] Using property: version = 0.1
[2021-09-09T21:06:46.642Z] [INFO] Using property: package = myapps
[2021-09-09T21:06:46.642Z] Confirm properties configuration:
[2021-09-09T21:06:46.642Z] groupId: streams.examples
[2021-09-09T21:06:46.642Z] artifactId: streams.examples
[2021-09-09T21:06:46.642Z] version: 0.1
[2021-09-09T21:06:46.642Z] package: myapps
[2021-09-09T21:06:46.642Z]  Y: : [INFO] 

[2021-09-09T21:06:46.642Z] [INFO] Using following parameters for creating 
project from Archetype: streams-quickstart-java:3.1.0-SNAPSHOT
[2021-09-09T21:06:46.642Z] [INFO] 

[2021-09-09T21:06:46.642Z] [INFO] Parameter: groupId, Value: streams.examples
[2021-09-09T21:06:46.642Z] [INFO] Parameter: artifactId, Value: streams.examples
[2021-09-09T21:06:46.642Z] [INFO] Parameter: version, Value: 0.1
[2021-09-09T21:06:46.642Z] [INFO] Parameter: package, Value: myapps
[2021-09-09T21:06:46.642Z] [INFO] Parameter: packageInPathFormat, Value: myapps
[2021-09-09T21:06:46.642Z] [INFO] Parameter: package, Value: myapps
[2021-09-09T21:06:46.642Z] [INFO] Parameter: version, Value: 0.1
[2021-09-09T21:06:46.642Z] [INFO] Parameter: groupId, Value: streams.examples
[2021-09-09T21:06:46.642Z] [INFO] Parameter: artifactId, Value: streams.examples
[2021-09-09T21:06:46.642Z] [INFO] Project created from Archetype in dir: 
/home/jenkins/workspace/Kafka_kafka_trunk/streams/quickstart/test-streams-archetype/stre

[jira] [Created] (KAFKA-13287) Upgrade RocksDB to 6.22.1.1

2021-09-09 Thread Bruno Cadonna (Jira)
Bruno Cadonna created KAFKA-13287:
-

 Summary: Upgrade RocksDB to 6.22.1.1
 Key: KAFKA-13287
 URL: https://issues.apache.org/jira/browse/KAFKA-13287
 Project: Kafka
  Issue Type: Task
  Components: streams
Reporter: Bruno Cadonna
 Attachments: compat_report.html

RocksDB 6.22.1.1 is source compatible with RocksDB 6.19.3 that Streams 
currently used  (see attached compatibility report).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13286) Revisit Streams State Store and Serde Implementation

2021-09-09 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-13286:
-

 Summary: Revisit Streams State Store and Serde Implementation
 Key: KAFKA-13286
 URL: https://issues.apache.org/jira/browse/KAFKA-13286
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Guozhang Wang


Kafka Streams state store is built in hierarchical layers as metered -> cached 
-> logged -> [convert] -> raw stores (rocksDB, in-memory), and it leveraged on 
the builtin Serde libraries for serialize / deserialize. There are several 
inefficiencies in the current design:

* The API only supports serde using byte arrays. This means we generate a lot 
of garbage and spend unnecessary time copying bytes, especially when working 
with windowed state stores that rely on composite keys. In many places in the 
code we have extract parts of the composite key to deserialize the either the 
timestamp or the message key from the state store key (e.g. the methods in 
WindowStoreUtils).
* The serde operation could happen on multiple layers of the state store 
hierarchies, which means we need to extra byte array copies as we move along 
doing serdes. For example, we do serde in the metered layer, but then again in 
cached layer with cache functions, and also in logged stores for generated the 
key/value in bytes to send to Kafka.

To improve on this, we can consider having support for serde into/from 
ByteBuffers would allow us to reuse the underlying bytearrays and just pass 
around slices of the underlying Buffers to avoid the unnecessary copying. 

1) More specifically, e.g. the serialize interface could be refactored to:

{code}
ByteBuffer serialize(String topic, T data, ByteBuffer);
{code}

Where the serialized bytes would be appended to the ByteBuffer. When a series 
of serialize functions are called along side the state store hierarchies, we 
then just need to make sure that what's should be appended first to the 
ByteBuffer would be serialized first. E.g. if the serialized bytes format of a 
WindowSchema is 

Then we would need to call the serialize as in:

{code}
serialize(key, serialize(leftRightBoolean, serialize(timestamp, buffer))); 
{code}

2) In addition, we can consider having a pool of ByteBuffers representing a set 
of byte arrays that can be re-used. This can be captured as an intelligent 
{{ByteBufferSupplier}}, which provides:

{code}
ByteBuffer ByteBufferSupplier#allocate(long size)
{code}

Its implementation can choose to either create new byte arrays, or re-use 
existing ones in the pool; the gottcha though is that we may usually not know 
the serialized byte length for raw keys (think: in practice the keys would be 
in json/avro etc), and hence would not know how to pass in {{size}} for 
serialization, and hence may need to be conservative, or trial and error etc.

Of course callers then would be responsible for returning the used ByteBuffer 
back to the Supplier via

{code}
ByteBufferSupplier#deallocate(ByteBuffer buffer)
{code}

3) With RocksDB's direct byte-buffer (KAFKA-9168) we can optionally also 
allocate them from RocksDB directly so that using them for puts/gets would not 
go through JNI, hence is more efficient. The Supplier then would need to be 
careful to deallocate these direct byte-buffers since they would not be GC'ed 
by the JVM.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #462

2021-09-09 Thread Apache Jenkins Server
See 




[GitHub] [kafka-site] bbejeck merged pull request #371: Added select KS APAC & EU 2021 videos

2021-09-09 Thread GitBox


bbejeck merged pull request #371:
URL: https://github.com/apache/kafka-site/pull/371


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (KAFKA-12988) Change RLMM add/updateRemoteLogSegmentMetadata and putRemotePartitionDeleteMetadata APIS asynchronous.

2021-09-09 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-12988.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

merged the PR to trunk.

> Change RLMM add/updateRemoteLogSegmentMetadata and 
> putRemotePartitionDeleteMetadata APIS asynchronous.
> --
>
> Key: KAFKA-12988
> URL: https://issues.apache.org/jira/browse/KAFKA-12988
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Satish Duggana
>Assignee: Satish Duggana
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10542) Convert KTable maps to new PAPI

2021-09-09 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya resolved KAFKA-10542.
--
Resolution: Fixed

https://github.com/apache/kafka/pull/11099

> Convert KTable maps to new PAPI
> ---
>
> Key: KAFKA-10542
> URL: https://issues.apache.org/jira/browse/KAFKA-10542
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: John Roesler
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-13201) Convert KTable suppress to new PAPI

2021-09-09 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya resolved KAFKA-13201.
--
Resolution: Fixed

https://github.com/apache/kafka/pull/11213

> Convert KTable suppress to new PAPI
> ---
>
> Key: KAFKA-13201
> URL: https://issues.apache.org/jira/browse/KAFKA-13201
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-13256) Possible NPE in ConfigDef when rendering (enriched) RST or HTML when documentation is not set/NULL

2021-09-09 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-13256.

Fix Version/s: 3.1.0
   Resolution: Fixed

> Possible NPE in ConfigDef when rendering (enriched) RST or HTML when 
> documentation is not set/NULL
> --
>
> Key: KAFKA-13256
> URL: https://issues.apache.org/jira/browse/KAFKA-13256
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 2.8.0
>Reporter: René Kerner
>Assignee: René Kerner
>Priority: Major
> Fix For: 3.1.0
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> While working on Debezium I discovered the following issue:
> When Kafka's ConfigDef renders the HTML or RST documentation representation 
> of the config definition, it requires `ConfigKey.documentation` member 
> variable to be a java.lang.String instance that's set to an actual value 
> different than NULL, else NPE happens:
> {code:java}
>  b.append(key.documentation.replaceAll("\n", ""));
> {code}
> {code:java}
>  for (String docLine : key.documentation.split("\n")) {
> {code}
>  
> When `documentation` is not set/NULL I suggest to either set a valid String 
> like "No documentation available" or skip that config key.
>  
> I could provide a PR to fix this soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » 2.8 #79

2021-09-09 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-774: Deprecate public access to Admin client's *Result constructors

2021-09-09 Thread Josep Prat
Hi Tom,

Thanks for your elaborate explanation. I agree with you, it doesn't seem
like the best option right now. Thanks for updating the KIP with this
rejected alternative.

Best,
———
Josep Prat

Aiven Deutschland GmbH

Immanuelkirchstraße 26, 10405 Berlin

Amtsgericht Charlottenburg, HRB 209739 B

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

m: +491715557497

w: aiven.io

e: josep.p...@aiven.io

On Thu, Sep 9, 2021, 17:37 Tom Bentley  wrote:

> Hi Josep,
>
> Thanks for the question! I did consider it briefly:
>
> * As you mention, it would also be quite a lot more code (interfaces +
> impls) for relatively small protection.
> * It wouldn't simplify life for people wanting to mock the Admin client. It
> would break code at runtime for people who (for whatever reason) are
> already using reflection to instantiate the package-private constructors.
> * I don't  _think_ it would be binary compatible for clients that were just
> invoking methods on Result instances (specifically, I _think_ the JVM
> rejects bytecode that uses invokevirtual on an interface, the byte code
> should be using invokeinterface instead). Maybe that's not a huge problem
> because people should not be substituting a Kafka 4.x jar for a 3.x one at
> runtime.
>
> So on balance I'm inclined to the simpler approach in the KIP.
>
> If we _did_ decide to switch to interface in 4.0, deprecating the
> constructors would be a first step anyway, so it would still make sense to
> deprecate now. The difference would be what happens for 4.0 (or 5.0...):
> Either changing the access modifier or introducing interfaces. Therefore
> it's something we could change our minds on before that major release, if
> there were other motivations for using interfaces.
>
> I've added this to the rejected alternatives.
>
> Kind regards,
>
> Tom
>
> On Thu, Sep 9, 2021 at 3:54 PM Josep Prat 
> wrote:
>
> > Hi Tom,
> > Thanks for the KIP,
> > I have one question. Have you considered converting those classes to
> > interfaces? This would also solve the problem. I must admit that the
> change
> > is way bigger this way, but I would argue it offers a cleaner distinction
> > and it offers more safety against mistakenly making methods public that
> > shouldn't be.
> >
> >
> > Thanks!
> > ———
> > Josep Prat
> >
> > Aiven Deutschland GmbH
> >
> > Immanuelkirchstraße 26, 10405 Berlin
> >
> > Amtsgericht Charlottenburg, HRB 209739 B
> >
> > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> >
> > m: +491715557497
> >
> > w: aiven.io
> >
> > e: josep.p...@aiven.io
> >
> > On Thu, Sep 9, 2021, 16:25 Tom Bentley  wrote:
> >
> > > Hi,
> > >
> > > I've written a small KIP-774 that proposes to deprecate public access
> to
> > > the Admin client's *Result constructors:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-774%3A+Deprecate+public+access+to+Admin+client%27s+*Result+constructors
> > >
> > > I'd be grateful for any comments you may have.
> > >
> > > Kind regards,
> > >
> > > Tom
> > >
> >
>


Re: [DISCUSS] KIP-774: Deprecate public access to Admin client's *Result constructors

2021-09-09 Thread Tom Bentley
Hi Josep,

Thanks for the question! I did consider it briefly:

* As you mention, it would also be quite a lot more code (interfaces +
impls) for relatively small protection.
* It wouldn't simplify life for people wanting to mock the Admin client. It
would break code at runtime for people who (for whatever reason) are
already using reflection to instantiate the package-private constructors.
* I don't  _think_ it would be binary compatible for clients that were just
invoking methods on Result instances (specifically, I _think_ the JVM
rejects bytecode that uses invokevirtual on an interface, the byte code
should be using invokeinterface instead). Maybe that's not a huge problem
because people should not be substituting a Kafka 4.x jar for a 3.x one at
runtime.

So on balance I'm inclined to the simpler approach in the KIP.

If we _did_ decide to switch to interface in 4.0, deprecating the
constructors would be a first step anyway, so it would still make sense to
deprecate now. The difference would be what happens for 4.0 (or 5.0...):
Either changing the access modifier or introducing interfaces. Therefore
it's something we could change our minds on before that major release, if
there were other motivations for using interfaces.

I've added this to the rejected alternatives.

Kind regards,

Tom

On Thu, Sep 9, 2021 at 3:54 PM Josep Prat 
wrote:

> Hi Tom,
> Thanks for the KIP,
> I have one question. Have you considered converting those classes to
> interfaces? This would also solve the problem. I must admit that the change
> is way bigger this way, but I would argue it offers a cleaner distinction
> and it offers more safety against mistakenly making methods public that
> shouldn't be.
>
>
> Thanks!
> ———
> Josep Prat
>
> Aiven Deutschland GmbH
>
> Immanuelkirchstraße 26, 10405 Berlin
>
> Amtsgericht Charlottenburg, HRB 209739 B
>
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>
> m: +491715557497
>
> w: aiven.io
>
> e: josep.p...@aiven.io
>
> On Thu, Sep 9, 2021, 16:25 Tom Bentley  wrote:
>
> > Hi,
> >
> > I've written a small KIP-774 that proposes to deprecate public access to
> > the Admin client's *Result constructors:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-774%3A+Deprecate+public+access+to+Admin+client%27s+*Result+constructors
> >
> > I'd be grateful for any comments you may have.
> >
> > Kind regards,
> >
> > Tom
> >
>


Re: [DISCUSS] KIP-774: Deprecate public access to Admin client's *Result constructors

2021-09-09 Thread Josep Prat
Hi Tom,
Thanks for the KIP,
I have one question. Have you considered converting those classes to
interfaces? This would also solve the problem. I must admit that the change
is way bigger this way, but I would argue it offers a cleaner distinction
and it offers more safety against mistakenly making methods public that
shouldn't be.


Thanks!
———
Josep Prat

Aiven Deutschland GmbH

Immanuelkirchstraße 26, 10405 Berlin

Amtsgericht Charlottenburg, HRB 209739 B

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

m: +491715557497

w: aiven.io

e: josep.p...@aiven.io

On Thu, Sep 9, 2021, 16:25 Tom Bentley  wrote:

> Hi,
>
> I've written a small KIP-774 that proposes to deprecate public access to
> the Admin client's *Result constructors:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-774%3A+Deprecate+public+access+to+Admin+client%27s+*Result+constructors
>
> I'd be grateful for any comments you may have.
>
> Kind regards,
>
> Tom
>


[DISCUSS] KIP-774: Deprecate public access to Admin client's *Result constructors

2021-09-09 Thread Tom Bentley
Hi,

I've written a small KIP-774 that proposes to deprecate public access to
the Admin client's *Result constructors:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-774%3A+Deprecate+public+access+to+Admin+client%27s+*Result+constructors

I'd be grateful for any comments you may have.

Kind regards,

Tom


[jira] [Created] (KAFKA-13285) Use consistent access modifier for Admin clients Result classes

2021-09-09 Thread Tom Bentley (Jira)
Tom Bentley created KAFKA-13285:
---

 Summary: Use consistent access modifier for Admin clients Result 
classes
 Key: KAFKA-13285
 URL: https://issues.apache.org/jira/browse/KAFKA-13285
 Project: Kafka
  Issue Type: Task
  Components: admin
Affects Versions: 3.0.0
Reporter: Tom Bentley


The following classes in the Admin client have public constructors, while the 
rest have package-private constructors:
AlterClientQuotasResult
AlterUserScramCredentialsResult
DeleteRecordsResult
DescribeClientQuotasResult
DescribeConsumerGroupsResult
ListOffsetsResult

There should be consistency across all the Result classes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: Kafka » kafka-2.7-jdk8 #178

2021-09-09 Thread Apache Jenkins Server
See 


Changes:

[Konstantine Karantasis] MINOR: Remove unsupported rsync and ssh commands from 
release.py (#11309)


--
[...truncated 6.91 MB...]

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.Consum

Jenkins build is still unstable: Kafka » Kafka Branch Builder » 2.8 #78

2021-09-09 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.0 #131

2021-09-09 Thread Apache Jenkins Server
See