[jira] [Created] (KAFKA-13062) refactor DeleteConsumerGroupsHandler

2021-07-12 Thread Luke Chen (Jira)
Luke Chen created KAFKA-13062:
-

 Summary: refactor DeleteConsumerGroupsHandler
 Key: KAFKA-13062
 URL: https://issues.apache.org/jira/browse/KAFKA-13062
 Project: Kafka
  Issue Type: Sub-task
Reporter: Luke Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13063) refactor DescribeConsumerGroupsHandler

2021-07-12 Thread Luke Chen (Jira)
Luke Chen created KAFKA-13063:
-

 Summary: refactor DescribeConsumerGroupsHandler
 Key: KAFKA-13063
 URL: https://issues.apache.org/jira/browse/KAFKA-13063
 Project: Kafka
  Issue Type: Sub-task
Reporter: Luke Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13064) refactor ListConsumerGroupOffsetsHandler

2021-07-12 Thread Luke Chen (Jira)
Luke Chen created KAFKA-13064:
-

 Summary: refactor ListConsumerGroupOffsetsHandler
 Key: KAFKA-13064
 URL: https://issues.apache.org/jira/browse/KAFKA-13064
 Project: Kafka
  Issue Type: Sub-task
Reporter: Luke Chen
Assignee: Luke Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] KIP-655: Windowed "Distinct" Operation for Kafka Streams

2021-07-12 Thread Ivan Ponomarev

Hello all!

I'd like to start the vote for KIP-655 which proposes the zero-arg 
distict() method to be added to TimeWindowedKStream and 
SessionWindowedKStream.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-655%3A+Windowed+Distinct+Operation+for+Kafka+Streams+API

Regards,

Ivan


[jira] [Created] (KAFKA-13065) Replace EasyMock with Mockito for BasicAuthSecurityRestExtensionTest

2021-07-12 Thread Chun-Hao Tang (Jira)
Chun-Hao Tang created KAFKA-13065:
-

 Summary: Replace EasyMock with Mockito for 
BasicAuthSecurityRestExtensionTest
 Key: KAFKA-13065
 URL: https://issues.apache.org/jira/browse/KAFKA-13065
 Project: Kafka
  Issue Type: Sub-task
Reporter: Chun-Hao Tang
Assignee: Chun-Hao Tang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13066) Replace EasyMock with Mockito for FileStreamSinkConnectorTest

2021-07-12 Thread Chun-Hao Tang (Jira)
Chun-Hao Tang created KAFKA-13066:
-

 Summary: Replace EasyMock with Mockito for 
FileStreamSinkConnectorTest
 Key: KAFKA-13066
 URL: https://issues.apache.org/jira/browse/KAFKA-13066
 Project: Kafka
  Issue Type: Sub-task
Reporter: Chun-Hao Tang
Assignee: Chun-Hao Tang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-655: Windowed "Distinct" Operation for KStream

2021-07-12 Thread Bruno Cadonna

Hi Ivan,

Thank you for the KIP!

Some aspects are not clear to me from the KIP and I have a proposal.

1. The KIP does not describe the criteria that define a duplicate. Could 
you add a definition of duplicate to the KIP?


2. The KIP does not describe what happens if distinct() is applied on a 
hopping window. On the DSL level, I do not see how you can avoid that 
users apply distinct() on a hopping window, i.e., you cannot avoid it at 
compile time, you need to check it at runtime and throw an exception. Is 
this correct or am I missing something?


3. I would also like to back a proposal by Sophie. She proposed to use 
deduplicate() instead of distinct(), since the other DSL operations are 
also verbs. I do not think that SQL and the Java Stream API are good 
arguments to not use a verb.


Best,
Bruno


On 10.07.21 19:11, John Roesler wrote:

Hi Ivan,

Sorry for the silence!

I have just re-read the proposal.

To summarize, you are now only proposing the zero-arg distict() method to be 
added to TimeWindowedKStream and SessionWindowedKStream, right?

I’m in favor of this proposal.

Thanks,
John

On Sat, Jul 10, 2021, at 10:18, Ivan Ponomarev wrote:

Hello everyone,

I would like to remind you about KIP-655 and KIP-759 just in case they
got lost in your inbox.

Now the initial proposal is split into two independent and smaller ones,
so it must be easier to review them. Of course, if you have time.

Regards,

Ivan


24.06.2021 18:11, Ivan Ponomarev пишет:

Hello all,

I have rewritten the KIP-655 summarizing what was agreed upon during
this discussion (now the proposal is much simpler and less invasive).

I have also created KIP-759 (cancelRepartition operation) and started a
discussion for it.

Regards,

Ivan.



04.06.2021 8:15, Matthias J. Sax пишет:

Just skimmed over the thread -- first of all, I am glad that we could
merge KIP-418 and ship it :)

About the re-partitioning concerns, there are already two tickets for it:

   - https://issues.apache.org/jira/browse/KAFKA-4835
   - https://issues.apache.org/jira/browse/KAFKA-10844

Thus, it seems best to exclude this topic from this KIP, and do a
separate KIP for it (if necessary, we can "pause" this KIP until the
repartition KIP is done). It's a long standing "issue" and we should
resolve it in a general way I guess.

(Did not yet ready all responses in detail yet, so keeping this comment
short.)


-Matthias

On 6/2/21 6:35 AM, John Roesler wrote:

Thanks, Ivan!

That sounds like a great plan to me. Two smaller KIPs are easier to
agree on than one big one.

I agree hopping and sliding windows will actually have a duplicating
effect. We can avoid adding distinct() to the sliding window
interface, but hopping windows are just a different parameterization
of epoch-aligned windows. It seems we can’t do much about that except
document the issue.

Thanks,
John

On Wed, May 26, 2021, at 10:14, Ivan Ponomarev wrote:

Hi John!

I think that your proposal is just fantastic, it simplifies things a
lot!

I also felt uncomfortable due to the fact that the proposed
`distinct()`
is not somewhere near `count()` and `reduce(..)`. But
`selectKey(..).groupByKey().windowedBy(..).distinct()` didn't look like
a correct option for  me because of the issue with the unneeded
repartitioning.

The bold idea that we can just CANCEL the repartitioning didn't came to
my mind.

What seemed to me a single problem is in fact two unrelated problems:
`distinct` operation and cancelling the unneeded repartitioning.

   > what if we introduce a parameter to `selectKey()` that specifies
that
the caller asserts that the new key does _not_ change the data
partitioning?

I think a more elegant solution would be not to add a new parameter to
`selectKey` and all the other key-changing operations (`map`,
`transform`, `flatMap`, ...), but add a new operator
`KStream#cancelRepartitioning()` that resets `keyChangingOperation`
flag
for the upstream node. Of course, "use it only if you know what you're
doing" warning is to be added. Well, it's a topic for a separate KIP!

Concerning `distinct()`. If we use `XXXWindowedKStream` facilities,
then
changes to the API are minimally invasive: we're just adding
`distinct()` to TimeWindowedKStream and SessionWindowedKStream, and
that's all.

We can now define `distinct` as an operation that returns only a first
record that falls into a new window, and filters out all the other
records that fall into an already existing window. BTW, we can mock the
behaviour of such an operation with `TopologyTestDriver` using
`reduce((l, r) -> STOP)`.filterNot((k, v)->STOP.equals(v)).  ;-)

Consider the following example (record times are in seconds):

//three bursts of variously ordered records
4, 5, 6
23, 22, 24
34, 33, 32
//'late arrivals'
7, 22, 35


1. 'Epoch-aligned deduplication' using tumbling windows:

.groupByKey().windowedBy(TimeWindows.of(Duration.ofSeconds(10))).distinct()


produces

(key@[0/1], 4)
(key@[2/3], 23)
(key@[3/4], 34)

-- t

Re: [DISCUSS] KIP-477: Add PATCH method for connector config in Connect REST API

2021-07-12 Thread Chris Egerton
Hi all,

Know it's been a while for this KIP but in my personal experience the value
of a PATCH method in the REST API has actually increased over time and I'd
love to have this option for quick, stop-the-bleeding remediation efforts.

One thought that comes to mind is that sometimes it may be useful to
explicitly remove properties from a connector configuration. We might
permit this by allowing users to specify null (the JSON literal, not a
string containing the characters "null") as the value for to-be-removed
properties.

I'd love to see this change if you're still interested in driving it, Ivan.
Hopefully we can give it the attention it deserves in the upcoming months!

Cheers,

Chris

On Fri, Jun 28, 2019 at 4:56 AM Ivan Yurchenko 
wrote:

> Thank you for your feedback Ryanne!
> These are all surely valid concerns and PATCH isn't really necessary or
> suitable for normal production configuration management. However, there are
> cases where quick patching of the configuration is useful, such as hot
> fixes of production or in development.
>
> Overall, the change itself is really tiny and if the cost-benefit balance
> is positive, I'd still like to drive it further.
>
> Ivan
>
> On Wed, 26 Jun 2019 at 17:45, Ryanne Dolan  wrote:
>
> > Ivan, I looked at adding PATCH a while ago as well. I decided not to
> pursue
> > the idea for a few reasons:
> >
> > 1) PATCH is still racy. For example, if you want to add a topic to the
> > "topics" property, you still need to read, modify, and write the existing
> > value. To handle this, you'd need to support atomic sub-document
> > operations, which I don't see happening.
> >
> > 2) A common pattern is to store your configurations in git or something,
> > and then apply them via PUT. Throw in some triggers or jenkins etc, and
> you
> > have a more robust solution than PATCH provides.
> >
> > 3) For properties that change a lot, it's possible to use an out-of-band
> > data source, e.g. Kafka or Zookeeper, and then have your Connector
> > subscribe to changes. I've done something like this to enable dynamic
> > reconfiguration of Connectors from command-line tools and dashboards
> > without involving the Connect REST API at all. Moreover, I've done so in
> an
> > atomic, non-racy way.
> >
> > So I don't think PATCH is strictly necessary nor sufficient for atomic
> > partial updates. That said, it doesn't hurt and I'm happy to support the
> > KIP.
> >
> > Ryanne
> >
> > On Tue, Jun 25, 2019 at 12:15 PM Ivan Yurchenko <
> ivan0yurche...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Since Kafka 2.3 has just been release and more people may have time to
> > look
> > > at this now, I'd like to bump this discussion.
> > > Thanks.
> > >
> > > Ivan
> > >
> > >
> > > On Thu, 13 Jun 2019 at 17:20, Ivan Yurchenko  >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > I'd like to start the discussion of KIP-477: Add PATCH method for
> > > > connector config in Connect REST API.
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-477%3A+Add+PATCH+method+for+connector+config+in+Connect+REST+API
> > > >
> > > > There is also a draft PR: https://github.com/apache/kafka/pull/6934.
> > > >
> > > > Thank you.
> > > >
> > > > Ivan
> > > >
> > >
> >
>


[jira] [Created] (KAFKA-13067) KafkaMetadataLog should not use segment size smaller than KafkaRaftClient max batch size

2021-07-12 Thread David Arthur (Jira)
David Arthur created KAFKA-13067:


 Summary: KafkaMetadataLog should not use segment size smaller than 
KafkaRaftClient max batch size
 Key: KAFKA-13067
 URL: https://issues.apache.org/jira/browse/KAFKA-13067
 Project: Kafka
  Issue Type: Bug
Reporter: David Arthur






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-12 Thread Rohan Desai
https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams


[jira] [Resolved] (KAFKA-13003) KafkaBroker advertises socket port instead of the configured advertised port

2021-07-12 Thread Uwe Eisele (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Eisele resolved KAFKA-13003.

Resolution: Fixed

Pull Request #10935 has been merged.

> KafkaBroker advertises socket port instead of the configured advertised port
> 
>
> Key: KAFKA-13003
> URL: https://issues.apache.org/jira/browse/KAFKA-13003
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.8.0
>Reporter: Uwe Eisele
>Assignee: Uwe Eisele
>Priority: Critical
>  Labels: kip-500
> Fix For: 3.0.0
>
>
> In Kraft mode Apache Kafka 2.8.0 does advertise the socket port instead of 
> the configured advertised port.
> A broker given with the following configuration
> {code:java}
> listeners=PUBLIC://0.0.0.0:19092,REPLICATION://0.0.0.0:9091
> advertised.listeners=PUBLIC://envoy-kafka-broker:9091,REPLICATION://kafka-broker1:9091
> {code}
> advertises on the _PUBLIC_ listener _envoy-kafka-broker:19092_, however I 
> would expect that _envoy-kafka-broker:9091_ is advertised. In ZooKeeper mode 
> it works as expected.
> In a deployment with a L4 proxy in front of the Kafka cluster, it is 
> important, that the advertised port can be different from the actual socket 
> port.
> I tested it with a Docker-Compose setup which runs 3 Kafka Broker in Kraft 
> mode and an Envoy proxy in front of them. With Apache Kafka 2.8.0 it does not 
> work, because Kafka does not advertise the configured advertised port. For 
> more details see: 
> https://github.com/ueisele/kafka/tree/fix/kraft-advertisedlisteners-build/proxy-examples/proxyl4-kafkakraft-bug-2.8
> _Client -- 909[1-3] --> Envoy Proxy -- 19092 --> Kafka Broker [1-3]_
> || Envoy Host || Envoy Port || Kafka Broker || Kafka Port || Advertised 
> Listener ||
> | envoy-kafka-broker | 9091 | kafka-broker1 | 19092 | envoy-kafka-broker:9091 
> |
> | envoy-kafka-broker | 9092 | kafka-broker2 | 19092 | envoy-kafka-broker:9092 
> |
> | envoy-kafka-broker | 9093 | kafka-broker3 | 19092 | envoy-kafka-broker:9093 
> |
> {code:bash}
> > docker-compose exec kafkacat kafkacat -b envoy-kafka-broker:9091 -L
> Metadata for all topics (from broker -1: envoy-kafka-broker:9091/bootstrap):
>  3 brokers:
>   broker 101 at envoy-kafka-broker:19092
>   broker 102 at envoy-kafka-broker:19092 (controller)
>   broker 103 at envoy-kafka-broker:19092
>  0 topics:
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

2021-07-12 Thread Sumant Tambe
Hi Israel,

Linkedin is interested in evaluating KIP-405 for HDFS and S3 in the short
term and Azure Blob Storage in the long run. You may already know that Linkedin
is migrating to Azure
. We think
that Blobs will provide us with the optimal cost/availability/operability
trade-offs for Kafka in Azure.
What's the context behind your interest in KIP-405 and Azure Blobs? Do you
have any data/experience of using Azure blobs at scale?

Satish, any word on the KIP-405 and the RSM implementations?

Regards,
Sumant
Kafka Dev, Linkedin

On Mon, 15 Mar 2021 at 09:08, Satish Duggana 
wrote:

> Hi Israel,
> Thanks for your interest in tiered storage. As mentioned by Jun earlier, we
> decided not to have any implementations in Apache Kafka repo like Kafka
> connectors. We plan to have RSM implementations for HDFS, S3, GCP, and
> Azure storages in a separate repo. We will let you know once they are ready
> for review.
>
> Best,
> Satish.
>
> On Sat, 13 Mar 2021 at 01:27, Israel Ekpo  wrote:
>
> > Thanks @Jun for the prompt response.
> >
> > That's ok and I think it is a great strategy just like the Connect
> > ecosystem.
> >
> > However, I am still in search for repos and samples that demonstrate
> > implementation for the KIP.
> >
> > I will keep searching but was just wondering if there were sample
> > implementations for S3 or HDFS I could take a look at.
> >
> > Thanks.
> >
> > On Fri, Mar 12, 2021 at 2:19 PM Jun Rao 
> wrote:
> >
> > > Hi, Israel,
> > >
> > > Thanks for your interest. As part of KIP-405, we have made the decision
> > not
> > > to host any plugins for external remote storage directly in Apache
> Kafka.
> > > Those plugins could be hosted outside of Apache Kafka.
> > >
> > > Jun
> > >
> > > On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo 
> > wrote:
> > >
> > > > Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP and
> > > thanks
> > > > to everyone that participated in the review and discussion to take it
> > to
> > > > where it is today.
> > > >
> > > > I would like to contribute by working on integrating Azure Storage
> > (Blob
> > > > and ADLS) with Tiered Storage for this KIP
> > > >
> > > > I have created this issue to track this work
> > > > https://issues.apache.org/jira/browse/KAFKA-12458
> > > >
> > > > Are there any sample implementations for HDFS/S3 that I can reference
> > to
> > > > get started?
> > > >
> > > > When you have a moment, please share.
> > > >
> > > > Thanks.
> > > >
> > >
> >
>


Re: [DISCUSS] Apache Kafka 3.0.0 release plan with new updated dates

2021-07-12 Thread Konstantine Karantasis
Thanks for the update Levani,

KIP-708 is now on the list of postponed KIPs.

Konstantine

On Thu, Jul 1, 2021 at 10:48 PM Levani Kokhreidze 
wrote:

> Hi Konstantine,
>
> FYI, I don’t think we will be able to have KIP-708 ready on time.
> Feel free to remove it from the release plan.
>
> Best,
> Levani
>
> > On 1. Jul 2021, at 01:27, Konstantine Karantasis <
> konstant...@confluent.io.INVALID> wrote:
> >
> > Hi all,
> >
> > Today we have reached the Feature Freeze milestone for Apache Kafka 3.0.
> > Exciting!
> >
> > I'm going to allow for any pending changes to settle within the next
> couple
> > of days.
> > I trust that we all approve and merge adopted features and changes which
> we
> > consider to be in good shape for 3.0.
> >
> > Given the 4th of July holiday in the US, the 3.0 release branch will
> appear
> > sometime on Tuesday, July 6th.
> > Until then, please keep merging to trunk only the changes you intend to
> > include in Apache Kafka 3.0.
> >
> > Regards,
> > Konstantine
> >
> >
> > On Wed, Jun 30, 2021 at 3:25 PM Konstantine Karantasis <
> > konstant...@confluent.io> wrote:
> >
> >>
> >> Done. Thanks Luke!
> >>
> >> On Tue, Jun 29, 2021 at 6:39 PM Luke Chen  wrote:
> >>
> >>> Hi Konstantine,
> >>> We've decided that the KIP-726 will be released in V3.1, not V3.0.
> >>> KIP-726: Make the "cooperative-sticky, range" as the default assignor
> >>>
> >>> Could you please remove this KIP from the 3.0 release plan wiki page?
> >>>
> >>> Thank you.
> >>> Luke
> >>>
> >>> On Wed, Jun 30, 2021 at 8:23 AM Konstantine Karantasis
> >>>  wrote:
> >>>
>  Thanks for the update Colin.
>  They are now both in the release plan.
> 
>  Best,
>  Konstantine
> 
>  On Tue, Jun 29, 2021 at 2:55 PM Colin McCabe 
> >>> wrote:
> 
> > Hi Konstantine,
> >
> > Can you please add two KIPs to the 3.0 release plan wiki page?
> >
> > I'm thinking of:
> >KIP-630: Kafka Raft Snapshots
> >KIP-746: Revise KRaft Metadata Records
> >
> > These are marked as 3.0 on the KIP page but I guess we don't have
> >>> them on
> > the page yet.
> >
> > Many thanks.
> > Colin
> >
> >
> > On Tue, Jun 22, 2021, at 06:29, Josep Prat wrote:
> >> Hi there,
> >>
> >> As the feature freeze date is approaching, I just wanted to kindly
> >>> ask
> > for
> >> some reviews on the already submitted PR (
> >> https://github.com/apache/kafka/pull/10840) that implements the
>  approved
> >> KIP-744 (https://cwiki.apache.org/confluence/x/XIrOCg). The PR has
>  been
> >> ready for review for 2 weeks, and I simply want to make sure there
> >>> is
> >> enough time to address any possible changes that might be requested.
> >>
> >> Thanks in advance and sorry for any inconvenience caused,
> >> --
> >> Josep
> >> On Mon, Jun 21, 2021 at 11:54 PM Konstantine Karantasis
> >>  wrote:
> >>
> >>> Thanks for the update Bruno.
> >>> I've moved KIP-698 to the list of postponed KIPs in the plan.
> >>>
> >>> Konstantine
> >>>
> >>> On Mon, Jun 21, 2021 at 2:30 AM Bruno Cadonna  
> > wrote:
> >>>
>  Hi Konstantine,
> 
>  The implementation of
> 
>  KIP-698: Add Explicit User Initialization of Broker-side State
> >>> to
> > Kafka
>  Streams
> 
>  will not be ready for 3.0, so you can remove it from the list.
> 
>  Best,
>  Bruno
> 
>  On 15.06.21 07:33, Konstantine Karantasis wrote:
> > Done. Moved it into the table of Adopted KIPs targeting 3.0.0
> >>> and
> > to
> >>> the
> > release plan of course.
> > Thanks for catching this Israel.
> >
> > Best,
> > Konstantine
> >
> > On Mon, Jun 14, 2021 at 7:40 PM Israel Ekpo <
>  israele...@gmail.com>
>  wrote:
> >
> >> Konstantine,
> >>
> >> One of mine is missing from this list
> >>
> >> KIP-633: Drop 24 hour default of grace period in Streams
> >> Please could you include it?
> >>
> >> Voting has already concluded a long time ago
> >>
> >>
> >>
> >> On Mon, Jun 14, 2021 at 6:08 PM Konstantine Karantasis
> >>  wrote:
> >>
> >>> Hi all.
> >>>
> >>> KIP Freeze for the next major release of Apache Kafka was
>  reached
> >>> last
> >>> week.
> >>>
> >>> As of now, 36 KIPs have concluded their voting process and
> >>> have
> > been
> >>> adopted.
> >>> These KIPs are targeting 3.0 (unless it's noted otherwise in
>  the
>  release
> >>> plan) and their inclusion as new features will be finalized
>  right
> >>> after
> >>> Feature Freeze.
> >>>
> >>> At the high level, out of these 36 KIPs, 11 hav

[jira] [Reopened] (KAFKA-7760) Add broker configuration to set minimum value for segment.bytes and segment.ms

2021-07-12 Thread Badai Aqrandista (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Badai Aqrandista reopened KAFKA-7760:
-

Reopening issue and making this the main ticket for KIP-760

> Add broker configuration to set minimum value for segment.bytes and segment.ms
> --
>
> Key: KAFKA-7760
> URL: https://issues.apache.org/jira/browse/KAFKA-7760
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Badai Aqrandista
>Assignee: Badai Aqrandista
>Priority: Major
>  Labels: kip, newbie
>
> If someone set segment.bytes or segment.ms at topic level to a very small 
> value (e.g. segment.bytes=1000 or segment.ms=1000), Kafka will generate a 
> very high number of segment files. This can bring down the whole broker due 
> to hitting the maximum open file (for log) or maximum number of mmap-ed file 
> (for index).
> To prevent that from happening, I would like to suggest adding two new items 
> to the broker configuration:
>  * min.topic.segment.bytes, defaults to 1048576: The minimum value for 
> segment.bytes. When someone sets topic configuration segment.bytes to a value 
> lower than this, Kafka throws an error INVALID VALUE.
>  * min.topic.segment.ms, defaults to 360: The minimum value for 
> segment.ms. When someone sets topic configuration segment.ms to a value lower 
> than this, Kafka throws an error INVALID VALUE.
> Thanks
> Badai



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-13040) Increase minimum value of segment.ms and segment.bytes

2021-07-12 Thread Badai Aqrandista (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Badai Aqrandista resolved KAFKA-13040.
--
Resolution: Duplicate

> Increase minimum value of segment.ms and segment.bytes
> --
>
> Key: KAFKA-13040
> URL: https://issues.apache.org/jira/browse/KAFKA-13040
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Badai Aqrandista
>Assignee: Badai Aqrandista
>Priority: Minor
>
> Raised for KIP-760 (linked).
> Many times, Kafka brokers in production crash with "Too many open files" 
> error or "Out of memory" errors because some Kafka topics have a lot of 
> segment files as a result of small {{segment.ms}} or {{segment.bytes}}. These 
> two configuration can be set by any user who is authorized to create topic or 
> modify topic configuration.
> To prevent these two configuration from causing Kafka broker crash, they 
> should have a minimum value that is big enough.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Apache Kafka 3.0.0 release plan with new updated dates

2021-07-12 Thread Konstantine Karantasis
Hi all,

This is a reminder that Code Freeze for Apache Kafka 3.0 is coming up this
week and is set to take place by the end of day Wednesday, July 14th.

Currently in the project we have 22 blocker issues for 3.0, out of 41 total
tickets targeting 3.0.

You may find the list of open issues in the release plan page:
https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.0.0

Thanks for all the hard work so far and for reducing the number of open
issues in the recent days.
Please take another look and help us resolve all the blockers for this
upcoming major release.

Best,
Konstantine

On Mon, Jul 12, 2021 at 1:57 PM Konstantine Karantasis <
konstant...@confluent.io> wrote:

>
> Thanks for the update Levani,
>
> KIP-708 is now on the list of postponed KIPs.
>
> Konstantine
>
> On Thu, Jul 1, 2021 at 10:48 PM Levani Kokhreidze 
> wrote:
>
>> Hi Konstantine,
>>
>> FYI, I don’t think we will be able to have KIP-708 ready on time.
>> Feel free to remove it from the release plan.
>>
>> Best,
>> Levani
>>
>> > On 1. Jul 2021, at 01:27, Konstantine Karantasis <
>> konstant...@confluent.io.INVALID> wrote:
>> >
>> > Hi all,
>> >
>> > Today we have reached the Feature Freeze milestone for Apache Kafka 3.0.
>> > Exciting!
>> >
>> > I'm going to allow for any pending changes to settle within the next
>> couple
>> > of days.
>> > I trust that we all approve and merge adopted features and changes
>> which we
>> > consider to be in good shape for 3.0.
>> >
>> > Given the 4th of July holiday in the US, the 3.0 release branch will
>> appear
>> > sometime on Tuesday, July 6th.
>> > Until then, please keep merging to trunk only the changes you intend to
>> > include in Apache Kafka 3.0.
>> >
>> > Regards,
>> > Konstantine
>> >
>> >
>> > On Wed, Jun 30, 2021 at 3:25 PM Konstantine Karantasis <
>> > konstant...@confluent.io> wrote:
>> >
>> >>
>> >> Done. Thanks Luke!
>> >>
>> >> On Tue, Jun 29, 2021 at 6:39 PM Luke Chen  wrote:
>> >>
>> >>> Hi Konstantine,
>> >>> We've decided that the KIP-726 will be released in V3.1, not V3.0.
>> >>> KIP-726: Make the "cooperative-sticky, range" as the default assignor
>> >>>
>> >>> Could you please remove this KIP from the 3.0 release plan wiki page?
>> >>>
>> >>> Thank you.
>> >>> Luke
>> >>>
>> >>> On Wed, Jun 30, 2021 at 8:23 AM Konstantine Karantasis
>> >>>  wrote:
>> >>>
>>  Thanks for the update Colin.
>>  They are now both in the release plan.
>> 
>>  Best,
>>  Konstantine
>> 
>>  On Tue, Jun 29, 2021 at 2:55 PM Colin McCabe 
>> >>> wrote:
>> 
>> > Hi Konstantine,
>> >
>> > Can you please add two KIPs to the 3.0 release plan wiki page?
>> >
>> > I'm thinking of:
>> >KIP-630: Kafka Raft Snapshots
>> >KIP-746: Revise KRaft Metadata Records
>> >
>> > These are marked as 3.0 on the KIP page but I guess we don't have
>> >>> them on
>> > the page yet.
>> >
>> > Many thanks.
>> > Colin
>> >
>> >
>> > On Tue, Jun 22, 2021, at 06:29, Josep Prat wrote:
>> >> Hi there,
>> >>
>> >> As the feature freeze date is approaching, I just wanted to kindly
>> >>> ask
>> > for
>> >> some reviews on the already submitted PR (
>> >> https://github.com/apache/kafka/pull/10840) that implements the
>>  approved
>> >> KIP-744 (https://cwiki.apache.org/confluence/x/XIrOCg). The PR has
>>  been
>> >> ready for review for 2 weeks, and I simply want to make sure there
>> >>> is
>> >> enough time to address any possible changes that might be
>> requested.
>> >>
>> >> Thanks in advance and sorry for any inconvenience caused,
>> >> --
>> >> Josep
>> >> On Mon, Jun 21, 2021 at 11:54 PM Konstantine Karantasis
>> >>  wrote:
>> >>
>> >>> Thanks for the update Bruno.
>> >>> I've moved KIP-698 to the list of postponed KIPs in the plan.
>> >>>
>> >>> Konstantine
>> >>>
>> >>> On Mon, Jun 21, 2021 at 2:30 AM Bruno Cadonna > 
>> > wrote:
>> >>>
>>  Hi Konstantine,
>> 
>>  The implementation of
>> 
>>  KIP-698: Add Explicit User Initialization of Broker-side State
>> >>> to
>> > Kafka
>>  Streams
>> 
>>  will not be ready for 3.0, so you can remove it from the list.
>> 
>>  Best,
>>  Bruno
>> 
>>  On 15.06.21 07:33, Konstantine Karantasis wrote:
>> > Done. Moved it into the table of Adopted KIPs targeting 3.0.0
>> >>> and
>> > to
>> >>> the
>> > release plan of course.
>> > Thanks for catching this Israel.
>> >
>> > Best,
>> > Konstantine
>> >
>> > On Mon, Jun 14, 2021 at 7:40 PM Israel Ekpo <
>>  israele...@gmail.com>
>>  wrote:
>> >
>> >> Konstantine,
>> >>
>> >> One of mine is missing from this list
>> >>
>> >> KIP-633: Drop 24 hour default of grace period in Streams
>> >>

Re: [DISCUSS] KIP-655: Windowed "Distinct" Operation for KStream

2021-07-12 Thread John Roesler
Hi all,

Bruno raised some very good points. I’d like to chime in with additional 
context. 

1. Great point. We faced a similar problem defining KIP-557. For 557, we chose 
to use the serialized byte array instead of the equals() method, but I think 
the situation in KIP-655 is a bit different. I think it might make sense to use 
the equals() method here, but am curious what Ivan thinks.

2. I figured we'd do nothing. I thought Ivan was just saying that it doesn't 
make a ton of sense to use it, which I agree with, but it doesn't seem like 
that means we should prohibit it.

3. FWIW, I don't have a strong feeling either way.

Thanks,
-John

On Mon, Jul 12, 2021, at 09:14, Bruno Cadonna wrote:
> Hi Ivan,
> 
> Thank you for the KIP!
> 
> Some aspects are not clear to me from the KIP and I have a proposal.
> 
> 1. The KIP does not describe the criteria that define a duplicate. Could 
> you add a definition of duplicate to the KIP?
> 
> 2. The KIP does not describe what happens if distinct() is applied on a 
> hopping window. On the DSL level, I do not see how you can avoid that 
> users apply distinct() on a hopping window, i.e., you cannot avoid it at 
> compile time, you need to check it at runtime and throw an exception. Is 
> this correct or am I missing something?
> 
> 3. I would also like to back a proposal by Sophie. She proposed to use 
> deduplicate() instead of distinct(), since the other DSL operations are 
> also verbs. I do not think that SQL and the Java Stream API are good 
> arguments to not use a verb.
> 
> Best,
> Bruno
> 
> 
> On 10.07.21 19:11, John Roesler wrote:
> > Hi Ivan,
> > 
> > Sorry for the silence!
> > 
> > I have just re-read the proposal.
> > 
> > To summarize, you are now only proposing the zero-arg distict() method to 
> > be added to TimeWindowedKStream and SessionWindowedKStream, right?
> > 
> > I’m in favor of this proposal.
> > 
> > Thanks,
> > John
> > 
> > On Sat, Jul 10, 2021, at 10:18, Ivan Ponomarev wrote:
> >> Hello everyone,
> >>
> >> I would like to remind you about KIP-655 and KIP-759 just in case they
> >> got lost in your inbox.
> >>
> >> Now the initial proposal is split into two independent and smaller ones,
> >> so it must be easier to review them. Of course, if you have time.
> >>
> >> Regards,
> >>
> >> Ivan
> >>
> >>
> >> 24.06.2021 18:11, Ivan Ponomarev пишет:
> >>> Hello all,
> >>>
> >>> I have rewritten the KIP-655 summarizing what was agreed upon during
> >>> this discussion (now the proposal is much simpler and less invasive).
> >>>
> >>> I have also created KIP-759 (cancelRepartition operation) and started a
> >>> discussion for it.
> >>>
> >>> Regards,
> >>>
> >>> Ivan.
> >>>
> >>>
> >>>
> >>> 04.06.2021 8:15, Matthias J. Sax пишет:
>  Just skimmed over the thread -- first of all, I am glad that we could
>  merge KIP-418 and ship it :)
> 
>  About the re-partitioning concerns, there are already two tickets for it:
> 
>     - https://issues.apache.org/jira/browse/KAFKA-4835
>     - https://issues.apache.org/jira/browse/KAFKA-10844
> 
>  Thus, it seems best to exclude this topic from this KIP, and do a
>  separate KIP for it (if necessary, we can "pause" this KIP until the
>  repartition KIP is done). It's a long standing "issue" and we should
>  resolve it in a general way I guess.
> 
>  (Did not yet ready all responses in detail yet, so keeping this comment
>  short.)
> 
> 
>  -Matthias
> 
>  On 6/2/21 6:35 AM, John Roesler wrote:
> > Thanks, Ivan!
> >
> > That sounds like a great plan to me. Two smaller KIPs are easier to
> > agree on than one big one.
> >
> > I agree hopping and sliding windows will actually have a duplicating
> > effect. We can avoid adding distinct() to the sliding window
> > interface, but hopping windows are just a different parameterization
> > of epoch-aligned windows. It seems we can’t do much about that except
> > document the issue.
> >
> > Thanks,
> > John
> >
> > On Wed, May 26, 2021, at 10:14, Ivan Ponomarev wrote:
> >> Hi John!
> >>
> >> I think that your proposal is just fantastic, it simplifies things a
> >> lot!
> >>
> >> I also felt uncomfortable due to the fact that the proposed
> >> `distinct()`
> >> is not somewhere near `count()` and `reduce(..)`. But
> >> `selectKey(..).groupByKey().windowedBy(..).distinct()` didn't look like
> >> a correct option for  me because of the issue with the unneeded
> >> repartitioning.
> >>
> >> The bold idea that we can just CANCEL the repartitioning didn't came to
> >> my mind.
> >>
> >> What seemed to me a single problem is in fact two unrelated problems:
> >> `distinct` operation and cancelling the unneeded repartitioning.
> >>
> >>    > what if we introduce a parameter to `selectKey()` that specifies
> >> that
> >> the caller asserts that the new key doe

[jira] [Created] (KAFKA-13068) Rename Log to UnifiedLog

2021-07-12 Thread Kowshik Prakasam (Jira)
Kowshik Prakasam created KAFKA-13068:


 Summary: Rename Log to UnifiedLog
 Key: KAFKA-13068
 URL: https://issues.apache.org/jira/browse/KAFKA-13068
 Project: Kafka
  Issue Type: Sub-task
Reporter: Kowshik Prakasam
Assignee: Kowshik Prakasam


Once KAFKA-12554 is completed, we can rename Log -> UnifiedLog as described in 
the doc:  
[https://docs.google.com/document/d/1dQJL4MCwqQJSPmZkVmVzshFZKuFy_bCPtubav4wBfHQ/edit#].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-12360) Improve documentation of max.task.idle.ms (kafka-streams)

2021-07-12 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler resolved KAFKA-12360.
--
Resolution: Fixed

> Improve documentation of max.task.idle.ms (kafka-streams)
> -
>
> Key: KAFKA-12360
> URL: https://issues.apache.org/jira/browse/KAFKA-12360
> Project: Kafka
>  Issue Type: Sub-task
>  Components: docs, streams
>Reporter: Domenico Delle Side
>Assignee: John Roesler
>Priority: Minor
>  Labels: beginner, newbie, trivial
>
> _max.task.idle.ms_ is an handy way to pause processing in a *_kafka-streams_* 
> application. This is very useful when you need to join two topics that are 
> out of sync, i.e when data in a topic may be produced _before_ you receive 
> join information in the other topic.
> In the documentation, however, it is not specified that the value of 
> _max.task.idle.ms_ *must* be lower than _max.poll.intervall.ms_, otherwise 
> you'll incur into an endless rebalancing problem.
> I think it is better to clearly state this in the documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10091) Improve task idling

2021-07-12 Thread John Roesler (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler resolved KAFKA-10091.
--
Resolution: Fixed

> Improve task idling
> ---
>
> Key: KAFKA-10091
> URL: https://issues.apache.org/jira/browse/KAFKA-10091
> Project: Kafka
>  Issue Type: Task
>  Components: streams
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Blocker
>  Labels: needs-kip
> Fix For: 3.0.0
>
>
> When Streams is processing a task with multiple inputs, each time it is ready 
> to process a record, it has to choose which input to process next. It always 
> takes from the input for which the next record has the least timestamp. The 
> result of this is that Streams processes data in timestamp order. However, if 
> the buffer for one of the inputs is empty, Streams doesn't know what 
> timestamp the next record for that input will be.
> Streams introduced a configuration "max.task.idle.ms" in KIP-353 to address 
> this issue.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization]
> The config allows Streams to wait some amount of time for data to arrive on 
> the empty input, so that it can make a timestamp-ordered decision about which 
> input to pull from next.
> However, this config can be hard to use reliably and efficiently, since what 
> we're really waiting for is the next poll that _would_ return data from the 
> empty input's partition, and this guarantee is a function of the poll 
> interval, the max poll interval, and the internal logic that governs when 
> Streams will poll again.
> The ideal case is you'd be able to guarantee at a minimum that _any_ amount 
> of idling would guarantee you poll data from the empty partition if there's 
> data to fetch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13069) Add magic number to DefaultKafkaPrincipalBuilder.KafkaPrincipalSerde

2021-07-12 Thread Ron Dagostino (Jira)
Ron Dagostino created KAFKA-13069:
-

 Summary: Add magic number to 
DefaultKafkaPrincipalBuilder.KafkaPrincipalSerde
 Key: KAFKA-13069
 URL: https://issues.apache.org/jira/browse/KAFKA-13069
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.8.0, 3.0.0
Reporter: Ron Dagostino
Assignee: Ron Dagostino
 Fix For: 3.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13070) LogManager shutdown races with periodic work scheduled by the instance

2021-07-12 Thread Kowshik Prakasam (Jira)
Kowshik Prakasam created KAFKA-13070:


 Summary: LogManager shutdown races with periodic work scheduled by 
the instance
 Key: KAFKA-13070
 URL: https://issues.apache.org/jira/browse/KAFKA-13070
 Project: Kafka
  Issue Type: Bug
Reporter: Kowshik Prakasam


In the LogManager shutdown sequence (in LogManager.shutdown()), we don't cancel 
the periodic work scheduled by it prior to shutdown. As a result, the periodic 
work could race with the shutdown sequence causing some unwanted side effects. 
This is reproducible by a unit test in LogManagerTest.

 

```

// set val maxLogAgeMs = 6 in the test

@Test
 def testRetentionPeriodicWorkAfterShutdown(): Unit = {
    val log = logManager.getOrCreateLog(new TopicPartition(name, 0), topicId = 
None)
     val logFile = new File(logDir, name + "-0")
     assertTrue(logFile.exists)

    log.appendAsLeader(TestUtils.singletonRecords("test1".getBytes()), 
leaderEpoch = 0)
     log.updateHighWatermark(log.logEndOffset)

    logManager.shutdown()

    assertTrue(Files.exists(new File(logDir, 
LogLoader.CleanShutdownFile).toPath))

    time.sleep(maxLogAgeMs + logManager.InitialTaskDelayMs +           
logManager.retentionCheckMs + 1)

    logManager = null
 }

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [kafka-site] ableegoldman merged pull request #361: KAFKA-12993: fix memory-mgmt.html formatting

2021-07-12 Thread GitBox


ableegoldman merged pull request #361:
URL: https://github.com/apache/kafka-site/pull/361


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (KAFKA-13071) Deprecate and remove --authorizer option in kafka-acls.sh

2021-07-12 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13071:
---

 Summary: Deprecate and remove --authorizer option in kafka-acls.sh
 Key: KAFKA-13071
 URL: https://issues.apache.org/jira/browse/KAFKA-13071
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson


Now that we have all of the ACL APIs implemented through the admin client, we 
should consider deprecating and removing support for the --authorizer flag in 
kafka-acls.sh.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13072) refactor RemoveMembersFromConsumerGroupHandler

2021-07-12 Thread Luke Chen (Jira)
Luke Chen created KAFKA-13072:
-

 Summary: refactor RemoveMembersFromConsumerGroupHandler
 Key: KAFKA-13072
 URL: https://issues.apache.org/jira/browse/KAFKA-13072
 Project: Kafka
  Issue Type: Sub-task
Reporter: Luke Chen
Assignee: Luke Chen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #311

2021-07-12 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-405 + KAFKA-7739 - Implementation of Tiered Storage Integration with Azure Storage

2021-07-12 Thread Satish Duggana
Hi Sumant,
We are tracking HDFS and S3 implementations as part of this project to
see the proposed APIs are addressing file/object stores. As Jun
mentioned in the earlier mail thread,  no RSM implementation will be
part of Apache Kafka. This is in line with Kafka Connectors
development. Devs/Users can build their own implementations and host
them to share with others.

We have HDFS/S3 RSMs which can be treated as reference
implementations. These can be referenced to build your own
implementations if needed.

Thanks,
Satish.


On Tue, 13 Jul 2021 at 02:25, Sumant Tambe  wrote:
>
> Hi Israel,
>
> Linkedin is interested in evaluating KIP-405 for HDFS and S3 in the short
> term and Azure Blob Storage in the long run. You may already know that 
> Linkedin
> is migrating to Azure
> . We think
> that Blobs will provide us with the optimal cost/availability/operability
> trade-offs for Kafka in Azure.
> What's the context behind your interest in KIP-405 and Azure Blobs? Do you
> have any data/experience of using Azure blobs at scale?
>
> Satish, any word on the KIP-405 and the RSM implementations?
>
> Regards,
> Sumant
> Kafka Dev, Linkedin
>
> On Mon, 15 Mar 2021 at 09:08, Satish Duggana 
> wrote:
>
> > Hi Israel,
> > Thanks for your interest in tiered storage. As mentioned by Jun earlier, we
> > decided not to have any implementations in Apache Kafka repo like Kafka
> > connectors. We plan to have RSM implementations for HDFS, S3, GCP, and
> > Azure storages in a separate repo. We will let you know once they are ready
> > for review.
> >
> > Best,
> > Satish.
> >
> > On Sat, 13 Mar 2021 at 01:27, Israel Ekpo  wrote:
> >
> > > Thanks @Jun for the prompt response.
> > >
> > > That's ok and I think it is a great strategy just like the Connect
> > > ecosystem.
> > >
> > > However, I am still in search for repos and samples that demonstrate
> > > implementation for the KIP.
> > >
> > > I will keep searching but was just wondering if there were sample
> > > implementations for S3 or HDFS I could take a look at.
> > >
> > > Thanks.
> > >
> > > On Fri, Mar 12, 2021 at 2:19 PM Jun Rao 
> > wrote:
> > >
> > > > Hi, Israel,
> > > >
> > > > Thanks for your interest. As part of KIP-405, we have made the decision
> > > not
> > > > to host any plugins for external remote storage directly in Apache
> > Kafka.
> > > > Those plugins could be hosted outside of Apache Kafka.
> > > >
> > > > Jun
> > > >
> > > > On Thu, Mar 11, 2021 at 5:15 PM Israel Ekpo 
> > > wrote:
> > > >
> > > > > Thanks Satish, Sriharsha, Suresh and Ying for authoring this KIP and
> > > > thanks
> > > > > to everyone that participated in the review and discussion to take it
> > > to
> > > > > where it is today.
> > > > >
> > > > > I would like to contribute by working on integrating Azure Storage
> > > (Blob
> > > > > and ADLS) with Tiered Storage for this KIP
> > > > >
> > > > > I have created this issue to track this work
> > > > > https://issues.apache.org/jira/browse/KAFKA-12458
> > > > >
> > > > > Are there any sample implementations for HDFS/S3 that I can reference
> > > to
> > > > > get started?
> > > > >
> > > > > When you have a moment, please share.
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > >
> >


[jira] [Created] (KAFKA-13073) Simulation test fails due to inconsistency in MockLog's implementation

2021-07-12 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13073:
--

 Summary: Simulation test fails due to inconsistency in MockLog's 
implementation
 Key: KAFKA-13073
 URL: https://issues.apache.org/jira/browse/KAFKA-13073
 Project: Kafka
  Issue Type: Bug
  Components: controller, replication
Affects Versions: 3.0.0
Reporter: Jose Armando Garcia Sancio
Assignee: Jose Armando Garcia Sancio
 Fix For: 3.0.0


We are getting the following error on trunk
{code:java}
RaftEventSimulationTest > canRecoverAfterAllNodesKilled STANDARD_OUT
timestamp = 2021-07-12T16:26:55.663, 
RaftEventSimulationTest:canRecoverAfterAllNodesKilled =
  java.lang.RuntimeException:
Uncaught exception during poll of node 1
  |---jqwik---
tries = 25| # of calls to property
checks = 25   | # of not rejected calls
generation = RANDOMIZED   | parameters are randomly generated
after-failure = PREVIOUS_SEED | use the previous seed
when-fixed-seed = ALLOW   | fixing the random seed is allowed
edge-cases#mode = MIXIN   | edge cases are mixed in
edge-cases#total = 108| # of all combined edge cases
edge-cases#tried = 4  | # of edge cases tried in current run
seed = 8079861963960994566| random seed to reproduce generated values   
 Sample
--
  arg0: 4002
  arg1: 2
  arg2: 4{code}
I think there are a couple of issues here:
 # The {{ListenerContext}} for {{KafkaRaftClient}} uses the value returned by 
{{ReplicatedLog::startOffset()}} to determined the log start and when to load a 
snapshot while the {{MockLog}} implementation uses {{logStartOffset}} which 
could be a different value.
 # {{MockLog}} doesn't implement {{ReplicatedLog::maybeClean}} so the log start 
offset is always 0.
 # The snapshot id validation for {{MockLog}} and {{KafkaMetadataLog}}'s 
{{createNewSnapshot}} throws an exception when the snapshot id is less than the 
log start offset.

Solutions:

Fix the error quoted above we only need to fix bullet point 3. but I think we 
should fix all of the issues enumerated in this Jira.

For 1. we should change the {{MockLog}} implementation so that it uses 
{{startOffset}} both externally and internally.

For 2. I will file another issue to track this implementation.

For 3. I think this validation is too strict. I think it is safe to simply 
ignore any attempt by the state machine to create an snapshot with an id less 
that the log start offset. We should return a {{Optional.empty()}}when the 
snapshot id is less than the log start offset. This tells the user that it 
doesn't need to generate a snapshot for that offset. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13074) Implement mayClean for MockLog

2021-07-12 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13074:
--

 Summary: Implement mayClean for MockLog
 Key: KAFKA-13074
 URL: https://issues.apache.org/jira/browse/KAFKA-13074
 Project: Kafka
  Issue Type: Bug
Reporter: Jose Armando Garcia Sancio


The current implement of MockLog doesn't implement maybeClean. It is expected 
that MockLog has the same semantic as KafkaMetadataLog. This is assumed to be 
true for a few of the tests suite like the raft simulation and the kafka raft 
client test context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13075) Consolidate RocksDBStore and RocksDBKeyValueStoreTest

2021-07-12 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-13075:
--

 Summary: Consolidate RocksDBStore and RocksDBKeyValueStoreTest
 Key: KAFKA-13075
 URL: https://issues.apache.org/jira/browse/KAFKA-13075
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: A. Sophie Blee-Goldman


Looks like we have two different test classes covering pretty much the same 
thing: RocksDBStore. It seems like RocksDBKeyValueStoreTest was the original 
test class for RocksDBStore, but someone later added RocksDBStoreTest, most 
likely because they didn't notice the RocksDBKeyValueStoreTest which didn't 
follow the usual naming scheme for test classes. 

We should consolidate these two into a single file, ideally retaining the 
RocksDBStoreTest name since that conforms to the test naming pattern used 
throughout Streams (and so this same thing doesn't happen again). It should 
also extend AbstractKeyValueStoreTest like the RocksDBKeyValueStoreTest 
currently does so we continue to get the benefit of all the tests in there as 
well



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-12 Thread Guozhang Wang
Hey Rohan,

Thanks for putting up the KIP. Maybe you can also briefly talk about the
context in the email message as well (they are very well explained inside
the KIP doc though).

I have a few meta and detailed thoughts after reading it.

1) I agree that the current ratio metrics is just "snapshot in point", and
more flexible metrics that would allow reporters to calculate based on
window intervals are better. However, the current mechanism of the proposed
metrics assumes the thread->clients mapping as of today, where each thread
would own exclusively one main consumer, restore consumer, producer and an
admin client. But this mapping may be subject to change in the future. Have
you thought about how this metric can be extended when, e.g. the embedded
clients and stream threads are de-coupled?

2) [This and all below are minor comments] The "flush-time-total" may
better be a producer client metric, as "flush-wait-time-total", than a
streams metric, though the streams-level "total-blocked" can still leverage
it. Similarly, I think "txn-commit-time-total" and
"offset-commit-time-total" may better be inside producer and consumer
clients respectively.

3) The doc was not very clear on how "thread-start-time" would be needed
when calculating streams utilization along with total-blocked time, could
you elaborate a bit more in the KIP?

4) For "txn-commit-time-total" specifically, besides producer.commitTxn,
other txn-related calls may also be blocking, including
producer.beginTxn/abortTxn, I saw you mentioned "txn-begin-time-total"
later in the doc, but did not include it as a separate metric, and
similarly, should we have a `txn-abort-time-total` as well? If yes, could
you update the KIP page accordingly.

5) Not a suggestion, but just wanted to bring up that the producer related
metrics are a bit "coarsen" compared with the consumer/admin clients since
their IO mechanisms are a bit different: for producer the caller thread
does not do any IOs since it's all delegated to the background sender
thread, while for consumer/admin the caller thread would still need to do
some IOs, and hence the selector-level metrics would make sense. On top of
my head I cannot think of a better measuring mechanism for producers
either, especially for txn-related ones, we may need to experiment and see
if the generated ratio is relatively accurate and reasonable with the
reflected "block time".



Guozhang

On Mon, Jul 12, 2021 at 12:01 PM Rohan Desai 
wrote:

>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-761: Add total blocked time metric to streams

2021-07-12 Thread Rohan Desai
Hello All,

I'd like to start a discussion on the KIP linked above which proposes some
metrics that we would find useful to help measure whether a Kafka Streams
application is saturated. The motivation section in the KIP goes into some
more detail on why we think this is a useful addition to the metrics
already implemented. Thanks in advance for your feedback!

Best Regards,

Rohan

On Mon, Jul 12, 2021 at 12:00 PM Rohan Desai 
wrote:

>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-761%3A+Add+Total+Blocked+Time+Metric+to+Streams
>