Re: [DISCUSS] Apache Kafka 3.0.0 release plan with new updated dates

2021-06-14 Thread Konstantine Karantasis
Done. Moved it into the table of Adopted KIPs targeting 3.0.0 and to the
release plan of course.
Thanks for catching this Israel.

Best,
Konstantine

On Mon, Jun 14, 2021 at 7:40 PM Israel Ekpo  wrote:

> Konstantine,
>
> One of mine is missing from this list
>
> KIP-633: Drop 24 hour default of grace period in Streams
> Please could you include it?
>
> Voting has already concluded a long time ago
>
>
>
> On Mon, Jun 14, 2021 at 6:08 PM Konstantine Karantasis
>  wrote:
>
> > Hi all.
> >
> > KIP Freeze for the next major release of Apache Kafka was reached last
> > week.
> >
> > As of now, 36 KIPs have concluded their voting process and have been
> > adopted.
> > These KIPs are targeting 3.0 (unless it's noted otherwise in the release
> > plan) and their inclusion as new features will be finalized right after
> > Feature Freeze.
> >
> > At the high level, out of these 36 KIPs, 11 have been implemented already
> > and 25 are open or in progress.
> > Here is the full list of adopted KIPs:
> >
> > KIP-751: Drop support for Scala 2.12 in Kafka 4.0 (deprecate in 3.0)
> > KIP-750: Drop support for Java 8 in Kafka 4.0 (deprecate in 3.0)
> > KIP-746: Revise KRaft Metadata Records
> > KIP-745: Connect API to restart connector and tasks
> > KIP-744: Migrate TaskMetadata and ThreadMetadata to an interface with
> > internal implementation
> > KIP-743: Remove config value 0.10.0-2.4 of Streams built-in metrics
> version
> > config
> > KIP-741: Change default serde to be null
> > KIP-740: Clean up public API in TaskId and fix TaskMetadata#taskId()
> > KIP-738: Removal of Connect's internal converter properties
> > KIP-734: Improve AdminClient.listOffsets to return timestamp and offset
> for
> > the record with the largest timestamp
> > KIP-733: Change Kafka Streams default replication factor config
> > KIP-732: Deprecate eos-alpha and replace eos-beta with eos-v2
> > KIP-730: Producer ID generation in KRaft mode
> > KIP-726: Make the "cooperative-sticky, range" as the default assignor
> > KIP-725: Streamlining configurations for WindowedSerializer and
> > WindowedDeserializer.
> > KIP-724: Drop support for message formats v0 and v1
> > KIP-722: Enable connector client overrides by default
> > KIP-721: Enable connector log contexts in Connect Log4j configuration
> > KIP-720: Deprecate MirrorMaker v1
> > KIP-716: Allow configuring the location of the offset-syncs topic with
> > MirrorMaker2
> > KIP-715: Expose Committed offset in streams
> > KIP-709: Extend OffsetFetch requests to accept multiple group ids.
> > KIP-708: Rack awareness for Kafka Streams
> > KIP-707: The future of KafkaFuture
> > KIP-699: Update FindCoordinator to resolve multiple Coordinators at a
> time
> > KIP-698: Add Explicit User Initialization of Broker-side State to Kafka
> > Streams
> > KIP-695: Further Improve Kafka Streams Timestamp Synchronization
> > KIP-691: Enhance Transactional Producer Exception Handling
> > KIP-679: Producer will enable the strongest delivery guarantee by default
> > KIP-666: Add Instant-based methods to ReadOnlySessionStore
> > KIP-653: Upgrade log4j to log4j2
> > KIP-623: Add "internal-topics" option to streams application reset tool
> > KIP-622: Add currentSystemTimeMs and currentStreamTimeMs to
> > ProcessorContext
> > KIP-466: Add support for List serialization and deserialization
> > KIP-405: Kafka Tiered Storage
> > KIP-390: Support Compression Level
> >
> > If you notice that a KIP is missing from the list above and should be
> part
> > of the release plan for 3.0, please reply below.
> > The single source of truth remains the official release plan for 3.0,
> which
> > you may read at any time here:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.0.0
> >
> > Kind reminder that for all the adopted KIPs any required changes to the
> > documentation are also part of their respective feature.
> >
> > For the KIPs that are still in progress, please work closely with your
> > reviewers to make sure that the features are stable and land on time for
> > Feature Freeze.
> >
> > The next milestone for the Apache Kafka 3.0 release is Feature Freeze on
> > June 30th, 2021.
> >
> > Best regards,
> > Konstantine
> >
> > On Fri, Jun 4, 2021 at 3:47 PM Konstantine Karantasis <
> > konstant...@confluent.io> wrote:
> >
> > >
> > > Hi all,
> > >
> > > Just a quick reminder that KIP Freeze is next Wednesday, June 9th.
> > > A vote thread needs to be open for at least 72 hours, so to everyone
> that
> > > is working hard on proposals targeting 3.0.0, please make sure that
> your
> > > [VOTE] threads are started on time.
> > >
> > > Best,
> > > Konstantine
> > >
> > >
> > > On Wed, May 26, 2021 at 8:10 PM Israel Ekpo 
> > wrote:
> > >
> > >> +1 on the new schedule.
> > >>
> > >> On Wed, May 26, 2021 at 8:14 PM Sophie Blee-Goldman
> > >>  wrote:
> > >>
> > >> > Ah ok, thanks Konstantine. I won't bug you about every new KIP that
> > >> comes
> > >> > in between now and KIP Freeze :P
> > >> >
> > >> > +1 on the sche

Re: [DISCUSS] KIP-747 Add support for basic aggregation APIs

2021-06-14 Thread Matthias J. Sax
Hi,

I think extending min/max to non-numeric types makes sense. Wondering
why we should require a `Comparator` or if we should require that the
types implement `Comparable` instead?

I also think, that min/max should not change the value type. Using
`Long` for sum() make sense though, and also to require a ``.


-Matthias

On 6/8/21 5:00 PM, Mohan Parthasarathy wrote:
> Hi Alex,
> 
> On Tue, Jun 8, 2021 at 2:44 PM Alexandre Brasil 
> wrote:
> 
>>
>> My point here is that, when we're only interested in a max/min numeric
>> value, it doesn't
>> matter when we have repeated values, since we'd be only forwarding the
>> number downstream,
>> so I could disregard when the Comparator returns a zero value (meaning
>> equals) and min/max
>> would still be semantically correct. But when we're forwarding the original
>> object downstream
>> instead of its numeric property, it could mean different things
>> semantically depending on how
>> we handle the repeated values.
>>
>> As an example, if I were using max() on a stream of Biddings for products
>> in an auction, the
>> order of the biddings would probably influence the winner if two clients
>> send Biddings with the
>> same value. If we're only forwarding the Bidding value downstream (a double
>> value of 100, for
>> example), it doesn't matter how repeated values are handled, since the max
>> price for this
>> auction would still be 100.00, no matter what Bidding got selected in the
>> end. But if we're
>> forwarding the Biddings downstream instead, then it matters whether the
>> winning Bidding sent
>> downstream was originally posted by Client A or Client B.
>>
>> I'm not saying that an overloaded method to handle different options for
>> how repeated values
>> should be handled by min/max is mandatory, but it should be clear on the
>> methods' docs
>> what would happen when Comparator.compare() == 0. My preferred option for
>> the default
>> behaviour is to only forward a new value is smaller/bigger than the
>> previous min/max value
>> (ignoring compare() == 0), since it would emit less values downstream and
>> would be easier
>> to read ("I only send a value downstream if it's bigger/smaller than the
>> previously selected
>> value").
>>
> Thanks for the clarification. I like your suggestion unless someone feels
> that they want an option to control this (i.e., when compare() == 0, return
> the old value vs new value).
> 
> 
>>
>>> Not knowing the schema of the value (V) has its own set of problems. As I
>> have alluded to
>>> in the proposal, this is a little bit messy. We already have "reduce"
>> which can be used to
>>> implement sum (mapValues().reduce()).
>>> Thinking about it more, do you think "sum" would be useful ? One hacky
>> way to implement
>>> this is to inspect the type of the return when the "func" is called the
>> first time OR infer from
>>> the materialized or have an explicit initializer.
>>
>> I think it might be useful for some use cases, yes, but it would be tricky
>> to implement this in a
>> way that handles generic Numbers and keeps their original implementation
>> class. One
>> simplification you could take is fixating VR to be a Double, and then use
>> Number.doubleValue()
>> to compute the sum.
>>
> 
> Yeah, that would simplify quite a bit. I think you are suggesting this:
> 
> KTable sum(Function func)
> 
> 
>> What you said about using reduce() to compute a sum() is also true for
>> min() and max(). =) All
>> three methods in this KIP would be a syntactic sugar for what could
>> otherwise be implemented
>> using reduce/aggregate, but I see value in implementing them and
>> simplifying the adoption of
>> those use cases.
>>
>> Agreed. I seem to have forgotten the reason as to why I started this KIP
> :-). There is a long way to go.
> 
> -thanks
> Mohan
> 
> Best regards,
>> Alexandre
>>
>> On Sat, Jun 5, 2021 at 10:17 PM Mohan Parthasarathy 
>> wrote:
>>
>>> Hi Alex,
>>>
>>> Responses below.
>>>
>>> On Fri, Jun 4, 2021 at 9:27 AM Alexandre Brasil <
>>> alexandre.bra...@gmail.com>
>>> wrote:
>>>
 Hi Mohan,

 I like the idea of adding those methods to the API, but I'd like to
>> make
>>> a
 suggestion:

 Although the most used scenario for min() / max() might possibly be for
 numeric values, I think they could also be
 useful on other objects like Dates, LocalDates or Strings. Why limit
>> the
 API to Numbers only?


>>> There was no specific reason. Just addressing the common scenario. But I
>>> don't see why this can't be supported given your suggestion below.
>>>
>>> Extending on the above, couldn't we change the API to provide a
 Comparator instead of the Function
 for those methods, and make them return a KTable instead? Not
>> only
 would this approach not limit the
 usage of those methods to Numbers, but they'd also preserve the origin
>>> from
 the min/max value [1]. The extraction of
 a single (numeric?) value could be achieved by a subsequent
>> .mapVal

Re: Adding @NotNull annotation to public APIs, KIP needed or not?

2021-06-14 Thread Matthias J. Sax
Personally, I think it might make sense to use annotations. And I agree,
that we should have proper null-checks in place anyway, so existing code
should not break.

But I don't feel strong about it either -- not sure if some people might
have concerns?

In the end, a KIP sound appropriate though. Just my 2ct.


-Matthias


On 6/4/21 3:08 AM, Matthew de Detrich wrote:
> Hello everyone,
> 
> I was thinking of doing a PR which involved adding @NotNull annotations to
> various Kafka API's. Afaik the @NotNull annotation doesn't break binary
> compatibility however it can break source compatibility.
> 
> The point is that even though using @NotNull can break source
> compatibility, if it does (assuming that the @NotNull is added
> correctly) then you would have gotten a runtime NotNullException or null
> related error and hence your code would have been broken anyways.
> 
> So ultimately I guess the question is do I need to create a KIP to work on
> such a task? I think technically speaking you do need to create one however
> because of what I just said it may not be needed?
> 
> Regards
> 


Re: [DISCUSS] Apache Kafka 3.0.0 release plan with new updated dates

2021-06-14 Thread Israel Ekpo
Konstantine,

One of mine is missing from this list

KIP-633: Drop 24 hour default of grace period in Streams
Please could you include it?

Voting has already concluded a long time ago



On Mon, Jun 14, 2021 at 6:08 PM Konstantine Karantasis
 wrote:

> Hi all.
>
> KIP Freeze for the next major release of Apache Kafka was reached last
> week.
>
> As of now, 36 KIPs have concluded their voting process and have been
> adopted.
> These KIPs are targeting 3.0 (unless it's noted otherwise in the release
> plan) and their inclusion as new features will be finalized right after
> Feature Freeze.
>
> At the high level, out of these 36 KIPs, 11 have been implemented already
> and 25 are open or in progress.
> Here is the full list of adopted KIPs:
>
> KIP-751: Drop support for Scala 2.12 in Kafka 4.0 (deprecate in 3.0)
> KIP-750: Drop support for Java 8 in Kafka 4.0 (deprecate in 3.0)
> KIP-746: Revise KRaft Metadata Records
> KIP-745: Connect API to restart connector and tasks
> KIP-744: Migrate TaskMetadata and ThreadMetadata to an interface with
> internal implementation
> KIP-743: Remove config value 0.10.0-2.4 of Streams built-in metrics version
> config
> KIP-741: Change default serde to be null
> KIP-740: Clean up public API in TaskId and fix TaskMetadata#taskId()
> KIP-738: Removal of Connect's internal converter properties
> KIP-734: Improve AdminClient.listOffsets to return timestamp and offset for
> the record with the largest timestamp
> KIP-733: Change Kafka Streams default replication factor config
> KIP-732: Deprecate eos-alpha and replace eos-beta with eos-v2
> KIP-730: Producer ID generation in KRaft mode
> KIP-726: Make the "cooperative-sticky, range" as the default assignor
> KIP-725: Streamlining configurations for WindowedSerializer and
> WindowedDeserializer.
> KIP-724: Drop support for message formats v0 and v1
> KIP-722: Enable connector client overrides by default
> KIP-721: Enable connector log contexts in Connect Log4j configuration
> KIP-720: Deprecate MirrorMaker v1
> KIP-716: Allow configuring the location of the offset-syncs topic with
> MirrorMaker2
> KIP-715: Expose Committed offset in streams
> KIP-709: Extend OffsetFetch requests to accept multiple group ids.
> KIP-708: Rack awareness for Kafka Streams
> KIP-707: The future of KafkaFuture
> KIP-699: Update FindCoordinator to resolve multiple Coordinators at a time
> KIP-698: Add Explicit User Initialization of Broker-side State to Kafka
> Streams
> KIP-695: Further Improve Kafka Streams Timestamp Synchronization
> KIP-691: Enhance Transactional Producer Exception Handling
> KIP-679: Producer will enable the strongest delivery guarantee by default
> KIP-666: Add Instant-based methods to ReadOnlySessionStore
> KIP-653: Upgrade log4j to log4j2
> KIP-623: Add "internal-topics" option to streams application reset tool
> KIP-622: Add currentSystemTimeMs and currentStreamTimeMs to
> ProcessorContext
> KIP-466: Add support for List serialization and deserialization
> KIP-405: Kafka Tiered Storage
> KIP-390: Support Compression Level
>
> If you notice that a KIP is missing from the list above and should be part
> of the release plan for 3.0, please reply below.
> The single source of truth remains the official release plan for 3.0, which
> you may read at any time here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.0.0
>
> Kind reminder that for all the adopted KIPs any required changes to the
> documentation are also part of their respective feature.
>
> For the KIPs that are still in progress, please work closely with your
> reviewers to make sure that the features are stable and land on time for
> Feature Freeze.
>
> The next milestone for the Apache Kafka 3.0 release is Feature Freeze on
> June 30th, 2021.
>
> Best regards,
> Konstantine
>
> On Fri, Jun 4, 2021 at 3:47 PM Konstantine Karantasis <
> konstant...@confluent.io> wrote:
>
> >
> > Hi all,
> >
> > Just a quick reminder that KIP Freeze is next Wednesday, June 9th.
> > A vote thread needs to be open for at least 72 hours, so to everyone that
> > is working hard on proposals targeting 3.0.0, please make sure that your
> > [VOTE] threads are started on time.
> >
> > Best,
> > Konstantine
> >
> >
> > On Wed, May 26, 2021 at 8:10 PM Israel Ekpo 
> wrote:
> >
> >> +1 on the new schedule.
> >>
> >> On Wed, May 26, 2021 at 8:14 PM Sophie Blee-Goldman
> >>  wrote:
> >>
> >> > Ah ok, thanks Konstantine. I won't bug you about every new KIP that
> >> comes
> >> > in between now and KIP Freeze :P
> >> >
> >> > +1 on the scheduling changes as well
> >> >
> >> > On Wed, May 26, 2021 at 4:00 PM David Arthur 
> wrote:
> >> >
> >> > > The new schedule looks good to me, +1
> >> > >
> >> > > On Wed, May 26, 2021 at 6:29 PM Ismael Juma 
> >> wrote:
> >> > >
> >> > > > Thanks Konstantine, +1 from me.
> >> > > >
> >> > > > Ismael
> >> > > >
> >> > > > On Wed, May 26, 2021 at 2:48 PM Konstantine Karantasis
> >> > > >  wrote:
> >> > > >
> >> > > > > Hi all,
> >> > > > >
> >> > > 

Re: [VOTE] KIP-748: Add Broker Count Metrics

2021-06-14 Thread Israel Ekpo
Xavier, I just responded.

On Mon, Jun 14, 2021 at 1:58 PM Xavier Léauté 
wrote:

> I had one quick question in the discussion thread. Any chance we can
> provide some thought there?
>
> On Thu, Jun 10, 2021 at 9:46 AM Ryan Dielhenn
>  wrote:
>
> > Hello,
> >
> > I would like to start a vote on KIP-748: Add Broker Count Metrics.
> >
> > Here is the KIP:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-748:+Add+Broker+Count+Metrics
> >
> > Here is the discussion thread:
> >
> >
> https://lists.apache.org/thread.html/r308364dfbb3020e6151cef47237c28a4a540e187b8af84ddafec83af%40%3Cdev.kafka.apache.org%3E
> >
> > Please take a look and vote if you have a chance.
> >
> > Thank you,
> > Ryan
> >
>


Re: [DISCUSS] KIP-748: Add Broker Count Metrics

2021-06-14 Thread Israel Ekpo
Ryan,

I think you can add a section under Rejected Alternatives to elaborate on
why you feel combining the metrics for KRaft mode and legacy mode is not a
good idea. That could help clarify future questions such as the one raised
by Xavier.



On Mon, Jun 14, 2021 at 10:20 PM Israel Ekpo  wrote:

> Xavier,
>
> I think the reason for doing this is to make them independent so that it
> is easier to design and implement the tracking for legacy mode (with
> Zookeeper) and  KRaft Mode (without ZK)
>
> That is my assessment.
>
> On Mon, Jun 14, 2021 at 1:57 PM Xavier Léauté 
> wrote:
>
>> Any reason we need two different metrics for ZK an Quorum based
>> controllers?
>> Wouldn't it make sense to have one metric that abstracts the
>> implementation
>> detail?
>>
>> On Mon, Jun 7, 2021 at 2:29 PM Ryan Dielhenn > .invalid>
>> wrote:
>>
>> > Hey Colin and David,
>> >
>> > I added another table for the ZK version of RegisteredBrokerCount.
>> >
>> > Best,
>> > Ryan Dielhenn
>> >
>> > On 2021/06/04 08:21:27, David Jacot 
>> wrote:
>> > > Hi Ryan,
>> > >
>> > > Thanks for the KIP.
>> > >
>> > > +1 for adding RegisteredBrokerCount to the ZK controller as well. This
>> > > would be really helpful.
>> > >
>> > > Best,
>> > > David
>> > >
>> > > On Fri, Jun 4, 2021 at 12:44 AM Colin McCabe 
>> wrote:
>> > >
>> > > > Hi Ryan,
>> > > >
>> > > > Thanks for the KIP. I think it would be good to provide the
>> > > > RegisteredBrokerCount metric for the ZK controller as well as for
>> the
>> > > > Quorum controller. Looks good aside from that!
>> > > >
>> > > > best,
>> > > > Colin
>> > > >
>> > > > On Thu, Jun 3, 2021, at 14:09, Ryan Dielhenn wrote:
>> > > > > Hey kafka-dev,
>> > > > >
>> > > > > I created KIP-748 as a proposal to add broker count metrics to the
>> > > > > Quorum Controller.
>> > > > >
>> > > > >
>> > > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-748%3A+Add+Broker+Count+Metrics#KIP748:AddBrokerCountMetrics
>> > > > >
>> > > > > Best,
>> > > > > Ryan Dielhenn
>> > > > >
>> > > >
>> > >
>> >
>>
>


Re: [DISCUSS] KIP-748: Add Broker Count Metrics

2021-06-14 Thread Israel Ekpo
Xavier,

I think the reason for doing this is to make them independent so that it is
easier to design and implement the tracking for legacy mode (with
Zookeeper) and  KRaft Mode (without ZK)

That is my assessment.

On Mon, Jun 14, 2021 at 1:57 PM Xavier Léauté 
wrote:

> Any reason we need two different metrics for ZK an Quorum based
> controllers?
> Wouldn't it make sense to have one metric that abstracts the implementation
> detail?
>
> On Mon, Jun 7, 2021 at 2:29 PM Ryan Dielhenn  .invalid>
> wrote:
>
> > Hey Colin and David,
> >
> > I added another table for the ZK version of RegisteredBrokerCount.
> >
> > Best,
> > Ryan Dielhenn
> >
> > On 2021/06/04 08:21:27, David Jacot  wrote:
> > > Hi Ryan,
> > >
> > > Thanks for the KIP.
> > >
> > > +1 for adding RegisteredBrokerCount to the ZK controller as well. This
> > > would be really helpful.
> > >
> > > Best,
> > > David
> > >
> > > On Fri, Jun 4, 2021 at 12:44 AM Colin McCabe 
> wrote:
> > >
> > > > Hi Ryan,
> > > >
> > > > Thanks for the KIP. I think it would be good to provide the
> > > > RegisteredBrokerCount metric for the ZK controller as well as for the
> > > > Quorum controller. Looks good aside from that!
> > > >
> > > > best,
> > > > Colin
> > > >
> > > > On Thu, Jun 3, 2021, at 14:09, Ryan Dielhenn wrote:
> > > > > Hey kafka-dev,
> > > > >
> > > > > I created KIP-748 as a proposal to add broker count metrics to the
> > > > > Quorum Controller.
> > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-748%3A+Add+Broker+Count+Metrics#KIP748:AddBrokerCountMetrics
> > > > >
> > > > > Best,
> > > > > Ryan Dielhenn
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] New Kafka PMC member: Konstantine Karantasis

2021-06-14 Thread Israel Ekpo
I figured but still wanted to congratulate him.

On Mon, Jun 14, 2021 at 5:12 PM Bill Bejeck  wrote:

> This email in the thread is not an announcement and was sent prematurely to
> the dev list.
>
> Bill
>
> On Mon, Jun 14, 2021 at 1:36 PM Bill Bejeck  wrote:
>
> > I'm a +1 as well
> >
> > Bill
> >
> >
> >>
> >>
>


Re: [DISCUSS] KIP-714: Client metrics and observability

2021-06-14 Thread Ryanne Dolan
Magnus, I think such a substantial change requires more motivation than is
currently provided. As I read it, the motivation boils down to this: you
want your clients to phone-home unless they opt-out. As stated in the KIP,
"there are plenty of existing solutions [...] to send metrics [...] to a
collector", so the opt-out appears to be the only motivation. Am I missing
something?

Ryanne

On Wed, Jun 2, 2021 at 7:46 AM Magnus Edenhill  wrote:

> Hey all,
>
> I'm proposing KIP-714 to add remote Client metrics and observability.
> This functionality will allow centralized monitoring and troubleshooting of
> clients and their internals.
>
> Please see
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
>
> Looking forward to your feedback!
>
> Regards,
> Magnus
>


[jira] [Created] (KAFKA-12949) TestRaftServer's scala.MatchError: null on test-kraft-server-start.sh

2021-06-14 Thread Ignacio Acuna (Jira)
Ignacio Acuna created KAFKA-12949:
-

 Summary: TestRaftServer's scala.MatchError: null on 
test-kraft-server-start.sh
 Key: KAFKA-12949
 URL: https://issues.apache.org/jira/browse/KAFKA-12949
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Reporter: Ignacio Acuna
Assignee: Ignacio Acuna


Encounter the following exception when trying to run the TestRaftServer:
{code:java}
bin/test-kraft-server-start.sh --config config/kraft.properties{code}
{code:java}
[2021-06-14 17:15:43,232] ERROR [raft-workload-generator]: Error due to 
(kafka.tools.TestRaftServer$RaftWorkloadGenerator)
 scala.MatchError: null
 at 
kafka.tools.TestRaftServer$RaftWorkloadGenerator.doWork(TestRaftServer.scala:220)
 at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
 [2021-06-14 17:15:43,253] INFO [raft-workload-generator]: Stopped 
(kafka.tools.TestRaftServer$RaftWorkloadGenerator){code}
That happens on the followin match:
{code:java}
eventQueue.poll(eventTimeoutMs, TimeUnit.MILLISECONDS) match {
  case HandleClaim(epoch) =>
  claimedEpoch = Some(epoch)
  throttler.reset()
  pendingAppends.clear()
  recordCount.set(0)
  case HandleResign =>
  claimedEpoch = None
  pendingAppends.clear()case HandleCommit(reader) =>
  try {
while (reader.hasNext) {
  val batch = reader.next()
  claimedEpoch.foreach { leaderEpoch =>
handleLeaderCommit(leaderEpoch, batch)
  }
}
  } finally {
reader.close()
  }
  case HandleSnapshot(reader) =>
  // Ignore snapshots; only interested in records appended by this leader
  reader.close()
  case Shutdown => // Ignore shutdown command
}
{code}
Full log attached. When the eventQueue.poll returns null (if deque is empty), 
there isn't a case to match so the thread gets stuck and stops processing 
events (raft-workload-generator).

Proposal:
 Add a case null to the match so the raft-workload-generator can continue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Apache Kafka 3.0.0 release plan with new updated dates

2021-06-14 Thread Konstantine Karantasis
Hi all.

KIP Freeze for the next major release of Apache Kafka was reached last
week.

As of now, 36 KIPs have concluded their voting process and have been
adopted.
These KIPs are targeting 3.0 (unless it's noted otherwise in the release
plan) and their inclusion as new features will be finalized right after
Feature Freeze.

At the high level, out of these 36 KIPs, 11 have been implemented already
and 25 are open or in progress.
Here is the full list of adopted KIPs:

KIP-751: Drop support for Scala 2.12 in Kafka 4.0 (deprecate in 3.0)
KIP-750: Drop support for Java 8 in Kafka 4.0 (deprecate in 3.0)
KIP-746: Revise KRaft Metadata Records
KIP-745: Connect API to restart connector and tasks
KIP-744: Migrate TaskMetadata and ThreadMetadata to an interface with
internal implementation
KIP-743: Remove config value 0.10.0-2.4 of Streams built-in metrics version
config
KIP-741: Change default serde to be null
KIP-740: Clean up public API in TaskId and fix TaskMetadata#taskId()
KIP-738: Removal of Connect's internal converter properties
KIP-734: Improve AdminClient.listOffsets to return timestamp and offset for
the record with the largest timestamp
KIP-733: Change Kafka Streams default replication factor config
KIP-732: Deprecate eos-alpha and replace eos-beta with eos-v2
KIP-730: Producer ID generation in KRaft mode
KIP-726: Make the "cooperative-sticky, range" as the default assignor
KIP-725: Streamlining configurations for WindowedSerializer and
WindowedDeserializer.
KIP-724: Drop support for message formats v0 and v1
KIP-722: Enable connector client overrides by default
KIP-721: Enable connector log contexts in Connect Log4j configuration
KIP-720: Deprecate MirrorMaker v1
KIP-716: Allow configuring the location of the offset-syncs topic with
MirrorMaker2
KIP-715: Expose Committed offset in streams
KIP-709: Extend OffsetFetch requests to accept multiple group ids.
KIP-708: Rack awareness for Kafka Streams
KIP-707: The future of KafkaFuture
KIP-699: Update FindCoordinator to resolve multiple Coordinators at a time
KIP-698: Add Explicit User Initialization of Broker-side State to Kafka
Streams
KIP-695: Further Improve Kafka Streams Timestamp Synchronization
KIP-691: Enhance Transactional Producer Exception Handling
KIP-679: Producer will enable the strongest delivery guarantee by default
KIP-666: Add Instant-based methods to ReadOnlySessionStore
KIP-653: Upgrade log4j to log4j2
KIP-623: Add "internal-topics" option to streams application reset tool
KIP-622: Add currentSystemTimeMs and currentStreamTimeMs to ProcessorContext
KIP-466: Add support for List serialization and deserialization
KIP-405: Kafka Tiered Storage
KIP-390: Support Compression Level

If you notice that a KIP is missing from the list above and should be part
of the release plan for 3.0, please reply below.
The single source of truth remains the official release plan for 3.0, which
you may read at any time here:

https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.0.0

Kind reminder that for all the adopted KIPs any required changes to the
documentation are also part of their respective feature.

For the KIPs that are still in progress, please work closely with your
reviewers to make sure that the features are stable and land on time for
Feature Freeze.

The next milestone for the Apache Kafka 3.0 release is Feature Freeze on
June 30th, 2021.

Best regards,
Konstantine

On Fri, Jun 4, 2021 at 3:47 PM Konstantine Karantasis <
konstant...@confluent.io> wrote:

>
> Hi all,
>
> Just a quick reminder that KIP Freeze is next Wednesday, June 9th.
> A vote thread needs to be open for at least 72 hours, so to everyone that
> is working hard on proposals targeting 3.0.0, please make sure that your
> [VOTE] threads are started on time.
>
> Best,
> Konstantine
>
>
> On Wed, May 26, 2021 at 8:10 PM Israel Ekpo  wrote:
>
>> +1 on the new schedule.
>>
>> On Wed, May 26, 2021 at 8:14 PM Sophie Blee-Goldman
>>  wrote:
>>
>> > Ah ok, thanks Konstantine. I won't bug you about every new KIP that
>> comes
>> > in between now and KIP Freeze :P
>> >
>> > +1 on the scheduling changes as well
>> >
>> > On Wed, May 26, 2021 at 4:00 PM David Arthur  wrote:
>> >
>> > > The new schedule looks good to me, +1
>> > >
>> > > On Wed, May 26, 2021 at 6:29 PM Ismael Juma 
>> wrote:
>> > >
>> > > > Thanks Konstantine, +1 from me.
>> > > >
>> > > > Ismael
>> > > >
>> > > > On Wed, May 26, 2021 at 2:48 PM Konstantine Karantasis
>> > > >  wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > Please find below the updated release plan for the Apache Kafka
>> 3.0.0
>> > > > > release.
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177046466
>> > > > >
>> > > > > New suggested dates for the release are as follows:
>> > > > >
>> > > > > KIP Freeze is 09 June 2021 (same date as in the initial plan)
>> > > > > Feature Freeze is 30 June 2021 (new date, extended by two weeks)
>> > > > > Code Freeze is 14 July 2021 (

Re: [VOTE] New Kafka PMC member: Konstantine Karantasis

2021-06-14 Thread Bill Bejeck
This email in the thread is not an announcement and was sent prematurely to
the dev list.

Bill

On Mon, Jun 14, 2021 at 1:36 PM Bill Bejeck  wrote:

> I'm a +1 as well
>
> Bill
>
>
>>
>>


[jira] [Created] (KAFKA-12948) NetworkClient.close(node) with node in connecting state makes NetworkClient unusable

2021-06-14 Thread Rajini Sivaram (Jira)
Rajini Sivaram created KAFKA-12948:
--

 Summary: NetworkClient.close(node) with node in connecting state 
makes NetworkClient unusable
 Key: KAFKA-12948
 URL: https://issues.apache.org/jira/browse/KAFKA-12948
 Project: Kafka
  Issue Type: Bug
  Components: network
Affects Versions: 2.7.1, 2.8.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
 Fix For: 2.7.2, 2.8.1


`NetworkClient.close(node)` closes the node and removes it from 
`ClusterConnectionStates.nodeState`, but not from 
`ClusterConnectionStates.connectingNodes`. Subsequent `NetworkClient.poll()` 
invocations throw IllegalStateException and this leaves the NetworkClient in an 
unusable state until the node is removed from connectionNodes or added to 
nodeState. We don't use `NetworkClient.close(node)` in clients, but we use it 
in clients started by brokers for replica fetcher and controller. Since brokers 
use NetworkClientUtils.isReady() before establishing connections and this 
invokes poll(), the NetworkClient never recovers.

Exception stack trace:
{code:java}
java.lang.IllegalStateException: No entry found for connection 0
at 
org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:409)
at 
org.apache.kafka.clients.ClusterConnectionStates.isConnectionSetupTimeout(ClusterConnectionStates.java:446)
at 
org.apache.kafka.clients.ClusterConnectionStates.lambda$nodesWithConnectionSetupTimeout$0(ClusterConnectionStates.java:458)
at 
java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:174)
at java.util.HashMap$KeySpliterator.forEachRemaining(HashMap.java:1553)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.kafka.clients.ClusterConnectionStates.nodesWithConnectionSetupTimeout(ClusterConnectionStates.java:459)
at 
org.apache.kafka.clients.NetworkClient.handleTimedOutConnections(NetworkClient.java:807)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:564)
at 
org.apache.kafka.clients.NetworkClientUtils.isReady(NetworkClientUtils.java:42)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] New Kafka PMC member: Konstantine Karantasis

2021-06-14 Thread Israel Ekpo
Congratulations Konstantine! Thank you for your service and contributions.

If I could vote, I would be a +1 as well :)



On Mon, Jun 14, 2021 at 1:36 PM Bill Bejeck  wrote:

> I'm a +1 as well
>
> Bill
>
> On Mon, Jun 14, 2021 at 1:33 PM Randall Hauch  wrote:
>
> > +1. He's been a very valuable contributor and committer.
> >
> > Randall
> >
> > On Sun, Jun 13, 2021 at 10:40 PM Ismael Juma  wrote:
> >
> > > I think it's worth pinging people once before closing the vote. I'm +1.
> > >
> > > Ismael
> > >
> > > On Thu, Jun 10, 2021 at 8:43 AM Mickael Maison 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > It has been a month and nobody has voted so I'm closing this vote.
> > > >
> > > > Thanks
> > > >
> > > > On Mon, May 10, 2021 at 9:58 AM Mickael Maison 
> > > > wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I'd like to start a vote for adding Konstantine Karantasis to the
> > Kafka
> > > > PMC.
> > > > >
> > > > > Konstantine is a commiter since February 2020 and has made 67
> commits
> > > in
> > > > total.
> > > > > https://github.com/apache/kafka/commits?author=kkonstantine
> > > > >
> > > > > He is one of the key contributors to Kafka Connect and has either
> > > > > authored or reviewed a significant chunk of all the changes made to
> > > > > Connect in the past couple of years.
> > > > >
> > > > > He has also presented a session about Connect at the Kafka Summit
> SF
> > > > 2019.
> > > > >
> > > > > Thanks
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-714: Client metrics and observability

2021-06-14 Thread Travis Bischel
Apologies for this duplicate reply, I did not notice the success confirmation 
on the first submission.

On 2021/06/14 04:52:11, Travis Bischel  wrote: 
> Hi! I have a few thoughts on this KIP. First, I'd like to thank you for your 
> work
> and writeup, it's clear that a lot of thought went into this and it's very 
> thorough!
> However, I'm not convinced it's the right approach from a fundamental level.
> 
> Fundamentally, this KIP seems like somewhat of a solution to an organizational
> problem. Metrics are organizational concerns, not Kafka operator concerns.
> Clients should make it easy to plug in metrics (this is the approach I take in
> my own client), and organizations should have processes such that all clients
> gather and ship metrics how that organization desires. If an organization is
> set up correctly, there is no reason for metrics to be forwarded through 
> Kafka.
> This feels like a solution to an organization not properly setting up how
> processes ship metrics, and in some ways, it's an overbroad solution, and in
> other ways, it doesn't cover the entire problem.
> 
> From the perspective of Kafka operators, it is easy to see that this KIP is
> nice in that it just dictates what clients should support for metrics and that
> the metrics should ship through Kafka. But, from the perspective of an
> observability team, this workflow is basically hijacking the standard flow 
> that
> organizations may have. I would rather have applications collect metrics and
> ship them the same way every other application does. I'd rather not have to
> configure additional plugins within Kafka to take metrics and forward them.
> 
> More importantly, this KIP prescibes cardinality problems, requires that to
> officially support the KIP a client must support all relevant metrics within
> the KIP, and requires that a client cannot support other metrics unless those
> other metrics also go through a KIP process. It is difficult to imagine all of
> these metrics being relevant to every organization, and there is no way for an
> organization to filter what is relevant within the client. Instead, the
> filtering is pushed downwards, meaning more network IO and more CPU costs to
> filter what is irrelevant and aggregate what needs to be aggregated, and more
> time for an organization to setup whatever it is that will be doing this
> filtering and aggregating. Contrast this with a client that enables hooking in
> to capture numbers that are relevant within an org itself: the org can gather
> what they want, ship only want they want, and ship directly to the
> observability system they have already set up. As an aside, it may also be
> wise to avoid shipping metrics through Kafka about client interaction with
> Kafka, because if Kafka is having problems, then orgs lose insight into those
> problems. This would be like statuspage using itself for status on its own
> systems.
> 
> Another downside is that by dictating the important metrics, this KIP either
> has two choices: try to choose what is important to every org, and inevitably
> leave out something important to somebody else, or just add everything and let
> the orgs filter. This KIP mostly looks to go with the latter approach, meaning
> orgs will be shipping & filtering. With hooks, an org would be able to gather
> exactly what they want.
> 
> As well, I expect that org applications have metrics on the state of the
> applications outside of the Kafka client. Applications are already sending
> non-Kafka-client related metrics outbound to observability systems. If a Kafka
> client provided hooks, then users could just gather the additional relevant
> Kafka client metrics and ship those metrics the same way they do all of their
> other metrics. It feels a bit odd for a Kafka client to have its own separate
> way of forwarding metrics. Another benefit hooks in clients is that
> organizations do not _have_ to set up additional plugins to forward metrics
> from Kafka. Hooks avoid extra organizational work.
> 
> The option that the KIP provides for users of clients to opt out of metrics 
> may
> avoid some of the above issues (by just disabling things at the user level),
> but that's not really great from the perspective of client authors, because 
> the
> existence of this KIP forces authors to either just not implement the KIP, or
> increase complexity within the KIP. Further, from an operator perspective, if 
> I
> would prefer clients to ship metrics through the systems they already have in
> place, now I have to expect that anything that uses librdkafka or the official
> Java client will be shipping me metrics that I have to deal with (since the 
> KIP
> is default enabled).
> 
> Lastly, I'm a little wary that this KIP may stem from a product goal of
> Confluent: since most everything uses librdkafka or the Java client, then by
> defaulting clients sending metrics, Confluent gets an easy way to provide
> metric panels for a nice cloud UI. If any client does no

Re: [VOTE] KIP-748: Add Broker Count Metrics

2021-06-14 Thread Xavier Léauté
I had one quick question in the discussion thread. Any chance we can
provide some thought there?

On Thu, Jun 10, 2021 at 9:46 AM Ryan Dielhenn
 wrote:

> Hello,
>
> I would like to start a vote on KIP-748: Add Broker Count Metrics.
>
> Here is the KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-748:+Add+Broker+Count+Metrics
>
> Here is the discussion thread:
>
> https://lists.apache.org/thread.html/r308364dfbb3020e6151cef47237c28a4a540e187b8af84ddafec83af%40%3Cdev.kafka.apache.org%3E
>
> Please take a look and vote if you have a chance.
>
> Thank you,
> Ryan
>


Re: [DISCUSS] KIP-748: Add Broker Count Metrics

2021-06-14 Thread Xavier Léauté
Any reason we need two different metrics for ZK an Quorum based controllers?
Wouldn't it make sense to have one metric that abstracts the implementation
detail?

On Mon, Jun 7, 2021 at 2:29 PM Ryan Dielhenn 
wrote:

> Hey Colin and David,
>
> I added another table for the ZK version of RegisteredBrokerCount.
>
> Best,
> Ryan Dielhenn
>
> On 2021/06/04 08:21:27, David Jacot  wrote:
> > Hi Ryan,
> >
> > Thanks for the KIP.
> >
> > +1 for adding RegisteredBrokerCount to the ZK controller as well. This
> > would be really helpful.
> >
> > Best,
> > David
> >
> > On Fri, Jun 4, 2021 at 12:44 AM Colin McCabe  wrote:
> >
> > > Hi Ryan,
> > >
> > > Thanks for the KIP. I think it would be good to provide the
> > > RegisteredBrokerCount metric for the ZK controller as well as for the
> > > Quorum controller. Looks good aside from that!
> > >
> > > best,
> > > Colin
> > >
> > > On Thu, Jun 3, 2021, at 14:09, Ryan Dielhenn wrote:
> > > > Hey kafka-dev,
> > > >
> > > > I created KIP-748 as a proposal to add broker count metrics to the
> > > > Quorum Controller.
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-748%3A+Add+Broker+Count+Metrics#KIP748:AddBrokerCountMetrics
> > > >
> > > > Best,
> > > > Ryan Dielhenn
> > > >
> > >
> >
>


[jira] [Created] (KAFKA-12947) Replace EasyMock and PowerMock with Mockito for StreamsMetricsImplTest ...

2021-06-14 Thread YI-CHEN WANG (Jira)
YI-CHEN WANG created KAFKA-12947:


 Summary: Replace EasyMock and PowerMock with Mockito for 
StreamsMetricsImplTest ...
 Key: KAFKA-12947
 URL: https://issues.apache.org/jira/browse/KAFKA-12947
 Project: Kafka
  Issue Type: Sub-task
Reporter: YI-CHEN WANG
Assignee: YI-CHEN WANG


For Kafka-7438



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] New Kafka PMC member: Konstantine Karantasis

2021-06-14 Thread Bill Bejeck
I'm a +1 as well

Bill

On Mon, Jun 14, 2021 at 1:33 PM Randall Hauch  wrote:

> +1. He's been a very valuable contributor and committer.
>
> Randall
>
> On Sun, Jun 13, 2021 at 10:40 PM Ismael Juma  wrote:
>
> > I think it's worth pinging people once before closing the vote. I'm +1.
> >
> > Ismael
> >
> > On Thu, Jun 10, 2021 at 8:43 AM Mickael Maison 
> > wrote:
> >
> > > Hi,
> > >
> > > It has been a month and nobody has voted so I'm closing this vote.
> > >
> > > Thanks
> > >
> > > On Mon, May 10, 2021 at 9:58 AM Mickael Maison 
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > I'd like to start a vote for adding Konstantine Karantasis to the
> Kafka
> > > PMC.
> > > >
> > > > Konstantine is a commiter since February 2020 and has made 67 commits
> > in
> > > total.
> > > > https://github.com/apache/kafka/commits?author=kkonstantine
> > > >
> > > > He is one of the key contributors to Kafka Connect and has either
> > > > authored or reviewed a significant chunk of all the changes made to
> > > > Connect in the past couple of years.
> > > >
> > > > He has also presented a session about Connect at the Kafka Summit SF
> > > 2019.
> > > >
> > > > Thanks
> > >
> >
>


Re: [VOTE] KIP-716: Allow configuring the location of the offset-syncs topic with MirrorMaker2

2021-06-14 Thread Mickael Maison
I'm +1 (binding) too

I'm closing the vote with:
- 3 +1 (binding) votes from Tom, Konstantine and myself
- 3 +1 (non-binding) votes from Ryanne, Igor and Omnia

Thanks for the feedback and votes

On Mon, Jun 14, 2021 at 5:57 PM Omnia Ibrahim  wrote:
>
> +1 (non-binding) thanks
>
> On Mon, Jun 14, 2021 at 2:35 PM Igor Soarez  wrote:
>
> > Thanks for the KIP Mickael.
> >
> > Makes sense. +1 (non-binding)
> >
> > --
> > Igor
> >
> >


Re: [idea] Kafka topic metadata

2021-06-14 Thread Israel Ekpo
Sorry Ivan and Garmes,

I misunderstood the suggestion earlier. I think this will be a great idea
for a KIP.

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

You were referring to metadata for the actual topic and not its contents.

Sorry about that confusion.



On Mon, Jun 14, 2021 at 1:05 PM Ivan Yurchenko 
wrote:

> Hi,
>
> Having metadata for topics seems pretty useful. Currently, one has to use
> external storage for this (e.g. a database) and the question of keeping
> topic and metadata in sync exists: A topic is deleted, how to delete its
> metadata? How to deal with delete-then-recreate scenarios (well, we have
> topic IDs now)? Making Kafka self-sufficient in this aspect sounds good.
>
> Ivan
>
>
> On Mon, 14 Jun 2021 at 19:55, Israel Ekpo  wrote:
>
> > Garmes,
> >
> > I had similar questions in the past but @Matthias J. Sax
> >  pointed
> > me to this
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-244%3A+Add+Record+Header+support+to+Kafka+Streams+Processor+API
> >
> > With the headers, you can filter based on the header content and not just
> > the contents of the topic.
> >
> >
> >
> >
> > On Fri, Jun 11, 2021 at 9:04 AM Garmes Amine
>  > >
> > wrote:
> >
> > > Hello Apache Kafka community,
> > >
> > > In the driection to make Kafka more cloud friendly, do you think it
> make
> > > sense to have key-velue metadata in the topic definition,
> > > this will allow to search, filter, select ... topics not only by name
> but
> > > also by metadata
> > >
> > > Ex:
> > > topic name: test
> > > topic metadata:
> > >- namespace: space1 (will allow to have multi tenant Kafka cluster)
> > >- project: test
> > >- phase: dev
> > >- type: json
> > >- area: logging
> > >
> > >
> > > Best regards,
> > > Garmes
> >
>


Re: [idea] Kafka topic metadata

2021-06-14 Thread Ivan Yurchenko
Hi,

Having metadata for topics seems pretty useful. Currently, one has to use
external storage for this (e.g. a database) and the question of keeping
topic and metadata in sync exists: A topic is deleted, how to delete its
metadata? How to deal with delete-then-recreate scenarios (well, we have
topic IDs now)? Making Kafka self-sufficient in this aspect sounds good.

Ivan


On Mon, 14 Jun 2021 at 19:55, Israel Ekpo  wrote:

> Garmes,
>
> I had similar questions in the past but @Matthias J. Sax
>  pointed
> me to this
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-244%3A+Add+Record+Header+support+to+Kafka+Streams+Processor+API
>
> With the headers, you can filter based on the header content and not just
> the contents of the topic.
>
>
>
>
> On Fri, Jun 11, 2021 at 9:04 AM Garmes Amine  >
> wrote:
>
> > Hello Apache Kafka community,
> >
> > In the driection to make Kafka more cloud friendly, do you think it make
> > sense to have key-velue metadata in the topic definition,
> > this will allow to search, filter, select ... topics not only by name but
> > also by metadata
> >
> > Ex:
> > topic name: test
> > topic metadata:
> >- namespace: space1 (will allow to have multi tenant Kafka cluster)
> >- project: test
> >- phase: dev
> >- type: json
> >- area: logging
> >
> >
> > Best regards,
> > Garmes
>


Re: [VOTE] KIP-716: Allow configuring the location of the offset-syncs topic with MirrorMaker2

2021-06-14 Thread Omnia Ibrahim
+1 (non-binding) thanks

On Mon, Jun 14, 2021 at 2:35 PM Igor Soarez  wrote:

> Thanks for the KIP Mickael.
>
> Makes sense. +1 (non-binding)
>
> --
> Igor
>
>


Re: [idea] Kafka topic metadata

2021-06-14 Thread Israel Ekpo
Garmes,

I had similar questions in the past but @Matthias J. Sax
 pointed
me to this

https://cwiki.apache.org/confluence/display/KAFKA/KIP-244%3A+Add+Record+Header+support+to+Kafka+Streams+Processor+API

With the headers, you can filter based on the header content and not just
the contents of the topic.




On Fri, Jun 11, 2021 at 9:04 AM Garmes Amine 
wrote:

> Hello Apache Kafka community,
>
> In the driection to make Kafka more cloud friendly, do you think it make
> sense to have key-velue metadata in the topic definition,
> this will allow to search, filter, select ... topics not only by name but
> also by metadata
>
> Ex:
> topic name: test
> topic metadata:
>- namespace: space1 (will allow to have multi tenant Kafka cluster)
>- project: test
>- phase: dev
>- type: json
>- area: logging
>
>
> Best regards,
> Garmes


Re: [DISCUSS] KIP-690 Add additional configuration to control MirrorMaker 2 internal topics naming convention

2021-06-14 Thread Omnia Ibrahim
Hi folks, let me try to clarify some of your concerns and questions.

Mickael: Have you considered making names changeable via configurations?
>

Gwen: may be missing something, but we are looking at 3 new configs (one
> for each topic). And this rejected alternative is basically identical to
> what Connect already does (you can choose names for internal topics using
> configs).
>
> These are valid points. The reasons why we should prefer an interface (the
current proposal is using the ReplicationPolicy interface which already
exists in MM2) instead are

1. the number of configurations that MM2 has. Right now MM2 has its own set
of configuration in addition to configuration for admin, consumer and
producer clients and Connect API. And these configurations in some
use-cases could be different based on the herder.

Consider a use case where MM2 is used to mirror between a set of clusters
running by different teams and have different naming policies. So if we are
using 3 configurations for internal topics for a use case like below the
configuration will be like this. If the number of policies grows, the
amount of configuration can get unwieldy.

clusters = newCenterCluster, teamACluster, teamBCluster, ...

//newCenterCluster policy is .
//teamACluster naming policy is _ when move to
newCenterCluster it will be teamA._
//teamBCluster naming policy is . when move to
newCenterCluster it will be teamB._

//The goal is to move all topics from team-specific cluster to one new cluster
// where the org can unify resource management and naming conventions

replication.policy.class=MyCustomReplicationPolicy

teamACluster.heartbeat.topic=mm2_heartbeat_topic // created on source cluster
teamACluster->newCenterCluster.offsets-sync.topic=mm2_my_offset_sync_topic
//created on source cluster at the moment
teamACluster->newCenterCluster.checkpoints.topic=teamA.mm2_checkpoint_topic
//created on target cluster which newCenterCluster

teamBCluster.heartbeat.topic=mm2.heartbeat_topic // created on source cluster
teamBCluster->newCenterCluster.offsets-sync.topic=mm2.my_offset_sync_topic
//created on source cluster at the moment
teamBCluster->newCenterCluster.checkpoints.topic=teamB.mm2_checkpoint_topic
//created on target cluster which newCenterCluster

teamACluster.config.storage.topic=...
teamACluster.offset.storage.topic=...
teamACluster.status.storage.topic=...

teamBCluster.config.storage.topic=...
teamBCluster.offset.storage.topic=...
teamBCluster.status.storage.topic=...


teamACluster→newCenterCluster.enabled=true
teamACluster→newCenterCluster.enabled=true

2. The other reason is what Mickael mentioned regards a future KIP to move
offset-syncs on the target cluster.

> Mickael: I'm about to open a KIP to specify where the offset-syncs topic
> is created by MM2. In restricted environments, we'd prefer MM2 to only have
> read access to the source cluster and have the offset-syncs on the target
> cluster. I think allowing to specify the cluster where to create that topic
> would be a natural extension of the interface you propose here.
>

In this case, where you want to achieve “read-only” on the source cluster
then using ReplicationPolicy to name the offset-syncs topic makes more
sense as ReplicationPolicy holds the implementation of how topics will be
named on the target cluster where MM2 will have “write” access.
For example, the user can provide their own naming convention for the
target cluster as part of replication.policy.class =
MyCustomReplicationPolicy where it formate the name of the replicated topic
to be ...Now if we have also an extra config
for internal topics that will also be created at target with the similar
naming convention .. this means the
user will need to add
offset-sync.topic=.. this feels
like a duplication for me as we could achieve it by re-using the logic from
ReplicationPolicy

So using the replication interface we can define MyCustomReplicationPolicy like
this following

public class *MyCustomReplicationPolicy* implements ReplicationPolicy {
@Override
//How to rename remote topics
public String *formatRemoteTopic*(String sourceClusterAlias, String topic) {
return nameTopicOnTarger(sourceClusterAlias, topic, "mirrored");
}

@Override
String *offsetSyncTopic*(String clusterAlias) {
// offset-sync topic will be created on target cluster so it
need to follow
// naming convention of target cluster
return nameTopicOnTarget(clusterAlias, "mm2-offset-sync", "internal");
}

@Override
String *checkpointsTopic*(String clusterAlias) {
// checkpoints topic is created on target cluster so it need to follow
// naming convention of target cluster
return nameTopicOnTarget(clusterAlias, "mm2-checkpoints", "internal");
}

private String *nameTopicOnTarget*(String prefix, String topic,
String suffix) {
return prefix + separator + topic + seprator + suffix;
}
}

and MM2 configs will be

*newCenterClus

Re: [VOTE] KIP-752: Support --bootstrap-server in ReplicaVerificationTool

2021-06-14 Thread Guozhang Wang
If we believe this tool does not work anymore and there's other ways to
achieve the intended function, then we should remove it in the next
release; otherwise, I think this KIP still is worthy. In any ways, we
should not left a cmd tool not maintained but not removed either.

Guozhang

On Thu, Jun 10, 2021 at 10:05 PM Dongjin Lee  wrote:

> Hi Ismael,
>
> > I am not convinced this tool is actually useful, I haven't seen anyone
> using it in years.
>
> Sure, you may right indeed. The `ReplicaVerificationTool` may not be so
> useful.[^0] However, I hope to propose another perspective.
>
> As long as this tool is provided with a launcher script in a distribution,
> its command-line parameters look so weird to the users since it breaks
> consistency. It is even worse with KIP-499
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=123899170
> >[^1],
> which tries to unify the command line parameters and deprecate old ones -
> even the tools without launcher script (e.g., VerifiableLog4jAppender) now
> uses `--bootstrap-server` parameter. This situation is rather odd, isn't
> it?
>
> This improvement may not have a great value, but it may reduce awkwardness
> from the user's viewpoint.
>
> Best,
> Dongjin
>
> [^0]: With my personal experience, I used it to validate the replication
> when working with a client so sensitive to replication missing, like a
> Semiconductor manufacturing company.
> [^1]: Somewhat strange, two omitted tools from KIP-499
> <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=123899170
> >
> all have their own launcher script.
>
> On Thu, Jun 10, 2021 at 2:02 PM Ismael Juma  wrote:
>
> > KAFKA-12600 was a general change, not related to this tool specifically.
> I
> > am not convinced this tool is actually useful, I haven't seen anyone
> using
> > it in years.
> >
> > Ismael
> >
> > On Wed, Jun 9, 2021 at 9:51 PM Dongjin Lee  wrote:
> >
> > > Hi Ismael,
> > >
> > > Before I submit this KIP, I reviewed some history. When KIP-499
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-499+-+Unify+connection+name+flag+for+command+line+tool
> > > >
> > > tried to resolve the inconsistencies between the command line tools,
> two
> > > tools were omitted, probably by mistake.
> > >
> > > - KAFKA-12878: Support --bootstrap-server
> kafka-streams-application-reset
> > > 
> > > - KAFKA-12899: Support --bootstrap-server in ReplicaVerificationTool
> > >  (this one)
> > >
> > > And it seems like this tool is still working. The last update was
> > > KAFKA-12600  by
> you,
> > > which will also be included in this 3.0.0 release. It is why I
> determined
> > > that this tool is worth updating.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > On Thu, Jun 10, 2021 at 1:26 PM Ismael Juma  wrote:
> > >
> > > > Hi Dongjin,
> > > >
> > > > Does this tool still work? I recall that there were some doubts about
> > it
> > > > and that's why it wasn't updated previously.
> > > >
> > > > Ismael
> > > >
> > > > On Sat, Jun 5, 2021 at 2:38 PM Dongjin Lee 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to call for a vote on KIP-752: Support --bootstrap-server
> in
> > > > > ReplicaVerificationTool:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-752%3A+Support+--bootstrap-server+in+ReplicaVerificationTool
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > --
> > > > > *Dongjin Lee*
> > > > >
> > > > > *A hitchhiker in the mathematical world.*
> > > > >
> > > > >
> > > > >
> > > > > *github:  github.com/dongjinleekr
> > > > > keybase:
> > > > https://keybase.io/dongjinleekr
> > > > > linkedin:
> > > > kr.linkedin.com/in/dongjinleekr
> > > > > speakerdeck:
> > > > > speakerdeck.com/dongjin
> > > > > *
> > > > >
> > > >
> > >
> > >
> > > --
> > > *Dongjin Lee*
> > >
> > > *A hitchhiker in the mathematical world.*
> > >
> > >
> > >
> > > *github:  github.com/dongjinleekr
> > > keybase:
> > https://keybase.io/dongjinleekr
> > > linkedin:
> > kr.linkedin.com/in/dongjinleekr
> > > speakerdeck:
> > > speakerdeck.com/dongjin
> > > *
> > >
> >
>
>
> --
> *Dongjin Lee*
>
> *A hitchhiker in the mathematical world.*
>
>
>
> *github:  github.com/dongjinleekr
> keybase: https://keybase.io/dongjinleekr
> linkedin: kr.linkedin.com/in/dongjinleekr
> speakerdeck:
> speakerdeck.com/dongjin
> 

Re: Request for contributor permission

2021-06-14 Thread Guozhang Wang
Hello Nicolas,

I've added you to the contributors list. You should be able to
assign tickets to yourself now.

Guozhang

On Mon, Jun 14, 2021 at 7:24 AM Nicolas Guignard <
nicolas.guignar...@gmail.com> wrote:

> Hi,
>
> I am sending this email to ask to have the contributor's permission in
> order to be able to assign Jiras to me. My Jira username is: Nicolas
> Guignard. Is this all you need or do you need something else?
>
> Thanks in advance.
>
> Have a good day,
> Cheers,
> Nicolas Guignard
> --
> Software engineer
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-714: Client metrics and observability

2021-06-14 Thread Travis Bischel
Hi! I have a few thoughts on this KIP. First, I'd like to thank you for the 
writeup,
clearly a lot of thought has gone into it and it is very thorough. However, I'm 
not
convinced it's the right approach from a fundamental level.

Fundamentally, this KIP seems like somewhat of a solution to an organizational
problem. Metrics are organizational concerns, not Kafka operator concerns.
Clients should make it easy to plug in metrics (this is the approach I take in
my own client), and organizations should have processes such that all clients
gather and ship metrics how that organization desires. If an organization is
set up correctly, there is no reason for metrics to be forwarded through Kafka.
This feels like a solution to an organization not properly setting up how
processes ship metrics, and in some ways, it's an overbroad solution, and in
other ways, it doesn't cover the entire problem.

>From the perspective of Kafka operators, it is easy to see that this KIP is
nice in that it just dictates what clients should support for metrics and that
the metrics should ship through Kafka. But, from the perspective of an
observability team, this workflow is basically hijacking the standard flow that
organizations may have. I would rather have applications collect metrics and
ship them the same way every other application does. I'd rather not have to
configure additional plugins within Kafka to take metrics and forward them.

More importantly, this KIP prescibes cardinality problems, requires that to
officially support the KIP a client must support all relevant metrics within
the KIP, and requires that a client cannot support other metrics unless those
other metrics also go through a KIP process. It is difficult to imagine all of
these metrics being relevant to every organization, and there is no way for an
organization to filter what is relevant within the client. Instead, the
filtering is pushed downwards, meaning more network IO and more CPU costs to
filter what is irrelevant and aggregate what needs to be aggregated, and more
time for an organization to setup whatever it is that will be doing this
filtering and aggregating. Contrast this with a client that enables hooking in
to capture numbers that are relevant within an org itself: the org can gather
what they want, ship only want they want, and ship directly to the
observability system they have already set up. As an aside, it may also be
wise to avoid shipping metrics through Kafka about client interaction with
Kafka, because if Kafka is having problems, then orgs lose insight into those
problems. This would be like statuspage using itself for status on its own
systems.

Another downside is that by dictating the important metrics, this KIP either
has two choices: try to choose what is important to every org, and inevitably
leave out something important to somebody else, or just add everything and let
the orgs filter. This KIP mostly looks to go with the latter approach, meaning
orgs will be shipping & filtering. With hooks, an org would be able to gather
exactly what they want.

As well, I expect that org applications have metrics on the state of the
applications outside of the Kafka client. Applications are already sending
non-Kafka-client related metrics outbound to observability systems. If a Kafka
client provided hooks, then users could just gather the additional relevant
Kafka client metrics and ship those metrics the same way they do all of their
other metrics. It feels a bit odd for a Kafka client to have its own separate
way of forwarding metrics. Another benefit hooks in clients is that
organizations do not _have_ to set up additional plugins to forward metrics
from Kafka. Hooks avoid extra organizational work.

The option that the KIP provides for users of clients to opt out of metrics may
avoid some of the above issues (by just disabling things at the user level),
but that's not really great from the perspective of client authors, because the
existence of this KIP forces authors to either just not implement the KIP, or
increase complexity within the KIP. Further, from an operator perspective, if I
would prefer clients to ship metrics through the systems they already have in
place, now I have to expect that anything that uses librdkafka or the official
Java client will be shipping me metrics that I have to deal with (since the KIP
is default enabled).

Lastly, I'm a little wary that this KIP may stem from a product goal of
Confluent: since most everything uses librdkafka or the Java client, then by
defaulting clients sending metrics, Confluent gets an easy way to provide
metric panels for a nice cloud UI. If any client does not want to support these
metrics, and then a user wonders why these hypothetical panels have no metrics,
then Confluent can just reply "use a supported client".  Even if this
(potentially unlikely) scenario is true, then hooks would still be a great
alternative, because then Confluent could provide drop-in hooks for any client
and

Re: [DISCUSS] KIP-714: Client metrics and observability

2021-06-14 Thread Travis Bischel
Hi! I have a few thoughts on this KIP. First, I'd like to thank you for your 
work
and writeup, it's clear that a lot of thought went into this and it's very 
thorough!
However, I'm not convinced it's the right approach from a fundamental level.

Fundamentally, this KIP seems like somewhat of a solution to an organizational
problem. Metrics are organizational concerns, not Kafka operator concerns.
Clients should make it easy to plug in metrics (this is the approach I take in
my own client), and organizations should have processes such that all clients
gather and ship metrics how that organization desires. If an organization is
set up correctly, there is no reason for metrics to be forwarded through Kafka.
This feels like a solution to an organization not properly setting up how
processes ship metrics, and in some ways, it's an overbroad solution, and in
other ways, it doesn't cover the entire problem.

>From the perspective of Kafka operators, it is easy to see that this KIP is
nice in that it just dictates what clients should support for metrics and that
the metrics should ship through Kafka. But, from the perspective of an
observability team, this workflow is basically hijacking the standard flow that
organizations may have. I would rather have applications collect metrics and
ship them the same way every other application does. I'd rather not have to
configure additional plugins within Kafka to take metrics and forward them.

More importantly, this KIP prescibes cardinality problems, requires that to
officially support the KIP a client must support all relevant metrics within
the KIP, and requires that a client cannot support other metrics unless those
other metrics also go through a KIP process. It is difficult to imagine all of
these metrics being relevant to every organization, and there is no way for an
organization to filter what is relevant within the client. Instead, the
filtering is pushed downwards, meaning more network IO and more CPU costs to
filter what is irrelevant and aggregate what needs to be aggregated, and more
time for an organization to setup whatever it is that will be doing this
filtering and aggregating. Contrast this with a client that enables hooking in
to capture numbers that are relevant within an org itself: the org can gather
what they want, ship only want they want, and ship directly to the
observability system they have already set up. As an aside, it may also be
wise to avoid shipping metrics through Kafka about client interaction with
Kafka, because if Kafka is having problems, then orgs lose insight into those
problems. This would be like statuspage using itself for status on its own
systems.

Another downside is that by dictating the important metrics, this KIP either
has two choices: try to choose what is important to every org, and inevitably
leave out something important to somebody else, or just add everything and let
the orgs filter. This KIP mostly looks to go with the latter approach, meaning
orgs will be shipping & filtering. With hooks, an org would be able to gather
exactly what they want.

As well, I expect that org applications have metrics on the state of the
applications outside of the Kafka client. Applications are already sending
non-Kafka-client related metrics outbound to observability systems. If a Kafka
client provided hooks, then users could just gather the additional relevant
Kafka client metrics and ship those metrics the same way they do all of their
other metrics. It feels a bit odd for a Kafka client to have its own separate
way of forwarding metrics. Another benefit hooks in clients is that
organizations do not _have_ to set up additional plugins to forward metrics
from Kafka. Hooks avoid extra organizational work.

The option that the KIP provides for users of clients to opt out of metrics may
avoid some of the above issues (by just disabling things at the user level),
but that's not really great from the perspective of client authors, because the
existence of this KIP forces authors to either just not implement the KIP, or
increase complexity within the KIP. Further, from an operator perspective, if I
would prefer clients to ship metrics through the systems they already have in
place, now I have to expect that anything that uses librdkafka or the official
Java client will be shipping me metrics that I have to deal with (since the KIP
is default enabled).

Lastly, I'm a little wary that this KIP may stem from a product goal of
Confluent: since most everything uses librdkafka or the Java client, then by
defaulting clients sending metrics, Confluent gets an easy way to provide
metric panels for a nice cloud UI. If any client does not want to support these
metrics, and then a user wonders why these hypothetical panels have no metrics,
then Confluent can just reply "use a supported client".  Even if this
(potentially unlikely) scenario is true, then hooks would still be a great
alternative, because then Confluent could provide drop-in hooks for

Request for contributor permission

2021-06-14 Thread Nicolas Guignard
Hi,

I am sending this email to ask to have the contributor's permission in
order to be able to assign Jiras to me. My Jira username is: Nicolas
Guignard. Is this all you need or do you need something else?

Thanks in advance.

Have a good day,
Cheers,
Nicolas Guignard
-- 
Software engineer


Re: MM2 taking into consideration automatic topic creation property from original cluster

2021-06-14 Thread David Beech
Hi Omnia

Maybe - I'll look forward to reading the KIP when it's submitted.

When I researched the feature request from our client I found KIP-158 which
added a bunch of topic auto-creation controls to the Kafka Connect
framework in version 2.6. It seemed to me like a convenient solution would
be to extend MM2 to use this rather than going directly to AdminClient.

To Ryanne & Mickael's point about controlling this using ACLs, I don't like
this so much because if MM2 will constantly try and create topics every so
often (period defined by refresh.topics.interval.seconds) then this will
add unnecessary overhead to the service and create a lot of noise in error
and audit logs. I would prefer to be able to turn off topic auto-creation
completely.

For Kafka Streams internal topics, changelogs etc, it makes sense that
these are always auto-created. These are implementation details and the
user may not even care or need to know that they exist (but if they aren't
there then strange issues will surely manifest)

Topics created by MM2 are "owned" by the users, not the MM2 service. So I
think there's precedent for users to be able to control auto-creation, just
like they can with Connect and with Kafka in general.

Dave

On Mon, Jun 14, 2021 at 2:32 PM Omnia Ibrahim 
wrote:

> Hi,
> I am in the middle of writing a new KIP to introduced a new interface for
> topic updates/creation/description that will allow MM2 to either uses the
> default behaviour which uses Kafka AdminClient to create/update topics or
> to use a customised one that integrates with the user's ecosystem where
> they can call their Kafka resource management service to create a topic or
> update the configuration of the topic. Would this be a possible solution
> for your use case where you can provide a custom implementation that works
> with your ecosystem to create the topic?
>
> --
> Omnia
>
> On Mon, Jun 14, 2021 at 2:25 PM Igor Soarez  wrote:
>
> > Maybe it would be nice if it was possible to hook into or extend Admin
> > client interactions, to allow for custom logic supporting use cases such
> as
> > this.
> > Scenarios where topic/resource management is centralized are probably not
> > uncommon.
> >
> > --
> > Igor
> >
> > On Sat, Jun 12, 2021, at 9:42 AM, Matthew de Detrich wrote:
> > > Thanks Ryanne and Mickael
> > >
> > > I already suspected that something like this (i.e. current behaivior
> > being
> > > a deliberate design decision) was the case and I just wanted to confirm
> > my
> > > suspicions.
> > >
> > > I will relay this internally
> > >
> > > On Fri, 11 Jun 2021, 18:55 Mickael Maison, 
> > wrote:
> > >
> > > > Hi Matthew,
> > > >
> > > > If an administrator want to control topic creation, they should use
> > > > ACLs to prevents users creating topic. Relying on all applications to
> > > > not create topics is unlikely to succeed.
> > > >
> > > > Each refresh.topics.interval.seconds, MM2 checks if it needs to
> create
> > > > topics/partitions by comparing both clusters and also their previous
> > > > states. If your automation is able to create new topics on both
> > > > clusters, MM2 should detect them correctly and not attempt creating
> > > > any topics. If a topic does not exist on the target cluster, MM2 will
> > > > try creating it. If it fails, it will retry again at the next
> > > > refresh.topics.interval.seconds until the topic gets created.
> However,
> > > > it will also trigger a task reconfiguration each time which may have
> > > > an impact on your mirroring throughput.
> > > >
> > > > On Fri, Jun 11, 2021 at 4:25 PM Ryanne Dolan 
> > > > wrote:
> > > > >
> > > > > Matthew, I wonder what the expected behavior would be when
> > topic-creation
> > > > > is disabled and MM is asked to replicate a topic that doesn't exist
> > on
> > > > the
> > > > > target cluster? Maybe the task should fail at that point, or maybe
> it
> > > > > should replicate whatever it can?
> > > > >
> > > > > I think the current behavior is reasonable, esp considering
> precedent
> > > > from
> > > > > Connect and Streams, both of which actively create topics as
> needed.
> > > > >
> > > > > But I understand the motivation. Have they considered revoking
> topic
> > > > > creation permission using ACLs?
> > > > >
> > > > > Ryanne
> > > > >
> > > > > On Fri, Jun 11, 2021, 3:54 AM Matthew de Detrich
> > > > >  wrote:
> > > > >
> > > > > > Hello everyone,
> > > > > >
> > > > > > We have an interesting feature request from a client regarding
> > having
> > > > the
> > > > > > property of automatic topic creation to be reflected in a MM2.
> > > > Specifically
> > > > > > the current behaviour where if you have automatic topic creation
> > set to
> > > > > > false for the original Kafla cluster, MM2 configuration ignores
> > this
> > > > which
> > > > > > means that if Kafka clients send messages to the MM2 then topics
> > will
> > > > still
> > > > > > be automatically created on target cluster. The core problem here
> > for
> > > > the
> > > > > > 

Re: [VOTE] KIP-724: Drop support for message formats v0 and v1

2021-06-14 Thread Ismael Juma
Thanks for the votes and discussion. The KIP vote passes with:

5 binding +1s:
* Jason G
* Colin M
* Gwen S
* David J
* me

1 non-binding +1:
* Dongjin

Ismael

On Wed, Jun 9, 2021 at 11:28 AM Ismael Juma  wrote:

> Hi all,
>
> Consensus was reached in the discussion thread and part of what is
> proposed has to happen by 3.0, so starting the vote for KIP-724:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-724%3A+Drop+support+for+message+formats+v0+and+v1
>
> If there are concerns or objections, feel free to point them out in this
> thread or the discuss thread.
>
> Ismael
>


Re: [VOTE] KIP-716: Allow configuring the location of the offset-syncs topic with MirrorMaker2

2021-06-14 Thread Igor Soarez
Thanks for the KIP Mickael.

Makes sense. +1 (non-binding)

--
Igor



Re: MM2 taking into consideration automatic topic creation property from original cluster

2021-06-14 Thread Omnia Ibrahim
Hi,
I am in the middle of writing a new KIP to introduced a new interface for
topic updates/creation/description that will allow MM2 to either uses the
default behaviour which uses Kafka AdminClient to create/update topics or
to use a customised one that integrates with the user's ecosystem where
they can call their Kafka resource management service to create a topic or
update the configuration of the topic. Would this be a possible solution
for your use case where you can provide a custom implementation that works
with your ecosystem to create the topic?

--
Omnia

On Mon, Jun 14, 2021 at 2:25 PM Igor Soarez  wrote:

> Maybe it would be nice if it was possible to hook into or extend Admin
> client interactions, to allow for custom logic supporting use cases such as
> this.
> Scenarios where topic/resource management is centralized are probably not
> uncommon.
>
> --
> Igor
>
> On Sat, Jun 12, 2021, at 9:42 AM, Matthew de Detrich wrote:
> > Thanks Ryanne and Mickael
> >
> > I already suspected that something like this (i.e. current behaivior
> being
> > a deliberate design decision) was the case and I just wanted to confirm
> my
> > suspicions.
> >
> > I will relay this internally
> >
> > On Fri, 11 Jun 2021, 18:55 Mickael Maison, 
> wrote:
> >
> > > Hi Matthew,
> > >
> > > If an administrator want to control topic creation, they should use
> > > ACLs to prevents users creating topic. Relying on all applications to
> > > not create topics is unlikely to succeed.
> > >
> > > Each refresh.topics.interval.seconds, MM2 checks if it needs to create
> > > topics/partitions by comparing both clusters and also their previous
> > > states. If your automation is able to create new topics on both
> > > clusters, MM2 should detect them correctly and not attempt creating
> > > any topics. If a topic does not exist on the target cluster, MM2 will
> > > try creating it. If it fails, it will retry again at the next
> > > refresh.topics.interval.seconds until the topic gets created. However,
> > > it will also trigger a task reconfiguration each time which may have
> > > an impact on your mirroring throughput.
> > >
> > > On Fri, Jun 11, 2021 at 4:25 PM Ryanne Dolan 
> > > wrote:
> > > >
> > > > Matthew, I wonder what the expected behavior would be when
> topic-creation
> > > > is disabled and MM is asked to replicate a topic that doesn't exist
> on
> > > the
> > > > target cluster? Maybe the task should fail at that point, or maybe it
> > > > should replicate whatever it can?
> > > >
> > > > I think the current behavior is reasonable, esp considering precedent
> > > from
> > > > Connect and Streams, both of which actively create topics as needed.
> > > >
> > > > But I understand the motivation. Have they considered revoking topic
> > > > creation permission using ACLs?
> > > >
> > > > Ryanne
> > > >
> > > > On Fri, Jun 11, 2021, 3:54 AM Matthew de Detrich
> > > >  wrote:
> > > >
> > > > > Hello everyone,
> > > > >
> > > > > We have an interesting feature request from a client regarding
> having
> > > the
> > > > > property of automatic topic creation to be reflected in a MM2.
> > > Specifically
> > > > > the current behaviour where if you have automatic topic creation
> set to
> > > > > false for the original Kafla cluster, MM2 configuration ignores
> this
> > > which
> > > > > means that if Kafka clients send messages to the MM2 then topics
> will
> > > still
> > > > > be automatically created on target cluster. The core problem here
> for
> > > the
> > > > > client is that our client wants to have complete control over how
> > > topics
> > > > > get created (i.e. with terraform setup scripts) and with the
> current
> > > > > behaviour this is not possible.
> > > > >
> > > > > Of course this poses other problems if we did want to change the
> > > behaviour
> > > > > as stated earlier, i.e. if you update the configuration for the
> > > original
> > > > > Kafka cluster then you get into open questions about how to reflect
> > > this
> > > > > configuration onto the mirror maket (this is why its called
> "mirror").
> > > Is
> > > > > making MM2 reflect that flag a feature that makes general or
> > > alternately is
> > > > > there another variation that makes more sense (i.e. having a
> separate
> > > > > specific property rather than reusing the current automatic topic
> > > creation
> > > > > one).
> > > > >
> > > > > There is a currently existing issue on this at
> > > > > https://issues.apache.org/jira/browse/KAFKA-12753
> > > > >
> > > > > @Ryanne @Mickael Since you guys are the main developers on MM/MM2
> what
> > > are
> > > > > your thoughts on this?
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Matthew de Detrich
> > > > >
> > > > > *Aiven Deutschland GmbH*
> > > > >
> > > > > Immanuelkirchstraße 26, 10405 Berlin
> > > > >
> > > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > > >
> > > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > > >
> > > > > *m:* +491603708037
> > > > >
> > > > > *w:* aiv

Re: MM2 taking into consideration automatic topic creation property from original cluster

2021-06-14 Thread Igor Soarez
Maybe it would be nice if it was possible to hook into or extend Admin client 
interactions, to allow for custom logic supporting use cases such as this.
Scenarios where topic/resource management is centralized are probably not 
uncommon.

--
Igor

On Sat, Jun 12, 2021, at 9:42 AM, Matthew de Detrich wrote:
> Thanks Ryanne and Mickael
> 
> I already suspected that something like this (i.e. current behaivior being
> a deliberate design decision) was the case and I just wanted to confirm my
> suspicions.
> 
> I will relay this internally
> 
> On Fri, 11 Jun 2021, 18:55 Mickael Maison,  wrote:
> 
> > Hi Matthew,
> >
> > If an administrator want to control topic creation, they should use
> > ACLs to prevents users creating topic. Relying on all applications to
> > not create topics is unlikely to succeed.
> >
> > Each refresh.topics.interval.seconds, MM2 checks if it needs to create
> > topics/partitions by comparing both clusters and also their previous
> > states. If your automation is able to create new topics on both
> > clusters, MM2 should detect them correctly and not attempt creating
> > any topics. If a topic does not exist on the target cluster, MM2 will
> > try creating it. If it fails, it will retry again at the next
> > refresh.topics.interval.seconds until the topic gets created. However,
> > it will also trigger a task reconfiguration each time which may have
> > an impact on your mirroring throughput.
> >
> > On Fri, Jun 11, 2021 at 4:25 PM Ryanne Dolan 
> > wrote:
> > >
> > > Matthew, I wonder what the expected behavior would be when topic-creation
> > > is disabled and MM is asked to replicate a topic that doesn't exist on
> > the
> > > target cluster? Maybe the task should fail at that point, or maybe it
> > > should replicate whatever it can?
> > >
> > > I think the current behavior is reasonable, esp considering precedent
> > from
> > > Connect and Streams, both of which actively create topics as needed.
> > >
> > > But I understand the motivation. Have they considered revoking topic
> > > creation permission using ACLs?
> > >
> > > Ryanne
> > >
> > > On Fri, Jun 11, 2021, 3:54 AM Matthew de Detrich
> > >  wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > We have an interesting feature request from a client regarding having
> > the
> > > > property of automatic topic creation to be reflected in a MM2.
> > Specifically
> > > > the current behaviour where if you have automatic topic creation set to
> > > > false for the original Kafla cluster, MM2 configuration ignores this
> > which
> > > > means that if Kafka clients send messages to the MM2 then topics will
> > still
> > > > be automatically created on target cluster. The core problem here for
> > the
> > > > client is that our client wants to have complete control over how
> > topics
> > > > get created (i.e. with terraform setup scripts) and with the current
> > > > behaviour this is not possible.
> > > >
> > > > Of course this poses other problems if we did want to change the
> > behaviour
> > > > as stated earlier, i.e. if you update the configuration for the
> > original
> > > > Kafka cluster then you get into open questions about how to reflect
> > this
> > > > configuration onto the mirror maket (this is why its called "mirror").
> > Is
> > > > making MM2 reflect that flag a feature that makes general or
> > alternately is
> > > > there another variation that makes more sense (i.e. having a separate
> > > > specific property rather than reusing the current automatic topic
> > creation
> > > > one).
> > > >
> > > > There is a currently existing issue on this at
> > > > https://issues.apache.org/jira/browse/KAFKA-12753
> > > >
> > > > @Ryanne @Mickael Since you guys are the main developers on MM/MM2 what
> > are
> > > > your thoughts on this?
> > > >
> > > >
> > > > --
> > > >
> > > > Matthew de Detrich
> > > >
> > > > *Aiven Deutschland GmbH*
> > > >
> > > > Immanuelkirchstraße 26, 10405 Berlin
> > > >
> > > > Amtsgericht Charlottenburg, HRB 209739 B
> > > >
> > > > Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> > > >
> > > > *m:* +491603708037
> > > >
> > > > *w:* aiven.io *e:* matthew.dedetr...@aiven.io
> > > >
> >
> 


[jira] [Resolved] (KAFKA-12847) Dockerfile needed for kafka system tests needs changes

2021-06-14 Thread Abhijit Mane (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhijit Mane resolved KAFKA-12847.
--
  Reviewer: Chia-Ping Tsai
Resolution: Won't Fix

This issue is valid if SysTests are run as root. However, it won't be fixed for 
now except maybe a note in README with relevant details. Note that, as 
non-root, there are no problems.

Detailed discussion in issue comments.

> Dockerfile needed for kafka system tests needs changes
> --
>
> Key: KAFKA-12847
> URL: https://issues.apache.org/jira/browse/KAFKA-12847
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Affects Versions: 2.8.0, 2.7.1
> Environment: Issue tested in environments below but is independent of 
> h/w arch. or Linux flavor: -
> 1.) RHEL-8.3 on x86_64 
> 2.) RHEL-8.3 on IBM Power (ppc64le)
> 3.) apache/kafka branch tested: trunk (master)
>Reporter: Abhijit Mane
>Assignee: Abhijit Mane
>Priority: Major
>  Labels: easyfix
> Attachments: Dockerfile.upstream, 截圖 2021-06-05 上午1.53.17.png
>
>
> Hello,
> I tried apache/kafka system tests as per documentation: -
> ([https://github.com/apache/kafka/tree/trunk/tests#readme|https://github.com/apache/kafka/tree/trunk/tests#readme_])
> =
>  PROBLEM
>  ~~
> 1.) As root user, clone kafka github repo and start "kafka system tests"
>  # git clone [https://github.com/apache/kafka.git]
>  # cd kafka
>  # ./gradlew clean systemTestLibs
>  # bash tests/docker/run_tests.sh
> 2.) Dockerfile issue - 
> [https://github.com/apache/kafka/blob/trunk/tests/docker/Dockerfile]
> This file has an *UID* entry as shown below: -
>  ---
>  ARG *UID*="1000"
>  RUN useradd -u $*UID* ducker
> // {color:#de350b}*Error during docker build*{color} => useradd: UID 0 is not 
> unique, root user id is 0
>  ---
>  I ran everything as root which means the built-in bash environment variable 
> 'UID' always
> resolves to 0 and can't be changed. Hence, the docker build fails. The issue 
> should be seen even if run as non-root.
> 3.) Next, as root, as per README, I ran: -
> server:/kafka> *bash tests/docker/run_tests.sh*
> The ducker tool builds the container images & switches to user '*ducker*' 
> inside the container
> & maps kafka root dir ('kafka') from host to '/opt/kafka-dev' in the 
> container.
> Ref: 
> [https://github.com/apache/kafka/blob/trunk/tests/docker/ducker-ak|https://github.com/apache/kafka/blob/trunk/tests/docker/ducker-ak]
> Ex:  docker run -d *-v "${kafka_dir}:/opt/kafka-dev"* 
> This fails as the 'ducker' user has *no write permissions* to create files 
> under 'kafka' root dir. Hence, it needs to be made writeable.
> // *chmod -R a+w kafka* 
>  – needed as container is run as 'ducker' and needs write access since kafka 
> root volume from host is mapped to container as "/opt/kafka-dev" where the 
> 'ducker' user writes logs
>  =
> =
>  *FIXES needed*
>  ~
>  1.) Dockerfile - 
> [https://github.com/apache/kafka/blob/trunk/tests/docker/Dockerfile]
>  Change 'UID' to '*UID_DUCKER*'.
> This won't conflict with built in bash env. var UID and the docker image 
> build should succeed.
>  ---
>  ARG *UID_DUCKER*="1000"
>  RUN useradd -u $*UID_DUCKER* ducker
> // *{color:#57d9a3}No Error{color}* => No conflict with built-in UID
>  ---
> 2.) README needs an update where we must ensure the kafka root dir from where 
> the tests 
>  are launched is writeable to allow the 'ducker' user to create results/logs.
>  # chmod -R a+w kafka
> With this, I was able to get the docker images built and system tests started 
> successfully.
>  =
> Also, I wonder whether or not upstream Dockerfile & System tests are part of 
> CI/CD and get tested for every PR. If so, this issue should have been caught.
>  
> *Question to kafka SME*
>  -
>  Do you believe this is a valid problem with the Dockerfile and the fix is 
> acceptable? 
>  Please let me know and I am happy to submit a PR with this fix.
> Thanks,
>  Abhijit



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12946) __consumer_offsets topic with very big partitions

2021-06-14 Thread Emi (Jira)
Emi created KAFKA-12946:
---

 Summary: __consumer_offsets topic with very big partitions
 Key: KAFKA-12946
 URL: https://issues.apache.org/jira/browse/KAFKA-12946
 Project: Kafka
  Issue Type: Bug
  Components: log cleaner
Affects Versions: 2.0.0
Reporter: Emi


I am using Kafka 2.0.0 with java 8u191
There is a partitions of the __consumer_offsets topic that is 600 GB with 6000 
segments older than 4 months. Other partitions of that topic are small: 
20-30MB. 

There are 60 consumer groups, 90 topics and 100 partitions per topic. 

There aren't errors in the logs. From the log of the logcleaner, I can see that 
partition is never touched from the logcleaner thread for the compaction, but 
it only add new segments.
How is this possible?


There was another partition with the same problem, but after some months it has 
been compacted. Now there is only one partition with this problem, but this is 
bigger and keep growing


My settings:
{{offsets.commit.required.acks = -1}}
{{[offsets.commit.timeout.ms|http://offsets.commit.timeout.ms/]}}{{ = 5000}}
{{offsets.load.buffer.size = 5242880}}
{{[offsets.retention.check.interval.ms|http://offsets.retention.check.interval.ms/]}}{{
 = 60}}
{{offsets.retention.minutes = 10080}}
{{offsets.topic.compression.codec = 0}}
{{offsets.topic.num.partitions = 50}}
{{offsets.topic.replication.factor = 3}}
{{offsets.topic.segment.bytes = 104857600}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)