[jira] [Created] (KAFKA-10125) The partition which is removing should be considered to be under reassignment

2020-06-08 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-10125:
--

 Summary: The partition which is removing should be considered to 
be under reassignment
 Key: KAFKA-10125
 URL: https://issues.apache.org/jira/browse/KAFKA-10125
 Project: Kafka
  Issue Type: Bug
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


When a reassignment is still in progress, the replica which is either removing 
or adding should be considered to be under reassignment. However, TopicCommand 
still print the partition which is removing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10124) ConsumerPerformance output wrong rebalance.time.ms

2020-06-08 Thread jiamei xie (Jira)
jiamei xie created KAFKA-10124:
--

 Summary:  ConsumerPerformance output wrong rebalance.time.ms 
 Key: KAFKA-10124
 URL: https://issues.apache.org/jira/browse/KAFKA-10124
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Reporter: jiamei xie
Assignee: jiamei xie


When running consumer performance benchmark, negative fetch.time.ms and 
fetch.MB.sec, fetch.nMsg.sec are got, which must be wrong. 
bin/kafka-consumer-perf-test.sh --topic test1 --bootstrap-server localhost:9092 
--messages 10
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, 
nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec
2020-06-07 05:08:52:393, 2020-06-07 05:09:46:815, 19073.6132, 350.4762, 
2133, 367500.8820, 1591477733263, -1591477678841, -0., -0.0126



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10123) Regression resetting offsets in consumer when fetching from old broker

2020-06-08 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-10123:
---

 Summary: Regression resetting offsets in consumer when fetching 
from old broker
 Key: KAFKA-10123
 URL: https://issues.apache.org/jira/browse/KAFKA-10123
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson
Assignee: David Arthur
 Fix For: 2.6.0


We saw this error in system tests:
{code}
java.lang.NullPointerException
at 
org.apache.kafka.clients.consumer.internals.Fetcher.prepareFetchRequests(Fetcher.java:)
at 
org.apache.kafka.clients.consumer.internals.Fetcher.sendFetches(Fetcher.java:246)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollForFetches(KafkaConsumer.java:1296)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1248)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1216)
at 
kafka.tools.ConsoleConsumer$ConsumerWrapper.receive(ConsoleConsumer.scala:437)
at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:103)
at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:77)
at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:54)
at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
{code}

The logs should that the consumer was in the middle of an offset reset when 
this happened. We changed the logic in KAFKA-9724 to include the following 
check:
{code}
NodeApiVersions nodeApiVersions = 
apiVersions.get(leaderAndEpoch.leader.get().idString());
if (nodeApiVersions == null || 
hasUsableOffsetForLeaderEpochVersion(nodeApiVersions)) {
return assignedState(tp).maybeValidatePosition(leaderAndEpoch);
} else {
// If the broker does not support a newer version of 
OffsetsForLeaderEpoch, we skip validation
completeValidation(tp);
return false;
}
{code}

The problem seems to be the shortcut call to `completeValidation`, which 
executes the following logic:
{code}
if (hasPosition()) {
transitionState(FetchStates.FETCHING, () -> 
this.nextRetryTimeMs = null);
}
{code}

We should be protected by the call to `hasPosition` here, but in the case of 
the `AWAIT_RESET` state, we are incorrectly returning true. This causes us to 
enter the `FETCHING` state without a position, which ultimately leads to the 
NPE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-trunk-jdk8 #4624

2020-06-08 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-10005: Decouple RestoreListener from RestoreCallback (#8676)

[github] MINOR: Remove unused isSticky assert out from tests only do

[github] KAFKA-10102: update ProcessorTopology instead of rebuilding it (#8803)

[github] HOTFIX: fix validity check in sticky assignor tests (#8815)

[github] KAFKA-10063; UnsupportedOperation when querying cleaner metrics after

[github] MINOR: Fix fetch session epoch comment in `FetchRequest.json` (#8802)


--
[...truncated 3.13 MB...]
org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectInMemoryStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectInMemoryStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldThrowForMissingTime[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldThrowForMissingTime[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureInternalTopicNamesIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureInternalTopicNamesIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTimeDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTimeDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessRecordForTopic[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessRecordForTopic[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldForwardRecordsFromSubtopologyToSubtopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldForwardRecordsFromSubtopologyToSubtopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForSmallerValue[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForSmallerValue[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDri

[jira] [Created] (KAFKA-10122) Consumer should allow heartbeat during rebalance as well

2020-06-08 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-10122:
-

 Summary: Consumer should allow heartbeat during rebalance as well
 Key: KAFKA-10122
 URL: https://issues.apache.org/jira/browse/KAFKA-10122
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang
Assignee: Guozhang Wang


Today we disable heartbeats if the {{state != MemberState.STABLE}}. And if a 
rebalance failed we set the state to UNJOINED. In the old API {{poll(long)}} it 
is okay since we always try to complete the rebalance successfully within the 
same call, so we would not be in UNJOINED or REBALANCING for a very long time.

But with the new {{poll(Duration)}} we may actually return while we are still 
in UNJOINED or REBALANCING and it may take some time (smaller than 
max.poll.interval but larger than session.timeout) before the next poll call, 
and since heartbeat is disabled during this period of time we could be kicked 
by the coordinator.

The proposal I have is

1) allow heartbeat to be sent during REBALANCING as well.
2) when join/sync response has retriable error, do not set the state to 
UNJOINED but stay with REBALANCING.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is back to normal : kafka-trunk-jdk14 #199

2020-06-08 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-08 Thread Colin McCabe
On Mon, Jun 8, 2020, at 14:41, Jun Rao wrote:
> Hi, Colin,
> 
> Thanks for the comment. You brought up several points.
> 
> 1. Should we set up a per user quota? To me, it does seem we need some sort
> of a quota. When the controller runs out of resources, ideally, we only
> want to penalize the bad behaving applications, instead of every
> application. To do that, we will need to know what each application is
> entitled to and the per user quota is intended to capture that.
> 
> 2. How easy is it to configure a quota? The following is how an admin
> typically sets up a quota in our existing systems. Pick a generous default
> per user quota works for most applications. For the few resource intensive
> applications, customize a higher quota for them. Reserve enough resources
> in anticipation that a single (or a few) application will exceed the quota
> at a given time.
>

Hi Jun,

Thanks for the response.

Maybe I was too pessimistic about the ability of admins to configure a useful 
quota here.  I do agree that it would be nice to have the ability to set 
different quotas for different users, as you mentioned.

> 
> 3. How should the quota be defined? In the discussion thread, we debated
> between a usage based model vs a rate based model. Dave and Anna argued for
> the rate based model mostly because it's simpler to implement.
> 

I'm trying to think more about how this integrates with our plans for KIP-500.  
When we get rid of ZK, we will have to handle this in the controller itself, 
rather than in the AdminManager.  That implies we'll have to rewrite the code.  
Maybe this is worth it if we want this feature now, though.

Another wrinkle here is that as we discussed in KIP-590, controller operations 
will land on a random broker first, and only then be forwarded to the active 
controller.  This implies that either admissions control should happen on all 
brokers (needing some kind of distributed quota scheme), or be done on the 
controller after we've already done the work of forwarding the message.  The 
second approach might not be that bad, but it would be nice to figure this out.

>
> 4. If a quota is exceeded, how is that enforced? My understanding of the
> KIP is that, if a quota is exceeded, the broker immediately sends back
> a QUOTA_VIOLATED error and a throttle time back to the client, and the
> client will wait for the throttle time before issuing the next request.
> This seems to be the same as the BUSY error code you mentioned.
>

Yes, I agree, it sounds like we're thinking along the same lines.  However, 
rather than QUOTA_VIOLATED, how about naming the error code BUSY?  Then the 
error text could indicate the quota that we violated.  This would be more 
generally useful as an error code and also avoid being confusingly similar to 
POLICY_VIOLATION.

best,
Colin

> 
> I will let David chime in more on that.
> 
> Thanks,
> 
> Jun
> 
> 
> 
> On Sun, Jun 7, 2020 at 2:30 PM Colin McCabe  wrote:
> 
> > Hi David,
> >
> > Thanks for the KIP.
> >
> > I thought about this for a while and I actually think this approach is not
> > quite right.  The problem that I see here is that using an explicitly set
> > quota here requires careful tuning by the cluster operator.  Even worse,
> > this tuning might be invalidated by changes in overall conditions or even
> > more efficient controller software.
> >
> > For example, if we empirically find that the controller can do 1000 topics
> > in a minute (or whatever), this tuning might actually be wrong if the next
> > version of the software can do 2000 topics in a minute because of
> > efficiency upgrades.  Or, the broker that the controller is located on
> > might be experiencing heavy load from its non-controller operations, and so
> > it can only do 500 topics in a minute during this period.
> >
> > So the system administrator gets a very obscure tunable (it's not clear to
> > a non-Kafka-developer what "controller mutations" are or why they should
> > care).  And even worse, they will have to significantly "sandbag" the value
> > that they set it to, so that even under the heaviest load and oldest
> > deployed version of the software, the controller can still function.  Even
> > worse, this new quota adds a lot of complexity to the controller.
> >
> > What we really want is backpressure when the controller is overloaded.  I
> > believe this is the alternative you discuss in "Rejected Alternatives"
> > under "Throttle the Execution instead of the Admission"  Your reason for
> > rejecting it is that the client error handling does not work well in this
> > case.  But actually, this is an artifact of our current implementation,
> > rather than a fundamental issue with backpressure.
> >
> > Consider the example of a CreateTopicsRequest.  The controller could
> > return a special error code if the load was too high, and take the create
> > topics event off the controller queue.  Let's call that error code BUSY.
> >  Additionally, the controller could immediate

[jira] [Resolved] (KAFKA-10063) UnsupportedOperation when querying cleaner metrics after shutdown

2020-06-08 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-10063.
-
Fix Version/s: 2.6.0
   Resolution: Fixed

> UnsupportedOperation when querying cleaner metrics after shutdown
> -
>
> Key: KAFKA-10063
> URL: https://issues.apache.org/jira/browse/KAFKA-10063
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Chia-Ping Tsai
>Priority: Major
> Fix For: 2.6.0
>
>
> We have a few log cleaner metrics which iterate the set of cleaners. For 
> example:
> {code}
>   newGauge("max-clean-time-secs", () => 
> cleaners.iterator.map(_.lastStats.elapsedSecs).max.toInt)
> {code}
> It seems possible currently for LogCleaner metrics to get queried after 
> shutdown of the log cleaner, which clears the `cleaners` collection. This can 
> lead to the following error:
> {code}
> java.lang.UnsupportedOperationException: empty.max
>   at scala.collection.IterableOnceOps.max(IterableOnce.scala:952)
>   at scala.collection.IterableOnceOps.max$(IterableOnce.scala:950)
>   at scala.collection.AbstractIterator.max(Iterator.scala:1279)
>   at 
> kafka.log.LogCleaner.kafka$log$LogCleaner$$$anonfun$new$9(LogCleaner.scala:132)
>   at kafka.log.LogCleaner$$anonfun$4.value(LogCleaner.scala:132)
>   at kafka.log.LogCleaner$$anonfun$4.value(LogCleaner.scala:132)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is back to normal : kafka-trunk-jdk11 #1551

2020-06-08 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-7608) A Kafka Streams DSL transform or process call should potentially trigger a repartition

2020-06-08 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7608.

Resolution: Fixed

KIP-221 is merged. Closing this ticket.

> A Kafka Streams DSL transform or process call should potentially trigger a 
> repartition
> --
>
> Key: KAFKA-7608
> URL: https://issues.apache.org/jira/browse/KAFKA-7608
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.0.0
>Reporter: Andy Bryant
>Priority: Major
>
> Currently in Kafka Streams, if any DSL operation occurs that may modify the 
> keys of the record stream, the stream is flagged for repartitioning. 
> Currently this flag is checked prior to a stream join or an aggregation and 
> if set the stream is piped through a transient repartition topic. This 
> ensures messages with the same key are always co-located in the same 
> partition and hence same stream task and state store.
> The same mechanism should be used to trigger repartitioning prior to stream 
> {{transform}}, {{transformValues}} and {{process}} calls that specify one or 
> more state stores.
> Currently without the forced repartitioning, for streams where the key has 
> been modified, there is no guarantee the same keys will be processed by the 
> same task which would be what you expect when using a state store. Given that 
> aggregations and joins already automatically make this guarantee it seems 
> inconsistent that {{transform}} and {{process}} do not provide the same 
> guarantees.
> To achieve the same guarantees currently, developers must manually pipe the 
> stream through a topic to force the repartitioning. This works, but is 
> sub-optimal since you don't get the handy optimisation where the repartition 
> topic contents is purged after use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-08 Thread Jun Rao
Hi, Colin,

Thanks for the comment. You brought up several points.

1. Should we set up a per user quota? To me, it does seem we need some sort
of a quota. When the controller runs out of resources, ideally, we only
want to penalize the bad behaving applications, instead of every
application. To do that, we will need to know what each application is
entitled to and the per user quota is intended to capture that.

2. How easy is it to configure a quota? The following is how an admin
typically sets up a quota in our existing systems. Pick a generous default
per user quota works for most applications. For the few resource intensive
applications, customize a higher quota for them. Reserve enough resources
in anticipation that a single (or a few) application will exceed the quota
at a given time.

3. How should the quota be defined? In the discussion thread, we debated
between a usage based model vs a rate based model. Dave and Anna argued for
the rate based model mostly because it's simpler to implement.

4. If a quota is exceeded, how is that enforced? My understanding of the
KIP is that, if a quota is exceeded, the broker immediately sends back
a QUOTA_VIOLATED error and a throttle time back to the client, and the
client will wait for the throttle time before issuing the next request.
This seems to be the same as the BUSY error code you mentioned.

I will let David chime in more on that.

Thanks,

Jun



On Sun, Jun 7, 2020 at 2:30 PM Colin McCabe  wrote:

> Hi David,
>
> Thanks for the KIP.
>
> I thought about this for a while and I actually think this approach is not
> quite right.  The problem that I see here is that using an explicitly set
> quota here requires careful tuning by the cluster operator.  Even worse,
> this tuning might be invalidated by changes in overall conditions or even
> more efficient controller software.
>
> For example, if we empirically find that the controller can do 1000 topics
> in a minute (or whatever), this tuning might actually be wrong if the next
> version of the software can do 2000 topics in a minute because of
> efficiency upgrades.  Or, the broker that the controller is located on
> might be experiencing heavy load from its non-controller operations, and so
> it can only do 500 topics in a minute during this period.
>
> So the system administrator gets a very obscure tunable (it's not clear to
> a non-Kafka-developer what "controller mutations" are or why they should
> care).  And even worse, they will have to significantly "sandbag" the value
> that they set it to, so that even under the heaviest load and oldest
> deployed version of the software, the controller can still function.  Even
> worse, this new quota adds a lot of complexity to the controller.
>
> What we really want is backpressure when the controller is overloaded.  I
> believe this is the alternative you discuss in "Rejected Alternatives"
> under "Throttle the Execution instead of the Admission"  Your reason for
> rejecting it is that the client error handling does not work well in this
> case.  But actually, this is an artifact of our current implementation,
> rather than a fundamental issue with backpressure.
>
> Consider the example of a CreateTopicsRequest.  The controller could
> return a special error code if the load was too high, and take the create
> topics event off the controller queue.  Let's call that error code BUSY.
>  Additionally, the controller could immediately refuse new events if the
> queue had reached its maximum length, and simply return BUSY for that case
> as well.
>
> Basically, the way we handle RPC timeouts in the controller right now is
> not very good.  As you know, we time out the RPC, so the client gets
> TimeoutException, but then keep the event on the queue, so that it
> eventually gets executed!  There's no reason why we have to do that.  We
> could take the event off the queue if there is a timeout.  This would
> reduce load and mostly avoid the paradoxical situations you describe
> (getting TopicExistsException for a CreateTopicsRequest retry, etc.)
>
> I say "mostly" because there are still cases where retries could go astray
> (for example if we execute the topic creation but a network problem
> prevents the response from being sent to the client).  But this would still
> be a very big improvement over what we have now.
>
> Sorry for commenting so late on this but I got distracted by some other
> things.  I hope we can figure this one out-- I feel like there is a chance
> to significantly simplify this.
>
> best,
> Colin
>
>
> On Fri, May 29, 2020, at 07:57, David Jacot wrote:
> > Hi folks,
> >
> > I'd like to start the vote for KIP-599 which proposes a new quota to
> > throttle create topic, create partition, and delete topics operations to
> > protect the Kafka controller:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> >
> > Please, let me know what you think.
> >
> > Chee

[jira] [Resolved] (KAFKA-10005) Decouple RestoreListener from RestoreCallback and not enable bulk loading for RocksDB

2020-06-08 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-10005.
---
Fix Version/s: 2.6.0
 Assignee: Guozhang Wang
   Resolution: Fixed

> Decouple RestoreListener from RestoreCallback and not enable bulk loading for 
> RocksDB
> -
>
> Key: KAFKA-10005
> URL: https://issues.apache.org/jira/browse/KAFKA-10005
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 2.6.0
>
>
> In Kafka Streams we have two restoration callbacks:
> * RestoreCallback (BatchingRestoreCallback): specified per-store via 
> registration to specify the logic of applying a batch of records read from 
> the changelog to the store. Used for both updating standby tasks and 
> restoring active tasks.
> * RestoreListener: specified per-instance via `setRestoreListener`, to 
> specify the logic for `onRestoreStart / onRestoreEnd / onBatchRestored`.
> As we can see these two callbacks are for quite different purposes, however 
> today we allow user's to register a per-store RestoreCallback which is also 
> implementing the RestoreListener. Such weird mixing is actually motivated by 
> Streams internal usage to enable / disable bulk loading inside RocksDB. For 
> user's however this is less meaningful to specify a callback to be a listener 
> since the `onRestoreStart / End` has the storeName passed in, so that users 
> can just define different listening logic if needed for different stores.
> On the other hand, this mixing of two callbacks enforces Streams to check 
> internally if the passed in per-store callback is also implementing listener, 
> and if yes trigger their calls, which increases the complexity. Besides, 
> toggle rocksDB for bulk loading requires us to open / close / reopen / 
> reclose 4 times during the restoration which could also be costly.
> Given that we have KIP-441 in place, I think we should consider different 
> ways other than toggle bulk loading during restoration for Streams (e.g. 
> using different threads for restoration).
> The proposal for this ticket is to completely decouple the listener from 
> callback -- i.e. we would not presume users passing in a callback function 
> that implements both RestoreCallback and RestoreListener, and also for 
> RocksDB we replace the bulk loading mechanism with other ways of 
> optimization: https://rockset.com/blog/optimizing-bulk-load-in-rocksdb/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8803) Stream will not start due to TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId

2020-06-08 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-8803.
--
Resolution: Fixed

> Stream will not start due to TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> 
>
> Key: KAFKA-8803
> URL: https://issues.apache.org/jira/browse/KAFKA-8803
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Raman Gupta
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 2.3.2, 2.4.2, 2.5.0
>
> Attachments: logs-20200311.txt.gz, logs-client-20200311.txt.gz, 
> logs.txt.gz, screenshot-1.png
>
>
> One streams app is consistently failing at startup with the following 
> exception:
> {code}
> 2019-08-14 17:02:29,568 ERROR --- [2ce1b-StreamThread-2] 
> org.apa.kaf.str.pro.int.StreamTask: task [0_36] Timeout 
> exception caught when initializing transactions for task 0_36. This might 
> happen if the broker is slow to respond, if the network connection to the 
> broker was interrupted, or if similar circumstances arise. You can increase 
> producer parameter `max.block.ms` to increase this timeout.
> org.apache.kafka.common.errors.TimeoutException: Timeout expired after 
> 6milliseconds while awaiting InitProducerId
> {code}
> These same brokers are used by many other streams without any issue, 
> including some in the very same processes for the stream which consistently 
> throws this exception.
> *UPDATE 08/16:*
> The very first instance of this error is August 13th 2019, 17:03:36.754 and 
> it happened for 4 different streams. For 3 of these streams, the error only 
> happened once, and then the stream recovered. For the 4th stream, the error 
> has continued to happen, and continues to happen now.
> I looked up the broker logs for this time, and see that at August 13th 2019, 
> 16:47:43, two of four brokers started reporting messages like this, for 
> multiple partitions:
> [2019-08-13 20:47:43,658] INFO [ReplicaFetcher replicaId=3, leaderId=1, 
> fetcherId=0] Retrying leaderEpoch request for partition xxx-1 as the leader 
> reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
> The UNKNOWN_LEADER_EPOCH messages continued for some time, and then stopped, 
> here is a view of the count of these messages over time:
>  !screenshot-1.png! 
> However, as noted, the stream task timeout error continues to happen.
> I use the static consumer group protocol with Kafka 2.3.0 clients and 2.3.0 
> broker. The broker has a patch for KAFKA-8773.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[VOTE] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

2020-06-08 Thread Boyang Chen
Hey all,

I would like to start the vote for KIP-590:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-590%3A+Redirect+Zookeeper+Mutation+Protocols+to+The+Controller

Thanks!


Build failed in Jenkins: kafka-trunk-jdk14 #198

2020-06-08 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR: fix HTML markup (#8823)


--
[...truncated 6.31 MB...]
org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] STARTED

org.apach

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-08 Thread Jun Rao
Hi, David,

Thanks for the updated KIP. Another minor comment below.

40. For the new `QUOTA_VIOLATED` error in the response to
CreateTopics/CreatePartitions/DeleteTopics, could you clarify
whether ThrottleTimeMs is set when the error code is set to QUOTA_VIOLATED?

Jun

On Mon, Jun 8, 2020 at 9:32 AM David Jacot  wrote:

> Hi Jun,
>
> 30. The rate is accumulated at the partition level. Let me clarify this in
> the KIP.
>
> Best,
> David
>
> On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner  wrote:
>
> > Hi David,
> >
> > The KIP looks good to me. I am going to the voting thread...
> >
> > Hi Jun,
> >
> > Yes, exactly. That's a separate thing from this KIP, so working on the
> fix.
> >
> > Thanks,
> > Anna
> >
> > On Fri, Jun 5, 2020 at 4:36 PM Jun Rao  wrote:
> >
> > > Hi, Anna,
> > >
> > > Thanks for the comment. For the problem that you described, perhaps we
> > need
> > > to make the quota checking and recording more atomic?
> > >
> > > Hi, David,
> > >
> > > Thanks for the updated KIP.  Looks good to me now. Just one minor
> comment
> > > below.
> > >
> > > 30. controller_mutations_rate: For topic creation and deletion, is the
> > rate
> > > accumulated at the topic or partition level? It would be useful to make
> > it
> > > clear in the wiki.
> > >
> > > Jun
> > >
> > > On Fri, Jun 5, 2020 at 7:23 AM David Jacot 
> wrote:
> > >
> > > > Hi Anna and Jun,
> > > >
> > > > You are right. We should allocate up to the quota for each old
> sample.
> > > >
> > > > I have revamped the Throttling Algorithm section to better explain
> our
> > > > thought process and the token bucket inspiration.
> > > >
> > > > I have also added a chapter with few guidelines about how to define
> > > > the quota. There is no magic formula for this but I give few
> insights.
> > > > I don't have specific numbers that can be used out of the box so I
> > > > think that it is better to not put any for the time being. We can
> > always
> > > > complement later on in the documentation.
> > > >
> > > > Please, take a look and let me know what you think.
> > > >
> > > > Cheers,
> > > > David
> > > >
> > > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner 
> wrote:
> > > >
> > > > > Hi David and Jun,
> > > > >
> > > > > I dug a bit deeper into the Rate implementation, and wanted to
> > confirm
> > > > that
> > > > > I do believe that the token bucket behavior is better for the
> reasons
> > > we
> > > > > already discussed but wanted to summarize. The main difference
> > between
> > > > Rate
> > > > > and token bucket is that the Rate implementation allows a burst by
> > > > > borrowing from the future, whereas a token bucket allows a burst by
> > > using
> > > > > accumulated tokens from the previous idle period. Using accumulated
> > > > tokens
> > > > > smoothes out the rate measurement in general. Configuring a large
> > burst
> > > > > requires configuring a large quota window, which causes long delays
> > for
> > > > > bursty workload, due to borrowing credits from the future. Perhaps
> it
> > > is
> > > > > useful to add a summary in the beginning of the Throttling
> Algorithm
> > > > > section?
> > > > >
> > > > > In my previous email, I mentioned the issue we observed with the
> > > > bandwidth
> > > > > quota, where a low quota (1MB/s per broker) was limiting bandwidth
> > > > visibly
> > > > > below the quota. I thought it was strictly the issue with the Rate
> > > > > implementation as well, but I found a root cause to be different
> but
> > > > > amplified by the Rate implementation (long throttle delays of
> > requests
> > > > in a
> > > > > burst). I will describe it here for completeness using the
> following
> > > > > example:
> > > > >
> > > > >-
> > > > >
> > > > >Quota = 1MB/s, default window size and number of samples
> > > > >-
> > > > >
> > > > >Suppose there are 6 connections (maximum 6 outstanding
> requests),
> > > and
> > > > >each produce request is 5MB. If all requests arrive in a burst,
> > the
> > > > > last 4
> > > > >requests (20MB over 10MB allowed in a window) may get the same
> > > > throttle
> > > > >time if they are processed concurrently. We record the rate
> under
> > > the
> > > > > lock,
> > > > >but then calculate throttle time separately after that. So, for
> > each
> > > > >request, the observed rate could be 3MB/s, and each request gets
> > > > > throttle
> > > > >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The
> > > delay
> > > > is
> > > > >longer than the total rate window, which results in lower
> > bandwidth
> > > > than
> > > > >the quota. Since all requests got the same delay, they will also
> > > > arrive
> > > > > in
> > > > >a burst, which may also result in longer delay than necessary.
> It
> > > > looks
> > > > >pretty easy to fix, so I will open a separate JIRA for it. This
> > can
> > > be
> > > > >additionally mitigated by token bucket behavior.
> > > > >
> > > > >
> > > > > For the algorithm "So instead of havin

[jira] [Resolved] (KAFKA-10106) log time taken to handle LeaderAndIsr request

2020-06-08 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-10106.
-
Fix Version/s: 2.7.0
 Assignee: NIKHIL
   Resolution: Fixed

Merged the PR to trunk.

> log time taken to handle LeaderAndIsr request 
> --
>
> Key: KAFKA-10106
> URL: https://issues.apache.org/jira/browse/KAFKA-10106
> Project: Kafka
>  Issue Type: Improvement
>Reporter: NIKHIL
>Assignee: NIKHIL
>Priority: Minor
> Fix For: 2.7.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> ReplicaManager!becomeLeaderOrFollower handles the LeaderAndIsr request, 
> StateChangeLogger logs when this request is handled, however it can be useful 
> to log when this calls ends and record the time taken, can help 
> operationally. 
> Proposal is to ReplicaManager!becomeLeaderOrFollower start measuring the time 
> before the `replicaStateChangeLock` is acquired and log before the response 
> is returned. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10121) Streams Task Assignment optimization design

2020-06-08 Thread John Roesler (Jira)
John Roesler created KAFKA-10121:


 Summary: Streams Task Assignment optimization design
 Key: KAFKA-10121
 URL: https://issues.apache.org/jira/browse/KAFKA-10121
 Project: Kafka
  Issue Type: Task
  Components: streams
Affects Versions: 2.6.0
Reporter: John Roesler


Beginning in Kafka 2.6.0, Streams has a new task assignment algorithm that 
reacts to cluster membership changes by starting out 100% sticky and warming up 
tasks in the background to eventually migrate to a 100% balanced assignment. 
See KIP-441 for the details.

However, in computing the final, 100% balanced, assignment, the assignor 
doesn't take into account the current ownership of the tasks. Thus, when 
instances are added or removed, the assignor is likely to migrate large numbers 
of tasks. This is mitigated by the fact that the migrations happen at a trickle 
over time in the background, but it's still better to avoid unnecessary 
migrations if possible. See the example below for details.

The solution seems to be to use some kind of optimization algorithm to find a 
100% balanced assignment that also has maximum overlap with the current 
assignment.

 

Example, with additional detail:

The main focus of the KIP-441 work was the migration mechanism that allows 
Streams to warm up state for new instances in the background while continuing 
to process tasks on the instances that previously owned them. Accordingly the 
assignment algorithm itself focuses on simplicity and guaranteed balance, not 
optimality.

There are three kinds of balance that all have to be met for Stream to be 100% 
balanced:
 # Active task balance: no member should have more active processing workload 
than any other
 # Stateful task balance: no member should have more stateful tasks (either 
active and stateful or standby) than any other
 # Task parallel balance: no member should have more tasks (partitions) for a 
single subtopology than another

(Note: in all these cases, an instance may actually have one more task than 
another, if the number of members doesn't evenly divide the number of tasks. 
For a simple case, consider if you have two members and only one task. It can 
only be assigned to one of the members, and the assignment is still as balanced 
as it could be.)

The current algorithm ensures all three kinds of balance thusly:
 # sort all members by name (to ensure assignment stability)
 # sort all tasks by subtopology first, then by partition. E.g., sorted like 
this: 0_0, 0_1, 0_2, 1_0, 1_1
 # for all tasks that are stateful, iterate over both tasks and members in 
sorted order, assigning each task t[i] to the member m[i % num_tasks]
 # for each standby replica we need to assign, continue looping over the sorted 
members, assigning each replica to the next member (assuming the member doesn't 
already have a replica of the task)
 # for each stateless task, assign an active replica to the member with the 
least number of tasks. Since the active assignment of the member with the least 
number of tasks should have at most 1 task less than any other member after 
step 3, the assignment after step 5 is still balanced.

To demonstrate how a more sophisticated algorithm could minimize migrations, 
consider the following simple assignment with two instances and six tasks:

m1: [0_0, 0_2, 0_4]

m2: [0_1, 0_3, 0_5]

Adding a new member causes four of the tasks to migrate:

m1: [0_0, 0_3]

m2: [0_1, 0_4]

m3: [0_2, 0_5]

However, the following assignment is equally balanced, and only two of the 
tasks need to migrate:

m1: [0_0, 0_2]

m2: [0_1, 0_3]

m3: [0_4, 0_5]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-trunk-jdk11 #1550

2020-06-08 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR: fix HTML markup (#8823)


--
[...truncated 1.50 MB...]

kafka.server.RequestQuotaTest > testResponseThrottleTime STARTED

kafka.server.RequestQuotaTest > testResponseThrottleTime PASSED

kafka.server.RequestQuotaTest > 
testResponseThrottleTimeWhenBothProduceAndRequestQuotasViolated STARTED

kafka.server.RequestQuotaTest > 
testResponseThrottleTimeWhenBothProduceAndRequestQuotasViolated PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldFetchLeaderEpochOnFirstFetchOnly STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldFetchLeaderEpochOnFirstFetchOnly PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > issuesEpochRequestFromLocalReplica 
STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > issuesEpochRequestFromLocalReplica 
PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldTruncateToInitialFetchOffsetIfReplicaReturnsUndefinedOffset STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldTruncateToInitialFetchOffsetIfReplicaReturnsUndefinedOffset PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > shouldTruncateToReplicaOffset 
STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > shouldTruncateToReplicaOffset 
PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > shouldFetchOneReplicaAtATime 
STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > shouldFetchOneReplicaAtATime PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldPollIndefinitelyIfReplicaNotAvailable STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldPollIndefinitelyIfReplicaNotAvailable PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldReplaceCurrentLogDirWhenCaughtUp STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldReplaceCurrentLogDirWhenCaughtUp PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldTruncateToEndOffsetOfLargestCommonEpoch STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldTruncateToEndOffsetOfLargestCommonEpoch PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldNotAddPartitionIfFutureLogIsNotDefined STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldNotAddPartitionIfFutureLogIsNotDefined PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > 
fetchEpochsFromLeaderShouldHandleExceptionFromGetLocalReplica STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > 
fetchEpochsFromLeaderShouldHandleExceptionFromGetLocalReplica PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldUpdateLeaderEpochAfterFencedEpochError STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldUpdateLeaderEpochAfterFencedEpochError PASSED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldFetchNonDelayedAndNonTruncatingReplicas STARTED

kafka.server.ReplicaAlterLogDirsThreadTest > 
shouldFetchNonDelayedAndNonTruncatingReplicas PASSED

kafka.server.AdvertiseBrokerTest > testBrokerAdvertiseHostNameAndPortToZK 
STARTED

kafka.server.AdvertiseBrokerTest > testBrokerAdvertiseHostNameAndPortToZK PASSED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasBadClientId STARTED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasBadClientId PASSED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasBadConfigKey STARTED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasBadConfigKey PASSED

kafka.server.ClientQuotasRequestTest > testDescribeClientQuotasMatchPartial 
STARTED

kafka.server.ClientQuotasRequestTest > testDescribeClientQuotasMatchPartial 
PASSED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasRequestValidateOnly 
STARTED

kafka.server.DynamicBrokerReconfigurationTest > testAddRemoveSaslListeners 
PASSED

kafka.server.DynamicBrokerReconfigurationTest > testTrustStoreAlter STARTED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasRequestValidateOnly 
PASSED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasBadUser STARTED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasBadUser PASSED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasEmptyEntity STARTED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasEmptyEntity PASSED

kafka.server.ClientQuotasRequestTest > testClientQuotasSanitized STARTED

kafka.server.ClientQuotasRequestTest > testClientQuotasSanitized PASSED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasBadConfigValue 
STARTED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasBadConfigValue 
PASSED

kafka.server.ClientQuotasRequestTest > testClientQuotasUnsupportedEntityTypes 
STARTED

kafka.server.DynamicBrokerReconfigurationTest > testTrustStoreAlter PASSED

kafka.server.DynamicBrokerReconfigurationTest > 
testConfigDescribeUsingAdminClient STARTED

kafka.server.ClientQuotasRequestTest > testClientQuotasUnsupportedEntityTypes 
PASSED

kafka.server.ClientQuotasRequestTest > testAlterClientQuotasBadEntityType 
STARTED

kafka.server.ClientQuotasR

Re: Granting permission for Create KIP

2020-06-08 Thread Guozhang Wang
Done, added Rhett.Wang

On Mon, Jun 8, 2020 at 7:40 AM wang120445...@sina.com <
wang120445...@sina.com> wrote:

>
> Please grant permission for Create KIP to wiki ID: wang120445...@sina.com
>  Rhett.wang
>
>
> wang120445...@sina.com
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

2020-06-08 Thread Boyang Chen
Hey there,

If no more question is raised, I will go ahead and start the vote shortly.

On Thu, Jun 4, 2020 at 12:39 PM Boyang Chen 
wrote:

> Hey there,
>
> bumping this thread for any further KIP-590 discussion, since it's been
> quiet for a while.
>
> Boyang
>
> On Thu, May 21, 2020 at 10:31 AM Boyang Chen 
> wrote:
>
>> Thanks David, I agree the wording here is not clear, and the fellow
>> broker should just send a new CreateTopicRequest in this case.
>>
>> In the meantime, we had some offline discussion about the Envelope API as
>> well. Although it provides certain privileges like data embedding and
>> principal embedding, it creates a security hole by letting a malicious user
>> impersonate any forwarding broker, thus pretending to be any admin user.
>> Passing the principal around also increases the vulnerability, compared
>> with other standard ways such as passing a verified token, but it is
>> unfortunately not fully supported with Kafka security.
>>
>> So for the security concerns, we are abandoning the Envelope approach and
>> fallback to just forward the raw admin requests. The authentication will
>> happen on the receiving broker side instead of on the controller, so that
>> we are able to stripped off the principal fields and only include the
>> principal in header as optional field for audit logging purpose.
>> Furthermore, we shall propose adding a separate endpoint for
>> broker-controller communication which should be recommended to enable
>> secure connections so that a malicious client could not pretend to be a
>> broker and perform impersonation.
>>
>> Let me know your thoughts.
>>
>> Best,
>> Boyang
>>
>> On Tue, May 19, 2020 at 12:17 AM David Jacot  wrote:
>>
>>> Hi Boyang,
>>>
>>> I've got another question regarding the auto topic creation case. The KIP
>>> says: "Currently the target broker shall just utilize its own ZK client
>>> to
>>> create
>>> internal topics, which is disallowed in the bridge release. For above
>>> scenarios,
>>> non-controller broker shall just forward a CreateTopicRequest to the
>>> controller
>>> instead and let controller take care of the rest, while waiting for the
>>> response
>>> in the meantime." There will be no request to forward in this case,
>>> right?
>>> Instead,
>>> a CreateTopicsRequest is created and sent to the controller node.
>>>
>>> When the CreateTopicsRequest is sent as a side effect of the
>>> MetadataRequest,
>>> it would be good to know the principal and the clientId in the controller
>>> (quota,
>>> audit, etc.). Do you plan to use the Envelope API for this case as well
>>> or
>>> to call
>>> the regular API directly? Another was to phrase it would be: Shall the
>>> internal
>>> CreateTopicsRequest be sent with the original metadata (principal,
>>> clientId, etc.)
>>> of the MetadataRequest or as an admin request?
>>>
>>> Best,
>>> David
>>>
>>> On Fri, May 8, 2020 at 2:04 AM Guozhang Wang  wrote:
>>>
>>> > Just to adds a bit more FYI here related to the last question from
>>> David:
>>> > in KIP-595 while implementing the new requests we are also adding a
>>> > "KafkaNetworkChannel" which is used for brokers to send vote / fetch
>>> > records, so besides the discussion on listeners I think implementation
>>> wise
>>> > we can also consider consolidating a lot of those into the same
>>> call-trace
>>> > as well -- of course this is not related to public APIs so maybe just
>>> needs
>>> > to be coordinated among developments:
>>> >
>>> > 1. Broker -> Controller: ISR Change, Topic Creation, Admin Redirect
>>> > (KIP-497).
>>> > 2. Controller -> Broker: LeaderAndISR / MetadataUpdate; though these
>>> are
>>> > likely going to be deprecated post KIP-500.
>>> > 3. Txn Coordinator -> Broker: TxnMarker
>>> >
>>> >
>>> > Guozhang
>>> >
>>> > On Wed, May 6, 2020 at 8:58 PM Boyang Chen >> >
>>> > wrote:
>>> >
>>> > > Hey David,
>>> > >
>>> > > thanks for the feedbacks!
>>> > >
>>> > > On Wed, May 6, 2020 at 2:06 AM David Jacot 
>>> wrote:
>>> > >
>>> > > > Hi Boyang,
>>> > > >
>>> > > > While re-reading the KIP, I've got few small questions/comments:
>>> > > >
>>> > > > 1. When auto topic creation is enabled, brokers will send a
>>> > > > CreateTopicRequest
>>> > > > to the controller instead of writing to ZK directly. It means that
>>> > > > creation of these
>>> > > > topics are subject to be rejected with an error if a
>>> CreateTopicPolicy
>>> > is
>>> > > > used. Today,
>>> > > > it bypasses the policy entirely. I suppose that clusters allowing
>>> auto
>>> > > > topic creation
>>> > > > don't have a policy in place so it is not a big deal. I suggest to
>>> call
>>> > > > out explicitly the
>>> > > > limitation in the KIP though.
>>> > > >
>>> > > > That's a good idea, will add to the KIP.
>>> > >
>>> > >
>>> > > > 2. In the same vein as my first point. How do you plan to handle
>>> errors
>>> > > > when internal
>>> > > > topics are created by a broker? Do you plan to retry retryable
>>> errors
>>> > > > indefinitely?
>>> > > >
>>> > 

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-08 Thread David Jacot
Hi Jun,

30. The rate is accumulated at the partition level. Let me clarify this in
the KIP.

Best,
David

On Sat, Jun 6, 2020 at 2:37 AM Anna Povzner  wrote:

> Hi David,
>
> The KIP looks good to me. I am going to the voting thread...
>
> Hi Jun,
>
> Yes, exactly. That's a separate thing from this KIP, so working on the fix.
>
> Thanks,
> Anna
>
> On Fri, Jun 5, 2020 at 4:36 PM Jun Rao  wrote:
>
> > Hi, Anna,
> >
> > Thanks for the comment. For the problem that you described, perhaps we
> need
> > to make the quota checking and recording more atomic?
> >
> > Hi, David,
> >
> > Thanks for the updated KIP.  Looks good to me now. Just one minor comment
> > below.
> >
> > 30. controller_mutations_rate: For topic creation and deletion, is the
> rate
> > accumulated at the topic or partition level? It would be useful to make
> it
> > clear in the wiki.
> >
> > Jun
> >
> > On Fri, Jun 5, 2020 at 7:23 AM David Jacot  wrote:
> >
> > > Hi Anna and Jun,
> > >
> > > You are right. We should allocate up to the quota for each old sample.
> > >
> > > I have revamped the Throttling Algorithm section to better explain our
> > > thought process and the token bucket inspiration.
> > >
> > > I have also added a chapter with few guidelines about how to define
> > > the quota. There is no magic formula for this but I give few insights.
> > > I don't have specific numbers that can be used out of the box so I
> > > think that it is better to not put any for the time being. We can
> always
> > > complement later on in the documentation.
> > >
> > > Please, take a look and let me know what you think.
> > >
> > > Cheers,
> > > David
> > >
> > > On Fri, Jun 5, 2020 at 8:37 AM Anna Povzner  wrote:
> > >
> > > > Hi David and Jun,
> > > >
> > > > I dug a bit deeper into the Rate implementation, and wanted to
> confirm
> > > that
> > > > I do believe that the token bucket behavior is better for the reasons
> > we
> > > > already discussed but wanted to summarize. The main difference
> between
> > > Rate
> > > > and token bucket is that the Rate implementation allows a burst by
> > > > borrowing from the future, whereas a token bucket allows a burst by
> > using
> > > > accumulated tokens from the previous idle period. Using accumulated
> > > tokens
> > > > smoothes out the rate measurement in general. Configuring a large
> burst
> > > > requires configuring a large quota window, which causes long delays
> for
> > > > bursty workload, due to borrowing credits from the future. Perhaps it
> > is
> > > > useful to add a summary in the beginning of the Throttling Algorithm
> > > > section?
> > > >
> > > > In my previous email, I mentioned the issue we observed with the
> > > bandwidth
> > > > quota, where a low quota (1MB/s per broker) was limiting bandwidth
> > > visibly
> > > > below the quota. I thought it was strictly the issue with the Rate
> > > > implementation as well, but I found a root cause to be different but
> > > > amplified by the Rate implementation (long throttle delays of
> requests
> > > in a
> > > > burst). I will describe it here for completeness using the following
> > > > example:
> > > >
> > > >-
> > > >
> > > >Quota = 1MB/s, default window size and number of samples
> > > >-
> > > >
> > > >Suppose there are 6 connections (maximum 6 outstanding requests),
> > and
> > > >each produce request is 5MB. If all requests arrive in a burst,
> the
> > > > last 4
> > > >requests (20MB over 10MB allowed in a window) may get the same
> > > throttle
> > > >time if they are processed concurrently. We record the rate under
> > the
> > > > lock,
> > > >but then calculate throttle time separately after that. So, for
> each
> > > >request, the observed rate could be 3MB/s, and each request gets
> > > > throttle
> > > >delay = 20 seconds (instead of 5, 10, 15, 20 respectively). The
> > delay
> > > is
> > > >longer than the total rate window, which results in lower
> bandwidth
> > > than
> > > >the quota. Since all requests got the same delay, they will also
> > > arrive
> > > > in
> > > >a burst, which may also result in longer delay than necessary. It
> > > looks
> > > >pretty easy to fix, so I will open a separate JIRA for it. This
> can
> > be
> > > >additionally mitigated by token bucket behavior.
> > > >
> > > >
> > > > For the algorithm "So instead of having one sample equal to 560 in
> the
> > > last
> > > > window, we will have 100 samples equal to 5.6.", I agree with Jun. I
> > > would
> > > > allocate 5 per each old sample that is still in the overall window.
> It
> > > > would be a bit larger granularity than the pure token bucket (we
> lose 5
> > > > units / mutation once we move past the sample window), but it is
> better
> > > > than the long delay.
> > > >
> > > > Thanks,
> > > >
> > > > Anna
> > > >
> > > >
> > > > On Thu, Jun 4, 2020 at 6:33 PM Jun Rao  wrote:
> > > >
> > > > > Hi, David, Anna,
> > > > >
> > > > > Thanks for the discussion and the updated wiki

[DISCUSS] KIP-621: Deprecate and replace DescribeLogDirsResult.all() and .values()

2020-06-08 Thread Tom Bentley
Hi all,

I've opened a small KIP seeking to deprecate and replace a couple of
methods of DescribeLogDirsResult which refer to internal classes in their
return type.

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158862109

Please take a look if you have the time.

Kind regards,

Tom


[jira] [Created] (KAFKA-10120) DescribeLogDirsResult exposes internal classes

2020-06-08 Thread Tom Bentley (Jira)
Tom Bentley created KAFKA-10120:
---

 Summary: DescribeLogDirsResult exposes internal classes
 Key: KAFKA-10120
 URL: https://issues.apache.org/jira/browse/KAFKA-10120
 Project: Kafka
  Issue Type: Bug
Reporter: Tom Bentley
Assignee: Tom Bentley


DescribeLogDirsResult (returned by AdminClient#describeLogDirs(Collection)) 
exposes a number of internal types:
 * {{DescribeLogDirsResponse.LogDirInfo}}
 * {{DescribeLogDirsResponse.ReplicaInfo}}
 * {{Errors}}

{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] 2.5.1 Bug Fix Release

2020-06-08 Thread John Roesler
Hello again, all,

Just a quick status update: There are currently two tickets blocking the 2.5.1
release:
KAFKA-10049 KTable-KTable Foreign Key join throwing Serialization Exception 
KAFKA-9891  Invalid state store content after task migration with 
exactly_once and standby replicas 

KAFKA-10049 has an updated PR, which I will review today.

Once these blockers are resolved, I'll cut the RC0 for validation.

The full details are on:
https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.5.1

Thanks,
-John

On Fri, May 29, 2020, at 13:16, Bill Bejeck wrote:
> Thanks for volunteering John, +1.
> 
> On Fri, May 29, 2020 at 1:58 PM Ismael Juma  wrote:
> 
> > Thanks for volunteering! +1
> >
> > Ismael
> >
> > On Fri, May 29, 2020 at 8:56 AM John Roesler  wrote:
> >
> > > Hello all,
> > >
> > > I'd like to volunteer as release manager for the 2.5.1 bugfix release.
> > >
> > > Kafka 2.5.0 was released on 15 April 2020, and 40 issues have been fixed
> > > since then.
> > >
> > > The release plan is documented here:
> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.5.1
> > >
> > > Thanks,
> > > -John
> > >
> >
>


[jira] [Created] (KAFKA-10119) StreamsResetter fails with TimeoutException for older Brokers

2020-06-08 Thread Bruno Cadonna (Jira)
Bruno Cadonna created KAFKA-10119:
-

 Summary: StreamsResetter fails with TimeoutException for older 
Brokers
 Key: KAFKA-10119
 URL: https://issues.apache.org/jira/browse/KAFKA-10119
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 2.6.0
Reporter: Bruno Cadonna


Since somewhere after commit 2d37c8c84 in Apache Kafka, the streams resetter 
consistently fails with brokers of version confluent-5.0.1. 

The following exception is thrown:

{code:java}
org.apache.kafka.common.errors.TimeoutException: Timeout of 6ms expired 
before the position for partition test-0 could be determined
{code} 

which comes from this line within the {{StreamsResetter}} class:

{code:java}
System.out.println("Topic: " + p.topic() + " Partition: " + p.partition() + " 
Offset: " + client.position(p));
{code}

The exception is not thrown with brokers of version confluent-5.5.0. I have not 
tried brokers of other versions.

The bug can be reproduced with the following steps:
1. check out commit dc8f8ffd2ad from Apache Kafka
2. build with {{./gradlew clean -PscalaVersion=2.13 jar}}
3. start a confluent-5.0.1 broker. 
4. create a topic with {{bin/kafka-topics.sh --create --bootstrap-server 
localhost:9092 --replication-factor 1 --partitions 1 --topic test}}
5. start streams resetter with {{bin/kafka-streams-application-reset.sh 
--application-id test --bootstrap-servers localhost:9092 --input-topics test}}

Streams resetter should output:
{code}
ERROR: org.apache.kafka.common.errors.TimeoutException: Timeout of 6ms 
expired before the position for partition test-0 could be determined
org.apache.kafka.common.errors.TimeoutException: Timeout of 6ms expired 
before the position for partition test-0 could be determined
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Granting permission for Create KIP

2020-06-08 Thread wang120445...@sina.com

Please grant permission for Create KIP to wiki ID: wang120445...@sina.com   
Rhett.wang


wang120445...@sina.com


Re: [VOTE] KIP-584: Versioning scheme for features

2020-06-08 Thread Kowshik Prakasam
Hi all,

I wanted to let you know that I have made the following minor changes to
the KIP-584 write up. The purpose is to ensure the design is correct for a
few things which came up during implementation:

1. Feature version data type has been made to be int16 (instead of int64).
The reason is two fold:
a. Usage of int64 felt overkill. Feature version bumps are infrequent
(since these bumps represent breaking changes that are generally
infrequent). Therefore int16 is big enough to support version bumps of a
particular feature.
b. The int16 data type aligns well with existing API versions data
type. Please see the file
'/clients/src/main/resources/common/message/ApiVersionsResponse.json'.

2. Finalized feature version epoch data type has been made to be int32
(instead of int64). The reason is that the epoch value is the value of ZK
node version, whose data type is int32.

3. Introduced a new 'status' field in the '/features' ZK node schema. The
purpose is to implement Colin's earlier point for the strategy for
transitioning from not having a /features znode to having one. An
explanation has been provided in the following section of the KIP detailing
the different cases:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP584:Versioningschemeforfeatures-FeatureZKnodestatus
.

Please let me know if you have any questions or concerns.


Cheers,
Kowshik



Cheers,
Kowshik

On Tue, Apr 28, 2020 at 11:24 PM Kowshik Prakasam 
wrote:

> Hi all,
>
> This KIP vote has been open for ~12 days. The summary of the votes is that
> we have 3 binding votes (Colin, Guozhang, Jun), and 3 non-binding votes
> (David, Dhruvil, Boyang). Therefore, the KIP vote passes. I'll mark KIP as
> accepted and start working on the implementation.
>
> Thanks a lot!
>
>
> Cheers,
> Kowshik
>
> On Mon, Apr 27, 2020 at 12:15 PM Colin McCabe  wrote:
>
>> Thanks, Kowshik.  +1 (binding)
>>
>> best,
>> Colin
>>
>> On Sat, Apr 25, 2020, at 13:20, Kowshik Prakasam wrote:
>> > Hi Colin,
>> >
>> > Thanks for the explanation! I agree with you, and I have updated the
>> > KIP.
>> > Here is a link to relevant section:
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-584%3A+Versioning+scheme+for+features#KIP-584:Versioningschemeforfeatures-Controller:ZKnodebootstrapwithdefaultvalues
>> >
>> >
>> > Cheers,
>> > Kowshik
>> >
>> > On Fri, Apr 24, 2020 at 8:50 PM Colin McCabe 
>> wrote:
>> >
>> > > On Fri, Apr 24, 2020, at 00:01, Kowshik Prakasam wrote:
>> > > > (Kowshik): Great point! However for case #1, I'm not sure why we
>> need to
>> > > > create a '/features' ZK node with disabled features. Instead, do
>> you see
>> > > > any drawback if we just do not create it? i.e. if IBP is less than
>> 2.6,
>> > > the
>> > > > controller treats the case as though the versioning system is
>> completely
>> > > > disabled, and would not create a non-existing '/features' node.
>> > >
>> > > Hi Kowshik,
>> > >
>> > > When the IBP is less than 2.6, but the software has been upgraded to a
>> > > state where it supports this KIP, that
>> > >  means the user is upgrading from an earlier version of the
>> software.  In
>> > > this case, we want to start with all the features disabled and allow
>> the
>> > > user to enable them when they are ready.
>> > >
>> > > Enabling all the possible features immediately after an upgrade could
>> be
>> > > harmful to the cluster.  On the other hand, for a new cluster, we do
>> want
>> > > to enable all the possible features immediately . I was proposing
>> this as a
>> > > way to distinguish the two cases (since the new cluster will never be
>> > > started with an old IBP).
>> > >
>> > > > Colin MccCabe wrote:
>> > > > > And now, something a little bit bigger (sorry).  For finalized
>> > > features,
>> > > > > why do we need both min_version_level and max_version_level?
>> Assuming
>> > > that
>> > > > > we want all the brokers to be on the same feature version level,
>> we
>> > > really only care
>> > > > > about three numbers for each feature, right?  The minimum
>> supported
>> > > version
>> > > > > level, the maximum supported version level, and the current active
>> > > version level.
>> > > >
>> > > > > We don't actually want different brokers to be on different
>> versions of
>> > > > > the same feature, right?  So we can just have one number for
>> current
>> > > > > version level, rather than two.  At least that's what I was
>> thinking
>> > > -- let
>> > > > > me know if I missed something.
>> > > >
>> > > > (Kowshik): It is my understanding that the "current active version
>> level"
>> > > > that you have mentioned, is the "max_version_level". But we still
>> > > > maintain/publish both min and max version levels, because, the
>> detail
>> > > about
>> > > > min level is useful to external clients. This is described below.
>> > > >
>> > > > For any feature F, think of the closed range: [min_version_level,
>> > > > max_version_level] as the range of finalized versi

Re: [VOTE] KIP-578: Add configuration to limit number of partitions

2020-06-08 Thread Gokul Ramanan Subramanian
Hi. Can we resume the voting process for KIP-578? Thanks.

On Mon, Jun 1, 2020 at 11:09 AM Gokul Ramanan Subramanian <
gokul24...@gmail.com> wrote:

> Thanks Colin. Have updated the KIP per your recommendations. Let me know
> what you think.
>
> Thanks Harsha for the vote.
>
> On Wed, May 27, 2020 at 8:17 PM Colin McCabe  wrote:
>
>> Hi Gokul Ramanan Subramanian,
>>
>> Thanks for the KIP.
>>
>> Can you please modify the KIP to remove the reference to the deprecated
>> --zookeeper flag?  This is not how kafka-configs.sh is supposed to be used
>> in new versions of Kafka.  You get a warning message if you do use this
>> deprecated flag.  As described in KIP-604, we are removing the --zookeeper
>> flag in the Kafka 3.0 release.  It also causes problems when people use the
>> deprecated access mode-- for example, as you note in this KIP, it bypasses
>> resource limits such as the ones described here.
>>
>> Instead of WILL_EXCEED_PARTITION_LIMITS, how about
>> RESOURCE_LIMIT_REACHED?  Then the error string can contain the detailed
>> message about which resource limit was hit (per broker limit, per cluster
>> limit, whatever.)  It would also be good to spell out that
>> CreateTopicsPolicy plugins can also throw this exception, for consistency.
>>
>> I realize that 2 billion partitions seems like a very big number.
>> However, filesystems have had to transition to 64 bit inode numbers as time
>> has gone on.  There doesn't seem to be any performance reason why this
>> should be a 31 bit number, so let's just make these configurations longs,
>> not ints.
>>
>> best,
>> Colin
>>
>>
>> On Wed, May 27, 2020, at 09:48, Harsha Chintalapani wrote:
>> > Thanks for the KIP Gokul. This will be really useful for our use cases
>> as
>> > well.
>> > +1 (binding).
>> >
>> > -Harsha
>> >
>> >
>> > On Tue, May 26, 2020 at 12:33 AM, Gokul Ramanan Subramanian <
>> > gokul24...@gmail.com> wrote:
>> >
>> > > Hi.
>> > >
>> > > Any votes for this?
>> > >
>> > > Thanks.
>> > >
>> > > On Tue, May 12, 2020 at 11:36 AM Gokul Ramanan Subramanian <
>> gokul2411s@
>> > > gmail.com> wrote:
>> > >
>> > > Hello,
>> > >
>> > > I'd like to initialize voting on KIP-578:
>> > > https://cwiki.apache.org/confluence/display/KAFKA/
>> > > KIP-578%3A+Add+configuration+to+limit+number+of+partitions
>> > > .
>> > >
>> > > Got some good feedback from Stanislav Kozlovski, Alexandre Dupriez
>> and Tom
>> > > Bentley on the discussion thread. I have addressed their comments. I
>> want
>> > > to thank them for their time.
>> > >
>> > > If there are any more concerns about the KIP, I am happy to discuss
>> them
>> > > further.
>> > >
>> > > Thanks.
>> > >
>> > >
>> >
>>
>


Jenkins build is back to normal : kafka-2.4-jdk8 #222

2020-06-08 Thread Apache Jenkins Server
See 




[GitHub] [kafka-site] showuon commented on pull request #268: MINOR: Make contact.html more clear

2020-06-08 Thread GitBox


showuon commented on pull request #268:
URL: https://github.com/apache/kafka-site/pull/268#issuecomment-640413373


   @mjsax @guozhangwang  , could please review this small PR? Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [kafka-site] showuon opened a new pull request #268: MINOR: Make contact.html more clear

2020-06-08 Thread GitBox


showuon opened a new pull request #268:
URL: https://github.com/apache/kafka-site/pull/268


   In our `contact us` page, we said:
   > To subscribe, send an email to users-subscr...@kafka.apache.org. Once 
subscribed, send your emails to us...@kafka.apache.org.
   
   which will confuse users to let them send their **email address** to 
us...@kafka.apache.org, as below mailing thread showed.
   
   
![image](https://user-images.githubusercontent.com/43372967/84001508-59ce3c80-a999-11ea-8703-3afee40a9310.png)
   
   Rephrase this sentence to make it more clear:
   
![image](https://user-images.githubusercontent.com/43372967/84001869-16280280-a99a-11ea-9c32-054ed745bc8a.png)
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org