Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Anna Povzner
Hi Jun and David,

Regarding token bucket vs, Rate behavior. We recently observed a couple of
cases where a bursty workload behavior would result in long-ish pauses in
between, resulting in lower overall bandwidth than the quota. I will need
to debug this a bit more to be 100% sure, but it does look like the case
described by David earlier in this thread. So, I agree with Jun -- I think
we should make all quota rate behavior consistent, and probably similar to
the token bucket one.

Looking at KIP-13, it doesn't describe rate calculation in enough detail,
but does mention window size. So, we could keep "window size" and "number
of samples" configs and change Rate implementation to be more similar to
token bucket:
* number of samples define our burst size
* Change the behavior, which could be described as: If a burst happens
after an idle period, the burst would effectively spread evenly over the
(now - window) time period, where window is ( - 1)*
. Which basically describes a token bucket, while keeping the
current quota configs. I think we can even implement this by changing the
way we record the last sample or lastWindowMs.

Jun, if we would be changing Rate calculation behavior in bandwidth and
request quotas, would we need a separate KIP? Shouldn't need to if we
keep window size and number of samples configs, right?

Thanks,
Anna

On Wed, Jun 3, 2020 at 3:24 PM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the reply.
>
> 11. To match the behavior in the Token bucket approach, I was thinking that
> requests that don't fit in the previous time windows will be accumulated in
> the current time window. So, the 60 extra requests will be accumulated in
> the latest window. Then, the client also has to wait for 12 more secs
> before throttling is removed. I agree that this is probably a better
> behavior and it's reasonable to change the existing behavior to this one.
>
> To me, it seems that sample_size * num_windows is the same as max burst
> balance. The latter seems a bit better to configure. The thing is that the
> existing quota system has already been used in quite a few places and if we
> want to change the configuration in the future, there is the migration
> cost. Given that, do you feel it's better to adopt the  new token bucket
> terminology or just adopt the behavior somehow into our existing system? If
> it's the former, it would be useful to document this in the rejected
> section and add a future plan on migrating existing quota configurations.
>
> Jun
>
>
> On Tue, Jun 2, 2020 at 6:55 AM David Jacot  wrote:
>
> > Hi Jun,
> >
> > Thanks for your reply.
> >
> > 10. I think that both options are likely equivalent from an accuracy
> point
> > of
> > view. If we put the implementation aside, conceptually, I am not
> convinced
> > by the used based approach because resources don't have a clear owner
> > in AK at the moment. A topic can be created by (Principal A, no client
> id),
> > then can be extended by (no principal, Client B), and finally deleted by
> > (Principal C, Client C). This does not sound right to me and I fear that
> it
> > is not going to be easy to grasp for our users.
> >
> > Regarding the naming, I do agree that we can make it more future proof.
> > I propose `controller_mutations_rate`. I think that differentiating the
> > mutations
> > from the reads is still a good thing for the future.
> >
> > 11. I am not convinced by your proposal for the following reasons:
> >
> > First, in my toy example, I used 101 windows and 7 * 80 requests. We
> could
> > effectively allocate 5 * 100 requests to the previous windows assuming
> that
> > they are empty. What shall we do with the remaining 60 requests? Shall we
> > allocate them to the current window or shall we divide it among all the
> > windows?
> >
> > Second, I don't think that we can safely change the behavior of all the
> > existing
> > rates used because it actually changes the computation of the rate as
> > values
> > allocated to past windows would expire before they would today.
> >
> > Overall, while trying to fit in the current rate, we are going to build a
> > slightly
> > different version of the rate which will be even more confusing for
> users.
> >
> > Instead, I think that we should embrace the notion of burst as it could
> > also
> > be applied to other quotas in the future. Users don't have to know that
> we
> > use the Token Bucket or a special rate inside at the end of the day. It
> is
> > an
> > implementation detail.
> >
> > Users would be able to define:
> > - a rate R; and
> > - a maximum burst B.
> >
> > If we change the metrics to be as follow:
> > - the actual rate;
> > - the burst balance in %, 0 meaning that the user is throttled;
> > It remains disattach from the algorithm.
> >
> > I personally prefer this over having to define a rate and a number of
> > windows
> > while having to understand that the number of windows implicitly defines
> > the
> > allowed burst. I think that it is clearer and easier to 

[jira] [Created] (KAFKA-10099) Kerberos authentication sets java authrizedId to authenticationId not autherizationId

2020-06-03 Thread Francois Fernando (Jira)
Francois Fernando created KAFKA-10099:
-

 Summary: Kerberos authentication sets java authrizedId to 
authenticationId not autherizationId
 Key: KAFKA-10099
 URL: https://issues.apache.org/jira/browse/KAFKA-10099
 Project: Kafka
  Issue Type: Bug
  Components: security
Affects Versions: 2.3.0
Reporter: Francois Fernando


Following authentication code in kafka still puzzles me (Lines 67-74: 
https://github.com/apache/kafka/blob/3cdc78e6bb1f83973a14ce1550fe3874f7348b05/clients/src/main/java/org/apache/kafka/common/security/authenticator/SaslServerCallbackHandler.java).


{{private void handleAuthorizeCallback(AuthorizeCallback ac) {}}
{{  String authenticationID = ac.getAuthenticationID();}}
{{  String authorizationID = ac.getAuthorizationID();}}

{{  LOG.info("Successfully authenticated client: authenticationID={}; 
authorizationID={}.",}}
{{ authenticationID, authorizationID);}}

{{  ac.setAuthorized(true);}}
{{  ac.setAuthorizedID(authenticationID);}}
{{}}}

In a kafka cluster secured with Kerberos, using a kafka keytab with principal 
like `sys_read/reader.myorg.c...@myorg.corp` results in:

authenticationID = sys_r...@myorg.corp;
authorizationID = sys_read/reader.myorg.c...@myorg.corp

Last line of above method sets the authorizedID to authenticationID not 
authorizationID. From my understanding of java security, the principal will 
become what's set in AuthorizedID.

This means the ACL definitions can't use the full principal string as the 
principal as authorizer will never see it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9788) Sensor name collision for group and transaction coordinator load metrics

2020-06-03 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-9788.

Fix Version/s: 2.6.0
   Resolution: Fixed

> Sensor name collision for group and transaction coordinator load metrics
> 
>
> Key: KAFKA-9788
> URL: https://issues.apache.org/jira/browse/KAFKA-9788
> Project: Kafka
>  Issue Type: Bug
>Reporter: Bob Barrett
>Assignee: Bob Barrett
>Priority: Minor
> Fix For: 2.6.0
>
>
> Both the group coordinator and the transaction coordinator create a Sensor 
> object on startup to track the time it takes to load partitions, and both 
> name the Sensor "PartitionLoadTime":
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L98]
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L92]
> However, Sensor names are meant to be unique. This name collision means that 
> there is actually only one underlying "PartitionLoadTime" Sensor per broker, 
> which is marked for each partition loaded by either coordinator, resulting in 
> the metrics for group and transaction partition loading to be identical, and 
> based the combination of each data set. These should be renamed to allow 
> distinguishing between the two coordinator types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-614: Add Prefix Scan support for State Stores

2020-06-03 Thread John Roesler
Hi Sagar,

Thanks for the question, and sorry for muddying the water.

I meant the Bytes/byte[] thing as advice for how you all can solve your problem 
in the mean time, while we work through this KIP. I don’t think it’s relevant 
for the KIP itself.

I think the big issue here is what the type of the prefix should be in the 
method signature. Using the same type as the key makes sense some times, but 
not other times. I’m not sure what the best way around this might be. It might 
help if there are other typed key/value stores to compare APIs with.

Thanks,
John

On Mon, Jun 1, 2020, at 09:58, Sagar wrote:
> Hi John,
> 
> Just to add to my previous email(and sorry for the spam), if we consider
> using Bytes/byte[] and manually invoke the serdes, if you could provide
> examples where the same Serde for keys won't work for the prefix types. As
> far as my understanding goes, the prefix seek would depend upon ordering of
> the keys like lexicographic. As long as the binary format is consistent for
> both the keys and the prefixes would it not ensure the ability to search in
> that same ordering space? This is from my limited understanding so any
> concrete examples would be helpful...
> 
> Also, you mentioned about the creation of dummy values to indicate prefix
> values, do you mean this line:
> 
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/internals/foreignkeyjoin/ForeignJoinSubscriptionProcessorSupplier.java#L91
> This
> is where the prefix key is built and used for searching .
> 
> Thanks!
> Sagar.
> 
> On Mon, Jun 1, 2020 at 11:42 AM Sagar  wrote:
> 
> > Hi John,
> >
> > Thank you. I think it makes sense to modify the KIP to add the
> > prefixScan() as part of the existing interfaces and add the new mixin
> > behaviour as Rejected alternatives. I am not very aware of other stores
> > apart from keyValueStore so is it fine if I keep it there for now?
> >
> > Regarding the type definition of types I will try and think about some
> > alternatives and share if I get any.
> >
> > Thanks!
> > Sagar.
> >
> >
> > On Sun, May 31, 2020 at 1:55 AM John Roesler  wrote:
> >
> >> Hi Sagar,
> >>
> >> Thanks for the response. Your use case makes sense to me; I figured it
> >> must be something like that.
> >>
> >> On a pragmatic level, in the near term, you might consider basically
> >> doing the same thing we did in KIP-213. If you swap out the store types for
> >> Byte/byte[] and “manually” invoke the serdes in your own logic, then you
> >> can use the same algorithm we did to derive the range scan boundaries from
> >> your desired prefix.
> >>
> >> For the actual KIP, it seems like we would need significant design
> >> improvements to be able to do any mixins, so I think we should favor
> >> proposing either to just add to the existing interfaces or to create brand
> >> new interfaces, as appropriate, for now. Given that prefix can be converted
> >> to a range query at a low level, I think we can probably explore adding
> >> prefix to the existing interfaces with a default implementation.
> >>
> >> It seems like that just leaves the question of how to define the type of
> >> the prefix. To be honest, I don’t have any great ideas here. Are you able
> >> to generate some creative solutions, Sagar?
> >>
> >> Thanks,
> >> John
> >>
> >> On Tue, May 26, 2020, at 06:42, Sagar wrote:
> >> > Hi John,
> >> >
> >> > Thanks for the detailed reply. I was a bit crammed with work last week
> >> so
> >> > couldn't respond earlier so apologies for that.
> >> >
> >> > First of all, thanks for the context that both you and Adam have
> >> > provided me on the issues faced previously. As I can clearly see, while
> >> I
> >> > was able to cut some corners while writing some test cases or
> >> benchmarks,
> >> > to be able to stitch together a store with prefix scan into an actual
> >> > topology needs more work. I am sorry for the half baked tests that I
> >> wrote
> >> > without realising and you have rightly put it when you said these
> >> > challenges aren't obvious up front.
> >> >
> >> > Now, coming back to the other points, I spent some time going through
> >> the
> >> > KIP-213 and also some of the code snippets that are talked about in that
> >> > KIP. With the detailed explanation that you provided, it is now obvious
> >> to
> >> > me that keeping a generic type for keys like K won't work oob and hence
> >> a
> >> > decision was made to use Bytes as the key type.
> >> >
> >> > I just had another thought on this though. I looked at the range
> >> function
> >> > that was added in the ReadOnlyKeyValueStore. While the Key and the Value
> >> > mentioned in that method is generic, internally almost all queries end
> >> up
> >> > querying using Bytes in some or the other form. I looked at not just
> >> > RocksDb Store but other stores like InMemory store or MemoryLRU and this
> >> > seems to be the pattern. I think this stems from the fact that these
> >> stores
> >> > while 

[jira] [Created] (KAFKA-10098) Remove unnecessary escaping in regular expression in SaslAuthenticatorTest.java

2020-06-03 Thread Can Cecen (Jira)
Can Cecen created KAFKA-10098:
-

 Summary: Remove unnecessary escaping in regular expression in 
SaslAuthenticatorTest.java
 Key: KAFKA-10098
 URL: https://issues.apache.org/jira/browse/KAFKA-10098
 Project: Kafka
  Issue Type: Improvement
Reporter: Can Cecen


n.b. This is a newbie ticket designed to be an introduction to contributing for 
the assignee.

In line:
{code}
e.getMessage().matches(
".*\\<\\[" + expectedResponseTextRegex + 
"]>.*\\<\\[" + receivedResponseTextRegex + ".*?]>"));
{code}
'<' does not need to be escaped.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10097) Avoid getting null map for task checkpoint

2020-06-03 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-10097:
---

 Summary: Avoid getting null map for task checkpoint
 Key: KAFKA-10097
 URL: https://issues.apache.org/jira/browse/KAFKA-10097
 Project: Kafka
  Issue Type: Improvement
Reporter: Boyang Chen


In StreamTask, we have the logic to generate a checkpoint offset map to be 
materialized through StateManager#checkpoint. This map could be either empty 
map or null, which the former case indicates to only pull down existing state 
store checkpoint data, while the latter indicates no need to do a checkpoint in 
the case such as we are suspending a task.

Having two similar special logics for checkpointing could lead to unexpected 
bugs, also we should think about separating the empty checkpoint case vs 
passed-in checkpoint case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9161) Close gaps in Streams configs documentation

2020-06-03 Thread Jakob Homan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved KAFKA-9161.

Resolution: Fixed

> Close gaps in Streams configs documentation
> ---
>
> Key: KAFKA-9161
> URL: https://issues.apache.org/jira/browse/KAFKA-9161
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation, streams
>Reporter: Sophie Blee-Goldman
>Assignee: Sophie Blee-Goldman
>Priority: Major
>  Labels: beginner, newbie, newbie++
>
> There are a number of Streams configs that aren't documented in the 
> "Configuring a Streams Application" section of the docs. As of 2.3 the 
> missing configs are:
>  # default.windowed.key.serde.inner ^
>  # default.windowed.value.serde.inner ^
>  # max.task.idle.ms
>  # rocksdb.config.setter. ^^
>  # topology.optimization
>  # -upgrade.from- fixed
>  ^ these configs are also missing the corresponding DOC string
>  ^^ this one actually does appear on that page, but instead of being included 
> in the list of Streams configs it is for some reason under  "Consumer and 
> Producer Configuration Parameters" ?
> There are also a few configs whose documented name is slightly incorrect, as 
> it is missing the "default" prefix that appears in the actual code. The 
> "missing-default" configs are:
>  # key.serde
>  # value.serde
>  # timestamp.extractor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Jun Rao
Hi, David,

Thanks for the reply.

11. To match the behavior in the Token bucket approach, I was thinking that
requests that don't fit in the previous time windows will be accumulated in
the current time window. So, the 60 extra requests will be accumulated in
the latest window. Then, the client also has to wait for 12 more secs
before throttling is removed. I agree that this is probably a better
behavior and it's reasonable to change the existing behavior to this one.

To me, it seems that sample_size * num_windows is the same as max burst
balance. The latter seems a bit better to configure. The thing is that the
existing quota system has already been used in quite a few places and if we
want to change the configuration in the future, there is the migration
cost. Given that, do you feel it's better to adopt the  new token bucket
terminology or just adopt the behavior somehow into our existing system? If
it's the former, it would be useful to document this in the rejected
section and add a future plan on migrating existing quota configurations.

Jun


On Tue, Jun 2, 2020 at 6:55 AM David Jacot  wrote:

> Hi Jun,
>
> Thanks for your reply.
>
> 10. I think that both options are likely equivalent from an accuracy point
> of
> view. If we put the implementation aside, conceptually, I am not convinced
> by the used based approach because resources don't have a clear owner
> in AK at the moment. A topic can be created by (Principal A, no client id),
> then can be extended by (no principal, Client B), and finally deleted by
> (Principal C, Client C). This does not sound right to me and I fear that it
> is not going to be easy to grasp for our users.
>
> Regarding the naming, I do agree that we can make it more future proof.
> I propose `controller_mutations_rate`. I think that differentiating the
> mutations
> from the reads is still a good thing for the future.
>
> 11. I am not convinced by your proposal for the following reasons:
>
> First, in my toy example, I used 101 windows and 7 * 80 requests. We could
> effectively allocate 5 * 100 requests to the previous windows assuming that
> they are empty. What shall we do with the remaining 60 requests? Shall we
> allocate them to the current window or shall we divide it among all the
> windows?
>
> Second, I don't think that we can safely change the behavior of all the
> existing
> rates used because it actually changes the computation of the rate as
> values
> allocated to past windows would expire before they would today.
>
> Overall, while trying to fit in the current rate, we are going to build a
> slightly
> different version of the rate which will be even more confusing for users.
>
> Instead, I think that we should embrace the notion of burst as it could
> also
> be applied to other quotas in the future. Users don't have to know that we
> use the Token Bucket or a special rate inside at the end of the day. It is
> an
> implementation detail.
>
> Users would be able to define:
> - a rate R; and
> - a maximum burst B.
>
> If we change the metrics to be as follow:
> - the actual rate;
> - the burst balance in %, 0 meaning that the user is throttled;
> It remains disattach from the algorithm.
>
> I personally prefer this over having to define a rate and a number of
> windows
> while having to understand that the number of windows implicitly defines
> the
> allowed burst. I think that it is clearer and easier to grasp. Don't you?
>
> Best,
> David
>
> On Fri, May 29, 2020 at 6:38 PM Jun Rao  wrote:
>
> > Hi, David, Anna,
> >
> > Thanks for the response. Sorry for the late reply.
> >
> > 10. Regarding exposing rate or usage as quota. Your argument is that
> usage
> > is not very accurate anyway and is harder to implement. So, let's just
> be a
> > bit loose and expose rate. I am sort of neutral on that. (1) It seems to
> me
> > that overall usage will be relatively more accurate than rate. All the
> > issues that Anna brought up seem to exist for rate too. Rate has the
> > additional problem that the cost of each request may not be uniform. (2)
> In
> > terms of implementation, a usage based approach requires tracking the
> user
> > info through the life cycle of an operation. However, as you mentioned,
> > things like topic creation can generate additional
> > LeaderAndIsr/UpdateMetadata requests. Longer term, we probably want to
> > associate those cost to the user who initiated the operation. If we do
> > that, we sort of need to track the user for the full life cycle of the
> > processing of an operation anyway. (3) If you prefer rate strongly, I
> don't
> > have strong objections. However, I do feel that the new quota name should
> > be able to cover all controller related cost longer term. This KIP
> > currently only covers topic creation/deletion. It would not be ideal if
> in
> > the future, we have to add yet another type of quota for some other
> > controller related operations. The quota name in the KIP has partition
> > mutation. In the future, if we 

[jira] [Resolved] (KAFKA-10096) Remove unnecessary String.format call in VersionConditionalTest.java

2020-06-03 Thread Can Cecen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Can Cecen resolved KAFKA-10096.
---
Resolution: Won't Fix

Realized that String.format is used to create line separators. 

> Remove unnecessary String.format call in VersionConditionalTest.java
> 
>
> Key: KAFKA-10096
> URL: https://issues.apache.org/jira/browse/KAFKA-10096
> Project: Kafka
>  Issue Type: Improvement
>  Components: unit tests
>Reporter: Can Cecen
>Assignee: Can Cecen
>Priority: Trivial
>  Labels: newbie
>
> n.b. This is a newbie ticket designed to be an introduction to contributing 
> for the assignee.
> Since there is no format specified to String.format, we can remove it and 
> just append the line.
>  {code}
> static void assertEquals(CodeBuffer buffer, String... lines) throws Exception 
> {
> StringWriter stringWriter = new StringWriter();
> buffer.write(stringWriter);
> StringBuilder expectedStringBuilder = new StringBuilder();
> for (String line : lines) {
> expectedStringBuilder.append(String.format(line));
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10096) Remove unnecessary String.format call in VersionConditionalTest.java

2020-06-03 Thread Can Cecen (Jira)
Can Cecen created KAFKA-10096:
-

 Summary: Remove unnecessary String.format call in 
VersionConditionalTest.java
 Key: KAFKA-10096
 URL: https://issues.apache.org/jira/browse/KAFKA-10096
 Project: Kafka
  Issue Type: Improvement
  Components: unit tests
Reporter: Can Cecen


n.b. This is a newbie ticket designed to be an introduction to contributing for 
the assignee.

Since there is no format specified to String.format, we can remove it and just 
append the line.
static void assertEquals(CodeBuffer buffer, String... lines) throws Exception {
StringWriter stringWriter = new StringWriter();
buffer.write(stringWriter);
StringBuilder expectedStringBuilder = new StringBuilder();
for (String line : lines) {
expectedStringBuilder.append(String.format(line));
}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10094) In MirrorSourceConnector replace two-step assignment with single call

2020-06-03 Thread Jakob Homan (Jira)
Jakob Homan created KAFKA-10094:
---

 Summary: In MirrorSourceConnector replace two-step assignment with 
single call
 Key: KAFKA-10094
 URL: https://issues.apache.org/jira/browse/KAFKA-10094
 Project: Kafka
  Issue Type: Improvement
  Components: mirrormaker
Reporter: Jakob Homan
Assignee: Mandar Tillu


n.b. This is a newbie ticket designed to be an introduction to contributing for 
the assignee.

 

In MirrorSourceConnector::refreshTopicPartitions we have places where we create 
a new HashSet and then addAll to the set.  We can replace both with a direct 
call to the copy constructor.

 
{code:java}
void refreshTopicPartitions()
throws InterruptedException, ExecutionException {
knownSourceTopicPartitions = findSourceTopicPartitions();
knownTargetTopicPartitions = findTargetTopicPartitions();
List upstreamTargetTopicPartitions = 
knownTargetTopicPartitions.stream()
.map(x -> new 
TopicPartition(replicationPolicy.upstreamTopic(x.topic()), x.partition()))
.collect(Collectors.toList());

Set newTopicPartitions = new HashSet<>();
newTopicPartitions.addAll(knownSourceTopicPartitions);
newTopicPartitions.removeAll(upstreamTargetTopicPartitions);
Set deadTopicPartitions = new HashSet<>();
deadTopicPartitions.addAll(upstreamTargetTopicPartitions);{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10093) Replace iteration with call to addAll in Utils

2020-06-03 Thread Jakob Homan (Jira)
Jakob Homan created KAFKA-10093:
---

 Summary: Replace iteration with call to addAll in Utils
 Key: KAFKA-10093
 URL: https://issues.apache.org/jira/browse/KAFKA-10093
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Reporter: Jakob Homan
Assignee: Can Cecen


n.b. This is a newbie ticket designed to be an introduction to contributing for 
the assignee.

In clients/src/main/java/org/apache/kafka/common/utils/Utils.java we're 
currently using iteration to add all the elements from one collection into 
another. We can replace this with a call to Arrays.asList() and 
Collections.addAll().

{code}/**
 * Creates a set
 * @param elems the elements
 * @param  the type of element
 * @return Set
 */
 @SafeVarargs
 public static  Set mkSet(T... elems) {
 Set result = new HashSet<>((int) (elems.length / 0.75) + 1);
 for (T elem : elems)
 result.add(elem);
 return result;
 }

/**
 * Creates a sorted set
 * @param elems the elements
 * @param  the type of element, must be comparable
 * @return SortedSet
 */
 @SafeVarargs
 public static > SortedSet mkSortedSet(T... elems) {
 SortedSet result = new TreeSet<>();
 for (T elem : elems)
 result.add(elem);
 return result;
 }{code}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10092) Remove unnecessary enum modifier in NioEchoServer

2020-06-03 Thread Jakob Homan (Jira)
Jakob Homan created KAFKA-10092:
---

 Summary: Remove unnecessary enum modifier in NioEchoServer
 Key: KAFKA-10092
 URL: https://issues.apache.org/jira/browse/KAFKA-10092
 Project: Kafka
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Andrey Falko


n.b. This is a newbie ticket designed to be an introduction to contributing for 
the assignee.  

In NioEchoServer the enum has its constructor declared as private, which is 
[redundant|https://docs.oracle.com/javase/tutorial/java/javaOO/enum.html].  We 
can remove this.  

{code}public class NioEchoServer extends Thread {
public enum MetricType {
TOTAL, RATE, AVG, MAX;

private final String metricNameSuffix;

private MetricType() {
metricNameSuffix = "-" + name().toLowerCase(Locale.ROOT);
}}} {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10095) In LogCleanerManagerTest replace get().nonEmpty call with contains

2020-06-03 Thread Jakob Homan (Jira)
Jakob Homan created KAFKA-10095:
---

 Summary: In LogCleanerManagerTest replace get().nonEmpty call with 
contains
 Key: KAFKA-10095
 URL: https://issues.apache.org/jira/browse/KAFKA-10095
 Project: Kafka
  Issue Type: Improvement
  Components: log cleaner, unit tests
Reporter: Jakob Homan
Assignee: Sarah Gonsalves


n.b. This is a newbie ticket designed to be an introduction to contributing for 
the assignee.

In kafka.log.LogCleanerManagerTest we have two calls to 
.get(something).nonEmpty, which is equivalent to .contains(something).  We 
should simplify these calls.

 {code}cleanerManager.doneCleaning(topicPartition, log.dir, 1)
assertTrue(cleanerManager.cleaningState(topicPartition).isEmpty)

assertTrue(cleanerManager.allCleanerCheckpoints.get(topicPartition).nonEmpty)

cleanerManager.setCleaningState(topicPartition, LogCleaningAborted)
cleanerManager.doneCleaning(topicPartition, log.dir, 1)
assertEquals(LogCleaningPaused(1), 
cleanerManager.cleaningState(topicPartition).get)

assertTrue(cleanerManager.allCleanerCheckpoints.get(topicPartition).nonEmpty){code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-2.6-jdk8 #15

2020-06-03 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-10080; Fix race condition on txn completion which can cause

[rajinisivaram] KAFKA-10089 The stale ssl engine factory is not closed after 
reconfigure


--
[...truncated 3.13 MB...]
org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairsAndCustomTimestamps
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairsAndCustomTimestamps
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairs 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicNameAndTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicNameAndTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairsAndCustomTimestamps
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairsAndCustomTimestamps
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairsWithCustomTimestampAndIncrementsAndNotAdvanceTime
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairsWithCustomTimestampAndIncrementsAndNotAdvanceTime
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithTimestampWithTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithTimestampWithTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKeyAndDefaultTimestamp
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKeyAndDefaultTimestamp
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairsWithTimestampAndIncrements STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairsWithTimestampAndIncrements PASSED

org.apache.kafka.streams.test.TestRecordTest > testConsumerRecord STARTED

org.apache.kafka.streams.test.TestRecordTest > testConsumerRecord PASSED

org.apache.kafka.streams.test.TestRecordTest > testToString STARTED

org.apache.kafka.streams.test.TestRecordTest > testToString PASSED

org.apache.kafka.streams.test.TestRecordTest > testInvalidRecords STARTED

org.apache.kafka.streams.test.TestRecordTest > testInvalidRecords PASSED

org.apache.kafka.streams.test.TestRecordTest > testPartialConstructorEquals 
STARTED

org.apache.kafka.streams.test.TestRecordTest > testPartialConstructorEquals 
PASSED

org.apache.kafka.streams.test.TestRecordTest > testMultiFieldMatcher STARTED

org.apache.kafka.streams.test.TestRecordTest > testMultiFieldMatcher PASSED

org.apache.kafka.streams.test.TestRecordTest > testFields STARTED

org.apache.kafka.streams.test.TestRecordTest > testFields PASSED

org.apache.kafka.streams.test.TestRecordTest > testProducerRecord STARTED

org.apache.kafka.streams.test.TestRecordTest > testProducerRecord PASSED


[jira] [Created] (KAFKA-10091) Improve task idling

2020-06-03 Thread John Roesler (Jira)
John Roesler created KAFKA-10091:


 Summary: Improve task idling
 Key: KAFKA-10091
 URL: https://issues.apache.org/jira/browse/KAFKA-10091
 Project: Kafka
  Issue Type: Task
  Components: streams
Reporter: John Roesler


When Streams is processing a task with multiple inputs, each time it is ready 
to process a record, it has to choose which input to process next. It always 
takes from the input for which the next record has the least timestamp. The 
result of this is that Streams processes data in timestamp order. However, if 
the buffer for one of the inputs is empty, Streams doesn't know what timestamp 
the next record for that input will be.

Streams introduced a configuration "max.task.idle.ms" in KIP-353 to address 
this issue.

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization]

The config allows Streams to wait some amount of time for data to arrive on the 
empty input, so that it can make a timestamp-ordered decision about which input 
to pull from next.

However, this config can be hard to use reliably and efficiently, since what 
we're really waiting for is the next poll that _would_ return data from the 
empty input's partition, and this guarantee is a function of the poll interval, 
the max poll interval, and the internal logic that governs when Streams will 
poll again.

The ideal case is you'd be able to guarantee at a minimum that _any_ amount of 
idling would guarantee you poll data from the empty partition if there's data 
to fetch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Please I Want to Subscribe

2020-06-03 Thread Ricardo Ferreira

Please I Want to Subscribe



Re: [VOTE] KIP-545 support automated consumer offset sync across clusters in MM 2.0

2020-06-03 Thread Ryanne Dolan
Looks like we've got the required votes! Congrats Ning!

Please note this KIP was proposed and written by Ning Zhang, not myself,
but we've collaborated on the PR, testing, and community outreach.

Now lets work to get the PR merged!

Ryanne

On Mon, May 25, 2020 at 1:47 PM Bill Bejeck  wrote:

> Thanks for the clear well written KIP.
>
> This seems like a useful feature to me.
>
> +1(binding)
>
> -Bill
>
> On Sun, May 24, 2020 at 12:50 PM John Roesler  wrote:
>
> > Thanks for the KIP, Ryanne!
> >
> > It looks like a good feature to me, and the KIP looks well motivated and
> > well designed.
> >
> > I’m +1 (binding)
> >
> > -John
> >
> >
> > On Thu, May 21, 2020, at 10:49, Harsha Ch wrote:
> > > +1 (binding). Good addition to MM 2.
> > >
> > > -Harsha
> > >
> > > On Thu, May 21, 2020 at 8:36 AM, Manikumar < manikumar.re...@gmail.com
> > > wrote:
> > >
> > > >
> > > >
> > > >
> > > > +1 (binding).
> > > >
> > > >
> > > >
> > > > Thanks for the KIP.
> > > >
> > > >
> > > >
> > > > On Thu, May 21, 2020 at 9:49 AM Maulin Vasavada < maulin. vasavada@
> > gmail.
> > > > com ( maulin.vasav...@gmail.com ) > wrote:
> > > >
> > > >
> > > >>
> > > >>
> > > >> Thank you for the KIP. I sincerely hope we get enough votes on this
> > KIP. I
> > > >> was thinking of similar changes while working on DR capabilities and
> > > >> offsets are Achilles Heels and this KIP addresses it.
> > > >>
> > > >>
> > > >>
> > > >> On Mon, May 18, 2020 at 6:10 PM Maulin Vasavada < maulin. vasavada@
> > gmail.
> > > >> com ( maulin.vasav...@gmail.com )
> > > >>
> > > >>
> > > >>
> > > >> wrote:
> > > >>
> > > >>
> > > >>>
> > > >>>
> > > >>> +1 (non-binding)
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Mon, May 18, 2020 at 9:41 AM Ryanne Dolan < ryannedolan@ gmail.
> > com (
> > > >>> ryannedo...@gmail.com ) > wrote:
> > > >>>
> > > >>>
> > > 
> > > 
> > >  Bump. Looks like we've got 6 non-binding votes and 1 binding.
> > > 
> > > 
> > > 
> > >  On Thu, Feb 20, 2020 at 11:25 AM Ning Zhang < ning2008wisc@
> gmail.
> > com (
> > >  ning2008w...@gmail.com ) > wrote:
> > > 
> > > 
> > > >
> > > >
> > > > Hello committers,
> > > >
> > > >
> > > >
> > > > I am the author of the KIP-545 and if we still miss votes from
> the
> > > > committers, please review the KIP and vote for it, so that the
> > > > corresponding PR will be reviewed soon.
> > > >
> > > >
> > > 
> > > 
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> https:/ / cwiki. apache. org/ confluence/ display/ KAFKA/
> >
> KIP-545%3A+support+automated+consumer+offset+sync+across+clusters+in+MM+2.
> > > >> 0 (
> > > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-545%3A+support+automated+consumer+offset+sync+across+clusters+in+MM+2.0
> > > >> )
> > > >>
> > > >>
> > > >>>
> > > 
> > > >
> > > >
> > > > Thank you
> > > >
> > > >
> > > >
> > > > On 2020/02/06 17:05:41, Edoardo Comar < edoardlists@ gmail. com
> (
> > > > edoardli...@gmail.com ) > wrote:
> > > >
> > > >
> > > >>
> > > >>
> > > >> +1 (non-binding)
> > > >> thanks for the KIP !
> > > >>
> > > >>
> > > >>
> > > >> On Tue, 14 Jan 2020 at 13:57, Navinder Brar <
> > > >>
> > > >>
> > > >
> > > >
> > > 
> > > 
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> navinder_brar@ yahoo. com ( navinder_b...@yahoo.com )
> > > >>
> > > >>
> > > >>>
> > > 
> > > >
> > > >
> > > > .invalid>
> > > >
> > > >
> > > >>
> > > >>
> > > >> wrote:
> > > >>
> > > >>
> > > >>>
> > > >>>
> > > >>> +1 (non-binding)
> > > >>> Navinder
> > > >>> On Tuesday, 14 January, 2020, 07:24:02 pm IST, Ryanne Dolan <
> > ryannedolan@
> > > >>> gmail. com ( ryannedo...@gmail.com ) > wrote:
> > > >>>
> > > >>>
> > > >>>
> > > >>> Bump. We've got 4 non-binding and one binding vote.
> > > >>>
> > > >>>
> > > >>>
> > > >>> Ryanne
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Fri, Dec 13, 2019, 1:44 AM Tom Bentley < tbentley@ redhat.
> > com (
> > > >>> tbent...@redhat.com ) >
> > > >>>
> > > >>>
> > > >>
> > > >>
> > > >
> > > >
> > > 
> > > 
> > > 
> > >  wrote:
> > > 
> > > 
> > > >
> > > >>
> > > >>>
> > > 
> > > 
> > >  +1 (non-binding)
> > > 
> > > 
> > > 
> > >  On Thu, Dec 12, 2019 at 6:33 PM Andrew Schofield <
> > andrew_schofield@ live.
> > >  com ( andrew_schofi...@live.com ) >
> > >  wrote:
> > > 
> > > 
> > > >
> > > >
> > > > +1 (non-binding)
> > > >
> > > >
> > > >
> > > > On 12/12/2019, 14:20, "Mickael Maison" <
> > > >
> > > >
> > > 
> > > 
> > > >>>
> > > >>>
> > > >>
> > > >>
> 

Re: [VOTE] KIP-601: Configurable socket connection timeout in NetworkClient

2020-06-03 Thread Colin McCabe
Hi Cheng,

Thanks for working on this.  Looks good.

How about "socket.connection.setup.timeout.ms" and 
"socket.connection.setup.timeout.max.ms" (not connections with an S)?

+1 (binding)

best,
Colin


On Wed, Jun 3, 2020, at 06:24, Rajini Sivaram wrote:
> Hi Cheng,
> 
> Thanks for the updates, looks good.
> 
> +1 (binding)
> 
> Regards,
> 
> Rajini
> 
> On Wed, Jun 3, 2020 at 8:53 AM Cheng Tan  wrote:
> 
> > Dear Rajini,
> >
> > Thanks for the feedback.
> >
> > 1)
> > Because "request.timeout.ms" only affects in-flight requests, after the
> > API NetworkClient.ready() is invoked, the connection won't get closed after
> > "request.timeout.ms” hits. Before
> > a) the SocketChannel is connected
> > b) ssl handshake finished
> > c) authentication has finished (sasl)
> > clients cannot invoke NetworkClient.send() to send any request, which
> > means no in-flight request targeting to the connection will be added.
> >
> >
> > 2)
> > I think a default value of 127 seconds make sense, which meets the timeout
> > indirectly specified by the default value of “tcp.syn.retries”. I’ve added
> > this into the KIP proposal.
> >
> >
> > 3)
> > Every time the timeout hits, the timeout value of the next connection try
> > will increase.
> >
> > The timeout will hit iff a connection stays at the `connecting` state
> > longer than the timeout value, as indicated by
> > ClusterConnectionStates.NodeConnectionState. The connection state of a node
> > will change iff `SelectionKey.OP_CONNECT` is detected by
> > `nioSelector.Select()`. The connection state may transit from `connecting`
> > to
> >
> > a) `disconnected` when SocketChannel.finishConnect() throws
> > IOException.
> > b) `connected` when SocketChannel.finishConnect() return TRUE.
> >
> > In other words, the timeout will hit and increase iff the interested
> > SelectionKey.OP_CONNECT doesn't trigger before the timeout arrives, which
> > means, for example, network congestion, failure of the ARP request, packet
> > filtering, routing error, or a silent discard may happen. (I didn’t read
> > the Java NIO source code. Please correct me the case when OP_CONNECT won’t
> > get triggered if I’m wrong)
> >
> >
> > 4)
> >
> > A) Connection timeout dominates both request timeout and API timeout
> >
> > When connection timeout hits, the connection will be closed. The client
> > will be notified either by the responses constructed by NetworkClient or
> > the callbacks attached to the request. As a result, the request failure
> > will be handled before either connection timeout or API timeout arrives.
> >
> >
> > B) Neither request timeout or API timeout dominates connection timeout
> >
> > i) Request timeout: Because request timeout only affects in-flight
> > requests, after the API NetworkClient.ready() is invoked, the connection
> > won't get closed after "request.timeout.ms” hits. Before
> > 1. the SocketChannel is connected
> > 2. SSL handshake finished
> > 3. authentication has finished (SASL)
> > , clients won't be able to invoke NetworkClient.send() to send any
> > request, which means no in-flight request targeting to the connection will
> > be added.
> >
> > ii) API timeout: In AdminClient, API timeout acts by putting a smaller and
> > smaller timeout value to the chain of requests in a same API. After the API
> > timeout hits, the retry logic won't close any connection. In consumer, API
> > timeout acts as a whole by putting a limit to the code block executing
> > time. The retry logic won't close any connection as well.
> >
> >
> > Conclusion:
> >
> > Thanks again for the long feedback and I’m always enjoying them. I’ve
> > supplement the above discussion into the KIP proposal. Please let me know
> > what you think.
> >
> >
> > Best, - Cheng Tan
> >
> >
> > > On Jun 2, 2020, at 3:01 AM, Rajini Sivaram 
> > wrote:
> > >
> > > Hi Cheng,
> > >
> > > Not sure if the discussion should move back to the DISCUSS thread. I
> > have a
> > > few questions:
> > >
> > > 1) The KIP motivation says that in some cases `request.timeout.ms`
> > doesn't
> > > timeout connections properly and as a result it takes 127s to detect a
> > > connection failure. This sounds like a bug rather than a limitation of
> > the
> > > current approach. Can you explain the scenarios where this occurs?
> > >
> > > 2) I think the current proposal is to use non-exponential 10s connection
> > > timeout as default with the option to use exponential timeout. So
> > > connection timeouts for every connection attempt will be between 8s and
> > 12s
> > > by default. Is that correct? Should we use a default max timeout to
> > enable
> > > exponential timeout by default since 8s seems rather small?
> > >
> > > 3) What is the scope of `failures` used to determine connection timeout
> > > with exponential timeouts? Will we always use 10s followed by 20s every
> > > time a connection is attempted?
> > >
> > > 4) It will be good if we can include two flows 

Build failed in Jenkins: kafka-2.6-jdk8 #14

2020-06-03 Thread Apache Jenkins Server
See 


Changes:

[vvcephei] KAFKA-10084: Fix EosTestDriver end offset (#8785)


--
[...truncated 3.13 MB...]

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairsAndCustomTimestamps
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairsAndCustomTimestamps
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairs 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicNameAndTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicNameAndTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairsAndCustomTimestamps
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairsAndCustomTimestamps
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairsWithCustomTimestampAndIncrementsAndNotAdvanceTime
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairsWithCustomTimestampAndIncrementsAndNotAdvanceTime
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithTimestampWithTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithTimestampWithTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKeyAndDefaultTimestamp
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 

[jira] [Resolved] (KAFKA-10089) The stale ssl engine factory is not closed after reconfigure

2020-06-03 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-10089.

Fix Version/s: 2.6.0
 Reviewer: Rajini Sivaram
   Resolution: Fixed

> The stale ssl engine factory is not closed after reconfigure
> 
>
> Key: KAFKA-10089
> URL: https://issues.apache.org/jira/browse/KAFKA-10089
> Project: Kafka
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.6.0
>
>
> {code}
> @Override
> public void reconfigure(Map newConfigs) throws KafkaException {
> SslEngineFactory newSslEngineFactory = 
> createNewSslEngineFactory(newConfigs);
> if (newSslEngineFactory != this.sslEngineFactory) {
> this.sslEngineFactory = newSslEngineFactory; // we should close 
> the older one
> log.info("Created new {} SSL engine builder with keystore {} 
> truststore {}", mode,
> newSslEngineFactory.keystore(), 
> newSslEngineFactory.truststore());
> }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9777) Purgatory locking bug can lead to hanging transaction

2020-06-03 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-9777.

Fix Version/s: 2.6.0
   Resolution: Fixed

> Purgatory locking bug can lead to hanging transaction
> -
>
> Key: KAFKA-9777
> URL: https://issues.apache.org/jira/browse/KAFKA-9777
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.1.1, 2.0.1, 2.1.1, 2.2.2, 2.3.1, 2.4.1
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 2.6.0
>
>
> Once a transaction reaches the `PrepareCommit` or `PrepareAbort` state, the 
> transaction coordinator must send markers to all partitions included in the 
> transaction. After all markers have been sent, then the transaction 
> transitions to the corresponding completed state. Until this transition 
> occurs, no additional progress can be made by the producer.
> The transaction coordinator uses a purgatory to track completion of the 
> markers that need to be sent. Once all markers have been written, then the 
> `DelayedTxnMarker` task becomes completable. We depend on its completion in 
> order to transition to the completed state.
> Related to KAFKA-8334, there is a bug in the locking protocol which is used 
> to check completion of the `DelayedTxnMarker` task. The purgatory attempts to 
> provide a "happens before" contract for task completion with 
> `checkAndComplete`. Basically if a task is completed before calling 
> `checkAndComplete`, then it should be given an opportunity to complete as 
> long as there is sufficient time remaining before expiration. 
> The bug in the locking protocol is that it expects that the operation lock is 
> exclusive to the operation. See here: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/DelayedOperation.scala#L114.
>  The logic assumes that if the lock cannot be acquired, then the other holder 
> of the lock must be attempting completion of the same delayed operation. If 
> that is not the case, then the "happens before" contract is broken and a task 
> may not get completed until expiration even if it has been satisfied.
> In the case of `DelayedTxnMarker`, the lock in use is the read side of a 
> read-write lock which is used for partition loading: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMarkerChannelManager.scala#L264.
>  In fact, if the lock cannot be acquired, it means that it is being held in 
> order to complete some loading operation, in which case it will definitely 
> not attempt completion of the delayed operation. If this happens to occur on 
> the last call to `checkAndComplete` after all markers have been written, then 
> the transition to the completing state will never occur.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-10080) IllegalStateException after duplicate CompleteCommit append to transaction log

2020-06-03 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-10080.
-
Fix Version/s: 2.6.0
   Resolution: Fixed

> IllegalStateException after duplicate CompleteCommit append to transaction log
> --
>
> Key: KAFKA-10080
> URL: https://issues.apache.org/jira/browse/KAFKA-10080
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.6.0
>
>
> We noticed this exception in the logs:
> {code}
> java.lang.IllegalStateException: TransactionalId foo completing transaction 
> state transition while it does not have a pending state   
>  
> at 
> kafka.coordinator.transaction.TransactionMetadata.$anonfun$completeTransitionTo$1(TransactionMetadata.scala:357)
> at 
> kafka.coordinator.transaction.TransactionMetadata.completeTransitionTo(TransactionMetadata.scala:353)
> at 
> kafka.coordinator.transaction.TransactionStateManager.$anonfun$appendTransactionToLog$3(TransactionStateManager.scala:595)
>   
>  
> at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at 
> kafka.coordinator.transaction.TransactionMetadata.inLock(TransactionMetadata.scala:188)
> at 
> kafka.coordinator.transaction.TransactionStateManager.$anonfun$appendTransactionToLog$15$adapted(TransactionStateManager.scala:587)
>   
> 
> at kafka.server.DelayedProduce.onComplete(DelayedProduce.scala:126)
> at 
> kafka.server.DelayedOperation.forceComplete(DelayedOperation.scala:70)
> at kafka.server.DelayedProduce.tryComplete(DelayedProduce.scala:107)
> at 
> kafka.server.DelayedOperation.maybeTryComplete(DelayedOperation.scala:121)
> at 
> kafka.server.DelayedOperationPurgatory$Watchers.tryCompleteWatched(DelayedOperation.scala:378)
> at 
> kafka.server.DelayedOperationPurgatory.checkAndComplete(DelayedOperation.scala:280)
> at 
> kafka.cluster.DelayedOperations.checkAndCompleteAll(Partition.scala:122)
> at 
> kafka.cluster.Partition.tryCompleteDelayedRequests(Partition.scala:1023)
> at 
> kafka.cluster.Partition.updateFollowerFetchState(Partition.scala:740)
> {code}
> After inspection, we found that there were two CompleteCommit entries in the 
> transaction state log which explains the failed transition. Indeed the logic 
> for writing the CompleteCommit message does seem prone to race conditions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Tom Bentley
Hi David,

One quick question about the implementation (I don't think it's spelled out
in the KIP): Presumably if you make, for example, a create topics request
with validate only it will check for quota violation, but not count towards
quota violation, right?

Many thanks,

Tom

On Wed, Jun 3, 2020 at 3:51 PM Rajini Sivaram 
wrote:

> Hi David,
>
> 2) sorry, that was my mistake.
>
> Regards,
>
> Rajini
>
>
> On Wed, Jun 3, 2020 at 3:08 PM David Jacot  wrote:
>
> > Hi Rajini,
> >
> > Thanks for your prompt response.
> > 1) Good catch, fixed.
> > 2) The retry mechanism will be in the client so a new field is not
> > required in the requests.
> >
> > Regards,
> > David
> >
> > On Wed, Jun 3, 2020 at 2:43 PM Rajini Sivaram 
> > wrote:
> >
> > > Hi David,
> > >
> > > Thanks for the updates, looks good. Just a couple of minor comments:
> > > 1) There is a typo in "*The channel will be mutated as well when
> > > `throttle_time_ms > 0`." * Should be *muted*?
> > > 2) Since the three requests will need a new field for `
> > > *retryQuotaViolatedException*`, we should perhaps add that change to
> the
> > > KIP.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > > On Wed, Jun 3, 2020 at 1:02 PM David Jacot 
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I have updated the KIP based on our recent discussions. I have mainly
> > > > changed the
> > > > following points:
> > > > * I have renamed the quota as suggested by Jun.
> > > > * I have changed the metrics to be "token bucket" agnostic. The idea
> is
> > > to
> > > > report the
> > > > burst and the rate per principal/clientid.
> > > > * I have removed the `retry.quota.violation.exception` configuration
> > and
> > > > replaced it
> > > > with options in the respective methods' options.
> > > >
> > > > Now, the public interfaces are not tight to the algorithm that we use
> > > > internally to throttle
> > > > the requests but keep the notion of burst. I hope that this will help
> > to
> > > > alleviate the
> > > > tokens bucket vs rate discussions.
> > > >
> > > > Please, have a look and let me know your thoughts.
> > > >
> > > > Bests,
> > > > David
> > > >
> > > >
> > > > On Wed, Jun 3, 2020 at 10:17 AM David Jacot 
> > wrote:
> > > >
> > > > > Hi Rajini,
> > > > >
> > > > > Thanks for your feedback. Please find my answers below:
> > > > >
> > > > > 1) Our main goal is to protect the controller from the extreme
> users
> > > > > (DDoS). We want
> > > > > to protect it from large requests or repetitive requests coming
> from
> > a
> > > > > single user.
> > > > > That user could be used by multiple apps as you pointed out which
> > makes
> > > > it
> > > > > even
> > > > > worst. For instance, a buggy application could continuously create
> > and
> > > > > delete
> > > > > topics in a tight loop.
> > > > >
> > > > > The idea of the burst is to still allow creating or deleting topics
> > in
> > > > > batch because
> > > > > this is how applications tend to do it. However, we want to keep
> the
> > > > batch
> > > > > size
> > > > > under control with the burst. The burst does not allow requests of
> > any
> > > > > size. Topics
> > > > > are accepted until the burst is passed. All the others are
> rejected.
> > > > > Ideally, regular
> > > > > and well behaving applications should never or rarely be throttled.
> > > > >
> > > > > 2) No, there is no explicit bound on the maximum throttle time.
> > Having
> > > a
> > > > > maximum
> > > > > is straightforward here because the throttling time depends on the
> > > actual
> > > > > size of
> > > > > the request.
> > > > >
> > > > > 3) That's a very good question that I haven't thought of. I was
> > > thinking
> > > > > about doing
> > > > > it for the new quota only. I think that your suggestion of having a
> > per
> > > > > method argument
> > > > > makes sense. I will update the KIP.
> > > > >
> > > > > 4) Indeed, it is outdated text. Let me update that.
> > > > >
> > > > > Regards,
> > > > > David
> > > > >
> > > > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram <
> > > rajinisiva...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hi David,
> > > > >>
> > > > >> Thanks for the KIP. A few questions below:
> > > > >>
> > > > >> 1) The KIP says: *`Typically, applications tend to send one
> request
> > to
> > > > >> create all the topics that they need`*. What would the point of
> > > > throttling
> > > > >> be in this case? If there was a user quota for the principal used
> by
> > > > that
> > > > >> application, wouldn't we just allow the request due to the burst
> > > value?
> > > > Is
> > > > >> the KIP specifically for the case where multiple apps with the
> same
> > > user
> > > > >> principal are started all at once?
> > > > >>
> > > > >> 2) Will there be a bound on the maximum throttle time?
> > > > >>
> > > > >> 3) Will *retry.quota.violation.exception* have any impact on
> request
> > > > quota
> > > > >> throttling or is it limited to the three requests where the new
> > quota
> > > is
> > > > >> applied? 

[jira] [Resolved] (KAFKA-8338) Improve consumer offset expiration logic to take subscription into account

2020-06-03 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-8338.

Resolution: Duplicate

This has been done as part of KAFKA-8730 (KIP-496).

> Improve consumer offset expiration logic to take subscription into account
> --
>
> Key: KAFKA-8338
> URL: https://issues.apache.org/jira/browse/KAFKA-8338
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: huxihx
>Priority: Major
>
> Currently, we expire consumer offsets for a group after the group is 
> considered gone.
> There is a case where the consumer group still exists, but is now subscribed 
> to different topics. In that case, the offsets of the old topics will never 
> expire and if lag is monitored, the monitors will show ever-growing lag on 
> those topics. 
> We need to improve the logic to expire the consumer offsets if the consumer 
> group didn't subscribe to specific topics/partitions for enough time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-trunk-jdk11 #1530

2020-06-03 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-10084: Fix EosTestDriver end offset (#8785)


--
Started by an SCM change
Running as SYSTEM
[EnvInject] - Loading node environment variables.
Building remotely on H30 (ubuntu) in workspace 

[WS-CLEANUP] Deleting project workspace...
[WS-CLEANUP] Deferred wipeout is used...
[WS-CLEANUP] Done
No credentials specified
Cloning the remote Git repository
Cloning repository https://github.com/apache/kafka.git
 > git init  # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress -- https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # 
 > timeout=10
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git fetch --tags --progress -- https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision cb88be45ebc400069c03ad8b9a2332da0aecc7b8 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f cb88be45ebc400069c03ad8b9a2332da0aecc7b8
Commit message: "KAFKA-10084: Fix EosTestDriver end offset (#8785)"
 > git rev-list --no-walk 35a0692ce1ac456ce8155afc1f0838f4895140ab # timeout=10
ERROR: No tool found matching GRADLE_4_10_2_HOME
Setting GRADLE_4_10_3_HOME=/home/jenkins/tools/gradle/4.10.3
[kafka-trunk-jdk11] $ /bin/bash -xe /tmp/jenkins5902558659320439060.sh
+ rm -rf 
ERROR: No tool found matching GRADLE_4_10_2_HOME
Setting GRADLE_4_10_3_HOME=/home/jenkins/tools/gradle/4.10.3
[kafka-trunk-jdk11] $ /bin/bash -xe /tmp/jenkins4850027431033726557.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew --no-daemon --continue -PmaxParallelForks=2 
-PtestLoggingEvents=started,passed,skipped,failed -PxmlFindBugsReport=true 
clean test -PscalaVersion=2.12
[0.078s][warning][os,thread] Failed to start thread - pthread_create failed 
(EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
Error occurred during initialization of VM
java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
memory or process/resource limits reached
Build step 'Execute shell' marked build as failure
Recording test results
ERROR: No tool found matching GRADLE_4_10_2_HOME
Setting GRADLE_4_10_3_HOME=/home/jenkins/tools/gradle/4.10.3
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
ERROR: No tool found matching GRADLE_4_10_2_HOME
Setting GRADLE_4_10_3_HOME=/home/jenkins/tools/gradle/4.10.3
Not sending mail to unregistered user nore...@github.com


Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Rajini Sivaram
+1 (binding)

Thanks for the KIP, David!

Regards,

Rajini


On Sun, May 31, 2020 at 3:29 AM Gwen Shapira  wrote:

> +1 (binding)
>
> Looks great. Thank you for the in-depth design and discussion.
>
> On Fri, May 29, 2020 at 7:58 AM David Jacot  wrote:
>
> > Hi folks,
> >
> > I'd like to start the vote for KIP-599 which proposes a new quota to
> > throttle create topic, create partition, and delete topics operations to
> > protect the Kafka controller:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> >
> > Please, let me know what you think.
> >
> > Cheers,
> > David
> >
>
>
> --
> Gwen Shapira
> Engineering Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Rajini Sivaram
Hi David,

2) sorry, that was my mistake.

Regards,

Rajini


On Wed, Jun 3, 2020 at 3:08 PM David Jacot  wrote:

> Hi Rajini,
>
> Thanks for your prompt response.
> 1) Good catch, fixed.
> 2) The retry mechanism will be in the client so a new field is not
> required in the requests.
>
> Regards,
> David
>
> On Wed, Jun 3, 2020 at 2:43 PM Rajini Sivaram 
> wrote:
>
> > Hi David,
> >
> > Thanks for the updates, looks good. Just a couple of minor comments:
> > 1) There is a typo in "*The channel will be mutated as well when
> > `throttle_time_ms > 0`." * Should be *muted*?
> > 2) Since the three requests will need a new field for `
> > *retryQuotaViolatedException*`, we should perhaps add that change to the
> > KIP.
> >
> > Regards,
> >
> > Rajini
> >
> > On Wed, Jun 3, 2020 at 1:02 PM David Jacot  wrote:
> >
> > > Hi all,
> > >
> > > I have updated the KIP based on our recent discussions. I have mainly
> > > changed the
> > > following points:
> > > * I have renamed the quota as suggested by Jun.
> > > * I have changed the metrics to be "token bucket" agnostic. The idea is
> > to
> > > report the
> > > burst and the rate per principal/clientid.
> > > * I have removed the `retry.quota.violation.exception` configuration
> and
> > > replaced it
> > > with options in the respective methods' options.
> > >
> > > Now, the public interfaces are not tight to the algorithm that we use
> > > internally to throttle
> > > the requests but keep the notion of burst. I hope that this will help
> to
> > > alleviate the
> > > tokens bucket vs rate discussions.
> > >
> > > Please, have a look and let me know your thoughts.
> > >
> > > Bests,
> > > David
> > >
> > >
> > > On Wed, Jun 3, 2020 at 10:17 AM David Jacot 
> wrote:
> > >
> > > > Hi Rajini,
> > > >
> > > > Thanks for your feedback. Please find my answers below:
> > > >
> > > > 1) Our main goal is to protect the controller from the extreme users
> > > > (DDoS). We want
> > > > to protect it from large requests or repetitive requests coming from
> a
> > > > single user.
> > > > That user could be used by multiple apps as you pointed out which
> makes
> > > it
> > > > even
> > > > worst. For instance, a buggy application could continuously create
> and
> > > > delete
> > > > topics in a tight loop.
> > > >
> > > > The idea of the burst is to still allow creating or deleting topics
> in
> > > > batch because
> > > > this is how applications tend to do it. However, we want to keep the
> > > batch
> > > > size
> > > > under control with the burst. The burst does not allow requests of
> any
> > > > size. Topics
> > > > are accepted until the burst is passed. All the others are rejected.
> > > > Ideally, regular
> > > > and well behaving applications should never or rarely be throttled.
> > > >
> > > > 2) No, there is no explicit bound on the maximum throttle time.
> Having
> > a
> > > > maximum
> > > > is straightforward here because the throttling time depends on the
> > actual
> > > > size of
> > > > the request.
> > > >
> > > > 3) That's a very good question that I haven't thought of. I was
> > thinking
> > > > about doing
> > > > it for the new quota only. I think that your suggestion of having a
> per
> > > > method argument
> > > > makes sense. I will update the KIP.
> > > >
> > > > 4) Indeed, it is outdated text. Let me update that.
> > > >
> > > > Regards,
> > > > David
> > > >
> > > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi David,
> > > >>
> > > >> Thanks for the KIP. A few questions below:
> > > >>
> > > >> 1) The KIP says: *`Typically, applications tend to send one request
> to
> > > >> create all the topics that they need`*. What would the point of
> > > throttling
> > > >> be in this case? If there was a user quota for the principal used by
> > > that
> > > >> application, wouldn't we just allow the request due to the burst
> > value?
> > > Is
> > > >> the KIP specifically for the case where multiple apps with the same
> > user
> > > >> principal are started all at once?
> > > >>
> > > >> 2) Will there be a bound on the maximum throttle time?
> > > >>
> > > >> 3) Will *retry.quota.violation.exception* have any impact on request
> > > quota
> > > >> throttling or is it limited to the three requests where the new
> quota
> > is
> > > >> applied? If it is only for the new quota, perhaps it would be better
> > to
> > > >> add
> > > >> an option to the relevant requests rather than use an admin client
> > > config.
> > > >>
> > > >> 4) Is this outdated text, wasn't sure what this refers to : *While
> the
> > > >> configuration could be set by broker, we envision to have it set
> only
> > > has
> > > >> a
> > > >> cluster wide default for two reasons.*
> > > >>
> > > >> Regards,
> > > >>
> > > >> Rajini
> > > >>
> > > >>
> > > >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot 
> > wrote:
> > > >>
> > > >> > Hi Jun,
> > > >> >
> > > >> > Thanks for your reply.
> > > >> >
> > > >> > 10. I think that 

Re: [VOTE] KIP-589: Add API to update Replica state in Controller

2020-06-03 Thread David Arthur
 The vote for this KIP passes with the following results:

* Three binding +1 votes from Colin, Guozhang, and Jason
* Two non-binding +1 votes from Jose and Boyang
* No +0 or -1 votes

Thanks, everyone!
-David

On Tue, Jun 2, 2020 at 8:56 PM Jason Gustafson  wrote:

> +1 I agree with Guozhang that broker epoch will need a separate discussion.
>
> Thanks!
> Jason
>
> On Thu, May 28, 2020 at 9:34 AM Guozhang Wang  wrote:
>
> > David, thanks for the KIP. I'm +1 on it as well.
> >
> > One note is that in post-ZK world, we would need a different way to get
> > broker epoch since it is updated as ZKversion today. I believe we would
> > have this discussion in a different KIP though.
> >
> >
> > Guozhang
> >
> > On Wed, May 27, 2020 at 8:26 PM Colin McCabe  wrote:
> >
> > > Thanks, David.  +1 (binding).
> > >
> > > cheers,
> > > Colin
> > >
> > > On Wed, May 27, 2020, at 18:21, David Arthur wrote:
> > > > Colin, thanks for the feedback. Good points. I've updated the KIP
> with
> > > your
> > > > suggestions.
> > > >
> > > > -David
> > > >
> > > > On Wed, May 27, 2020 at 4:05 PM Colin McCabe 
> > wrote:
> > > >
> > > > > Hi David,
> > > > >
> > > > > Thanks for the KIP!
> > > > >
> > > > > The KIP refers to "the KIP-500 bridge release (version 2.6.0 as of
> > the
> > > > > time of this proposal)".  This is out of date-- the bridge release
> > > will be
> > > > > one of the 3.x releases.  We should either update this to 3.0, or
> > > perhaps
> > > > > just take out the reference to a specific version, since it's not
> > > necessary
> > > > > to understand the rest of the KIP.
> > > > >
> > > > > > ... and potentially could replace the existing controlled
> shutdown
> > > RPC.
> > > > > Since this RPC
> > > > > > is somewhat generic, it could also be used to mark a replicas a
> > > "online"
> > > > > following some
> > > > > > kind of log dir recovery procedure (out of scope for this
> > proposal).
> > > > >
> > > > > I think it would be good to move this part into the "Future Work"
> > > section.
> > > > >
> > > > > > The Reason field is an optional textual description of why the
> > event
> > > is
> > > > > being sent
> > > > >
> > > > > Since we implemented optional fields in KIP-482, describing this
> > field
> > > as
> > > > > "optional" might be confusing.  Probably better to avoid describing
> > it
> > > that
> > > > > way, unless it's a tagged field.
> > > > >
> > > > > > - If no Topic is given, it is implied that all topics on this
> > broker
> > > are
> > > > > being indicated
> > > > > > - If a Topic and no partitions are given, it is implied that all
> > > > > partitions of this topic are being indicated
> > > > >
> > > > > I would prefer to leave out these "shortcuts" since they seem
> likely
> > to
> > > > > lead to confusion and bugs.
> > > > >
> > > > > For example, suppose that  the controller has just created a new
> > > partition
> > > > > for topic "foo" and put it on broker 3.  But then, before broker 3
> > > gets the
> > > > > LeaderAndIsrRequest from the controller, broker 3 get a bad log
> > > directory.
> > > > > So it sends an AlterReplicaStateRequest to the controller
> specifying
> > > topic
> > > > > foo and leaving out the partition list (using the first
> "shortcut".)
> > > The
> > > > > new partition will get marked as offline even though it hasn't even
> > > been
> > > > > created, much less assigned to the bad log directory.
> > > > >
> > > > > Since log directory failures are rare, spelling out the full set of
> > > > > affected partitions when one happens doesn't seem like that much
> of a
> > > > > burden.  This is also consistent with what we currently do.  In
> fact,
> > > it's
> > > > > much more efficient than what we currently do, since with KIP-589,
> we
> > > won't
> > > > > have to enumerate partitions that aren't on the failed log
> directory.
> > > > >
> > > > > In the future work section: If we eventually want to replace
> > > > > ControlledShutdownRequest with this RPC, we'll need some additional
> > > > > functionality.  Specifically, we'll need the ability to tell the
> > > controller
> > > > > to stop putting new partitions on the broker that sent the request.
> > > That
> > > > > could be done with a separate request or possibly additional flags
> on
> > > this
> > > > > request.  In any case, we don't have to solve that problem now.
> > > > >
> > > > > Thanks again for the KIP... great to see this moving forward.
> > > > >
> > > > > regards,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Wed, May 20, 2020, at 12:22, David Arthur wrote:
> > > > > > Hello, all. I'd like to start the vote for KIP-589 which proposes
> > to
> > > add
> > > > > a
> > > > > > new AlterReplicaState RPC.
> > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-589+Add+API+to+update+Replica+state+in+Controller
> > > > > >
> > > > > > Cheers,
> > > > > > David
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -David
> > > >
> > >
> >
> >

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread David Jacot
Hi Rajini,

Thanks for your prompt response.
1) Good catch, fixed.
2) The retry mechanism will be in the client so a new field is not
required in the requests.

Regards,
David

On Wed, Jun 3, 2020 at 2:43 PM Rajini Sivaram 
wrote:

> Hi David,
>
> Thanks for the updates, looks good. Just a couple of minor comments:
> 1) There is a typo in "*The channel will be mutated as well when
> `throttle_time_ms > 0`." * Should be *muted*?
> 2) Since the three requests will need a new field for `
> *retryQuotaViolatedException*`, we should perhaps add that change to the
> KIP.
>
> Regards,
>
> Rajini
>
> On Wed, Jun 3, 2020 at 1:02 PM David Jacot  wrote:
>
> > Hi all,
> >
> > I have updated the KIP based on our recent discussions. I have mainly
> > changed the
> > following points:
> > * I have renamed the quota as suggested by Jun.
> > * I have changed the metrics to be "token bucket" agnostic. The idea is
> to
> > report the
> > burst and the rate per principal/clientid.
> > * I have removed the `retry.quota.violation.exception` configuration and
> > replaced it
> > with options in the respective methods' options.
> >
> > Now, the public interfaces are not tight to the algorithm that we use
> > internally to throttle
> > the requests but keep the notion of burst. I hope that this will help to
> > alleviate the
> > tokens bucket vs rate discussions.
> >
> > Please, have a look and let me know your thoughts.
> >
> > Bests,
> > David
> >
> >
> > On Wed, Jun 3, 2020 at 10:17 AM David Jacot  wrote:
> >
> > > Hi Rajini,
> > >
> > > Thanks for your feedback. Please find my answers below:
> > >
> > > 1) Our main goal is to protect the controller from the extreme users
> > > (DDoS). We want
> > > to protect it from large requests or repetitive requests coming from a
> > > single user.
> > > That user could be used by multiple apps as you pointed out which makes
> > it
> > > even
> > > worst. For instance, a buggy application could continuously create and
> > > delete
> > > topics in a tight loop.
> > >
> > > The idea of the burst is to still allow creating or deleting topics in
> > > batch because
> > > this is how applications tend to do it. However, we want to keep the
> > batch
> > > size
> > > under control with the burst. The burst does not allow requests of any
> > > size. Topics
> > > are accepted until the burst is passed. All the others are rejected.
> > > Ideally, regular
> > > and well behaving applications should never or rarely be throttled.
> > >
> > > 2) No, there is no explicit bound on the maximum throttle time. Having
> a
> > > maximum
> > > is straightforward here because the throttling time depends on the
> actual
> > > size of
> > > the request.
> > >
> > > 3) That's a very good question that I haven't thought of. I was
> thinking
> > > about doing
> > > it for the new quota only. I think that your suggestion of having a per
> > > method argument
> > > makes sense. I will update the KIP.
> > >
> > > 4) Indeed, it is outdated text. Let me update that.
> > >
> > > Regards,
> > > David
> > >
> > > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > >> Hi David,
> > >>
> > >> Thanks for the KIP. A few questions below:
> > >>
> > >> 1) The KIP says: *`Typically, applications tend to send one request to
> > >> create all the topics that they need`*. What would the point of
> > throttling
> > >> be in this case? If there was a user quota for the principal used by
> > that
> > >> application, wouldn't we just allow the request due to the burst
> value?
> > Is
> > >> the KIP specifically for the case where multiple apps with the same
> user
> > >> principal are started all at once?
> > >>
> > >> 2) Will there be a bound on the maximum throttle time?
> > >>
> > >> 3) Will *retry.quota.violation.exception* have any impact on request
> > quota
> > >> throttling or is it limited to the three requests where the new quota
> is
> > >> applied? If it is only for the new quota, perhaps it would be better
> to
> > >> add
> > >> an option to the relevant requests rather than use an admin client
> > config.
> > >>
> > >> 4) Is this outdated text, wasn't sure what this refers to : *While the
> > >> configuration could be set by broker, we envision to have it set only
> > has
> > >> a
> > >> cluster wide default for two reasons.*
> > >>
> > >> Regards,
> > >>
> > >> Rajini
> > >>
> > >>
> > >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot 
> wrote:
> > >>
> > >> > Hi Jun,
> > >> >
> > >> > Thanks for your reply.
> > >> >
> > >> > 10. I think that both options are likely equivalent from an accuracy
> > >> point
> > >> > of
> > >> > view. If we put the implementation aside, conceptually, I am not
> > >> convinced
> > >> > by the used based approach because resources don't have a clear
> owner
> > >> > in AK at the moment. A topic can be created by (Principal A, no
> client
> > >> id),
> > >> > then can be extended by (no principal, Client B), and finally
> deleted
> > by
> > >> > 

Re: [DISCUSS] KIP-619 Deprecate ConsumerConfig#addDeserializerToConfig(Properties, Deserializer, Deserializer) and ProducerConfig#addSerializerToConfig(Properties, Serializer, Serializer)

2020-06-03 Thread Chia-Ping Tsai
When I created the KIP, the next number was 619 and not sure why the number is 
out of sync.

At any rate, I will update the KIP number :_

On 2020/06/03 05:06:39, Cheng Tan  wrote: 
> Hi Chia, 
> 
> Hope you are doing well. I already took KIP-619 as my KIP identification 
> number. Could you change your KIP id? Thank you.
> 
> Best, - Cheng
> 
> > On May 31, 2020, at 8:08 PM, Chia-Ping Tsai  wrote:
> > 
> > hi All,
> > 
> > This KIP plans to deprecate two unused methods without replacement.
> > 
> > All suggestions are welcome!
> > 
> > KIP: 
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=155749118
> > ISSUE: https://issues.apache.org/jira/browse/KAFKA-10044
> > 
> > ---
> > Chia-Ping
> 
> 


Re: [VOTE] KIP-601: Configurable socket connection timeout in NetworkClient

2020-06-03 Thread Rajini Sivaram
Hi Cheng,

Thanks for the updates, looks good.

+1 (binding)

Regards,

Rajini

On Wed, Jun 3, 2020 at 8:53 AM Cheng Tan  wrote:

> Dear Rajini,
>
> Thanks for the feedback.
>
> 1)
> Because "request.timeout.ms" only affects in-flight requests, after the
> API NetworkClient.ready() is invoked, the connection won't get closed after
> "request.timeout.ms” hits. Before
> a) the SocketChannel is connected
> b) ssl handshake finished
> c) authentication has finished (sasl)
> clients cannot invoke NetworkClient.send() to send any request, which
> means no in-flight request targeting to the connection will be added.
>
>
> 2)
> I think a default value of 127 seconds make sense, which meets the timeout
> indirectly specified by the default value of “tcp.syn.retries”. I’ve added
> this into the KIP proposal.
>
>
> 3)
> Every time the timeout hits, the timeout value of the next connection try
> will increase.
>
> The timeout will hit iff a connection stays at the `connecting` state
> longer than the timeout value, as indicated by
> ClusterConnectionStates.NodeConnectionState. The connection state of a node
> will change iff `SelectionKey.OP_CONNECT` is detected by
> `nioSelector.Select()`. The connection state may transit from `connecting`
> to
>
> a) `disconnected` when SocketChannel.finishConnect() throws
> IOException.
> b) `connected` when SocketChannel.finishConnect() return TRUE.
>
> In other words, the timeout will hit and increase iff the interested
> SelectionKey.OP_CONNECT doesn't trigger before the timeout arrives, which
> means, for example, network congestion, failure of the ARP request, packet
> filtering, routing error, or a silent discard may happen. (I didn’t read
> the Java NIO source code. Please correct me the case when OP_CONNECT won’t
> get triggered if I’m wrong)
>
>
> 4)
>
> A) Connection timeout dominates both request timeout and API timeout
>
> When connection timeout hits, the connection will be closed. The client
> will be notified either by the responses constructed by NetworkClient or
> the callbacks attached to the request. As a result, the request failure
> will be handled before either connection timeout or API timeout arrives.
>
>
> B) Neither request timeout or API timeout dominates connection timeout
>
> i) Request timeout: Because request timeout only affects in-flight
> requests, after the API NetworkClient.ready() is invoked, the connection
> won't get closed after "request.timeout.ms” hits. Before
> 1. the SocketChannel is connected
> 2. SSL handshake finished
> 3. authentication has finished (SASL)
> , clients won't be able to invoke NetworkClient.send() to send any
> request, which means no in-flight request targeting to the connection will
> be added.
>
> ii) API timeout: In AdminClient, API timeout acts by putting a smaller and
> smaller timeout value to the chain of requests in a same API. After the API
> timeout hits, the retry logic won't close any connection. In consumer, API
> timeout acts as a whole by putting a limit to the code block executing
> time. The retry logic won't close any connection as well.
>
>
> Conclusion:
>
> Thanks again for the long feedback and I’m always enjoying them. I’ve
> supplement the above discussion into the KIP proposal. Please let me know
> what you think.
>
>
> Best, - Cheng Tan
>
>
> > On Jun 2, 2020, at 3:01 AM, Rajini Sivaram 
> wrote:
> >
> > Hi Cheng,
> >
> > Not sure if the discussion should move back to the DISCUSS thread. I
> have a
> > few questions:
> >
> > 1) The KIP motivation says that in some cases `request.timeout.ms`
> doesn't
> > timeout connections properly and as a result it takes 127s to detect a
> > connection failure. This sounds like a bug rather than a limitation of
> the
> > current approach. Can you explain the scenarios where this occurs?
> >
> > 2) I think the current proposal is to use non-exponential 10s connection
> > timeout as default with the option to use exponential timeout. So
> > connection timeouts for every connection attempt will be between 8s and
> 12s
> > by default. Is that correct? Should we use a default max timeout to
> enable
> > exponential timeout by default since 8s seems rather small?
> >
> > 3) What is the scope of `failures` used to determine connection timeout
> > with exponential timeouts? Will we always use 10s followed by 20s every
> > time a connection is attempted?
> >
> > 4) It will be good if we can include two flows with the relationship
> > between various timeouts in the KIP. One with a fixed node like a typical
> > produce/consume request to the leader and another that uses
> > `leastLoadedNode` like a metadata request. Having the comparison between
> > the current and proposed behaviour w.r.t all configurable timeouts (the
> two
> > new connection timeouts, request timeout, api timeout etc.) will be
> useful.
> >
> > Regards,
> >
> > Rajini
> >
>


Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread Rajini Sivaram
Hi David,

Thanks for the updates, looks good. Just a couple of minor comments:
1) There is a typo in "*The channel will be mutated as well when
`throttle_time_ms > 0`." * Should be *muted*?
2) Since the three requests will need a new field for `
*retryQuotaViolatedException*`, we should perhaps add that change to the
KIP.

Regards,

Rajini

On Wed, Jun 3, 2020 at 1:02 PM David Jacot  wrote:

> Hi all,
>
> I have updated the KIP based on our recent discussions. I have mainly
> changed the
> following points:
> * I have renamed the quota as suggested by Jun.
> * I have changed the metrics to be "token bucket" agnostic. The idea is to
> report the
> burst and the rate per principal/clientid.
> * I have removed the `retry.quota.violation.exception` configuration and
> replaced it
> with options in the respective methods' options.
>
> Now, the public interfaces are not tight to the algorithm that we use
> internally to throttle
> the requests but keep the notion of burst. I hope that this will help to
> alleviate the
> tokens bucket vs rate discussions.
>
> Please, have a look and let me know your thoughts.
>
> Bests,
> David
>
>
> On Wed, Jun 3, 2020 at 10:17 AM David Jacot  wrote:
>
> > Hi Rajini,
> >
> > Thanks for your feedback. Please find my answers below:
> >
> > 1) Our main goal is to protect the controller from the extreme users
> > (DDoS). We want
> > to protect it from large requests or repetitive requests coming from a
> > single user.
> > That user could be used by multiple apps as you pointed out which makes
> it
> > even
> > worst. For instance, a buggy application could continuously create and
> > delete
> > topics in a tight loop.
> >
> > The idea of the burst is to still allow creating or deleting topics in
> > batch because
> > this is how applications tend to do it. However, we want to keep the
> batch
> > size
> > under control with the burst. The burst does not allow requests of any
> > size. Topics
> > are accepted until the burst is passed. All the others are rejected.
> > Ideally, regular
> > and well behaving applications should never or rarely be throttled.
> >
> > 2) No, there is no explicit bound on the maximum throttle time. Having a
> > maximum
> > is straightforward here because the throttling time depends on the actual
> > size of
> > the request.
> >
> > 3) That's a very good question that I haven't thought of. I was thinking
> > about doing
> > it for the new quota only. I think that your suggestion of having a per
> > method argument
> > makes sense. I will update the KIP.
> >
> > 4) Indeed, it is outdated text. Let me update that.
> >
> > Regards,
> > David
> >
> > On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram 
> > wrote:
> >
> >> Hi David,
> >>
> >> Thanks for the KIP. A few questions below:
> >>
> >> 1) The KIP says: *`Typically, applications tend to send one request to
> >> create all the topics that they need`*. What would the point of
> throttling
> >> be in this case? If there was a user quota for the principal used by
> that
> >> application, wouldn't we just allow the request due to the burst value?
> Is
> >> the KIP specifically for the case where multiple apps with the same user
> >> principal are started all at once?
> >>
> >> 2) Will there be a bound on the maximum throttle time?
> >>
> >> 3) Will *retry.quota.violation.exception* have any impact on request
> quota
> >> throttling or is it limited to the three requests where the new quota is
> >> applied? If it is only for the new quota, perhaps it would be better to
> >> add
> >> an option to the relevant requests rather than use an admin client
> config.
> >>
> >> 4) Is this outdated text, wasn't sure what this refers to : *While the
> >> configuration could be set by broker, we envision to have it set only
> has
> >> a
> >> cluster wide default for two reasons.*
> >>
> >> Regards,
> >>
> >> Rajini
> >>
> >>
> >> On Tue, Jun 2, 2020 at 2:55 PM David Jacot  wrote:
> >>
> >> > Hi Jun,
> >> >
> >> > Thanks for your reply.
> >> >
> >> > 10. I think that both options are likely equivalent from an accuracy
> >> point
> >> > of
> >> > view. If we put the implementation aside, conceptually, I am not
> >> convinced
> >> > by the used based approach because resources don't have a clear owner
> >> > in AK at the moment. A topic can be created by (Principal A, no client
> >> id),
> >> > then can be extended by (no principal, Client B), and finally deleted
> by
> >> > (Principal C, Client C). This does not sound right to me and I fear
> >> that it
> >> > is not going to be easy to grasp for our users.
> >> >
> >> > Regarding the naming, I do agree that we can make it more future
> proof.
> >> > I propose `controller_mutations_rate`. I think that differentiating
> the
> >> > mutations
> >> > from the reads is still a good thing for the future.
> >> >
> >> > 11. I am not convinced by your proposal for the following reasons:
> >> >
> >> > First, in my toy example, I used 101 windows and 7 * 80 requests. We
> >> could
> >> > 

Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread David Jacot
Hi all,

I have updated the KIP based on our recent discussions. I have mainly
changed the
following points:
* I have renamed the quota as suggested by Jun.
* I have changed the metrics to be "token bucket" agnostic. The idea is to
report the
burst and the rate per principal/clientid.
* I have removed the `retry.quota.violation.exception` configuration and
replaced it
with options in the respective methods' options.

Now, the public interfaces are not tight to the algorithm that we use
internally to throttle
the requests but keep the notion of burst. I hope that this will help to
alleviate the
tokens bucket vs rate discussions.

Please, have a look and let me know your thoughts.

Bests,
David


On Wed, Jun 3, 2020 at 10:17 AM David Jacot  wrote:

> Hi Rajini,
>
> Thanks for your feedback. Please find my answers below:
>
> 1) Our main goal is to protect the controller from the extreme users
> (DDoS). We want
> to protect it from large requests or repetitive requests coming from a
> single user.
> That user could be used by multiple apps as you pointed out which makes it
> even
> worst. For instance, a buggy application could continuously create and
> delete
> topics in a tight loop.
>
> The idea of the burst is to still allow creating or deleting topics in
> batch because
> this is how applications tend to do it. However, we want to keep the batch
> size
> under control with the burst. The burst does not allow requests of any
> size. Topics
> are accepted until the burst is passed. All the others are rejected.
> Ideally, regular
> and well behaving applications should never or rarely be throttled.
>
> 2) No, there is no explicit bound on the maximum throttle time. Having a
> maximum
> is straightforward here because the throttling time depends on the actual
> size of
> the request.
>
> 3) That's a very good question that I haven't thought of. I was thinking
> about doing
> it for the new quota only. I think that your suggestion of having a per
> method argument
> makes sense. I will update the KIP.
>
> 4) Indeed, it is outdated text. Let me update that.
>
> Regards,
> David
>
> On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram 
> wrote:
>
>> Hi David,
>>
>> Thanks for the KIP. A few questions below:
>>
>> 1) The KIP says: *`Typically, applications tend to send one request to
>> create all the topics that they need`*. What would the point of throttling
>> be in this case? If there was a user quota for the principal used by that
>> application, wouldn't we just allow the request due to the burst value? Is
>> the KIP specifically for the case where multiple apps with the same user
>> principal are started all at once?
>>
>> 2) Will there be a bound on the maximum throttle time?
>>
>> 3) Will *retry.quota.violation.exception* have any impact on request quota
>> throttling or is it limited to the three requests where the new quota is
>> applied? If it is only for the new quota, perhaps it would be better to
>> add
>> an option to the relevant requests rather than use an admin client config.
>>
>> 4) Is this outdated text, wasn't sure what this refers to : *While the
>> configuration could be set by broker, we envision to have it set only has
>> a
>> cluster wide default for two reasons.*
>>
>> Regards,
>>
>> Rajini
>>
>>
>> On Tue, Jun 2, 2020 at 2:55 PM David Jacot  wrote:
>>
>> > Hi Jun,
>> >
>> > Thanks for your reply.
>> >
>> > 10. I think that both options are likely equivalent from an accuracy
>> point
>> > of
>> > view. If we put the implementation aside, conceptually, I am not
>> convinced
>> > by the used based approach because resources don't have a clear owner
>> > in AK at the moment. A topic can be created by (Principal A, no client
>> id),
>> > then can be extended by (no principal, Client B), and finally deleted by
>> > (Principal C, Client C). This does not sound right to me and I fear
>> that it
>> > is not going to be easy to grasp for our users.
>> >
>> > Regarding the naming, I do agree that we can make it more future proof.
>> > I propose `controller_mutations_rate`. I think that differentiating the
>> > mutations
>> > from the reads is still a good thing for the future.
>> >
>> > 11. I am not convinced by your proposal for the following reasons:
>> >
>> > First, in my toy example, I used 101 windows and 7 * 80 requests. We
>> could
>> > effectively allocate 5 * 100 requests to the previous windows assuming
>> that
>> > they are empty. What shall we do with the remaining 60 requests? Shall
>> we
>> > allocate them to the current window or shall we divide it among all the
>> > windows?
>> >
>> > Second, I don't think that we can safely change the behavior of all the
>> > existing
>> > rates used because it actually changes the computation of the rate as
>> > values
>> > allocated to past windows would expire before they would today.
>> >
>> > Overall, while trying to fit in the current rate, we are going to build
>> a
>> > slightly
>> > different version of the rate which will be even more 

Build failed in Jenkins: kafka-trunk-jdk14 #164

2020-06-03 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-10083: fix failed


--
[...truncated 2.63 MB...]
unit.kafka.cluster.AssignmentStateTest.testPartitionAssignmentStatus[7] failed, 
log available in 
/home/jenkins/jenkins-slave/workspace/kafka-trunk-jdk14/core/build/reports/testOutput/unit.kafka.cluster.AssignmentStateTest.testPartitionAssignmentStatus[7].test.stdout

unit.kafka.cluster.AssignmentStateTest > testPartitionAssignmentStatus[7] FAILED
[2007.091s][warning][os,thread] Failed to start thread - pthread_create failed 
(EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
memory or process/resource limits reached
at java.base/java.lang.Thread.start0(Native Method)
at java.base/java.lang.Thread.start(Thread.java:799)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:937)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343)
at 
java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:118)
at kafka.log.LogManager.$anonfun$shutdown$9(LogManager.scala:461)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:273)
at scala.collection.TraversableLike.map$(TraversableLike.scala:266)
at scala.collection.immutable.List.map(List.scala:298)
at kafka.log.LogManager.$anonfun$shutdown$4(LogManager.scala:461)
at 
kafka.log.LogManager.$anonfun$shutdown$4$adapted(LogManager.scala:444)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.LogManager.shutdown(LogManager.scala:444)
at 
unit.kafka.cluster.AbstractPartitionTest.tearDown(AbstractPartitionTest.scala:91)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.RunAfters.invokeMethod(RunAfters.java:46)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at org.junit.runners.Suite.runChild(Suite.java:128)
at org.junit.runners.Suite.runChild(Suite.java:27)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
[2007.092s][warning][os,thread] Failed to start thread - pthread_create failed 
(EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, detached.
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 

[jira] [Created] (KAFKA-10090) Misleading warnings: The configuration was supplied but isn't a known config

2020-06-03 Thread Robert Wruck (Jira)
Robert Wruck created KAFKA-10090:


 Summary: Misleading warnings: The configuration was supplied but 
isn't a known config
 Key: KAFKA-10090
 URL: https://issues.apache.org/jira/browse/KAFKA-10090
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 2.5.0
Reporter: Robert Wruck


In our setup (using Spring cloud stream Kafka binder), we see log messages like:

 

{{The configuration 'ssl.keystore.password' was supplied but isn't a known 
config}}

 

logged by org.apache.kafka.clients.admin.AdminClientConfig. The Kafka binder 
actually uses SSL and security.protocol is set to SSL.

Looking through the code, a few things seem odd:
 * The log message says "isn't a known config" but that's not true. It is 
*known*, i.e. defined in ConfigDef, but not *used*.
 * The method for detecting whether a config is actually *used* is not 
complete. ChannelBuilders.channelBuilderConfigs() for example extracts the 
configs to use for the created channel builder using *new 
HashMap(config.values())* thus *get()* won't mark a config as used anymore.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-2.4-jdk8 #219

2020-06-03 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] KAFKA-10083: fix failed


--
[...truncated 2.76 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnIsOpen 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnIsOpen 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldDeleteAndReturnPlainValue STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldDeleteAndReturnPlainValue PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnName 
STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shouldReturnName 
PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutAllWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutAllWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldReturnIsPersistent STARTED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldReturnIsPersistent PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > 
shouldPutIfAbsentWithUnknownTimestamp STARTED


Jenkins build is back to normal : kafka-2.5-jdk8 #142

2020-06-03 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : kafka-trunk-jdk11 #1529

2020-06-03 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-10089) The stale ssl engine factory is not closed after reconfigure

2020-06-03 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-10089:
--

 Summary: The stale ssl engine factory is not closed after 
reconfigure
 Key: KAFKA-10089
 URL: https://issues.apache.org/jira/browse/KAFKA-10089
 Project: Kafka
  Issue Type: Bug
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


{code}
@Override
public void reconfigure(Map newConfigs) throws KafkaException {
SslEngineFactory newSslEngineFactory = 
createNewSslEngineFactory(newConfigs);
if (newSslEngineFactory != this.sslEngineFactory) {
this.sslEngineFactory = newSslEngineFactory; // we should close the 
older one
log.info("Created new {} SSL engine builder with keystore {} 
truststore {}", mode,
newSslEngineFactory.keystore(), 
newSslEngineFactory.truststore());
}
}
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is back to normal : kafka-2.6-jdk8 #13

2020-06-03 Thread Apache Jenkins Server
See 




Re: KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-03 Thread David Jacot
Hi Rajini,

Thanks for your feedback. Please find my answers below:

1) Our main goal is to protect the controller from the extreme users
(DDoS). We want
to protect it from large requests or repetitive requests coming from a
single user.
That user could be used by multiple apps as you pointed out which makes it
even
worst. For instance, a buggy application could continuously create and
delete
topics in a tight loop.

The idea of the burst is to still allow creating or deleting topics in
batch because
this is how applications tend to do it. However, we want to keep the batch
size
under control with the burst. The burst does not allow requests of any
size. Topics
are accepted until the burst is passed. All the others are rejected.
Ideally, regular
and well behaving applications should never or rarely be throttled.

2) No, there is no explicit bound on the maximum throttle time. Having a
maximum
is straightforward here because the throttling time depends on the actual
size of
the request.

3) That's a very good question that I haven't thought of. I was thinking
about doing
it for the new quota only. I think that your suggestion of having a per
method argument
makes sense. I will update the KIP.

4) Indeed, it is outdated text. Let me update that.

Regards,
David

On Wed, Jun 3, 2020 at 12:01 AM Rajini Sivaram 
wrote:

> Hi David,
>
> Thanks for the KIP. A few questions below:
>
> 1) The KIP says: *`Typically, applications tend to send one request to
> create all the topics that they need`*. What would the point of throttling
> be in this case? If there was a user quota for the principal used by that
> application, wouldn't we just allow the request due to the burst value? Is
> the KIP specifically for the case where multiple apps with the same user
> principal are started all at once?
>
> 2) Will there be a bound on the maximum throttle time?
>
> 3) Will *retry.quota.violation.exception* have any impact on request quota
> throttling or is it limited to the three requests where the new quota is
> applied? If it is only for the new quota, perhaps it would be better to add
> an option to the relevant requests rather than use an admin client config.
>
> 4) Is this outdated text, wasn't sure what this refers to : *While the
> configuration could be set by broker, we envision to have it set only has a
> cluster wide default for two reasons.*
>
> Regards,
>
> Rajini
>
>
> On Tue, Jun 2, 2020 at 2:55 PM David Jacot  wrote:
>
> > Hi Jun,
> >
> > Thanks for your reply.
> >
> > 10. I think that both options are likely equivalent from an accuracy
> point
> > of
> > view. If we put the implementation aside, conceptually, I am not
> convinced
> > by the used based approach because resources don't have a clear owner
> > in AK at the moment. A topic can be created by (Principal A, no client
> id),
> > then can be extended by (no principal, Client B), and finally deleted by
> > (Principal C, Client C). This does not sound right to me and I fear that
> it
> > is not going to be easy to grasp for our users.
> >
> > Regarding the naming, I do agree that we can make it more future proof.
> > I propose `controller_mutations_rate`. I think that differentiating the
> > mutations
> > from the reads is still a good thing for the future.
> >
> > 11. I am not convinced by your proposal for the following reasons:
> >
> > First, in my toy example, I used 101 windows and 7 * 80 requests. We
> could
> > effectively allocate 5 * 100 requests to the previous windows assuming
> that
> > they are empty. What shall we do with the remaining 60 requests? Shall we
> > allocate them to the current window or shall we divide it among all the
> > windows?
> >
> > Second, I don't think that we can safely change the behavior of all the
> > existing
> > rates used because it actually changes the computation of the rate as
> > values
> > allocated to past windows would expire before they would today.
> >
> > Overall, while trying to fit in the current rate, we are going to build a
> > slightly
> > different version of the rate which will be even more confusing for
> users.
> >
> > Instead, I think that we should embrace the notion of burst as it could
> > also
> > be applied to other quotas in the future. Users don't have to know that
> we
> > use the Token Bucket or a special rate inside at the end of the day. It
> is
> > an
> > implementation detail.
> >
> > Users would be able to define:
> > - a rate R; and
> > - a maximum burst B.
> >
> > If we change the metrics to be as follow:
> > - the actual rate;
> > - the burst balance in %, 0 meaning that the user is throttled;
> > It remains disattach from the algorithm.
> >
> > I personally prefer this over having to define a rate and a number of
> > windows
> > while having to understand that the number of windows implicitly defines
> > the
> > allowed burst. I think that it is clearer and easier to grasp. Don't you?
> >
> > Best,
> > David
> >
> > On Fri, May 29, 2020 at 6:38 PM Jun Rao  wrote:
> >
> > > 

Re: [VOTE] KIP-601: Configurable socket connection timeout in NetworkClient

2020-06-03 Thread Cheng Tan
Dear Rajini,

Thanks for the feedback. 

1) 
Because "request.timeout.ms" only affects in-flight requests, after the API 
NetworkClient.ready() is invoked, the connection won't get closed after 
"request.timeout.ms” hits. Before 
a) the SocketChannel is connected
b) ssl handshake finished
c) authentication has finished (sasl) 
clients cannot invoke NetworkClient.send() to send any request, which means no 
in-flight request targeting to the connection will be added.


2) 
I think a default value of 127 seconds make sense, which meets the timeout 
indirectly specified by the default value of “tcp.syn.retries”. I’ve added this 
into the KIP proposal.


3) 
Every time the timeout hits, the timeout value of the next connection try will 
increase. 

The timeout will hit iff a connection stays at the `connecting` state longer 
than the timeout value, as indicated by 
ClusterConnectionStates.NodeConnectionState. The connection state of a node 
will change iff `SelectionKey.OP_CONNECT` is detected by 
`nioSelector.Select()`. The connection state may transit from `connecting` to 

a) `disconnected` when SocketChannel.finishConnect() throws IOException.
b) `connected` when SocketChannel.finishConnect() return TRUE.

In other words, the timeout will hit and increase iff the interested 
SelectionKey.OP_CONNECT doesn't trigger before the timeout arrives, which 
means, for example, network congestion, failure of the ARP request, packet 
filtering, routing error, or a silent discard may happen. (I didn’t read the 
Java NIO source code. Please correct me the case when OP_CONNECT won’t get 
triggered if I’m wrong)


4) 

A) Connection timeout dominates both request timeout and API timeout

When connection timeout hits, the connection will be closed. The client will be 
notified either by the responses constructed by NetworkClient or the callbacks 
attached to the request. As a result, the request failure will be handled 
before either connection timeout or API timeout arrives.


B) Neither request timeout or API timeout dominates connection timeout

i) Request timeout: Because request timeout only affects in-flight requests, 
after the API NetworkClient.ready() is invoked, the connection won't get closed 
after "request.timeout.ms” hits. Before
1. the SocketChannel is connected
2. SSL handshake finished
3. authentication has finished (SASL)
, clients won't be able to invoke NetworkClient.send() to send any request, 
which means no in-flight request targeting to the connection will be added.

ii) API timeout: In AdminClient, API timeout acts by putting a smaller and 
smaller timeout value to the chain of requests in a same API. After the API 
timeout hits, the retry logic won't close any connection. In consumer, API 
timeout acts as a whole by putting a limit to the code block executing time. 
The retry logic won't close any connection as well.


Conclusion: 

Thanks again for the long feedback and I’m always enjoying them. I’ve 
supplement the above discussion into the KIP proposal. Please let me know what 
you think.


Best, - Cheng Tan


> On Jun 2, 2020, at 3:01 AM, Rajini Sivaram  wrote:
> 
> Hi Cheng,
> 
> Not sure if the discussion should move back to the DISCUSS thread. I have a
> few questions:
> 
> 1) The KIP motivation says that in some cases `request.timeout.ms` doesn't
> timeout connections properly and as a result it takes 127s to detect a
> connection failure. This sounds like a bug rather than a limitation of the
> current approach. Can you explain the scenarios where this occurs?
> 
> 2) I think the current proposal is to use non-exponential 10s connection
> timeout as default with the option to use exponential timeout. So
> connection timeouts for every connection attempt will be between 8s and 12s
> by default. Is that correct? Should we use a default max timeout to enable
> exponential timeout by default since 8s seems rather small?
> 
> 3) What is the scope of `failures` used to determine connection timeout
> with exponential timeouts? Will we always use 10s followed by 20s every
> time a connection is attempted?
> 
> 4) It will be good if we can include two flows with the relationship
> between various timeouts in the KIP. One with a fixed node like a typical
> produce/consume request to the leader and another that uses
> `leastLoadedNode` like a metadata request. Having the comparison between
> the current and proposed behaviour w.r.t all configurable timeouts (the two
> new connection timeouts, request timeout, api timeout etc.) will be useful.
> 
> Regards,
> 
> Rajini
> 


[jira] [Resolved] (KAFKA-9320) Enable TLSv1.3 by default and disable some of the older protocols

2020-06-03 Thread Nikolay Izhikov (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Izhikov resolved KAFKA-9320.

Resolution: Fixed

Fixed with the 
https://github.com/apache/kafka/commit/8b22b8159673bfe22d8ac5dcd4e4312d4f2c863c

> Enable TLSv1.3 by default and disable some of the older protocols
> -
>
> Key: KAFKA-9320
> URL: https://issues.apache.org/jira/browse/KAFKA-9320
> Project: Kafka
>  Issue Type: New Feature
>  Components: security
>Reporter: Rajini Sivaram
>Assignee: Nikolay Izhikov
>Priority: Major
>  Labels: needs-kip
> Attachments: report.txt
>
>
> KAFKA-7251 added support for TLSv1.3. We should include this in the list of 
> protocols that are enabled by default. We should also disable some of the 
> older protocols that are not secure. This change requires a KIP.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)