Build failed in Jenkins: kafka-2.3-jdk8 #160

2020-01-06 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] HOTFIX: fix system test race condition (#7836)


--
[...truncated 2.94 MB...]
kafka.log.LogValidatorTest > testCompressedV1 STARTED

kafka.log.LogValidatorTest > testCompressedV1 PASSED

kafka.log.LogValidatorTest > testCompressedV2 STARTED

kafka.log.LogValidatorTest > testCompressedV2 PASSED

kafka.log.LogValidatorTest > testDownConversionOfIdempotentRecordsNotPermitted 
STARTED

kafka.log.LogValidatorTest > testDownConversionOfIdempotentRecordsNotPermitted 
PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV2NonCompressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV2NonCompressed PASSED

kafka.log.LogValidatorTest > testAbsoluteOffsetAssignmentCompressed STARTED

kafka.log.LogValidatorTest > testAbsoluteOffsetAssignmentCompressed PASSED

kafka.log.LogValidatorTest > testLogAppendTimeWithRecompressionV1 STARTED

kafka.log.LogValidatorTest > testLogAppendTimeWithRecompressionV1 PASSED

kafka.log.LogValidatorTest > testLogAppendTimeWithRecompressionV2 STARTED

kafka.log.LogValidatorTest > testLogAppendTimeWithRecompressionV2 PASSED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV0ToV1 STARTED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV0ToV1 PASSED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV0ToV2 STARTED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV0ToV2 PASSED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV1ToV2 STARTED

kafka.log.LogValidatorTest > testCreateTimeUpConversionV1ToV2 PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV0Compressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV0Compressed PASSED

kafka.log.LogValidatorTest > testZStdCompressedWithUnavailableIBPVersion STARTED

kafka.log.LogValidatorTest > testZStdCompressedWithUnavailableIBPVersion PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV1ToV2Compressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV1ToV2Compressed PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV1NonCompressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV1NonCompressed PASSED

kafka.log.LogValidatorTest > 
testDownConversionOfTransactionalRecordsNotPermitted STARTED

kafka.log.LogValidatorTest > 
testDownConversionOfTransactionalRecordsNotPermitted PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV1Compressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterUpConversionV0ToV1Compressed PASSED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentNonCompressedV1 STARTED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentNonCompressedV1 PASSED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentNonCompressedV2 STARTED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentNonCompressedV2 PASSED

kafka.log.LogValidatorTest > testControlRecordsNotAllowedFromClients STARTED

kafka.log.LogValidatorTest > testControlRecordsNotAllowedFromClients PASSED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentCompressedV1 STARTED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentCompressedV1 PASSED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentCompressedV2 STARTED

kafka.log.LogValidatorTest > testRelativeOffsetAssignmentCompressedV2 PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV1NonCompressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV1NonCompressed PASSED

kafka.log.LogValidatorTest > testLogAppendTimeNonCompressedV1 STARTED

kafka.log.LogValidatorTest > testLogAppendTimeNonCompressedV1 PASSED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV0NonCompressed STARTED

kafka.log.LogValidatorTest > 
testOffsetAssignmentAfterDownConversionV2ToV0NonCompressed PASSED

kafka.log.LogValidatorTest > testControlRecordsNotCompressed STARTED

kafka.log.LogValidatorTest > testControlRecordsNotCompressed PASSED

kafka.log.LogValidatorTest > testInvalidCreateTimeNonCompressedV1 STARTED

kafka.log.LogValidatorTest > testInvalidCreateTimeNonCompressedV1 PASSED

kafka.log.LogValidatorTest > testInvalidCreateTimeNonCompressedV2 STARTED

kafka.log.LogValidatorTest > testInvalidCreateTimeNonCompressedV2 PASSED

kafka.log.LogValidatorTest > testCompressedBatchWithoutRecordsNotAllowed STARTED

kafka.log.LogValidatorTest > testCompressedBatchWithoutRecordsNotAllowed PASSED

kafka.log.LogValidatorTest > testInvalidInnerMagicVersion STARTED

kafka.log.LogValidatorTest > testInvalidInnerMagicVersion PASSED

kafka.log.LogValidatorTest > testInvalidOffsetRangeAndRecordCount STARTED

kafka.log.LogValidatorTest > testInvalidOffsetRangeAndRecordCount PASSED

kafka.log

[jira] [Created] (KAFKA-9374) Worker can be disabled by blocked connectors

2020-01-06 Thread Chris Egerton (Jira)
Chris Egerton created KAFKA-9374:


 Summary: Worker can be disabled by blocked connectors
 Key: KAFKA-9374
 URL: https://issues.apache.org/jira/browse/KAFKA-9374
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.3.1, 2.4.0, 2.2.2, 2.2.1, 2.3.0, 2.1.1, 2.2.0, 2.1.0, 
2.0.1, 2.0.0, 1.1.1, 1.1.0, 1.0.2, 1.0.1, 1.0.0
Reporter: Chris Egerton
Assignee: Chris Egerton


If a connector hangs during any of its {{initialize}}, {{start}}, {{stop}}, 
\{taskConfigs}}, {{taskClass}}, {{version}}, {{config}}, or {{validate}} 
methods, the worker will be disabled for some types of requests thereafter, 
including connector creation, connector reconfiguration, and connector deletion.
 This only occurs in distributed mode and is due to the threading model used by 
the 
[DistributedHerder|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java]
 class.

 

One potential solution could be to treat connectors that fail to start, stop, 
etc. in time similarly to tasks that fail to stop within the [task graceful 
shutdown timeout 
period|https://github.com/apache/kafka/blob/03f763df8a8d9482d8c099806336f00cf2521465/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerConfig.java#L121-L126]
 by handling all connector interactions on a separate thread, waiting for them 
to complete within a timeout, and abandoning the thread (and transitioning the 
connector to the {{FAILED}} state, if it has been created at all) if that 
timeout expires.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-2.4-jdk8 #121

2020-01-06 Thread Apache Jenkins Server
See 


Changes:

[bill] MINOR: Fixes for adding new DSL naming page (#7899)


--
[...truncated 5.48 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestampWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfTimestampIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.tes

Re: [DISCUSS] KIP-553: Enable TLSv1.3 by default and disable all protocols except [TLSV1.2, TLSV1.3]

2020-01-06 Thread Николай Ижиков
Hello, Rajini.

Can you, please, clarify, what should be done?
I can try to do tests by myself.

> 6 янв. 2020 г., в 21:29, Rajini Sivaram  написал(а):
> 
> Hi Brajesh.
> 
> No one is working on this yet, but will follow up with the Confluent tools
> team to see when this can be done.
> 
> On Mon, Jan 6, 2020 at 3:29 PM Brajesh Kumar  wrote:
> 
>> Hello Rajini,
>> 
>> What is the plan to run system tests using JDK 11? Is someone working on
>> this?
>> 
>> On Mon, Jan 6, 2020 at 3:00 PM Rajini Sivaram 
>> wrote:
>> 
>>> Hi Nikolay,
>>> 
>>> We can leave the KIP open and restart the discussion once system tests
>> are
>>> running.
>>> 
>>> Thanks,
>>> 
>>> Rajini
>>> 
>>> On Mon, Jan 6, 2020 at 2:46 PM Николай Ижиков 
>> wrote:
>>> 
 Hello, Rajini.
 
 Thanks, for the feedback.
 
 Should I mark this KIP as declined?
 Or just wait for the system tests results?
 
> 6 янв. 2020 г., в 17:26, Rajini Sivaram 
 написал(а):
> 
> Hi Nikolay,
> 
> Thanks for the KIP. We currently run system tests using JDK 8 and
>> hence
 we
> don't yet have full system test results with TLS 1.3 which requires
>> JDK
 11.
> We should wait until that is done before enabling TLS1.3 by default.
> 
> Regards,
> 
> Rajini
> 
> 
> On Mon, Dec 30, 2019 at 5:36 AM Николай Ижиков 
 wrote:
> 
>> Hello, Team.
>> 
>> Any feedback on this KIP?
>> Do we need this in Kafka?
>> 
>>> 24 дек. 2019 г., в 18:28, Nikolay Izhikov 
>> написал(а):
>>> 
>>> Hello,
>>> 
>>> I'd like to start a discussion of KIP.
>>> Its goal is to enable TLSv1.3 and disable obsolete versions by
>>> default.
>>> 
>>> 
>> 
 
>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=142641956
>>> 
>>> Your comments and suggestions are welcome.
>>> 
>> 
>> 
 
 
>>> 
>> 
>> 
>> --
>> Regards,
>> Brajesh Kumar
>> 



Jenkins build is back to normal : kafka-trunk-jdk11 #1055

2020-01-06 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-9373) Improve shutdown performance via lazy accessing the offset and time indices.

2020-01-06 Thread Adem Efe Gencer (Jira)
Adem Efe Gencer created KAFKA-9373:
--

 Summary: Improve shutdown performance via lazy accessing the 
offset and time indices.
 Key: KAFKA-9373
 URL: https://issues.apache.org/jira/browse/KAFKA-9373
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 2.3.1, 2.4.0, 2.3.0
Reporter: Adem Efe Gencer
Assignee: Adem Efe Gencer
 Fix For: 2.3.1, 2.4.0, 2.3.0


KAFKA-7283 enabled lazy mmap on index files by initializing indices on-demand 
rather than performing costly disk/memory operations when creating all indices 
on broker startup. This helped reducing the startup time of brokers. However, 
segment indices are still created on closing segments, regardless of whether 
they need to be closed or not.
 
Ideally we should:
 * Improve shutdown performance via lazy accessing the offset and time indices.
 * Eliminate redundant disk accesses and memory mapped operations while 
deleting or renaming files that back segment indices.
 * Prevent illegal accesses to underlying indices of a closed segment, which 
would lead to memory leaks due to recreation of the underlying memory mapped 
objects.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9372) Add producer config to make topicExpiry configurable

2020-01-06 Thread Jiao Zhang (Jira)
Jiao Zhang created KAFKA-9372:
-

 Summary: Add producer config to make topicExpiry configurable
 Key: KAFKA-9372
 URL: https://issues.apache.org/jira/browse/KAFKA-9372
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Affects Versions: 1.1.0
Reporter: Jiao Zhang


Sometimes we got error "org.apache.kafka.common.errors.TimeoutException: Failed 
to update metadata after 1000 ms" on producer side. We did the investigation 
and found
 # our producer produced messages in really low rate, the interval is more than 
10 minutes
 # by default, producer would expire topics after TOPIC_EXPIRY_MS, after topic 
expired if no data produce before next metadata update (automatically triggered 
by metadata.max.age.ms) partitions entry for the topic would disappear from the 
Metadata cache As a result, almost for every time's produce, producer need 
fetch metadata which could possibly end with timeout.

To solve this, we propose to add a new config metadata.topic.expiry for 
producer to make topicExpiry configurable. Topic expiry is good only when 
producer is long-lived and is used for producing variable counts of topics. But 
in the case that producers are bounded to single or few fixed topics, there is 
no need to expire topics at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9371) Disk space is not released after Kafka clears data due to retention settings

2020-01-06 Thread Arda Savran (Jira)
Arda Savran created KAFKA-9371:
--

 Summary: Disk space is not released after Kafka clears data due to 
retention settings
 Key: KAFKA-9371
 URL: https://issues.apache.org/jira/browse/KAFKA-9371
 Project: Kafka
  Issue Type: Bug
  Components: core, streams
Affects Versions: 2.2.0
 Environment: CentOS 7.7
Reporter: Arda Savran


We defined retention time on topics for 15 minutes. It looks like Kafka is 
deleting the messages as configured however the disk space is not restored. 
"df" output shows 30G for kafka-logs instead of the real size which is supposed 
to be 1Gb. 

 {{/usr/sbin/lsof | grep deleted }}

output shows a bunch of files under kafka-logs that are deleted but they are 
still consuming space.

Is this a known issue? Is there a setting that I can apply to kafka broker 
server?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-551: Expose disk read and write metrics

2020-01-06 Thread Colin McCabe
On Tue, Dec 10, 2019, at 11:10, Magnus Edenhill wrote:
> Hi Colin,
> 

Hi Magnus,

Thanks for taking a look.

> aren't those counters (ever increasing), rather than gauges (fluctuating)?

Since this is in the Kafka broker, we're using Yammer.  This might be 
confusing, but Yammer's concept of a "counter" is not actually monotonic.  It 
can decrease as well as increase.

In general Yammer counters require you to call inc(amount) or dec(amount) on 
them.  This doesn't match up with what we need to do here, which is to 
(essentially) make a callback into the kernel by reading from /proc.

The counter/gauge dichotomy doesn't affect the JMX, (I think?), so it's really 
kind of an implementation detail.

> 
> You also mention CPU usage as a side note, you could use getrusage(2)'s
> ru_utime (user) and ru_stime (sys)
> to allow the broker to monitor its own CPU usage.
> 

Interesting idea.  It might be better to save that for a future KIP, though, to 
avoid scope creep.

best,
Colin

> /Magnus
> 
> Den tis 10 dec. 2019 kl 19:33 skrev Colin McCabe :
> 
> > Hi all,
> >
> > I wrote KIP about adding support for exposing disk read and write
> > metrics.  Check it out here:
> >
> > https://cwiki.apache.org/confluence/x/sotSC
> >
> > best,
> > Colin
> >
>


Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

2020-01-06 Thread Colin McCabe
On Mon, Jan 6, 2020, at 14:40, Brian Byrne wrote:
> So the performance of a metadata RPC that occurs every once every 10
> seconds should not be measured strictly in CPU cost, but rather the effect
> on the 95-99%. The larger the request is, the more opportunity there is to
> put a burst stress on the producer and broker, and the larger the response
> payload to push through the control plane socket. Maybe that's not at 5k
> topics, but there are groups that are 10k+ topics and pushing further.

KAFKA-7019 made reading the metadata lock-free.  There is no a priori reason to 
prefer lots of small requests to a few big requests (within reason!)  In fact, 
it's quite the opposite: when we make lots of small requests, it uses more 
network bandwidth than when we make a few big ones.  There are a few reasons 
for this: the request and response headers have a fixed overhead, one big array 
takes less space when serialized than several small ones, etc.  There is also 
TCP and IP overhead, etc.

The broker can only push a few tens of thousands of metadata requests a second, 
due to the overhead of message processing.  This is why almost all of the admin 
commands support batching.  So if you need to create 1,000 topics, you make one 
request, not 1,000 requests, for example.

It's definitely reasonable to limit the number of topics made per metadata 
request.  But the reason for this is not improving performance, but preventing 
certain bad corner cases that happen when RPCs get too big.  For example, one 
problem that can happen when a metadata response gets too big is that the 
client could time out before it finishes reading the response.  Or if the 
response got way too big, it could even exceed the maximum response size.

So I think the limit should be pretty high here.  We might also consider 
putting the limit in terms of number of partitions rather than number of 
topics, since that's what really matters here (this is harder to implement, I 
realize...)  If I had to put a rough number on it, I'd say we don't want more 
than like 50 MB of response data.  This is vaguely in line with how we do fetch 
responses as well (although I think the limit there is higher).

We should also keep in mind that anyone with a wildcard subscription is making 
full metadata requests, which will return back information about every topic in 
the system.

> 
> There's definitely weight to the metadata RPCs. Looking at a previous
> local, non-loaded test I ran, I calculate about 2 microseconds per
> partition latency to the producer. At 10,000 topics with 100 partitions
> each, that's a full 2-second bubble in the best case. I can rerun a more
> targeted performance test, but I feel that's missing the point.
> 

If the metadata is fetched in the background, there should be no impact on 
producer latency, right?

It would be good to talk more about the importance of background metadata 
fetching in the KIP.  The fact that we don't do this is actually a big problem 
with the current implementation.  As I understand it, when the metadata gets 
too old, we slam on the brakes and wait for a metadata fetch to complete, 
rather than starting the metadata fetch BEFORE we need it.  It's just bad 
scheduling.

best,
Colin

> 
> Brian
> 
> On Mon, Jan 6, 2020 at 1:31 PM Colin McCabe  wrote:
> 
> > On Mon, Jan 6, 2020, at 13:07, Brian Byrne wrote:
> > > Hi Colin,
> > >
> > > Thanks again for the feedback!
> > >
> > > On Mon, Jan 6, 2020 at 12:07 PM Colin McCabe  wrote:
> > >
> > > > Metadata requests don't (always) go to the controller, right?  We
> > should
> > > > fix the wording in this section.
> > > >
> > >
> > > You're correct, s/controller/broker(s)/.
> > >
> > > I feel like "Proposed Changes" should come before "Public Interfaces"
> > > > here.  The new configuration won't make sense to the reader until he
> > or she
> > > > has read the "changes" section.  Also, it's not clear from the name
> > that
> > > > "metadata evict" refers to a span of time.  What do you think about "
> > > > metadata.eviction.period.ms" as a configuration name?
> > > >
> > >
> > > Sure, makes sense. Updated order and config name.
> > >
> > >
> > > > Where is the "10 seconds" coming from here?  The current default for
> > > > metadata.max.age.ms is 5 * 60 * 1000 ms, which implies that we want to
> > > > refresh every 5 minutes.  Definitely not every 10 seconds.
> > > >
> > >
> > > The 10 seconds is another arbitrary value to prevent a large number of
> > RPCs
> > > if the target batch size were fixed at 20. For example, if there's 5,000
> > > topics with a 5-minute interval, then instead of issuing an RPC every
> > > 1.2 seconds with batch size of 20, it would issue an RPC every 10 seconds
> > > with batch size of 167.
> > >
> >
> > Hmm.  This will lead to many more RPCs compared to the current situation
> > of issuing an RPC every 5 minutes with 5,000 topics, right?  See below for
> > more discussion.
> >
> > >
> > >
> > > > Stepping back a little bit, it s

Build failed in Jenkins: kafka-trunk-jdk11 #1054

2020-01-06 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR: Propagate LogContext to channel builders and SASL authenticator


--
[...truncated 2.79 MB...]
org.apache.kafka.streams.TestTopicsTest > testRecordsToList STARTED

org.apache.kafka.streams.TestTopicsTest > testRecordsToList PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValueListDuration STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValueListDuration PASSED

org.apache.kafka.streams.TestTopicsTest > testInputToString STARTED

org.apache.kafka.streams.TestTopicsTest > testInputToString PASSED

org.apache.kafka.streams.TestTopicsTest > testTimestamp STARTED

org.apache.kafka.streams.TestTopicsTest > testTimestamp PASSED

org.apache.kafka.streams.TestTopicsTest > testWithHeaders STARTED

org.apache.kafka.streams.TestTopicsTest > testWithHeaders PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValue STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValue PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateTopicWithNullTopicName STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateTopicWithNullTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairsA

Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

2020-01-06 Thread Brian Byrne
Hello all,

Does anyone else have opinions on the issue of RPC frequency? Would it be
better to remove the fetching of non-urgent topics altogether, so that the
refreshes are contained in a larger batch?

Thanks,
Brian


On Mon, Jan 6, 2020 at 2:40 PM Brian Byrne  wrote:

>
> So the performance of a metadata RPC that occurs every once every 10
> seconds should not be measured strictly in CPU cost, but rather the effect
> on the 95-99%. The larger the request is, the more opportunity there is to
> put a burst stress on the producer and broker, and the larger the response
> payload to push through the control plane socket. Maybe that's not at 5k
> topics, but there are groups that are 10k+ topics and pushing further.
>
> There's definitely weight to the metadata RPCs. Looking at a previous
> local, non-loaded test I ran, I calculate about 2 microseconds per
> partition latency to the producer. At 10,000 topics with 100 partitions
> each, that's a full 2-second bubble in the best case. I can rerun a more
> targeted performance test, but I feel that's missing the point.
>
> If you're arguing that the batch sizes are too small or time interval too
> short, that's fine, and I'll be happy to increase them. However, there is a
> breaking point to justify refreshing a subset of a producer's topics. Even
> if producers out number brokers by 10x, the fixed overhead cost of an RPC
> that occurs every 1-second is probably not worth scraping for performance,
> but worth sanitizing.
>
> Brian
>
> On Mon, Jan 6, 2020 at 1:31 PM Colin McCabe  wrote:
>
>> On Mon, Jan 6, 2020, at 13:07, Brian Byrne wrote:
>> > Hi Colin,
>> >
>> > Thanks again for the feedback!
>> >
>> > On Mon, Jan 6, 2020 at 12:07 PM Colin McCabe 
>> wrote:
>> >
>> > > Metadata requests don't (always) go to the controller, right?  We
>> should
>> > > fix the wording in this section.
>> > >
>> >
>> > You're correct, s/controller/broker(s)/.
>> >
>> > I feel like "Proposed Changes" should come before "Public Interfaces"
>> > > here.  The new configuration won't make sense to the reader until he
>> or she
>> > > has read the "changes" section.  Also, it's not clear from the name
>> that
>> > > "metadata evict" refers to a span of time.  What do you think about "
>> > > metadata.eviction.period.ms" as a configuration name?
>> > >
>> >
>> > Sure, makes sense. Updated order and config name.
>> >
>> >
>> > > Where is the "10 seconds" coming from here?  The current default for
>> > > metadata.max.age.ms is 5 * 60 * 1000 ms, which implies that we want
>> to
>> > > refresh every 5 minutes.  Definitely not every 10 seconds.
>> > >
>> >
>> > The 10 seconds is another arbitrary value to prevent a large number of
>> RPCs
>> > if the target batch size were fixed at 20. For example, if there's 5,000
>> > topics with a 5-minute interval, then instead of issuing an RPC every
>> > 1.2 seconds with batch size of 20, it would issue an RPC every 10
>> seconds
>> > with batch size of 167.
>> >
>>
>> Hmm.  This will lead to many more RPCs compared to the current situation
>> of issuing an RPC every 5 minutes with 5,000 topics, right?  See below for
>> more discussion.
>>
>> >
>> >
>> > > Stepping back a little bit, it seems like the big problem you
>> identified
>> > > is the O(N^2) behavior of producing to X, then Y, then Z, etc. etc.
>> where
>> > > each new produce to a fresh topic triggers a metadata request with
>> all the
>> > > preceding topics included.
>> > >
>> > > Of course we need to send out a metadata request before producing to
>> X,
>> > > then Y, then Z, but that request could just specify X, or just
>> specify Y,
>> > > etc. etc.  In other words, we could decouple decouple the routine
>> metadata
>> > > fetch which happens on a 5 minute timer from the need to fetch
>> metadata for
>> > > something specific right now.
>> > >
>> > > I guess my question is, do we really need to allow routine metadata
>> > > fetches to "piggyback" on the emergency metadata fetches?  It adds a
>> lot of
>> > > complexity, and we don't have any benchmarks proving that it's better.
>> > > Also, as I understand it, whether we piggyback or not, the number of
>> > > metadata fetches is the same, right?
>> > >
>> >
>> > So it's possible to do as you suggest, but I would argue that it'd be
>> more
>> > complex with how the code is structured and wouldn't add any extra
>> > complexity. The derived metadata class effectively respond to a
>> > notification that a metadata RPC is going to be issued, where they
>> return
>> > the metadata request structure with topics to refresh, which is
>> decoupled
>> > from what generated the event (new topic, stale metadata, periodic
>> refresh
>> > alarm). There is also a strict implementation detail that only one
>> metadata
>> > request can be outstanding at any time, which lends itself to
>> consolidate
>> > complexity in the base metadata class and have the derived use the
>> "listen
>> > for next update" model.
>> >
>> > By maintaining an ordered list 

Jenkins build is back to normal : kafka-trunk-jdk11 #1053

2020-01-06 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

2020-01-06 Thread Brian Byrne
So the performance of a metadata RPC that occurs every once every 10
seconds should not be measured strictly in CPU cost, but rather the effect
on the 95-99%. The larger the request is, the more opportunity there is to
put a burst stress on the producer and broker, and the larger the response
payload to push through the control plane socket. Maybe that's not at 5k
topics, but there are groups that are 10k+ topics and pushing further.

There's definitely weight to the metadata RPCs. Looking at a previous
local, non-loaded test I ran, I calculate about 2 microseconds per
partition latency to the producer. At 10,000 topics with 100 partitions
each, that's a full 2-second bubble in the best case. I can rerun a more
targeted performance test, but I feel that's missing the point.

If you're arguing that the batch sizes are too small or time interval too
short, that's fine, and I'll be happy to increase them. However, there is a
breaking point to justify refreshing a subset of a producer's topics. Even
if producers out number brokers by 10x, the fixed overhead cost of an RPC
that occurs every 1-second is probably not worth scraping for performance,
but worth sanitizing.

Brian

On Mon, Jan 6, 2020 at 1:31 PM Colin McCabe  wrote:

> On Mon, Jan 6, 2020, at 13:07, Brian Byrne wrote:
> > Hi Colin,
> >
> > Thanks again for the feedback!
> >
> > On Mon, Jan 6, 2020 at 12:07 PM Colin McCabe  wrote:
> >
> > > Metadata requests don't (always) go to the controller, right?  We
> should
> > > fix the wording in this section.
> > >
> >
> > You're correct, s/controller/broker(s)/.
> >
> > I feel like "Proposed Changes" should come before "Public Interfaces"
> > > here.  The new configuration won't make sense to the reader until he
> or she
> > > has read the "changes" section.  Also, it's not clear from the name
> that
> > > "metadata evict" refers to a span of time.  What do you think about "
> > > metadata.eviction.period.ms" as a configuration name?
> > >
> >
> > Sure, makes sense. Updated order and config name.
> >
> >
> > > Where is the "10 seconds" coming from here?  The current default for
> > > metadata.max.age.ms is 5 * 60 * 1000 ms, which implies that we want to
> > > refresh every 5 minutes.  Definitely not every 10 seconds.
> > >
> >
> > The 10 seconds is another arbitrary value to prevent a large number of
> RPCs
> > if the target batch size were fixed at 20. For example, if there's 5,000
> > topics with a 5-minute interval, then instead of issuing an RPC every
> > 1.2 seconds with batch size of 20, it would issue an RPC every 10 seconds
> > with batch size of 167.
> >
>
> Hmm.  This will lead to many more RPCs compared to the current situation
> of issuing an RPC every 5 minutes with 5,000 topics, right?  See below for
> more discussion.
>
> >
> >
> > > Stepping back a little bit, it seems like the big problem you
> identified
> > > is the O(N^2) behavior of producing to X, then Y, then Z, etc. etc.
> where
> > > each new produce to a fresh topic triggers a metadata request with all
> the
> > > preceding topics included.
> > >
> > > Of course we need to send out a metadata request before producing to X,
> > > then Y, then Z, but that request could just specify X, or just specify
> Y,
> > > etc. etc.  In other words, we could decouple decouple the routine
> metadata
> > > fetch which happens on a 5 minute timer from the need to fetch
> metadata for
> > > something specific right now.
> > >
> > > I guess my question is, do we really need to allow routine metadata
> > > fetches to "piggyback" on the emergency metadata fetches?  It adds a
> lot of
> > > complexity, and we don't have any benchmarks proving that it's better.
> > > Also, as I understand it, whether we piggyback or not, the number of
> > > metadata fetches is the same, right?
> > >
> >
> > So it's possible to do as you suggest, but I would argue that it'd be
> more
> > complex with how the code is structured and wouldn't add any extra
> > complexity. The derived metadata class effectively respond to a
> > notification that a metadata RPC is going to be issued, where they return
> > the metadata request structure with topics to refresh, which is decoupled
> > from what generated the event (new topic, stale metadata, periodic
> refresh
> > alarm). There is also a strict implementation detail that only one
> metadata
> > request can be outstanding at any time, which lends itself to consolidate
> > complexity in the base metadata class and have the derived use the
> "listen
> > for next update" model.
> >
> > By maintaining an ordered list of topics by their last metadata refresh
> > time (0 for new topics), it's a matter of pulling from the front of the
> > list to see which topics should be included in the next update. Always
> > include all urgent topics, then include non-urgent (i.e. need to be
> > refreshed soon-ish) up to the target batch size.
> >
> > The number of metadata fetches could be reduced. Assuming a target batch
> > size of 20, a new topic

Re: [DISCUSS] KIP-515: Enable ZK client to use the new TLS supported authentication

2020-01-06 Thread Colin McCabe
On Fri, Dec 27, 2019, at 10:48, Ron Dagostino wrote:
> Hi everyone.  I would like to make the following changes to the KIP.
> 
> MOTIVATION:
> Include a statement that it will be difficult in the short term to
> deprecate direct Zookeeper communication in kafka-configs.{sh, bat} (which
> invoke kafka.admin.ConfigCommand) because bootstrapping a Kafka cluster
> with encrypted passwords in Zookeeper is an explicitly-supported use case;
> therefore it is in scope to be able to securely configure the CLI tools
> that still leverage non-deprecated direct Zookeeper communication for TLS
> (the other 2 tools are kafka-reassign-partitions.{sh, bat} and
> zookeeper-security-migration.sh).

Hi Ron,

Thanks for the KIP.

About deprecations:

* zookeeper-security-migration: clearly, deprecating ZK access in this one 
would not make sense, since it would defeat the whole point of the tool :)

* kafka-reassign-partitions: ZK access should be deprecated here.  KIP-455 
implementation has dragged a bit, but this should get done soon.  Certainly 
before the next release.

* kafka-configs: I think ZK access should be deprecated here as well.  I agree 
there is a super-special bootstrapping case here, but that should have its own 
tool, not use kafka-configs.

I will post a separate KIP for this, though.

> 
> GOALS:
> Support the secure configuration of TLS-encrypted communication between
> Zookeeper and:
>   a) Kafka brokers
>   b) The three CLI tools mentioned above that still support direct,
> non-deprecated communication to Zookeeper
> It is explicitly out-of-scope to deprecate any direct Zookeeper
> communication in CLI tools as part of this KIP; such work will occur in
> future KIPs instead.
> 
> PUBLIC INTERFACES:
> 1) The following new broker configurations will be recognized.
>   zookeeper.client.secure (default value = false, for backwards
> compatibility)
>   zookeeper.clientCnxnSocket
>   zookeeper.ssl.keyStore.location
>   zookeeper.ssl.keyStore.password
>   zookeeper.ssl.trustStore.location
>   zookeeper.ssl.trustStore.password
> It will be an error for any of the last 5 values to be left unspecified if
> zookeeper.client.secure is explicitly set to true.
> 
> 2) In addition, the kafka.security.authorizer.AclAuthorizer class supports
> the ability to connect to a different Zookeeper instance than the one the
> brokers use.  We therefore also add the following optional configs, which
> override the corresponding ones from above when present:
>   authorizer.zookeeper.client.secure
>   authorizer.zookeeper.clientCnxnSocket
>   authorizer.zookeeper.ssl.keyStore.location
>   authorizer.zookeeper.ssl.keyStore.password
>   authorizer.zookeeper.ssl.trustStore.location
>   authorizer.zookeeper.ssl.trustStore.password
> 
> 3) The three CLI tools mentioned above will support a new --zk-tls-config-file
> " option.  The following
> properties will be recognized in that file, and unrecognized properties
> will be ignored to allow the possibility of pointing zk-tls-config-file at
> the broker's config file.
>   zookeeper.client.secure (default value = false)
>   zookeeper.clientCnxnSocket
>   zookeeper.ssl.keyStore.location
>   zookeeper.ssl.keyStore.password
>   zookeeper.ssl.trustStore.location
>   zookeeper.ssl.trustStore.password
> It will be an error for any of the last 5 values to be left unspecified if
> zookeeper.client.secure is explicitly set to true.

Do we really need a --zk-tls-config-file flag?  It seems like this could just 
appear in a configuration file, which all of these tools already accept (I 
think).

best,
Colin


> 
> Ron
> 
> On Mon, Dec 23, 2019 at 3:03 PM Ron Dagostino  wrote:
> 
> > Hi everyone.  Let's get this discussion going again now that Kafka 2.4
> > with Zookeeper 3.5.6 is out.
> >
> > First, regarding the KIP number, the other KIP that was using this number
> > moved to KIP 534, so KIP 515 remains the correct number for this
> > discussion.  I've updated the Kafka Improvement Proposal page to list this
> > KIP in the 515 slot, so we're all set there.
> >
> > Regarding the actual issues under discussion, I think there are some
> > things we should clarify.
> >
> > 1) It is possible to use TLS connectivity to Zookeeper from Apache Kafka
> > 2.4 -- the problem is that configuration information has to be passed via
> > system properties as "-D" command line options on the java invocation of
> > the broker, and those are not secure (anyone with access to the box can see
> > the command line used to invoke the broker); the configuration includes
> > sensitive information (e.g. a keystore password), so we need a secure
> > mechanism for passing the configuration values.  I believe the real
> > motivation for this KIP is to harden the configuration mechanism for
> > Zookeeper TLS connectivity.
> >
> > 2) I believe the list of CLI tools that continue to use direct Zookeeper
> > connectivity in a non-deprecated fashion is:
> >   a) zookeeper-security-migration.sh/kafka.admin.ZkSecurityMigrator
> >  

Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

2020-01-06 Thread Colin McCabe
On Mon, Jan 6, 2020, at 13:07, Brian Byrne wrote:
> Hi Colin,
> 
> Thanks again for the feedback!
> 
> On Mon, Jan 6, 2020 at 12:07 PM Colin McCabe  wrote:
> 
> > Metadata requests don't (always) go to the controller, right?  We should
> > fix the wording in this section.
> >
> 
> You're correct, s/controller/broker(s)/.
> 
> I feel like "Proposed Changes" should come before "Public Interfaces"
> > here.  The new configuration won't make sense to the reader until he or she
> > has read the "changes" section.  Also, it's not clear from the name that
> > "metadata evict" refers to a span of time.  What do you think about "
> > metadata.eviction.period.ms" as a configuration name?
> >
> 
> Sure, makes sense. Updated order and config name.
> 
> 
> > Where is the "10 seconds" coming from here?  The current default for
> > metadata.max.age.ms is 5 * 60 * 1000 ms, which implies that we want to
> > refresh every 5 minutes.  Definitely not every 10 seconds.
> >
> 
> The 10 seconds is another arbitrary value to prevent a large number of RPCs
> if the target batch size were fixed at 20. For example, if there's 5,000
> topics with a 5-minute interval, then instead of issuing an RPC every
> 1.2 seconds with batch size of 20, it would issue an RPC every 10 seconds
> with batch size of 167.
> 

Hmm.  This will lead to many more RPCs compared to the current situation of 
issuing an RPC every 5 minutes with 5,000 topics, right?  See below for more 
discussion.

> 
> 
> > Stepping back a little bit, it seems like the big problem you identified
> > is the O(N^2) behavior of producing to X, then Y, then Z, etc. etc. where
> > each new produce to a fresh topic triggers a metadata request with all the
> > preceding topics included.
> >
> > Of course we need to send out a metadata request before producing to X,
> > then Y, then Z, but that request could just specify X, or just specify Y,
> > etc. etc.  In other words, we could decouple decouple the routine metadata
> > fetch which happens on a 5 minute timer from the need to fetch metadata for
> > something specific right now.
> >
> > I guess my question is, do we really need to allow routine metadata
> > fetches to "piggyback" on the emergency metadata fetches?  It adds a lot of
> > complexity, and we don't have any benchmarks proving that it's better.
> > Also, as I understand it, whether we piggyback or not, the number of
> > metadata fetches is the same, right?
> >
> 
> So it's possible to do as you suggest, but I would argue that it'd be more
> complex with how the code is structured and wouldn't add any extra
> complexity. The derived metadata class effectively respond to a
> notification that a metadata RPC is going to be issued, where they return
> the metadata request structure with topics to refresh, which is decoupled
> from what generated the event (new topic, stale metadata, periodic refresh
> alarm). There is also a strict implementation detail that only one metadata
> request can be outstanding at any time, which lends itself to consolidate
> complexity in the base metadata class and have the derived use the "listen
> for next update" model.
> 
> By maintaining an ordered list of topics by their last metadata refresh
> time (0 for new topics), it's a matter of pulling from the front of the
> list to see which topics should be included in the next update. Always
> include all urgent topics, then include non-urgent (i.e. need to be
> refreshed soon-ish) up to the target batch size.
> 
> The number of metadata fetches could be reduced. Assuming a target batch
> size of 20, a new topic might also refresh 19 other "refresh soon" topics
> in the same RPC, as opposed to those 19 being handled in a subsequent RPC.
> 
> Although to counter the above, the batching/piggybacking logic isn't
> necessarily about reducing the total number of RPCs, but rather to
> distribute the load more evenly over time. For example, if a large number
> of topics need to be refreshed at the approximate same time (common for
> startups cases that hit a large number of topics), the updates are more
> evenly distributed to avoid a flood.

It wouldn't be a flood in the current case, right?  It would just be a single 
metadata request for a lot of topics. 

Let's compare the two cases.  In the current scenario, we have 1 metadata 
request every 5 minutes.  This request is for 5,000 topics (let's say).  In the 
new scenario, we have a request every 10 seconds for 167 topics each.

Which do you think will be more expensive?  I think the second scenario 
certainly will because of the overhead of 30x as many requests send over the 
wire.  Metadata accesses are now lockless, so the big metadata request just 
isn't that much of a problem.  I bet if you benchmark it, sending back metadata 
for 167 topics won't be that much cheaper than sending back metadata for 5k.  
Certainly not 30x cheaper.  There will eventually be a point where we need to 
split metadata requests, but it's definitely not at 5,000 topi

[jira] [Created] (KAFKA-9370) Return UNKNOWN_TOPIC_OR_PARTITION if topic deletion is in progress

2020-01-06 Thread Vikas Singh (Jira)
Vikas Singh created KAFKA-9370:
--

 Summary: Return UNKNOWN_TOPIC_OR_PARTITION if topic deletion is in 
progress
 Key: KAFKA-9370
 URL: https://issues.apache.org/jira/browse/KAFKA-9370
 Project: Kafka
  Issue Type: Bug
Reporter: Vikas Singh


`KafkaApis::handleCreatePartitionsRequest` returns `INVALID_TOPIC_EXCEPTION` if 
the topic is getting deleted. Change it to return `UNKNOWN_TOPIC_OR_PARTITION` 
instead. After the delete topic api returns, client should see the topic as 
deleted. The fact that we are processing deletion in background shouldn't have 
any impact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

2020-01-06 Thread Brian Byrne
Hi Colin,

Thanks again for the feedback!

On Mon, Jan 6, 2020 at 12:07 PM Colin McCabe  wrote:

> Metadata requests don't (always) go to the controller, right?  We should
> fix the wording in this section.
>

You're correct, s/controller/broker(s)/.

I feel like "Proposed Changes" should come before "Public Interfaces"
> here.  The new configuration won't make sense to the reader until he or she
> has read the "changes" section.  Also, it's not clear from the name that
> "metadata evict" refers to a span of time.  What do you think about "
> metadata.eviction.period.ms" as a configuration name?
>

Sure, makes sense. Updated order and config name.


> Where is the "10 seconds" coming from here?  The current default for
> metadata.max.age.ms is 5 * 60 * 1000 ms, which implies that we want to
> refresh every 5 minutes.  Definitely not every 10 seconds.
>

The 10 seconds is another arbitrary value to prevent a large number of RPCs
if the target batch size were fixed at 20. For example, if there's 5,000
topics with a 5-minute interval, then instead of issuing an RPC every
1.2 seconds with batch size of 20, it would issue an RPC every 10 seconds
with batch size of 167.



> Stepping back a little bit, it seems like the big problem you identified
> is the O(N^2) behavior of producing to X, then Y, then Z, etc. etc. where
> each new produce to a fresh topic triggers a metadata request with all the
> preceding topics included.
>
> Of course we need to send out a metadata request before producing to X,
> then Y, then Z, but that request could just specify X, or just specify Y,
> etc. etc.  In other words, we could decouple decouple the routine metadata
> fetch which happens on a 5 minute timer from the need to fetch metadata for
> something specific right now.
>
> I guess my question is, do we really need to allow routine metadata
> fetches to "piggyback" on the emergency metadata fetches?  It adds a lot of
> complexity, and we don't have any benchmarks proving that it's better.
> Also, as I understand it, whether we piggyback or not, the number of
> metadata fetches is the same, right?
>

So it's possible to do as you suggest, but I would argue that it'd be more
complex with how the code is structured and wouldn't add any extra
complexity. The derived metadata class effectively respond to a
notification that a metadata RPC is going to be issued, where they return
the metadata request structure with topics to refresh, which is decoupled
from what generated the event (new topic, stale metadata, periodic refresh
alarm). There is also a strict implementation detail that only one metadata
request can be outstanding at any time, which lends itself to consolidate
complexity in the base metadata class and have the derived use the "listen
for next update" model.

By maintaining an ordered list of topics by their last metadata refresh
time (0 for new topics), it's a matter of pulling from the front of the
list to see which topics should be included in the next update. Always
include all urgent topics, then include non-urgent (i.e. need to be
refreshed soon-ish) up to the target batch size.

The number of metadata fetches could be reduced. Assuming a target batch
size of 20, a new topic might also refresh 19 other "refresh soon" topics
in the same RPC, as opposed to those 19 being handled in a subsequent RPC.

Although to counter the above, the batching/piggybacking logic isn't
necessarily about reducing the total number of RPCs, but rather to
distribute the load more evenly over time. For example, if a large number
of topics need to be refreshed at the approximate same time (common for
startups cases that hit a large number of topics), the updates are more
evenly distributed to avoid a flood.

Brian



> On Mon, Jan 6, 2020, at 10:26, Lucas Bradstreet wrote:
> > +1 (non-binding)
> >
> > On Thu, Jan 2, 2020 at 11:15 AM Brian Byrne  wrote:
> >
> > > Hello all,
> > >
> > > After further discussion and improvements, I'd like to reinstate the
> voting
> > > process.
> > >
> > > The updated KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-526%3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> >
> > >
> > > The continued discussion:
> > >
> > >
> https://lists.apache.org/thread.html/b2f8f830ef04587144cf0840c7d4811bbf0a14f3c459723dbc5acf9e@%3Cdev.kafka.apache.org%3E
> > >
> > > I'd be happy to address any further comments/feedback.
> > >
> > > Thanks,
> > > Brian
> > >
> > > On Mon, Dec 9, 2019 at 11:02 PM Guozhang Wang 
> wrote:
> > >
> > > > With the concluded summary on the other discussion thread, I'm +1 on
> the
> > > > proposal.
> > > >
> > > > Thanks Brian!
> > > >
> > > > On Tue, Nov 19, 2019 at 8:00 PM deng ziming <
> dengziming1...@gmail.com>
> > > > wrote:
> > > >
> > > > > >
> > > > > > For new (uncached) topics, one problem here is that we don't know
> > > w

[jira] [Created] (KAFKA-9369) Allow Consumers and Producers to Connect with User-Agent String

2020-01-06 Thread David Mollitor (Jira)
David Mollitor created KAFKA-9369:
-

 Summary: Allow Consumers and Producers to Connect with User-Agent 
String
 Key: KAFKA-9369
 URL: https://issues.apache.org/jira/browse/KAFKA-9369
 Project: Kafka
  Issue Type: Improvement
Reporter: David Mollitor


Given the adhoc nature of consumers and producers in Kafka, it can be difficult 
to track where connections to brokers and partitions are coming from.

 

Please allow consumers and producers to pass an optional _user-agent_ string 
during the connection process so that they can quickly and accurately be 
identified.  For example, if I am performing an upgrade on my consumers, I want 
to be able to see that no consumers with an older version number of the 
consuming software still exist or if I see an application that is configured to 
consumer from the wrong consumer group, they can quickly be identified and 
removed.

 

[https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/User-Agent]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is back to normal : kafka-trunk-jdk8 #4132

2020-01-06 Thread Apache Jenkins Server
See 




Re: Kafka 2.4.0 & Mirror Maker 2.0 Error

2020-01-06 Thread Ryanne Dolan
I just downloaded the 2.4.0 release tarball and didn't run into any issues.
Peter, Jamie, can one of you file a jira ticket if you are still seeing
this? Thanks!

Ryanne

On Fri, Dec 27, 2019 at 12:04 PM Ryanne Dolan  wrote:

> Thanks Peter, I'll take a look.
>
> Ryanne
>
> On Fri, Dec 27, 2019, 7:48 AM Péter Sinóros-Szabó
>  wrote:
>
>> Hi,
>>
>> I see the same.
>> I just downloaded the Kafka zip and I run:
>>
>> ~/kafka-2.4.0-rc3$ ./bin/connect-mirror-maker.sh
>> config/connect-mirror-maker.properties
>>
>> Peter
>>
>> On Mon, 16 Dec 2019 at 17:14, Ryanne Dolan  wrote:
>>
>> > Hey Jamie, are you running the MM2 connectors on an existing Connect
>> > cluster, or with the connet-mirror-maker.sh driver? Given your question
>> > about plugin.path I'm guessing the former. Is the Connect cluster
>> running
>> > 2.4.0 as well? The jars should land in the Connect runtime without any
>> need
>> > to modify the plugin.path or copy jars around.
>> >
>> > Ryanne
>> >
>> > On Mon, Dec 16, 2019, 6:23 AM Jamie 
>> wrote:
>> >
>> > > Hi All,
>> > > I'm trying to set up mirror maker 2.0 with Kafka 2.4.0 however, I'm
>> > > receiving the following errors on startup:
>> > > ERROR Plugin class loader for connector
>> > > 'org.apache.kafka.connect.mirror.MirrorSourceConnector' was not found.
>> > > Returning:
>> > >
>> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
>> > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
>> > > ERROR Plugin class loader for connector
>> > > 'org.apache.kafka.connect.mirror.MirrorHeartbeatConnector' was not
>> > > found. Returning:
>> > >
>> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
>> > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
>> > > ERROR Plugin class loader for connector
>> > > 'org.apache.kafka.connect.mirror.MirrorCheckpointConnector' was not
>> > > found. Returning:
>> > >
>> org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader@187eb9a8
>> > > (org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader)
>> > >
>> > > I've checked the jar file containing these class file is in the class
>> > > path.
>> > > Is there anything I need to add to plugin.path for the connect
>> properties
>> > > when running mirror maker?
>> > > Many Thanks,
>> > > Jamie
>> >
>>
>>
>> --
>>  - Sini
>>
>


Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

2020-01-06 Thread Colin McCabe
Hi Brian,

Thanks for continuing to work on this.  It looks good overall.

It's probably better to keep the discussion on this thread rather than going 
back and forth between this and the DISCUSS thread.

The KIP states:

 > For (1), an RPC is generated every time an uncached topic's metadata must 
 > be fetched. During periods when a large number of uncached topics are 
 > processed (e.g. producer startup), a large number of RPCs may be sent out 
 > to the controller in a short period of time

Metadata requests don't (always) go to the controller, right?  We should fix 
the wording in this section.

I feel like "Proposed Changes" should come before "Public Interfaces" here.  
The new configuration won't make sense to the reader until he or she has read 
the "changes" section.  Also, it's not clear from the name that "metadata 
evict" refers to a span of time.  What do you think about 
"metadata.eviction.period.ms" as a configuration name?

 > Let metadataRefreshSecs = metadata.max.age.ms / 1000
>
 > Set topicsPerSec =  / metadataRefreshSecs
>
 > Set targetMetadataFetchSize = Math.max(topicsPerSec * 10, 20)
>
 > Rationale: this sets the target size to be approximate a metadata refresh 
 > at least every 10 seconds, while also maintaining a reasonable batch size 
 > of '20' for setups with a lower number of topics. '20' has no significance 
 > other than it's a small-but-appropriate trade-off between RPC metadata 
 > response size and necessary RPC frequency.

Where is the "10 seconds" coming from here?  The current default for 
metadata.max.age.ms is 5 * 60 * 1000 ms, which implies that we want to refresh 
every 5 minutes.  Definitely not every 10 seconds.

Stepping back a little bit, it seems like the big problem you identified is the 
O(N^2) behavior of producing to X, then Y, then Z, etc. etc. where each new 
produce to a fresh topic triggers a metadata request with all the preceding 
topics included.  

Of course we need to send out a metadata request before producing to X, then Y, 
then Z, but that request could just specify X, or just specify Y, etc. etc.  In 
other words, we could decouple decouple the routine metadata fetch which 
happens on a 5 minute timer from the need to fetch metadata for something 
specific right now.

I guess my question is, do we really need to allow routine metadata fetches to 
"piggyback" on the emergency metadata fetches?  It adds a lot of complexity, 
and we don't have any benchmarks proving that it's better.  Also, as I 
understand it, whether we piggyback or not, the number of metadata fetches is 
the same, right?

best,
Colin


On Mon, Jan 6, 2020, at 10:26, Lucas Bradstreet wrote:
> +1 (non-binding)
> 
> On Thu, Jan 2, 2020 at 11:15 AM Brian Byrne  wrote:
> 
> > Hello all,
> >
> > After further discussion and improvements, I'd like to reinstate the voting
> > process.
> >
> > The updated KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> > %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> > 
> >
> > The continued discussion:
> >
> > https://lists.apache.org/thread.html/b2f8f830ef04587144cf0840c7d4811bbf0a14f3c459723dbc5acf9e@%3Cdev.kafka.apache.org%3E
> >
> > I'd be happy to address any further comments/feedback.
> >
> > Thanks,
> > Brian
> >
> > On Mon, Dec 9, 2019 at 11:02 PM Guozhang Wang  wrote:
> >
> > > With the concluded summary on the other discussion thread, I'm +1 on the
> > > proposal.
> > >
> > > Thanks Brian!
> > >
> > > On Tue, Nov 19, 2019 at 8:00 PM deng ziming 
> > > wrote:
> > >
> > > > >
> > > > > For new (uncached) topics, one problem here is that we don't know
> > which
> > > > > partition to map a record to in the event that it has a key or custom
> > > > > partitioner, so the RecordAccumulator wouldn't know which
> > batch/broker
> > > it
> > > > > belongs. We'd need an intermediate record queue that subsequently
> > moved
> > > > the
> > > > > records into RecordAccumulators once metadata resolution was
> > complete.
> > > > For
> > > > > known topics, we don't currently block at all in waitOnMetadata.
> > > > >
> > > >
> > > > You are right, I forget this fact, and the intermediate record queue
> > will
> > > > help, but I have some questions
> > > >
> > > > if we add an intermediate record queue in KafkaProducer, when should we
> > > > move the records into RecordAccumulators?
> > > > only NetworkClient is aware of the MetadataResponse, here is the
> > > > hierarchical structure of the related classes:
> > > > KafkaProducer
> > > > Accumulator
> > > > Sender
> > > > NetworkClient
> > > > metadataUpdater.handleCompletedMetadataResponse
> > > >
> > > > so
> > > > 1. we should also add a metadataUpdater to KafkaProducer?
> > > > 2. if the topic really does not exists? the intermediate record queue
> > > will
> > > > become too large?
> > > > 3. and should we `block`

[jira] [Created] (KAFKA-9368) Preserve stream-time across rebalances/restarts

2020-01-06 Thread Sophie Blee-Goldman (Jira)
Sophie Blee-Goldman created KAFKA-9368:
--

 Summary: Preserve stream-time across rebalances/restarts
 Key: KAFKA-9368
 URL: https://issues.apache.org/jira/browse/KAFKA-9368
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Sophie Blee-Goldman


Stream-time is used to make decisions about processing out-of-order records or 
drop them if they are late (ie, timestamp < stream-time - grace-period). This 
is currently tracked on a per-processor basis such that each node has its own 
local view of stream-time based on the maximum timestamp it has processed.

During rebalances and restarts, stream-time is initialized as UNKNOWN (ie, -1) 
for all processors in tasks that are newly created (or migrated). In net 
effect, we forget current stream-time for this case what may lead to 
non-deterministic behavior if we stop processing right before a late record, 
that would be dropped if we continue processing, but is not dropped after 
rebalance/restart. Let's look at an examples with a grace period of 5ms for a 
tumbling windowed of 5ms, and the following records (timestamps in parenthesis):
{code:java}
r1(0) r2(5) r3(11) r4(2){code}
In the example, stream-time advances as 0, 5, 11, 11  and thus record `r4` is 
dropped as late because 2 < 6 = 11 - 5. However, if we stop processing or 
rebalance after processing `r3` but before processing `r4`, we would 
reinitialize stream-time as -1, and thus would process `r4` on restart/after 
rebalance. The problem is, that stream-time does advance differently from a 
global point of view: 0, 5, 11, 2.

Of course, this is a corner case because if we would stop processing one record 
earlier -- ie, after processing `r2` but before processing `r3` -- stream-time 
would be advanced correctly from a global point of view. 

Note that in previous versions the maximum partition-time was actually used for 
stream-time. This changed in 2.3 due to KAFKA-7895/[PR 
6278|https://github.com/apache/kafka/pull/6278], and could potentially change 
yet again in future versions (c.f. KAFKA-8769). Partition-time actually is 
preserved as of 2.4 thanks to KAFKA-7994.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-trunk-jdk11 #1052

2020-01-06 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR: Fix failing test case in TransactionLogTest (#7895)


--
[...truncated 2.79 MB...]
org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairsAndCustomTimestamps
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairsAndCustomTimestamps
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairs 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicNameAndTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicNameAndTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairsAndCustomTimestamps
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairsAndCustomTimestamps
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairsWithCustomTimestampAndIncrementsAndNotAdvanceTime
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairsWithCustomTimestampAndIncrementsAndNotAdvanceTime
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithTimestampWithTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithTimestampWithTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKeyAndDefaultTimestamp
 STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKeyAndDefaultTimestamp
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairsWithTimestampAndIncrements STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairsWithTimestampAndIncrements PASSED

org.apache.kafka.streams.MockTimeTest > shouldGetNanosAsMillis STARTED

org.apache.kafka.streams.MockTimeTest > shouldGetNanosAsMillis PASSED

org.apache.kafka.streams.MockTimeTest > shouldSetStartTime STARTED

org.apache.kafka.streams.MockTimeTest > shouldSetStartTime PASSED

org.apache.kafka.streams.MockTimeTest > shouldNotAllowNegativeSleep STARTED

org.apache.kafka.streams.MockTimeTest > shouldNotAllowNegativeSleep PASSED

org.apache.kafka.streams.MockTimeTest > shouldAdvanceTimeOnSleep STARTED

org.apache.kafka.streams.MockTimeTest > shouldAdvanceTimeOnSleep PASSED

org.apache.kafka.streams.internals.KeyValueStoreFacadeTest > shoul

Re: [DISCUSS] KIP-553: Enable TLSv1.3 by default and disable all protocols except [TLSV1.2, TLSV1.3]

2020-01-06 Thread Rajini Sivaram
Hi Brajesh.

No one is working on this yet, but will follow up with the Confluent tools
team to see when this can be done.

On Mon, Jan 6, 2020 at 3:29 PM Brajesh Kumar  wrote:

> Hello Rajini,
>
> What is the plan to run system tests using JDK 11? Is someone working on
> this?
>
> On Mon, Jan 6, 2020 at 3:00 PM Rajini Sivaram 
> wrote:
>
> > Hi Nikolay,
> >
> > We can leave the KIP open and restart the discussion once system tests
> are
> > running.
> >
> > Thanks,
> >
> > Rajini
> >
> > On Mon, Jan 6, 2020 at 2:46 PM Николай Ижиков 
> wrote:
> >
> > > Hello, Rajini.
> > >
> > > Thanks, for the feedback.
> > >
> > > Should I mark this KIP as declined?
> > > Or just wait for the system tests results?
> > >
> > > > 6 янв. 2020 г., в 17:26, Rajini Sivaram 
> > > написал(а):
> > > >
> > > > Hi Nikolay,
> > > >
> > > > Thanks for the KIP. We currently run system tests using JDK 8 and
> hence
> > > we
> > > > don't yet have full system test results with TLS 1.3 which requires
> JDK
> > > 11.
> > > > We should wait until that is done before enabling TLS1.3 by default.
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > >
> > > > On Mon, Dec 30, 2019 at 5:36 AM Николай Ижиков 
> > > wrote:
> > > >
> > > >> Hello, Team.
> > > >>
> > > >> Any feedback on this KIP?
> > > >> Do we need this in Kafka?
> > > >>
> > > >>> 24 дек. 2019 г., в 18:28, Nikolay Izhikov 
> > > >> написал(а):
> > > >>>
> > > >>> Hello,
> > > >>>
> > > >>> I'd like to start a discussion of KIP.
> > > >>> Its goal is to enable TLSv1.3 and disable obsolete versions by
> > default.
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=142641956
> > > >>>
> > > >>> Your comments and suggestions are welcome.
> > > >>>
> > > >>
> > > >>
> > >
> > >
> >
>
>
> --
> Regards,
> Brajesh Kumar
>


Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

2020-01-06 Thread Lucas Bradstreet
+1 (non-binding)

On Thu, Jan 2, 2020 at 11:15 AM Brian Byrne  wrote:

> Hello all,
>
> After further discussion and improvements, I'd like to reinstate the voting
> process.
>
> The updated KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> 
>
> The continued discussion:
>
> https://lists.apache.org/thread.html/b2f8f830ef04587144cf0840c7d4811bbf0a14f3c459723dbc5acf9e@%3Cdev.kafka.apache.org%3E
>
> I'd be happy to address any further comments/feedback.
>
> Thanks,
> Brian
>
> On Mon, Dec 9, 2019 at 11:02 PM Guozhang Wang  wrote:
>
> > With the concluded summary on the other discussion thread, I'm +1 on the
> > proposal.
> >
> > Thanks Brian!
> >
> > On Tue, Nov 19, 2019 at 8:00 PM deng ziming 
> > wrote:
> >
> > > >
> > > > For new (uncached) topics, one problem here is that we don't know
> which
> > > > partition to map a record to in the event that it has a key or custom
> > > > partitioner, so the RecordAccumulator wouldn't know which
> batch/broker
> > it
> > > > belongs. We'd need an intermediate record queue that subsequently
> moved
> > > the
> > > > records into RecordAccumulators once metadata resolution was
> complete.
> > > For
> > > > known topics, we don't currently block at all in waitOnMetadata.
> > > >
> > >
> > > You are right, I forget this fact, and the intermediate record queue
> will
> > > help, but I have some questions
> > >
> > > if we add an intermediate record queue in KafkaProducer, when should we
> > > move the records into RecordAccumulators?
> > > only NetworkClient is aware of the MetadataResponse, here is the
> > > hierarchical structure of the related classes:
> > > KafkaProducer
> > > Accumulator
> > > Sender
> > > NetworkClient
> > > metadataUpdater.handleCompletedMetadataResponse
> > >
> > > so
> > > 1. we should also add a metadataUpdater to KafkaProducer?
> > > 2. if the topic really does not exists? the intermediate record queue
> > will
> > > become too large?
> > > 3. and should we `block` when the intermediate record queue is too
> large?
> > > and this will again bring the blocking problem?
> > >
> > >
> > >
> > > On Wed, Nov 20, 2019 at 12:40 AM Brian Byrne 
> > wrote:
> > >
> > > > Hi Deng,
> > > >
> > > > Thanks for the feedback.
> > > >
> > > > On Mon, Nov 18, 2019 at 6:56 PM deng ziming <
> dengziming1...@gmail.com>
> > > > wrote:
> > > >
> > > > > hi, I reviewed the current code, the ProduceMetadata maintains an
> > > expiry
> > > > > threshold for every topic, every time when we write to a topic we
> > will
> > > > set
> > > > > the expiry time to -1 to indicate it should be updated, this does
> > work
> > > to
> > > > > reduce the size of the topic working set, but the producer will
> > > continue
> > > > > fetching metadata for these topics in every metadata request for
> the
> > > full
> > > > > expiry duration.
> > > > >
> > > >
> > > > Indeed, you are correct, I terribly misread the code here.
> Fortunately
> > > this
> > > > was only a minor optimization in the KIP that's no longer necessary.
> > > >
> > > >
> > > > and we can improve the situation by 2 means:
> > > > > 1. we maintain a refresh threshold for every topic which is for
> > > > example
> > > > > 0.8 * expiry_threshold, and when we send `MetadataRequest` to
> brokers
> > > we
> > > > > just request unknownLeaderTopics + unknownPartitionTopics + topics
> > > > > reach refresh threshold.
> > > > >
> > > >
> > > > Right, this is similar to what I suggested, with a larger window on
> the
> > > > "staleness" that permits for batching to an appropriate size (except
> if
> > > > there's any unknown topics, you'd want to issue the request
> > immediately).
> > > >
> > > >
> > > >
> > > > > 2. we don't invoke KafkaProducer#waitOnMetadata when we call
> > > > > KafkaProducer#send because of we just send data to
> RecordAccumulator,
> > > and
> > > > > before we send data to brokers we will invoke
> > > RecordAccumulator#ready(),
> > > > so
> > > > > we can only invoke waitOnMetadata to block when (number topics
> > > > > reach refresh threshold)>(number of all known topics)*0.2.
> > > > >
> > > >
> > > > For new (uncached) topics, one problem here is that we don't know
> which
> > > > partition to map a record to in the event that it has a key or custom
> > > > partitioner, so the RecordAccumulator wouldn't know which
> batch/broker
> > it
> > > > belongs. We'd need an intermediate record queue that subsequently
> moved
> > > the
> > > > records into RecordAccumulators once metadata resolution was
> complete.
> > > For
> > > > known topics, we don't currently block at all in waitOnMetadata.
> > > >
> > > > The last major point of minimizing producer startup metadata RPCs may
> > > still
> > > > need to be improved, but this would be a large improvement on the
> > cu

[jira] [Resolved] (KAFKA-9345) Deploy Mirror Maker 2.0 and dynamically modify the topic whitelist through REST API

2020-01-06 Thread Ryanne Dolan (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryanne Dolan resolved KAFKA-9345.
-
Resolution: Information Provided

> Deploy Mirror Maker 2.0 and dynamically modify the topic whitelist through 
> REST API
> ---
>
> Key: KAFKA-9345
> URL: https://issues.apache.org/jira/browse/KAFKA-9345
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, mirrormaker
>Affects Versions: 2.4.0
> Environment: runtime env:
> source-cluster:  kafka 2.2.1
> target-cluster:   kafka 2.2.1
> Mirror Maker 2.0 : kafka 2.4.0 
>Reporter: yzhou
>Assignee: yzhou
>Priority: Minor
>
> 1. Which is the best way to deploy mirror maker 2.0?  (a dedicated mm2 
> cluster or running mm2 in a connect cluster) . Could you tell me the 
> difference between them?
> 2. According to the blog or wiki 
> ([https://blog.cloudera.com/a-look-inside-kafka-mirrormaker-2/] , 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0#KIP-382:MirrorMaker2.0-Config,ACLSync]
>   , 
> [https://github.com/apache/kafka/blob/cae2a5e1f0779a0889f6cb43b523ebc8a812f4c2/connect/mirror/README.md]
>     ). Mirror Maker 2.0 topic supports dynamic modification of the whielist, 
> but I cannot figure out how to make it. Could you tell me how to solve this 
> problem?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-trunk-jdk11 #1051

2020-01-06 Thread Apache Jenkins Server
See 


Changes:

[manikumar.reddy] MINOR: Remove compilation warnings (#7888)


--
[...truncated 5.65 MB...]
org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowPatternNotValidForTopicNameException[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializersDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializersDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateIfEvenTimeAdvances[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateIfEvenTimeAdvances[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldInitProcessor[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldInitProcessor[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowForUnknownTopic[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowForUnknownTopic[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnStreamsTime[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnStreamsTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfInMemoryBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfInMemoryBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] STARTED

org.apache.kafk

[jira] [Created] (KAFKA-9367) CRC failure

2020-01-06 Thread Shivangi Singh (Jira)
Shivangi Singh created KAFKA-9367:
-

 Summary: CRC failure 
 Key: KAFKA-9367
 URL: https://issues.apache.org/jira/browse/KAFKA-9367
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.0.0
Reporter: Shivangi Singh


We have a 14 node kafka(2.0.0) cluster 

In our case 

*Leader* : *Broker Id* : 1003 *Ip*: 10.84.198.238
*Replica* : *Broker Id* : 1014 *Ip*: 10.22.2.74

A request was sent from replica -> leader to which leader(10.84.198.238) had 
the following exception


var/log/kafka/server.log.2019-12-26-00:[2019-12-26 00:13:04,386] ERROR Closing 
socket for 10.84.198.238:6667-10.22.2.74:53118-121025 because of error 
(kafka.network.Processor)
/var/log/kafka/server.log.2019-12-26-00-org.apache.kafka.common.errors.InvalidRequestException:
 Error getting request for apiKey: FETCH, apiVersion: 8, connectionId: 
10.84.198.238:6667-10.22.2.74:53118-121025, listenerName: 
ListenerName(PLAINTEXT), principal: User:ANONYMOUS
/var/log/kafka/server.log.2019-12-26-00-Caused by: 
org.apache.kafka.common.protocol.types.SchemaException: *Error reading field 
'forgotten_topics_data':* Error reading array of size 23668, only 69 bytes 
available
/var/log/kafka/server.log.2019-12-26-00- at 
org.apache.kafka.common.protocol.types.Schema.read(Schema.java:77)
/var/log/kafka/server.log.2019-12-26-00- at 
org.apache.kafka.common.protocol.ApiKeys.parseRequest(ApiKeys.java:290)
/var/log/kafka/server.log.2019-12-26-00- at 
org.apache.kafka.common.requests.RequestContext.parseRequest(RequestContext.java:63)



In response to this, replica (10.22.2.74) had the following log in it 

 

[2019-12-26 00:13:04,390] WARN [ReplicaFetcher replicaId=1014, leaderId=1003, 
fetcherId=0] Error in response for fetch request (type=FetchRequest, 
replicaId=1014, maxWait=500, minBytes=1, maxBytes=10485760, 
fetchData={_topic_name_=(offset=50344687, logStartOffset=24957467, 
maxBytes=1048576)}, isolationLevel=READ_UNCOMMITTED, toForget=, 
metadata=(sessionId=1747349875, epoch=183382033)) 
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 1003 was disconnected before the response 
was read
at 
org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
at 
kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:96)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:240)
at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:43)
at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:149)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:114)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)



Post this broker 1003  had the following exception


/var/log/kafka/server.log.2019-12-26-00:[2019-12-26 00:16:37,828] ERROR 
[ReplicaFetcher replicaId=1003, leaderId=1014, fetcherId=0] Found invalid 
messages during etch for partition _topic_name_ offset 91200983 
(kafka.server.ReplicaFetcherThread)
/var/log/kafka/server.log.2019-12-26-00-*org.apache.kafka.common.record.InvalidRecordException:
 Record is corrupt (stored crc = 1460037823, computed crc = 114378201)*
/var/log/kafka/server.log.2019-12-26-00:[2019-12-26 00:16:40,690] ERROR Closing 
socket for 10.84.198.238:6667-10.22.2.74:49850-740543 because of error 
(kafka.network.Processor)
/var/log/kafka/server.log.2019-12-26-00-org.apache.kafka.common.errors.InvalidRequestException:
 Error getting request for apiKey: FETCH, apiVersion: 8, connectionId: 
10.84.198.238:6667-10.22.2.74:49850-740543, listenerName: 
ListenerName(PLAINTEXT), principal: User:ANONYMOUS


Could you help us with the above issue?

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-trunk-jdk8 #4131

2020-01-06 Thread Apache Jenkins Server
See 


Changes:

[manikumar.reddy] MINOR: Remove compilation warnings (#7888)


--
[...truncated 5.63 MB...]

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForSmallerValue[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCreateStateDirectoryForStatefulTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCreateStateDirectoryForStatefulTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateIfWallClockTimeAdvances[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateIfWallClockTimeAdvances[Eos enabled = false] PASSED

org.apache.kafka.streams.MockTimeTest > shouldGetNanosAsMillis STARTED

org.apache.kafka.streams.MockTimeTest > shouldGetNanosAsMillis PASSED

org.apache.kafka.streams.MockTimeTest > shouldSetStartTime STARTED

org.apache.kafka.streams.MockTimeTest > shouldSetStartTime PASSED

org.apache.kafka.streams.MockTimeTest > shouldNotAllowNegativeSleep STARTED

org.apache.kafka.streams.MockTimeTest > shouldNotAllowNegativeSleep PASSED

org.apache.kafka.streams.MockTimeTest > shouldAdvanceTimeOnSleep STARTED

org.apache.kafka.streams.MockTimeTest > shouldAdvanceTimeOnSleep PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnIsOpen 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnIsOpen 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnName 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldReturnName 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWindowStartTimestampWithUnknownTimestamp STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldPutWindowStartTimestampWithUnknownTimestamp PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldReturnIsPersistent STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldReturnIsPersistent PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardClose 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardClose 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
PASSED

org.apache.kafka.streams.test.TestRecordTest > testConsumerRecord STARTED

org.apache.kafka.streams.test.TestRecordTest > testConsumerRecord PASSED

org.apache.kafka.streams.test.TestRecordTest > testToString STARTED

org.apache.kafka.streams.test.TestRecordTest > testToString PASSED

org.apache.kafka.streams.test.TestRecordTest > testInvalidRecords STARTED

org.apache.kafka.streams.test.TestRecordTest > testInvalidRecords PASSED

org.apache.kafka.streams.test.TestRecordTest > testPartialConstructorEquals 
STARTED

org.apache.kafka.streams.test.TestRecordTest > testPartialConstructorEquals 
PASSED

org.apache.kafka.streams.test.TestRecordTest > testMultiFieldMatcher STARTED

org.apache.kafka.streams.test.TestRecordTest > testMultiFieldMatcher PASSED

org.apache.kafka.streams.test.TestRecordTest > testFields STARTED

org.apache.kafka.streams.test.TestRecordTest > testFields PASSED

org.apache.kafka.streams.test.TestRecordTest > testProducerRecord STARTED

org.apache.kafka.streams.test.TestRecordTest > testProducerRecord PASSED

org.apache.kafka.streams.test.TestRecordTest > testEqualsAndHashCode STARTED

org.apache.kafka.streams.test.TestRecordTest > testEqualsAndHashCode PASSED

> Task :streams:upgrade-system-tests-0100:processTestResources NO-SOURCE
> Task :streams:upgrade-system-tests-0100:testClasses
> Task :streams:upgrade-system-tests-0100:checkstyleTest
> Task :streams:upgrade-system-tests-0100:spotbugsMain NO-SOURCE
> Task :streams:upgrade-system-tests-0100:test
> Task :streams:upgrade-system-tests-0101:compileJava NO-SOURCE
> Task :streams:upgrade-system-tests-0101:processResources NO-SOURCE
> Task :streams:upgrade-system-tests-0101:classes UP-TO-DATE
> Task :streams:upgrade-system-tests-0101:checkstyleMain NO-SOURCE
> Task :streams:upgrade-system-tests-0101:compileTestJava
> Task :streams:upgrade-system-tests-0101:processTestResources NO-SOURCE
> Task :streams:upgrade-system-tests-0101:testClasses
> Task :streams:upgrade-system-tests-0101:checkstyleTest
> Task :streams:upgrade-system-tests-0101:spotbugsMain NO-SOURCE
> Task :streams:upgrade-syst

Re: [DISCUSS] KIP-553: Enable TLSv1.3 by default and disable all protocols except [TLSV1.2, TLSV1.3]

2020-01-06 Thread Brajesh Kumar
Hello Rajini,

What is the plan to run system tests using JDK 11? Is someone working on
this?

On Mon, Jan 6, 2020 at 3:00 PM Rajini Sivaram 
wrote:

> Hi Nikolay,
>
> We can leave the KIP open and restart the discussion once system tests are
> running.
>
> Thanks,
>
> Rajini
>
> On Mon, Jan 6, 2020 at 2:46 PM Николай Ижиков  wrote:
>
> > Hello, Rajini.
> >
> > Thanks, for the feedback.
> >
> > Should I mark this KIP as declined?
> > Or just wait for the system tests results?
> >
> > > 6 янв. 2020 г., в 17:26, Rajini Sivaram 
> > написал(а):
> > >
> > > Hi Nikolay,
> > >
> > > Thanks for the KIP. We currently run system tests using JDK 8 and hence
> > we
> > > don't yet have full system test results with TLS 1.3 which requires JDK
> > 11.
> > > We should wait until that is done before enabling TLS1.3 by default.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > > On Mon, Dec 30, 2019 at 5:36 AM Николай Ижиков 
> > wrote:
> > >
> > >> Hello, Team.
> > >>
> > >> Any feedback on this KIP?
> > >> Do we need this in Kafka?
> > >>
> > >>> 24 дек. 2019 г., в 18:28, Nikolay Izhikov 
> > >> написал(а):
> > >>>
> > >>> Hello,
> > >>>
> > >>> I'd like to start a discussion of KIP.
> > >>> Its goal is to enable TLSv1.3 and disable obsolete versions by
> default.
> > >>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=142641956
> > >>>
> > >>> Your comments and suggestions are welcome.
> > >>>
> > >>
> > >>
> >
> >
>


-- 
Regards,
Brajesh Kumar


Re: [DISCUSS] KIP-553: Enable TLSv1.3 by default and disable all protocols except [TLSV1.2, TLSV1.3]

2020-01-06 Thread Rajini Sivaram
Hi Nikolay,

We can leave the KIP open and restart the discussion once system tests are
running.

Thanks,

Rajini

On Mon, Jan 6, 2020 at 2:46 PM Николай Ижиков  wrote:

> Hello, Rajini.
>
> Thanks, for the feedback.
>
> Should I mark this KIP as declined?
> Or just wait for the system tests results?
>
> > 6 янв. 2020 г., в 17:26, Rajini Sivaram 
> написал(а):
> >
> > Hi Nikolay,
> >
> > Thanks for the KIP. We currently run system tests using JDK 8 and hence
> we
> > don't yet have full system test results with TLS 1.3 which requires JDK
> 11.
> > We should wait until that is done before enabling TLS1.3 by default.
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Mon, Dec 30, 2019 at 5:36 AM Николай Ижиков 
> wrote:
> >
> >> Hello, Team.
> >>
> >> Any feedback on this KIP?
> >> Do we need this in Kafka?
> >>
> >>> 24 дек. 2019 г., в 18:28, Nikolay Izhikov 
> >> написал(а):
> >>>
> >>> Hello,
> >>>
> >>> I'd like to start a discussion of KIP.
> >>> Its goal is to enable TLSv1.3 and disable obsolete versions by default.
> >>>
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=142641956
> >>>
> >>> Your comments and suggestions are welcome.
> >>>
> >>
> >>
>
>


Re: [DISCUSS] KIP-553: Enable TLSv1.3 by default and disable all protocols except [TLSV1.2, TLSV1.3]

2020-01-06 Thread Николай Ижиков
Hello, Rajini.

Thanks, for the feedback.

Should I mark this KIP as declined?
Or just wait for the system tests results?

> 6 янв. 2020 г., в 17:26, Rajini Sivaram  написал(а):
> 
> Hi Nikolay,
> 
> Thanks for the KIP. We currently run system tests using JDK 8 and hence we
> don't yet have full system test results with TLS 1.3 which requires JDK 11.
> We should wait until that is done before enabling TLS1.3 by default.
> 
> Regards,
> 
> Rajini
> 
> 
> On Mon, Dec 30, 2019 at 5:36 AM Николай Ижиков  wrote:
> 
>> Hello, Team.
>> 
>> Any feedback on this KIP?
>> Do we need this in Kafka?
>> 
>>> 24 дек. 2019 г., в 18:28, Nikolay Izhikov 
>> написал(а):
>>> 
>>> Hello,
>>> 
>>> I'd like to start a discussion of KIP.
>>> Its goal is to enable TLSv1.3 and disable obsolete versions by default.
>>> 
>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=142641956
>>> 
>>> Your comments and suggestions are welcome.
>>> 
>> 
>> 



Re: [DISCUSS] KIP-553: Enable TLSv1.3 by default and disable all protocols except [TLSV1.2, TLSV1.3]

2020-01-06 Thread Rajini Sivaram
Hi Nikolay,

Thanks for the KIP. We currently run system tests using JDK 8 and hence we
don't yet have full system test results with TLS 1.3 which requires JDK 11.
We should wait until that is done before enabling TLS1.3 by default.

Regards,

Rajini


On Mon, Dec 30, 2019 at 5:36 AM Николай Ижиков  wrote:

> Hello, Team.
>
> Any feedback on this KIP?
> Do we need this in Kafka?
>
> > 24 дек. 2019 г., в 18:28, Nikolay Izhikov 
> написал(а):
> >
> > Hello,
> >
> > I'd like to start a discussion of KIP.
> > Its goal is to enable TLSv1.3 and disable obsolete versions by default.
> >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=142641956
> >
> > Your comments and suggestions are welcome.
> >
>
>


Re: [VOTE] KIP-526: Reduce Producer Metadata Lookups for Large Number of Topics

2020-01-06 Thread Stanislav Kozlovski
+1 (non-binding)

Thanks for the KIP, Brian!

On Thu, Jan 2, 2020 at 7:15 PM Brian Byrne  wrote:

> Hello all,
>
> After further discussion and improvements, I'd like to reinstate the voting
> process.
>
> The updated KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-526
> %3A+Reduce+Producer+Metadata+Lookups+for+Large+Number+of+Topics
> 
>
> The continued discussion:
>
> https://lists.apache.org/thread.html/b2f8f830ef04587144cf0840c7d4811bbf0a14f3c459723dbc5acf9e@%3Cdev.kafka.apache.org%3E
>
> I'd be happy to address any further comments/feedback.
>
> Thanks,
> Brian
>
> On Mon, Dec 9, 2019 at 11:02 PM Guozhang Wang  wrote:
>
> > With the concluded summary on the other discussion thread, I'm +1 on the
> > proposal.
> >
> > Thanks Brian!
> >
> > On Tue, Nov 19, 2019 at 8:00 PM deng ziming 
> > wrote:
> >
> > > >
> > > > For new (uncached) topics, one problem here is that we don't know
> which
> > > > partition to map a record to in the event that it has a key or custom
> > > > partitioner, so the RecordAccumulator wouldn't know which
> batch/broker
> > it
> > > > belongs. We'd need an intermediate record queue that subsequently
> moved
> > > the
> > > > records into RecordAccumulators once metadata resolution was
> complete.
> > > For
> > > > known topics, we don't currently block at all in waitOnMetadata.
> > > >
> > >
> > > You are right, I forget this fact, and the intermediate record queue
> will
> > > help, but I have some questions
> > >
> > > if we add an intermediate record queue in KafkaProducer, when should we
> > > move the records into RecordAccumulators?
> > > only NetworkClient is aware of the MetadataResponse, here is the
> > > hierarchical structure of the related classes:
> > > KafkaProducer
> > > Accumulator
> > > Sender
> > > NetworkClient
> > > metadataUpdater.handleCompletedMetadataResponse
> > >
> > > so
> > > 1. we should also add a metadataUpdater to KafkaProducer?
> > > 2. if the topic really does not exists? the intermediate record queue
> > will
> > > become too large?
> > > 3. and should we `block` when the intermediate record queue is too
> large?
> > > and this will again bring the blocking problem?
> > >
> > >
> > >
> > > On Wed, Nov 20, 2019 at 12:40 AM Brian Byrne 
> > wrote:
> > >
> > > > Hi Deng,
> > > >
> > > > Thanks for the feedback.
> > > >
> > > > On Mon, Nov 18, 2019 at 6:56 PM deng ziming <
> dengziming1...@gmail.com>
> > > > wrote:
> > > >
> > > > > hi, I reviewed the current code, the ProduceMetadata maintains an
> > > expiry
> > > > > threshold for every topic, every time when we write to a topic we
> > will
> > > > set
> > > > > the expiry time to -1 to indicate it should be updated, this does
> > work
> > > to
> > > > > reduce the size of the topic working set, but the producer will
> > > continue
> > > > > fetching metadata for these topics in every metadata request for
> the
> > > full
> > > > > expiry duration.
> > > > >
> > > >
> > > > Indeed, you are correct, I terribly misread the code here.
> Fortunately
> > > this
> > > > was only a minor optimization in the KIP that's no longer necessary.
> > > >
> > > >
> > > > and we can improve the situation by 2 means:
> > > > > 1. we maintain a refresh threshold for every topic which is for
> > > > example
> > > > > 0.8 * expiry_threshold, and when we send `MetadataRequest` to
> brokers
> > > we
> > > > > just request unknownLeaderTopics + unknownPartitionTopics + topics
> > > > > reach refresh threshold.
> > > > >
> > > >
> > > > Right, this is similar to what I suggested, with a larger window on
> the
> > > > "staleness" that permits for batching to an appropriate size (except
> if
> > > > there's any unknown topics, you'd want to issue the request
> > immediately).
> > > >
> > > >
> > > >
> > > > > 2. we don't invoke KafkaProducer#waitOnMetadata when we call
> > > > > KafkaProducer#send because of we just send data to
> RecordAccumulator,
> > > and
> > > > > before we send data to brokers we will invoke
> > > RecordAccumulator#ready(),
> > > > so
> > > > > we can only invoke waitOnMetadata to block when (number topics
> > > > > reach refresh threshold)>(number of all known topics)*0.2.
> > > > >
> > > >
> > > > For new (uncached) topics, one problem here is that we don't know
> which
> > > > partition to map a record to in the event that it has a key or custom
> > > > partitioner, so the RecordAccumulator wouldn't know which
> batch/broker
> > it
> > > > belongs. We'd need an intermediate record queue that subsequently
> moved
> > > the
> > > > records into RecordAccumulators once metadata resolution was
> complete.
> > > For
> > > > known topics, we don't currently block at all in waitOnMetadata.
> > > >
> > > > The last major point of minimizing producer startup metadata RPCs may
> > > still
> > > > need to be improved, but this would be a larg

Re: [DISCUSS] KIP-552: Add interface to handle unused config

2020-01-06 Thread Stanislav Kozlovski
Hey Artur,

Perhaps changing the log level to DEBUG is the simplest approach.

I wonder if other people know what the motivation behind the WARN log was?
I'm struggling to think up of a scenario where I'd like to see unused
values printed in anything above DEBUG.

Best,
Stanislav

On Mon, Dec 30, 2019 at 12:52 PM Artur Burtsev  wrote:

> Hi,
>
> Indeed changing the log level for the whole AbstractConfig is not an
> option, because logAll is extremely useful.
>
> Grouping warnings into 1 (with the count of unused only) will not be a
> good option for us either. It will still be pretty noisy. Imagine we
> have 32 partitions and scaled up the application to 32 instances then
> we still have 32 warnings per application (instead of 96 now) while we
> would like to have 0 warnings because we are perfectly aware of using
> schema.registry.url and its totally fine, and we don't have to be
> warned every time we start the application. Now imagine we use more
> than one consumer per application, then it will add another
> multiplication factor to these grouped warnings and we still have a
> lot of those. So I would say grouping doesn't help much.
>
> I think adding extra logger like
> "org.apache.kafka.clients.producer.ProducerConfig.unused" could be
> another good option. That would leave the existing interface untouched
> and give everyone an option to mute irrelevant warnings.
>
> To summarize, I still can see 3 options with its pros and cons
> discussed in the thread:
> 1) extra config with interface to handle unused
> 2) change unused warn to debug
> 3) add extra logger for unused
>
> Please let me know what do you think.
>
> Thanks,
> Artur
>
> On Mon, Dec 30, 2019 at 11:07 AM Stanislav Kozlovski
>  wrote:
> >
> > Hi all,
> >
> > Would printing all the unused configurations in one line, versus N lines,
> > be more helpful? I know that it would greatly reduce the verbosity in log
> > visualization tools like Kibana while still allowing us to see all the
> > relevant information without the need for an explicit action (e.g
> > changing the log level)
> >
> > Best,
> > Stanislav
> >
> > On Sat, Dec 28, 2019 at 3:13 PM John Roesler 
> wrote:
> >
> > > Hi Artur,
> > >
> > > That’s a good point.
> > >
> > > One thing you can do is log a summary at WARN level, like “27
> > > configurations were ignored. Ignored configurations are logged at DEBUG
> > > level.”
> > >
> > > I looked into the code a little, and these log messages are generated
> in
> > > AbstractConfig (logAll and logUnused). They both use the logger
> associated
> > > with the relevant config class (StreamsConfig, ProducerConfig, etc.).
> The
> > > list of all configs is logged at INFO level, and the list of unused
> configs
> > > is logged at WARN level. This means that it's not possible to silence
> the
> > > unused config messages while still logging the list of all configs. You
> > > could only silence both by setting (for example) ProducerConfig logger
> to
> > > ERROR or OFF.
> > >
> > > If it's desirable to be able to toggle them independently, then you can
> > > create a separate logger for unused configs, named something like
> > > "org.apache.kafka.clients.producer.ProducerConfig.unused". Then, you
> can
> > > leave the log at WARN, so it would continue to be printed by default,
> and
> > > anyone could disable it by setting
> > > "org.apache.kafka.clients.producer.ProducerConfig.unused" to ERROR or
> OFF,
> > > without disturbing the rest of the config log messages.
> > >
> > > It's simpler without the extra logger, but you also get less control.
> Do
> > > you think the extra control is necessary, versus printing a summary at
> WARN
> > > level?
> > > -John
> > >
> > >
> > > On Fri, Dec 27, 2019, at 04:26, Artur Burtsev wrote:
> > > > Hi,
> > > >
> > > > Indeed changing log level to debug would be the easiest and I think
> > > > that would be a good solution. When no one object I'm ready to move
> > > > forward with this approach and submit a MR.
> > > >
> > > > The only minor thing I have – having it at debug log level might make
> > > > it a bit less friendly for developers, especially for those who just
> > > > do the first steps in Kafka. For example, if you misspelled the
> > > > property name and trying to understand why things don't do what you
> > > > expect. Having a warning might save some time in this case. Other
> than
> > > > that I cannot see any reasons to have warnings there.
> > > >
> > > > Thanks,
> > > > Artur
> > > >
> > > > On Thu, Dec 26, 2019 at 10:01 PM John Roesler 
> > > wrote:
> > > > >
> > > > > Thanks for the KIP, Artur!
> > > > >
> > > > > For reference, here is the kip:
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-552%3A+Add+interface+to+handle+unused+config
> > > > >
> > > > > I agree, these warnings are kind of a nuisance. Would it be
> feasible
> > > just to leverage log4j in some way to make it easy to filter these
> > > messages? For example, we could move those warnings to debug level, or