Re: [DISCUSS] KIP-354 Time-based log compaction policy

2018-11-06 Thread xiongqi wu
Dong,

Thanks for the comments.   I have updated the KIP based on your comments.

below is reply to your questions:

1.  We only calculate this metric for log compaction that is determined by
max compaction lag. So we only collect non-negative metrics.  The log
cleaner is consistently running with some back off time if no job needs to
be done.
The max is the max among all log cleaner threads in their latest run not
the historical max.  This is similar to existing metric
"max-clean-time-secs".  I now mentioned this is metric from each thread in
the KIP.
User can look at the historical data to track how delay changes over time
(similar as other log cleaner metrics).

Another way of defining this metric is : "compaction_finish_time -
earliest_timestamp_of_first_uncompacted_segment".  So it is not w.r.t.
However,  max compaction lag may vary for different topics, and this
doesn't really tell how soon a compaction request is fulfilled after max
compaction lag.  What do you think?

2.  This is intent to track whether the latest logs compacted are
determined by max compaction lag.
The metric will be updated for each log cleaner run. If there are 2 two log
cleaner threads, and they both work on log partitions determined by "max
compaction lag" in their last run,  the value of this metric will be 2.
The previous metric doesn't provide this information if there are more than
one log cleaner thread.

3. I meant to say it is required to be picked up by log compaction after
this max lag. But the actual compaction finish time may vary, since the log
cleaner may take time to finish compaction on this partition or log cleaner
may work on other partition first.
Guarantee may be misleading, I have updated the KIP.

4. It is determined based on the cleaner checkpoint file.  This KIP doesn't
change how broker determined the un-compacted segments.
5.  done.
6.  Why should we need to make this feature depends upon message
timestamp?  "segment.largestTimestamp - maxSegmentMs" is
a reasonable estimate to determine the violation of max compaction lag,
and this estimate is only needed if the first segment of a log partition is
un-compacted.
7.  I removed unrelated part, and specifically mentioned the added
metric "num-logs-compacted-by-max-compaction-lag"
can be used for this performance impact measurement.

Xiongqi (Wesley) Wu


On Tue, Nov 6, 2018 at 6:50 PM Dong Lin  wrote:

> Hey Xiongqi,
>
> Thanks for the update. A few more comments below
>
> 1) According to the definition of
> kafka.log:type=LogCleaner,name=max-compaction-delay, it seems that the
> metric value will be a large negative number if max.compaction.lag.ms is
> MAX_LONG. Would this be a problem? Also, it seems weird that the value of
> the metric is defined w.r.t. how often the log cleaner is run.
>
> 2) Not sure if we need the metric num-logs-compacted-by-max-compaction-lag
> in addition to max-compaction-delay. It seems that operator can just use
> max-compaction-delay to determine whether the max.compaction.lag is
> properly enforced in a quantitative manner. Also, the metric name
> `num-logs-compacted-by-max-compaction-lag` is inconsistent with its
> intended meaning, i.e. the number of logs that needs to be compacted due to
> max.compaction.lag but not yet compacted. So it is probably simple to just
> remove this metric.
>
> 3) The KIP currently says that "a message record has a guaranteed
> upper-bound in time to become mandatory for compaction". The word
> "guarantee" may be misleading because the message may still not be
> compacted within max.compaction.lag after its creation. Could you clarify
> the exact semantics of the max.compaction.lag.ms in the Public Interface
> section?
>
> 4) The KIP's proposed change will estimate earliest message timestamp for
> un-compacted log segments. Can you explain how broker determines whether a
> segment has been compacted after the broker is restarted?
>
> 5) 2.b in Proposed Change section provides two way to get timestamp. To
> make the KIP easier to read for future reference, could we just mention the
> method that we plan to use and move the other solution to the rejected
> alternative section?
>
> 6) Based on the discussion (i.e. point 2 in the previous email), it is said
> that we can assume all messages have timestamp and the feature added in
> this KIP can be skipped for those messages which do not have timestamp. So
> do we still need to use "segment.largestTimestamp - maxSegmentMs" in
> Proposed Change section 2.a?
>
> 7) Based on the discussion (i.e. point 8 in the previous email), if this
> KIP requires user to monitor certain existing metrics for performance
> impact added in this KIP, can we list the metrics in the KIP for user's
> convenience?
>
>
> Thanks,
> Dong
>
> On Mon, Oct 29, 2018 at 3:16 PM xiongqi wu  wrote:
>
> > Hi Dong,
> > I have updated the KIP to address your comments.
> > One correction to previous Email:
> > after offline discussion with Dong,  we decide to use MAX_LONG as default
> > 

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-11-06 Thread Boyang Chen
Thanks Matthias for bringing this awesome proposal up! I shall take a deeper 
look and make a comparison between the two proposals.


Meanwhile for the scale down specifically for stateful streaming, we could 
actually introduce a new status called "learner" where the newly up hosts could 
try to catch up with the assigned task progress first before triggering the 
rebalance, from which we don't see a sudden dip on the progress. However, it is 
built on top of the success of KIP-345.



From: Matthias J. Sax 
Sent: Wednesday, November 7, 2018 7:02 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by 
specifying member id

Hey,

there was quite a pause on this KIP discussion and in the mean time, a
new design for incremental cooporative rebalance was suggested:

https://cwiki.apache.org/confluence/display/KAFKA/Incremental+Cooperative+Rebalancing%3A+Support+and+Policies


We should make sure that the proposal and this KIP align to each other.
Thoughts?


-Matthias

On 11/5/18 7:31 PM, Boyang Chen wrote:
> Hey Mike,
>
>
> thanks for the feedback, the two question are very thoughtful!
>
>
>> 1) I am a little confused about the distinction for the leader. If the 
>> consumer node that was assigned leader does a bounce (goes down and quickly 
>> comes up) to update application code, will a rebalance be triggered? I > do 
>> not think a bounce of the leader should trigger a rebalance.
>
> For Q1 my intention was to minimize the change within one KIP, since the 
> leader rejoining case could be addressed separately.
>
>
>> 2) The timeout for shrink up makes a lot of sense and allows to gracefully 
>> increase the number of nodes in the cluster. I think we need to support 
>> graceful shrink down as well. If I set the registration timeout to 5 minutes 
>> > to handle rolling restarts or intermittent failures without shuffling 
>> state, I don't want to wait 5 minutes in order for the group to rebalance if 
>> I am intentionally removing a node from the cluster. I am not sure the best 
>> way to > do this. One idea I had was adding the ability for a CLI or Admin 
>> API to force a rebalance of the group. This would allow for an admin to 
>> trigger the rebalance manually without waiting the entire registration 
>> timeout on > shrink down. What do you think?
>
> For 2) my understanding is that for scaling down case it is better to be 
> addressed by CLI tool than code logic, since only by human evaluation we 
> could decide whether it is a "right timing" -- the time when all the scaling 
> down consumers are offline -- to kick in rebalance. Unless we introduce 
> another term on coordinator which indicates the target consumer group size, 
> broker will find it hard to decide when to start rebalance. So far I prefer 
> to hold the implementation for that, but agree we could discuss whether we 
> want to introduce admin API in this KIP or a separate one.
>
>
> Thanks again for the proposed ideas!
>
>
> Boyang
>
> 
> From: Mike Freyberger 
> Sent: Monday, November 5, 2018 6:13 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by 
> specifying member id
>
> Boyang,
>
> Thanks for updating the KIP. It's shaping up well. Two things:
>
> 1) I am a little confused about the distinction for the leader. If the 
> consumer node that was assigned leader does a bounce (goes down and quickly 
> comes up) to update application code, will a rebalance be triggered? I do not 
> think a bounce of the leader should trigger a rebalance.
>
> 2) The timeout for shrink up makes a lot of sense and allows to gracefully 
> increase the number of nodes in the cluster. I think we need to support 
> graceful shrink down as well. If I set the registration timeout to 5 minutes 
> to handle rolling restarts or intermittent failures without shuffling state, 
> I don't want to wait 5 minutes in order for the group to rebalance if I am 
> intentionally removing a node from the cluster. I am not sure the best way to 
> do this. One idea I had was adding the ability for a CLI or Admin API to 
> force a rebalance of the group. This would allow for an admin to trigger the 
> rebalance manually without waiting the entire registration timeout on shrink 
> down. What do you think?
>
> Mike
>
> On 10/30/18, 1:55 AM, "Boyang Chen"  wrote:
>
> Btw, I updated KIP 345 based on my understanding. Feel free to take 
> another round of look:
>
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances
> KIP-345: Introduce static membership protocol to reduce 
> ...
> cwiki.apache.org
> For stateful applications, one of the biggest performance bottleneck is the 
> state shuffling. In Kafka consumer, 

Build failed in Jenkins: kafka-trunk-jdk8 #3184

2018-11-06 Thread Apache Jenkins Server
See 


Changes:

[junrao]  KAFKA-7537: Avoid sending full UpdateMetadataRequest to existing

--
[...truncated 2.73 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestampWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 

Build failed in Jenkins: kafka-2.1-jdk8 #51

2018-11-06 Thread Apache Jenkins Server
See 


Changes:

[jason] MINOR: Modify Connect service's startup timeout to be passed via the

[lindong28] KAFKA-7481; Add upgrade/downgrade notes for 2.1.x

--
[...truncated 897.77 KB...]

kafka.server.KafkaConfigTest > testCaseInsensitiveListenerProtocol STARTED

kafka.server.KafkaConfigTest > testCaseInsensitiveListenerProtocol PASSED

kafka.server.KafkaConfigTest > testListenerAndAdvertisedListenerNames STARTED

kafka.server.KafkaConfigTest > testListenerAndAdvertisedListenerNames PASSED

kafka.server.KafkaConfigTest > testNonroutableAdvertisedListeners STARTED

kafka.server.KafkaConfigTest > testNonroutableAdvertisedListeners PASSED

kafka.server.KafkaConfigTest > 
testInterBrokerListenerNameAndSecurityProtocolSet STARTED

kafka.server.KafkaConfigTest > 
testInterBrokerListenerNameAndSecurityProtocolSet PASSED

kafka.server.KafkaConfigTest > testFromPropsInvalid STARTED

kafka.server.KafkaConfigTest > testFromPropsInvalid PASSED

kafka.server.KafkaConfigTest > testInvalidCompressionType STARTED

kafka.server.KafkaConfigTest > testInvalidCompressionType PASSED

kafka.server.KafkaConfigTest > testAdvertiseHostNameDefault STARTED

kafka.server.KafkaConfigTest > testAdvertiseHostNameDefault PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeMinutesProvided STARTED

kafka.server.KafkaConfigTest > testLogRetentionTimeMinutesProvided PASSED

kafka.server.KafkaConfigTest > testValidCompressionType STARTED

kafka.server.KafkaConfigTest > testValidCompressionType PASSED

kafka.server.KafkaConfigTest > testUncleanElectionInvalid STARTED

kafka.server.KafkaConfigTest > testUncleanElectionInvalid PASSED

kafka.server.KafkaConfigTest > testListenerNamesWithAdvertisedListenerUnset 
STARTED

kafka.server.KafkaConfigTest > testListenerNamesWithAdvertisedListenerUnset 
PASSED

kafka.server.KafkaConfigTest > testLogRetentionTimeBothMinutesAndMsProvided 
STARTED

kafka.server.KafkaConfigTest > testLogRetentionTimeBothMinutesAndMsProvided 
PASSED

kafka.server.KafkaConfigTest > testLogRollTimeMsProvided STARTED

kafka.server.KafkaConfigTest > testLogRollTimeMsProvided PASSED

kafka.server.KafkaConfigTest > testUncleanLeaderElectionDefault STARTED

kafka.server.KafkaConfigTest > testUncleanLeaderElectionDefault PASSED

kafka.server.KafkaConfigTest > testInvalidAdvertisedListenersProtocol STARTED

kafka.server.KafkaConfigTest > testInvalidAdvertisedListenersProtocol PASSED

kafka.server.KafkaConfigTest > testUncleanElectionEnabled STARTED

kafka.server.KafkaConfigTest > testUncleanElectionEnabled PASSED

kafka.server.KafkaConfigTest > testInterBrokerVersionMessageFormatCompatibility 
STARTED

kafka.server.KafkaConfigTest > testInterBrokerVersionMessageFormatCompatibility 
PASSED

kafka.server.KafkaConfigTest > testAdvertisePortDefault STARTED

kafka.server.KafkaConfigTest > testAdvertisePortDefault PASSED

kafka.server.KafkaConfigTest > testVersionConfiguration STARTED

kafka.server.KafkaConfigTest > testVersionConfiguration PASSED

kafka.server.KafkaConfigTest > testEqualAdvertisedListenersProtocol STARTED

kafka.server.KafkaConfigTest > testEqualAdvertisedListenersProtocol PASSED

kafka.server.ListOffsetsRequestTest > testListOffsetsErrorCodes STARTED

kafka.server.ListOffsetsRequestTest > testListOffsetsErrorCodes PASSED

kafka.server.ListOffsetsRequestTest > testCurrentEpochValidation STARTED

kafka.server.ListOffsetsRequestTest > testCurrentEpochValidation PASSED

kafka.server.CreateTopicsRequestTest > testValidCreateTopicsRequests STARTED

kafka.server.CreateTopicsRequestTest > testValidCreateTopicsRequests PASSED

kafka.server.CreateTopicsRequestTest > testErrorCreateTopicsRequests STARTED

kafka.server.CreateTopicsRequestTest > testErrorCreateTopicsRequests PASSED

kafka.server.CreateTopicsRequestTest > testInvalidCreateTopicsRequests STARTED

kafka.server.CreateTopicsRequestTest > testInvalidCreateTopicsRequests PASSED

kafka.server.CreateTopicsRequestTest > testNotController STARTED

kafka.server.CreateTopicsRequestTest > testNotController PASSED

kafka.server.CreateTopicsRequestWithPolicyTest > testValidCreateTopicsRequests 
STARTED

kafka.server.CreateTopicsRequestWithPolicyTest > testValidCreateTopicsRequests 
PASSED

kafka.server.CreateTopicsRequestWithPolicyTest > testErrorCreateTopicsRequests 
STARTED

kafka.server.CreateTopicsRequestWithPolicyTest > testErrorCreateTopicsRequests 
PASSED

kafka.server.FetchRequestDownConversionConfigTest > 
testV1FetchWithDownConversionDisabled STARTED

kafka.server.FetchRequestDownConversionConfigTest > 
testV1FetchWithDownConversionDisabled PASSED

kafka.server.FetchRequestDownConversionConfigTest > testV1FetchFromReplica 
STARTED

kafka.server.FetchRequestDownConversionConfigTest > testV1FetchFromReplica 
PASSED

kafka.server.FetchRequestDownConversionConfigTest > 
testLatestFetchWithDownConversionDisabled STARTED


Jenkins build is back to normal : kafka-trunk-jdk11 #81

2018-11-06 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-354 Time-based log compaction policy

2018-11-06 Thread Dong Lin
Hey Xiongqi,

Thanks for the update. A few more comments below

1) According to the definition of
kafka.log:type=LogCleaner,name=max-compaction-delay, it seems that the
metric value will be a large negative number if max.compaction.lag.ms is
MAX_LONG. Would this be a problem? Also, it seems weird that the value of
the metric is defined w.r.t. how often the log cleaner is run.

2) Not sure if we need the metric num-logs-compacted-by-max-compaction-lag
in addition to max-compaction-delay. It seems that operator can just use
max-compaction-delay to determine whether the max.compaction.lag is
properly enforced in a quantitative manner. Also, the metric name
`num-logs-compacted-by-max-compaction-lag` is inconsistent with its
intended meaning, i.e. the number of logs that needs to be compacted due to
max.compaction.lag but not yet compacted. So it is probably simple to just
remove this metric.

3) The KIP currently says that "a message record has a guaranteed
upper-bound in time to become mandatory for compaction". The word
"guarantee" may be misleading because the message may still not be
compacted within max.compaction.lag after its creation. Could you clarify
the exact semantics of the max.compaction.lag.ms in the Public Interface
section?

4) The KIP's proposed change will estimate earliest message timestamp for
un-compacted log segments. Can you explain how broker determines whether a
segment has been compacted after the broker is restarted?

5) 2.b in Proposed Change section provides two way to get timestamp. To
make the KIP easier to read for future reference, could we just mention the
method that we plan to use and move the other solution to the rejected
alternative section?

6) Based on the discussion (i.e. point 2 in the previous email), it is said
that we can assume all messages have timestamp and the feature added in
this KIP can be skipped for those messages which do not have timestamp. So
do we still need to use "segment.largestTimestamp - maxSegmentMs" in
Proposed Change section 2.a?

7) Based on the discussion (i.e. point 8 in the previous email), if this
KIP requires user to monitor certain existing metrics for performance
impact added in this KIP, can we list the metrics in the KIP for user's
convenience?


Thanks,
Dong

On Mon, Oct 29, 2018 at 3:16 PM xiongqi wu  wrote:

> Hi Dong,
> I have updated the KIP to address your comments.
> One correction to previous Email:
> after offline discussion with Dong,  we decide to use MAX_LONG as default
> value for max.compaction.lag.ms.
>
>
> Xiongqi (Wesley) Wu
>
>
> On Mon, Oct 29, 2018 at 12:15 PM xiongqi wu  wrote:
>
> > Hi Dong,
> >
> > Thank you for your comment.  See my inline comments.
> > I will update the KIP shortly.
> >
> > Xiongqi (Wesley) Wu
> >
> >
> > On Sun, Oct 28, 2018 at 9:17 PM Dong Lin  wrote:
> >
> >> Hey Xiongqi,
> >>
> >> Sorry for late reply. I have some comments below:
> >>
> >> 1) As discussed earlier in the email list, if the topic is configured
> with
> >> both deletion and compaction, in some cases messages produced a long
> time
> >> ago can not be deleted based on time. This is a valid use-case because
> we
> >> actually have topic which is configured with both deletion and
> compaction
> >> policy. And we should enforce the semantics for both policy. Solution A
> >> sounds good. We do not need interface change (e.g. extra config) to
> >> enforce
> >> solution A. All we need is to update implementation so that when broker
> >> compacts a topic, if the message has timestamp (which is the common
> case),
> >> messages that are too old (based on the time-based retention config)
> will
> >> be discarded. Since this is a valid issue and it is also related to the
> >> guarantee of when a message can be deleted, can we include the solution
> of
> >> this problem in the KIP?
> >>
> > ==  This makes sense.  We can use similar approach to increase the
> log
> > start offset.
> >
> >>
> >> 2) It is probably OK to assume that all messages have timestamp. The
> >> per-message timestamp was introduced into Kafka 0.10.0 with KIP-31 and
> >> KIP-32 as of Feb 2016. Kafka 0.10.0 or earlier versions are no longer
> >> supported. Also, since the use-case for this feature is primarily for
> >> GDPR,
> >> we can assume that client library has already been upgraded to support
> >> SSL,
> >> which feature is added after KIP-31 and KIP-32.
> >>
> >>  =>  Ok. We can use message timestamp to delete expired records
> > if both compaction and retention are enabled.
> >
> >
> > 3) In Proposed Change section 2.a, it is said that
> segment.largestTimestamp
> >> - maxSegmentMs can be used to determine the timestamp of the earliest
> >> message. Would it be simpler to just use the create time of the file to
> >> determine the time?
> >>
> >> >  Linux/Java doesn't provide API for file creation time because
> > some filesystem type doesn't provide file creation time.
> >
> >
> >> 4) The KIP suggests to use must-clean-ratio to select the 

[jira] [Resolved] (KAFKA-7313) StopReplicaRequest should attempt to remove future replica for the partition only if future replica exists

2018-11-06 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7313.
-
Resolution: Fixed

> StopReplicaRequest should attempt to remove future replica for the partition 
> only if future replica exists
> --
>
> Key: KAFKA-7313
> URL: https://issues.apache.org/jira/browse/KAFKA-7313
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Dong Lin
>Assignee: Dong Lin
>Priority: Major
> Fix For: 2.0.1, 2.1.0
>
>
> This patch fixes two issues:
> 1) Currently if a broker received StopReplicaRequest with delete=true for the 
> same offline replica, the first StopRelicaRequest will show 
> KafkaStorageException and the second StopRelicaRequest will show 
> ReplicaNotAvailableException. This is because the first StopRelicaRequest 
> will remove the mapping (tp -> ReplicaManager.OfflinePartition) from 
> ReplicaManager.allPartitions before returning KafkaStorageException, thus the 
> second StopRelicaRequest will not find this partition as offline.
> This result appears to be inconsistent. And since the replica is already 
> offline and broker will not be able to delete file for this replica, the 
> StopReplicaRequest should fail without making any change and broker should 
> still remember that this replica is offline. 
> 2) Currently if broker receives StopReplicaRequest with delete=true, the 
> broker will attempt to remove future replica for the partition, which will 
> cause KafkaStorageException in the StopReplicaResponse if this replica does 
> not have future replica. It is problematic to always return 
> KafkaStorageException in the response if future replica does not exist.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-trunk-jdk8 #3183

2018-11-06 Thread Apache Jenkins Server
See 


Changes:

[jason] MINOR: Modify Connect service's startup timeout to be passed via the

[lindong28] KAFKA-7481; Add upgrade/downgrade notes for 2.1.x

--
[...truncated 2.73 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualWithNullForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestampWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 

Build failed in Jenkins: kafka-2.0-jdk8 #184

2018-11-06 Thread Apache Jenkins Server
See 


Changes:

[jason] MINOR: Modify Connect service's startup timeout to be passed via the

--
[...truncated 2.50 MB...]

org.apache.kafka.streams.TopologyTest > shouldNotAddNullStateStoreSupplier 
PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowToAddStateStoreToNonExistingProcessor STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowToAddStateStoreToNonExistingProcessor PASSED

org.apache.kafka.streams.TopologyTest > 
sourceAndProcessorShouldHaveSingleSubtopology STARTED

org.apache.kafka.streams.TopologyTest > 
sourceAndProcessorShouldHaveSingleSubtopology PASSED

org.apache.kafka.streams.TopologyTest > 
tableNamedMaterializedCountShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
tableNamedMaterializedCountShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullStoreNameWhenConnectingProcessorAndStateStores STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullStoreNameWhenConnectingProcessorAndStateStores PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullNameWhenAddingSourceWithTopic STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullNameWhenAddingSourceWithTopic PASSED

org.apache.kafka.streams.TopologyTest > 
sourceAndProcessorWithStateShouldHaveSingleSubtopology STARTED

org.apache.kafka.streams.TopologyTest > 
sourceAndProcessorWithStateShouldHaveSingleSubtopology PASSED

org.apache.kafka.streams.TopologyTest > shouldNotAllowNullTopicWhenAddingSink 
STARTED

org.apache.kafka.streams.TopologyTest > shouldNotAllowNullTopicWhenAddingSink 
PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowToAddGlobalStoreWithSourceNameEqualsProcessorName STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowToAddGlobalStoreWithSourceNameEqualsProcessorName PASSED

org.apache.kafka.streams.TopologyTest > 
singleSourceWithListOfTopicsShouldHaveSingleSubtopology STARTED

org.apache.kafka.streams.TopologyTest > 
singleSourceWithListOfTopicsShouldHaveSingleSubtopology PASSED

org.apache.kafka.streams.TopologyTest > 
sourceWithMultipleProcessorsShouldHaveSingleSubtopology STARTED

org.apache.kafka.streams.TopologyTest > 
sourceWithMultipleProcessorsShouldHaveSingleSubtopology PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowZeroStoreNameWhenConnectingProcessorAndStateStores STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowZeroStoreNameWhenConnectingProcessorAndStateStores PASSED

org.apache.kafka.streams.TopologyTest > 
kTableNamedMaterializedMapValuesShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
kTableNamedMaterializedMapValuesShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
timeWindowZeroArgCountShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
timeWindowZeroArgCountShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
multipleSourcesWithProcessorsShouldHaveDistinctSubtopologies STARTED

org.apache.kafka.streams.TopologyTest > 
multipleSourcesWithProcessorsShouldHaveDistinctSubtopologies PASSED

org.apache.kafka.streams.TopologyTest > shouldThrowOnUnassignedStateStoreAccess 
STARTED

org.apache.kafka.streams.TopologyTest > shouldThrowOnUnassignedStateStoreAccess 
PASSED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullProcessorNameWhenConnectingProcessorAndStateStores STARTED

org.apache.kafka.streams.TopologyTest > 
shouldNotAllowNullProcessorNameWhenConnectingProcessorAndStateStores PASSED

org.apache.kafka.streams.TopologyTest > 
kTableAnonymousMaterializedMapValuesShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
kTableAnonymousMaterializedMapValuesShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
sessionWindowZeroArgCountShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
sessionWindowZeroArgCountShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
tableAnonymousMaterializedCountShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
tableAnonymousMaterializedCountShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > shouldDescribeGlobalStoreTopology 
STARTED

org.apache.kafka.streams.TopologyTest > shouldDescribeGlobalStoreTopology PASSED

org.apache.kafka.streams.TopologyTest > 
kTableNonMaterializedMapValuesShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
kTableNonMaterializedMapValuesShouldPreserveTopologyStructure PASSED

org.apache.kafka.streams.TopologyTest > 
kGroupedStreamAnonymousMaterializedCountShouldPreserveTopologyStructure STARTED

org.apache.kafka.streams.TopologyTest > 
kGroupedStreamAnonymousMaterializedCountShouldPreserveTopologyStructure 

Re: [VOTE] KIP-354 Time-based log compaction policy

2018-11-06 Thread xiongqi wu
bump
Xiongqi (Wesley) Wu


On Thu, Sep 27, 2018 at 4:20 PM xiongqi wu  wrote:

>
> Thanks Eno, Brett, Dong, Guozhang, Colin,  and Xiaohe for feedback.
> Can I have more feedback or VOTE on this KIP?
>
>
> Xiongqi (Wesley) Wu
>
>
> On Wed, Sep 19, 2018 at 10:52 AM xiongqi wu  wrote:
>
>> Any other votes or comments?
>>
>> Xiongqi (Wesley) Wu
>>
>>
>> On Tue, Sep 11, 2018 at 11:45 AM xiongqi wu  wrote:
>>
>>> Yes, more votes and code review.
>>>
>>> Xiongqi (Wesley) Wu
>>>
>>>
>>> On Mon, Sep 10, 2018 at 11:37 PM Brett Rann 
>>> wrote:
>>>
 +1 (non binding) from on 0 then, and on the KIP.

 Where do we go from here? More votes?

 On Tue, Sep 11, 2018 at 5:34 AM Colin McCabe 
 wrote:

 > On Mon, Sep 10, 2018, at 11:44, xiongqi wu wrote:
 > > Thank you for comments. I will use '0' for now.
 > >
 > > If we create topics through admin client in the future, we might
 perform
 > > some useful checks. (but the assumption is all brokers in the same
 > cluster
 > > have the same default configurations value, otherwise,it might
 still be
 > > tricky to do such cross validation check.)
 >
 > This isn't something that we might do in the future-- this is
 something we
 > are doing now. We already have Create Topic policies which are
 enforced by
 > the broker. Check KIP-108 and KIP-170 for details. This is one of the
 > motivations for getting rid of direct ZK access-- making sure that
 these
 > policies are applied.
 >
 > I agree that having different configurations on different brokers can
 be
 > confusing and frustrating . That's why more configurations are being
 made
 > dynamic using KIP-226. Dynamic configurations are stored centrally in
 ZK,
 > so they are the same on all brokers (modulo propagation delays). In
 any
 > case, this is a general issue, not specific to "create topics".
 >
 > cheers,
 > Colin
 >
 >
 > >
 > >
 > > Xiongqi (Wesley) Wu
 > >
 > >
 > > On Mon, Sep 10, 2018 at 11:15 AM Colin McCabe 
 > wrote:
 > >
 > > > I don't have a strong opinion. But I think we should probably be
 > > > consistent with how segment.ms works, and just use 0.
 > > >
 > > > best,
 > > > Colin
 > > >
 > > >
 > > > On Wed, Sep 5, 2018, at 21:19, Brett Rann wrote:
 > > > > OK thanks for that clarification. I see why you're uncomfortable
 > with 0
 > > > now.
 > > > >
 > > > > I'm not really fussed. I just prefer consistency in
 configuration
 > > > options.
 > > > >
 > > > > Personally I lean towards treating 0 and 1 similarly in that
 > scenario,
 > > > > because it favours the person thinking about setting the
 > configurations,
 > > > > and a person doesn't care about a 1ms edge case especially when
 the
 > > > context
 > > > > is the true minimum is tied to the log cleaner cadence.
 > > > >
 > > > > Introducing 0 to mean "disabled" because there is some
 uniquness in
 > > > > segment.ms not being able to be set to 0, reduces configuration
 > > > consistency
 > > > > in favour of capturing a MS gap in an edge case that nobody
 would
 > ever
 > > > > notice. For someone to understand why everywhere else -1 is
 used to
 > > > > disable, but here 0 is used, they would need to learn about
 > segment.ms
 > > > > having a 1ms minimum and then after learning would think "who
 cares
 > about
 > > > > 1ms?" in this context. I would anyway :)
 > > > >
 > > > > my 2c anyway. Will again defer to majority. Curious which way
 Colin
 > falls
 > > > > now.
 > > > >
 > > > > Don't want to spend more time on this though, It's well into
 > > > bikeshedding
 > > > > territory now. :)
 > > > >
 > > > >
 > > > >
 > > > > On Thu, Sep 6, 2018 at 1:31 PM xiongqi wu 
 > wrote:
 > > > >
 > > > > > I want to honor the minimum value of segment.ms (which is
 1ms) to
 > > > force
 > > > > > roll an active segment.
 > > > > > So if we set "max.compaction.lag.ms" any value > 0, the
 minimum of
 > > > > > max.compaction.lag.ms and segment.ms will be used to seal an
 > active
 > > > > > segment.
 > > > > >
 > > > > > If we set max.compaction.lag.ms to 0, the current
 implementation
 > will
 > > > > > treat it as disabled.
 > > > > >
 > > > > > It is a little bit weird to treat max.compaction.lag=0 the
 same as
 > > > > > max.compaction.lag=1.
 > > > > >
 > > > > > There might be a reason why we set the minimum of segment.ms
 to 1,
 > > > and I
 > > > > > don't want to break this assumption.
 > > > > >
 > > > > >
 > > > > >
 > > > > > Xiongqi (Wesley) Wu
 > > > > >
 > > > > >
 > > > > > On Wed, Sep 5, 2018 at 7:54 PM Brett Rann
 

Re: Apache Kafka blog on more partitions support

2018-11-06 Thread Jun Rao
The blog is now published to
https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions

Thanks,

Jun

On Fri, Nov 2, 2018 at 2:36 PM, Jun Rao  wrote:

> Hi, Mayuresh,
>
> Most of the controlled shutdown time is in leader election. The controller
> currently doesn't wait for LeaderAndIsrRequest to be sent out before
> responding to the controlled shutdown request.
>
> Thanks,
>
> Jun
>
> On Fri, Nov 2, 2018 at 1:58 PM, Mayuresh Gharat <
> gharatmayures...@gmail.com> wrote:
>
>> Thanks Jun for sharing this. Looks nice !
>>
>> Do we intend to shed light on how much time is required, on an average,
>> for
>> new Leader election. Also would it be good to add "if the controller waits
>> for the LeaderAndIsrResponses before sending shutDown_OK to the shutting
>> down broker".
>>
>> Thanks,
>>
>> Mayuresh
>>
>> On Fri, Nov 2, 2018 at 12:07 PM  wrote:
>>
>> > Thanks Jun for sharing the post.
>> > Minor Nit: Date says  December 16, 2019.
>> >
>> > Did this test measured the replication affects on the overall cluster
>> > health and performance?
>> > It looks like we are suggesting with 200k partitions and 4k per broker
>> max
>> > size of a cluster should be around 50 brokers?
>> >
>> > Thanks,
>> > Harsha
>> > On Nov 2, 2018, 11:50 AM -0700, Jun Rao , wrote:
>> > > Hi, Everyone,
>> > >
>> > > The follow is the preview of a blog on Kafka supporting more
>> partitions.
>> > >
>> > > https://drive.google.com/file/d/122TK0oCoforc2cBWfW_yaEBjTMoX6yMt
>> > >
>> > > Please let me know if you have any comments by Tuesday.
>> > >
>> > > Thanks,
>> > >
>> > > Jun
>> >
>>
>>
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>>
>
>


Build failed in Jenkins: kafka-trunk-jdk11 #80

2018-11-06 Thread Apache Jenkins Server
See 


Changes:

[jason] MINOR: Modify Connect service's startup timeout to be passed via the

[lindong28] KAFKA-7481; Add upgrade/downgrade notes for 2.1.x

--
[...truncated 2.33 MB...]

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 

Jenkins build is back to normal : kafka-2.1-jdk8 #50

2018-11-06 Thread Apache Jenkins Server
See 




Re: [DISCUSS] How hard is it to separate the logic layer and the storage layer of Kafka broker?

2018-11-06 Thread Jun Rao
Hi, Yulin,

I assume the performance issue that you mentioned is for writes instead of
reads. By default, Kafka flushes data to disks asynchronously in batches.
So, even when there are multiple files to write, the batching can amortize
the HDD seek overhead. It would be useful to understand the number of I/Os
and the size of each I/O in your environment.

Thanks,

Jun

On Fri, Nov 2, 2018 at 9:50 PM, Yuanjin Lin  wrote:

> Colin, Thanks for the meaningful reply!
>
> We are 100% sure those HDDs are the bottleneck. Almost 90% alerts are about
> HDDs. I am the guy who have to deal with it. The common scenario would be
> 100-400 partitions per HDD(2TB size).  Due to some historical reasons,
> developers in my company tend to put everything to Kafka, cuz it makes them
> feel safe. Although, we have over 2k servers for Kafka. I still can receive
> alerts everyday.
>
> If the modification I proposed is not too hard, I will begin to do it next
> month.
>
>
> On Sat, Nov 3, 2018 at 1:36 AM Colin McCabe  wrote:
>
> > On Fri, Nov 2, 2018, at 03:14, Yuanjin Lin wrote:
> > > Hi all,
> > >
> > > I am a software engineer from Zhihu.com. Kafka is so great and used
> > heavily
> > > in Zhihu. There are probably over 2K Kafka brokers in total.
> > >
> > > However, we are suffering from the problem that the performance
> degrades
> > > rapidly when the number of topics increases(sadly, we are using HDD).
> >
> > Hi Yuanjin,
> >
> > How many partitions are you trying to create?
> >
> > Do you have benchmarks confirming that disk I/O is your bottleneck?
> There
> > are a few cases where large numbers of partitions may impose CPU and
> > garbage collection burdens.  The patch on
> > https://github.com/apache/kafka/pull/5206 illustrates one of them.
> >
> > > We are considering separating the logic layer and the storage layer of
> > Kafka
> > > broker like Apache Pulsar.
> > >
> > > After the modification, a server may have several Kafka brokers and
> more
> > > topics. Those brokers all connect to a sole storage engine via RP The
> > > sole storage can do the load balancing work easily, and avoid creating
> > too
> > > many files which hurts HDD.
> > >
> > > Is it hard? I think replacing the stuff in `Kafka.Log` would be enough,
> > > right?
> >
> > It would help to know what the problem is here.  If the problem is a
> large
> > number of files, then maybe the simplest approach would be creating fewer
> > files.  You don't need to introduce a new layer of servers in order to do
> > that.  You could use something like RocksDB to store messages and
> indices,
> > or create your own file format which combined together things which were
> > previously separate.  For example, we could combine the timeindex and
> index
> > files.
> >
> > As I understand it, Pulsar made the decision to combine together data
> from
> > multiple partitions in a single file.  Sometimes a very large number of
> > partitions.  This is great for writing, but not so good if you want to
> read
> > historical data from a single topic.
> >
> > regards,
> > Colin
> >
>


[jira] [Resolved] (KAFKA-7537) Only include live brokers in the UpdateMetadataRequest sent to existing brokers if there is no change in the partition states

2018-11-06 Thread Jun Rao (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-7537.

   Resolution: Fixed
Fix Version/s: 2.2.0

Merged the PR to trunk.

> Only include live brokers in the UpdateMetadataRequest sent to existing 
> brokers if there is no change in the partition states
> -
>
> Key: KAFKA-7537
> URL: https://issues.apache.org/jira/browse/KAFKA-7537
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Reporter: Zhanxiang (Patrick) Huang
>Assignee: Zhanxiang (Patrick) Huang
>Priority: Major
> Fix For: 2.2.0
>
>
> Currently if when brokers join/leave the cluster without any partition states 
> changes, controller will send out UpdateMetadataRequests containing the 
> states of all partitions to all brokers. But for existing brokers in the 
> cluster, the metadata diff between controller and the broker should only be 
> the "live_brokers" info. Only the brokers with empty metadata cache need the 
> full UpdateMetadataRequest. Sending the full UpdateMetadataRequest to all 
> brokers can place nonnegligible memory pressure on the controller side.
> Let's say in total we have N brokers, M partitions in the cluster and we want 
> to add 1 brand new broker in the cluster. With RF=2, the memory footprint per 
> partition in the UpdateMetadataRequest is ~200 Bytes. In the current 
> controller implementation, if each of the N RequestSendThreads serializes and 
> sends out the UpdateMetadataRequest at roughly the same time (which is very 
> likely the case), we will end up using *(N+1)*M*200B*. In a large kafka 
> cluster, we can have:
> {noformat}
> N=99
> M=100k
> Memory usage to send out UpdateMetadataRequest to all brokers:
> 100 * 100K * 200B = 2G
> However, we only need to send out full UpdateMetadataRequest to the newly 
> added broker. We only need to include live broker ids (4B * 100 brokers) in 
> the UpdateMetadataRequest sent to the existing 99 brokers. So the amount of 
> data that is actully needed will be:
> 1 * 100K * 200B + 99 * (100 * 4B) = ~21M
> We will can potentially reduce 2G / 21M = ~95x memory footprint as well as 
> the data tranferred in the network.{noformat}
>  
> This issue kind of hurts the scalability of a kafka cluster. KIP-380 and 
> KAFKA-7186 also help to further reduce the controller memory footprint.
>  
> In terms of implementation, we can keep some in-memory state in the 
> controller side to differentiate existing brokers and uninitialized brokers 
> (e.g. brand new brokers) so that if there is no change in partition states, 
> we only send out live brokers info to existing brokers.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7602) Improve usage of @see tag in Streams JavaDocs

2018-11-06 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-7602:
--

 Summary: Improve usage of @see tag in Streams JavaDocs
 Key: KAFKA-7602
 URL: https://issues.apache.org/jira/browse/KAFKA-7602
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Matthias J. Sax


As discussed on this PR 
[https://github.com/apache/kafka/pull/5273/files/bd8410ed3d5be9ca89e963687aa05e953d712b62..e4e3eed141447baf1c70ff15e2dc0df4e9a33f12#r223510489]
 we extensively use `@see` tags in Streams API Java docs.

This ticket is about revisiting all public JavaDocs (KStream, KTable, 
KGroupedStream, KGroupedTable, etc) and to define and document (in the wiki) a 
coherent strategy about the usage of `@see` tag, with the goal to guide users 
on how to use the API, and not too use `@see` too often to avoid confusion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Jenkins build is back to normal : kafka-trunk-jdk8 #3182

2018-11-06 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-362: Dynamic Session Window Support

2018-11-06 Thread Matthias J. Sax
Any update on this KIP?

On 9/20/18 3:30 PM, Matthias J. Sax wrote:
> Thanks for following up. Very nice examples!
> 
> I think, that the window definition for Flink is semantically
> questionable. If there is only a single record, why is the window
> defined as [ts, ts+gap]? To me, this definition is not sound and seems
> to be arbitrary. To define the windows as [ts-gap,ts+gap] as you mention
> would be semantically more useful -- still, I think that defining the
> window as [ts,ts] as we do currently in Kafka Streams is semantically
> the best.
> 
> I have the impression, that Flink only defines them differently, because
> it solves the issues in the implementation. (Ie, an implementation
> details leaks into the semantics, what is usually not desired.)
> 
> However, I believe that we could change the implementation accordingly.
> We could store the windowed keys, as [ts-gap,ts+gap] (or [ts,ts+gap]) in
> RocksDB, but at API level we return [ts,ts]. This way, we can still find
> all windows we need and provide the same deterministic behavior and keep
> the current window boundaries on the semantic level (there is no need to
> store the window start and/or end time). With this technique, we can
> also implement dynamic session gaps. I think, we would need to store the
> used "gap" for each window, too. But again, this would be an
> implementation detail.
> 
> Let's see what others think.
> 
> One tricky question we would need to address is, how we can be backward
> compatible. I am currently working on KIP-258 that should help to
> address this backward compatibility issue though.
> 
> 
> -Matthias
> 
> 
> 
> On 9/19/18 5:17 PM, Lei Chen wrote:
>> Thanks Matthias. That makes sense.
>>
>> You're right that symmetric merge is necessary to ensure consistency. On
>> the other hand, I kinda feel it defeats the purpose of dynamic gap, which
>> is to update the gap from old value to new value. The symmetric merge
>> always honor the larger gap in both direction, rather than honor the gap
>> carried by record with larger timestamp. I wasn't able to find any semantic
>> definitions w.r.t this particular aspect online, but spent some time
>> looking into other streaming engines like Apache Flink.
>>
>> Apache Flink defines the window differently, that uses (start time, start
>> time + gap).
>>
>> so our previous example (10, 10), (19,5),(15,3) in Flink's case will be:
>> [10,20]
>> [19,24] => merged to [10,24]
>> [15,18] => merged to [10,24]
>>
>> while example (15,3)(19,5)(10,10) will be
>> [15,18]
>> [19,24] => no merge
>> [10,20] => merged to [10,24]
>>
>> however, since it only records gap in future direction, not past, a late
>> record might not trigger any merge where in symmetric merge it would.
>> (7,2),(10, 10), (19,5),(15,3)
>> [7,9]
>> [10,20]
>> [19,24] => merged to [10,24]
>> [15,18] => merged to [10,24]
>> so at the end
>> two windows [7,9][10,24] are there.
>>
>> As you can see, in Flink, the gap semantic is more toward to the way that,
>> a gap carried by one record only affects how this record merges with future
>> records. e.g. a later event (T2, G2) will only be merged with (T1, G1) is
>> T2 is less than T1+G1, but not when T1 is less than T2 - G2. Let's call
>> this "forward-merge" way of handling this. I just went thought some source
>> code and if my understanding is incorrect about Flink's implementation,
>> please correct me.
>>
>> On the other hand, if we want to do symmetric merge in Kafka Streams, we
>> can change the window definition to [start time - gap, start time + gap].
>> This way the example (7,2),(10, 10), (19,5),(15,3) will be
>> [5,9]
>> [0,20] => merged to [0,20]
>> [14,24] => merged to [0,24]
>> [12,18] => merged to [0,24]
>>
>>  (19,5),(15,3)(7,2),(10, 10) will generate same result
>> [14,24]
>> [12,18] => merged to [12,24]
>> [5,9] => no merge
>> [0,20] => merged to [0,24]
>>
>> Note that symmetric-merge would require us to change the way how Kafka
>> Steams fetch windows now, instead of fetching range from timestamp-gap to
>> timestamp+gap, we will need to fetch all windows that are not expired yet.
>> On the other hand, I'm not sure how this will impact the current logic of
>> how a window is considered as closed, because the window doesn't carry end
>> timestamp anymore, but end timestamp + gap.
>>
>> So do you guys think forward-merge approach used by Flink makes more sense
>> in Kafka Streams, or symmetric-merge makes more sense? Both of them seems
>> to me can give deterministic result.
>>
>> BTW I'll add the use case into original KIP.
>>
>> Lei
>>
>>
>> On Tue, Sep 11, 2018 at 5:45 PM Matthias J. Sax 
>> wrote:
>>
>>> Thanks for explaining your understanding. And thanks for providing more
>>> details about the use-case. Maybe you can add this to the KIP?
>>>
>>>
>>> First one general comment. I guess that my and Guozhangs understanding
>>> about gap/close/gracePeriod is the same as yours -- we might not have
>>> use the term precisely correct in previous email.
>>>

Re: [DISCUSS]KIP-216: IQ should throw different exceptions for different errors

2018-11-06 Thread Matthias J. Sax
Hey Vito,

I saw that you updated your PR, but did not reply to my last comments.
Any thoughts?


-Matthias

On 10/19/18 10:34 AM, Matthias J. Sax wrote:
> Glad to have you back Vito :)
> 
> Some follow up thoughts:
> 
>  - the current `InvalidStateStoreException` is documents as being
> sometimes retry-able. From the JavaDocs:
> 
>> These exceptions may be transient [...] Hence, it is valid to backoff and 
>> retry when handling this exception.
> 
> I am wondering what the semantic impact/change is, if we introduce
> `RetryableStateStoreException` and `FatalStateStoreException` that both
> inherit from it. While I like the introduction of both from a high level
> point of view, I just want to make sure it's semantically sound and
> backward compatible. Atm, I think it's fine, but I want to point it out
> such that everybody can think about this, too, so we can verify that
> it's a natural evolving API change.
> 
>  - StateStoreClosedException:
> 
>> will be wrapped to StateStoreMigratedException or 
>> StateStoreNotAvailableException later.
> 
> Can you clarify the cases (ie, when will it be wrapped with the one or
> the other)?
> 
>  - StateStoreIsEmptyException:
> 
> I don't understand the semantic of this exception. Maybe it's a naming
> issue?
> 
>> will be wrapped to StateStoreMigratedException or 
>> StateStoreNotAvailableException later.
> 
> Also, can you clarify the cases (ie, when will it be wrapped with the
> one or the other)?
> 
> 
> I am also wondering, if we should introduce a fatal exception
> `UnkownStateStoreException` to tell users that they passed in an unknown
> store name?
> 
> 
> 
> -Matthias
> 
> 
> 
> On 10/17/18 8:14 PM, vito jeng wrote:
>> Just open a PR for further discussion:
>> https://github.com/apache/kafka/pull/5814
>>
>> Any suggestion is welcome.
>> Thanks!
>>
>> ---
>> Vito
>>
>>
>> On Thu, Oct 11, 2018 at 12:14 AM vito jeng  wrote:
>>
>>> Hi John,
>>>
>>> Thanks for reviewing the KIP.
>>>
 I didn't follow the addition of a new method to the QueryableStoreType
 interface. Can you elaborate why this is necessary to support the new
 exception types?
>>>
>>> To support the new exception types, I would check stream state in the
>>> following classes:
>>>   - CompositeReadOnlyKeyValueStore class
>>>   - CompositeReadOnlySessionStore class
>>>   - CompositeReadOnlyWindowStore class
>>>   - DelegatingPeekingKeyValueIterator class
>>>
>>> It is also necessary to keep backward compatibility. So I plan passing
>>> stream
>>> instance to QueryableStoreType instance during KafkaStreams#store()
>>> invoked.
>>> It looks a most simple way, I think.
>>>
>>> It is why I add a new method to the QueryableStoreType interface. I can
>>> understand
>>> that we should try to avoid adding the public api method. However, at the
>>> moment
>>> I have no better ideas.
>>>
>>> Any thoughts?
>>>
>>>
 Also, looking over your KIP again, it seems valuable to introduce
 "retriable store exception" and "fatal store exception" marker interfaces
 that the various exceptions can mix in. It would be nice from a usability
 perspective to be able to just log and retry on any "retriable" exception
 and log and shutdown on any fatal exception.
>>>
>>> I agree that this is valuable to the user.
>>> I'll update the KIP.
>>>
>>>
>>> Thanks
>>>
>>>
>>> ---
>>> Vito
>>>
>>>
>>> On Tue, Oct 9, 2018 at 2:30 AM John Roesler  wrote:
>>>
 Hi Vito,

 I'm glad to hear you're well again!

 I didn't follow the addition of a new method to the QueryableStoreType
 interface. Can you elaborate why this is necessary to support the new
 exception types?

 Also, looking over your KIP again, it seems valuable to introduce
 "retriable store exception" and "fatal store exception" marker interfaces
 that the various exceptions can mix in. It would be nice from a usability
 perspective to be able to just log and retry on any "retriable" exception
 and log and shutdown on any fatal exception.

 Thanks,
 -John

 On Fri, Oct 5, 2018 at 11:47 AM Guozhang Wang  wrote:

> Thanks for the explanation, that makes sense.
>
>
> Guozhang
>
>
> On Mon, Jun 25, 2018 at 2:28 PM, Matthias J. Sax 
> wrote:
>
>> The scenario I had I mind was, that KS is started in one thread while
 a
>> second thread has a reference to the object to issue queries.
>>
>> If a query is issue before the "main thread" started KS, and the
 "query
>> thread" knows that it will eventually get started, it can retry. On
 the
>> other hand, if KS is in state PENDING_SHUTDOWN or DEAD, it is
 impossible
>> to issue any query against it now or in the future and thus the error
 is
>> not retryable.
>>
>>
>> -Matthias
>>
>> On 6/25/18 10:15 AM, Guozhang Wang wrote:
>>> I'm wondering if StreamThreadNotStarted could be merged into
>>> StreamThreadNotRunning, 

Build failed in Jenkins: kafka-2.0-jdk8 #183

2018-11-06 Thread Apache Jenkins Server
See 


Changes:

[lindong28] KAFKA-7313; StopReplicaRequest should attempt to remove future 
replica

--
[...truncated 887.49 KB...]
kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOfflinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionZkUtilsExceptionFromCreateStates 
STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionZkUtilsExceptionFromCreateStates 
PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNewPartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNewPartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNewPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionErrorCodeFromStateLookup STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionErrorCodeFromStateLookup PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransitionForControlledShutdown STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransitionForControlledShutdown PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionZkUtilsExceptionFromStateLookup 
STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionZkUtilsExceptionFromStateLookup 
PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOfflinePartitionToNewPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOfflinePartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransition PASSED

kafka.controller.ControllerEventManagerTest > testEventThatThrowsException 
STARTED

kafka.controller.ControllerEventManagerTest > testEventThatThrowsException 
PASSED

kafka.controller.ControllerEventManagerTest > testSuccessfulEvent STARTED

kafka.controller.ControllerEventManagerTest > testSuccessfulEvent PASSED

kafka.controller.ControllerFailoverTest > testHandleIllegalStateException 
STARTED

kafka.controller.ControllerFailoverTest > testHandleIllegalStateException PASSED

kafka.network.SocketServerTest > testGracefulClose STARTED

kafka.network.SocketServerTest > testGracefulClose PASSED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingAlreadyDone STARTED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingAlreadyDone PASSED

kafka.network.SocketServerTest > controlThrowable STARTED

kafka.network.SocketServerTest > controlThrowable PASSED

kafka.network.SocketServerTest > testRequestMetricsAfterStop STARTED

kafka.network.SocketServerTest > testRequestMetricsAfterStop PASSED

kafka.network.SocketServerTest > testConnectionIdReuse STARTED

kafka.network.SocketServerTest > testConnectionIdReuse PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
STARTED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

kafka.network.SocketServerTest > testProcessorMetricsTags STARTED

kafka.network.SocketServerTest > testProcessorMetricsTags PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIp STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > testConnectionId STARTED

kafka.network.SocketServerTest > testConnectionId PASSED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics STARTED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics PASSED

kafka.network.SocketServerTest > testNoOpAction STARTED

kafka.network.SocketServerTest > testNoOpAction PASSED

kafka.network.SocketServerTest > simpleRequest STARTED

kafka.network.SocketServerTest > 

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-11-06 Thread Matthias J. Sax
Hey,

there was quite a pause on this KIP discussion and in the mean time, a
new design for incremental cooporative rebalance was suggested:

https://cwiki.apache.org/confluence/display/KAFKA/Incremental+Cooperative+Rebalancing%3A+Support+and+Policies


We should make sure that the proposal and this KIP align to each other.
Thoughts?


-Matthias

On 11/5/18 7:31 PM, Boyang Chen wrote:
> Hey Mike,
> 
> 
> thanks for the feedback, the two question are very thoughtful!
> 
> 
>> 1) I am a little confused about the distinction for the leader. If the 
>> consumer node that was assigned leader does a bounce (goes down and quickly 
>> comes up) to update application code, will a rebalance be triggered? I > do 
>> not think a bounce of the leader should trigger a rebalance.
> 
> For Q1 my intention was to minimize the change within one KIP, since the 
> leader rejoining case could be addressed separately.
> 
> 
>> 2) The timeout for shrink up makes a lot of sense and allows to gracefully 
>> increase the number of nodes in the cluster. I think we need to support 
>> graceful shrink down as well. If I set the registration timeout to 5 minutes 
>> > to handle rolling restarts or intermittent failures without shuffling 
>> state, I don't want to wait 5 minutes in order for the group to rebalance if 
>> I am intentionally removing a node from the cluster. I am not sure the best 
>> way to > do this. One idea I had was adding the ability for a CLI or Admin 
>> API to force a rebalance of the group. This would allow for an admin to 
>> trigger the rebalance manually without waiting the entire registration 
>> timeout on > shrink down. What do you think?
> 
> For 2) my understanding is that for scaling down case it is better to be 
> addressed by CLI tool than code logic, since only by human evaluation we 
> could decide whether it is a "right timing" -- the time when all the scaling 
> down consumers are offline -- to kick in rebalance. Unless we introduce 
> another term on coordinator which indicates the target consumer group size, 
> broker will find it hard to decide when to start rebalance. So far I prefer 
> to hold the implementation for that, but agree we could discuss whether we 
> want to introduce admin API in this KIP or a separate one.
> 
> 
> Thanks again for the proposed ideas!
> 
> 
> Boyang
> 
> 
> From: Mike Freyberger 
> Sent: Monday, November 5, 2018 6:13 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by 
> specifying member id
> 
> Boyang,
> 
> Thanks for updating the KIP. It's shaping up well. Two things:
> 
> 1) I am a little confused about the distinction for the leader. If the 
> consumer node that was assigned leader does a bounce (goes down and quickly 
> comes up) to update application code, will a rebalance be triggered? I do not 
> think a bounce of the leader should trigger a rebalance.
> 
> 2) The timeout for shrink up makes a lot of sense and allows to gracefully 
> increase the number of nodes in the cluster. I think we need to support 
> graceful shrink down as well. If I set the registration timeout to 5 minutes 
> to handle rolling restarts or intermittent failures without shuffling state, 
> I don't want to wait 5 minutes in order for the group to rebalance if I am 
> intentionally removing a node from the cluster. I am not sure the best way to 
> do this. One idea I had was adding the ability for a CLI or Admin API to 
> force a rebalance of the group. This would allow for an admin to trigger the 
> rebalance manually without waiting the entire registration timeout on shrink 
> down. What do you think?
> 
> Mike
> 
> On 10/30/18, 1:55 AM, "Boyang Chen"  wrote:
> 
> Btw, I updated KIP 345 based on my understanding. Feel free to take 
> another round of look:
> 
> 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A+Introduce+static+membership+protocol+to+reduce+consumer+rebalances
> KIP-345: Introduce static membership protocol to reduce 
> ...
> cwiki.apache.org
> For stateful applications, one of the biggest performance bottleneck is the 
> state shuffling. In Kafka consumer, there is a concept called "rebalance" 
> which means that for given M partitions and N consumers in one consumer 
> group, Kafka will try to balance the load between consumers and ideally have 
> ...
> 
> 
> 
> 
> KIP-345: Introduce static membership protocol to reduce 
> ...
> cwiki.apache.org
> For stateful applications, one of the biggest performance bottleneck is 
> the state shuffling. In Kafka consumer, there is a concept called "rebalance" 
> which means that for given M partitions and N consumers in one consumer 
> group, Kafka will 

[jira] [Reopened] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema

2018-11-06 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reopened KAFKA-7481:
-

> Consider options for safer upgrade of offset commit value schema
> 
>
> Key: KAFKA-7481
> URL: https://issues.apache.org/jira/browse/KAFKA-7481
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> KIP-211 and KIP-320 add new versions of the offset commit value schema. The 
> use of the new schema version is controlled by the 
> `inter.broker.protocol.version` configuration.  Once the new inter-broker 
> version is in use, it is not possible to downgrade since the older brokers 
> will not be able to parse the new schema. 
> The options at the moment are the following:
> 1. Do nothing. Users can try the new version and keep 
> `inter.broker.protocol.version` locked to the old release. Downgrade will 
> still be possible, but users will not be able to test new capabilities which 
> depend on inter-broker protocol changes.
> 2. Instead of using `inter.broker.protocol.version`, we could use 
> `message.format.version`. This would basically extend the use of this config 
> to apply to all persistent formats. The advantage is that it allows users to 
> upgrade the broker and begin using the new inter-broker protocol while still 
> allowing downgrade. But features which depend on the persistent format could 
> not be tested.
> Any other options?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7481) Consider options for safer upgrade of offset commit value schema

2018-11-06 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7481.
-
Resolution: Fixed

> Consider options for safer upgrade of offset commit value schema
> 
>
> Key: KAFKA-7481
> URL: https://issues.apache.org/jira/browse/KAFKA-7481
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Blocker
> Fix For: 2.1.0
>
>
> KIP-211 and KIP-320 add new versions of the offset commit value schema. The 
> use of the new schema version is controlled by the 
> `inter.broker.protocol.version` configuration.  Once the new inter-broker 
> version is in use, it is not possible to downgrade since the older brokers 
> will not be able to parse the new schema. 
> The options at the moment are the following:
> 1. Do nothing. Users can try the new version and keep 
> `inter.broker.protocol.version` locked to the old release. Downgrade will 
> still be possible, but users will not be able to test new capabilities which 
> depend on inter-broker protocol changes.
> 2. Instead of using `inter.broker.protocol.version`, we could use 
> `message.format.version`. This would basically extend the use of this config 
> to apply to all persistent formats. The advantage is that it allows users to 
> upgrade the broker and begin using the new inter-broker protocol while still 
> allowing downgrade. But features which depend on the persistent format could 
> not be tested.
> Any other options?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7559) ConnectStandaloneFileTest system tests do not pass

2018-11-06 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-7559.
-
Resolution: Fixed

> ConnectStandaloneFileTest system tests do not pass
> --
>
> Key: KAFKA-7559
> URL: https://issues.apache.org/jira/browse/KAFKA-7559
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Stanislav Kozlovski
>Assignee: Randall Hauch
>Priority: Major
> Fix For: 2.0.1, 2.1.0
>
>
> Both tests `test_skip_and_log_to_dlq` and `test_file_source_and_sink` under 
> `kafkatest.tests.connect.connect_test.ConnectStandaloneFileTest` fail with 
> error messages similar to:
> "TimeoutError: Kafka Connect failed to start on node: ducker@ducker04 in 
> condition mode: LISTEN"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Tentative release date for KAFKA-7280 and its possible workaround

2018-11-06 Thread sachin upadhyay
Hi,I am using Kafka 1.1.0 java client and my brokers are also on the same 
version. I am running into the same issue as reported in [KAFKA-7280] 
ConcurrentModificationException in FetchSessionHandler in heartbeat thread - 
ASF JIRA. 

| 
| 
|  | 
[KAFKA-7280] ConcurrentModificationException in FetchSessionHandler in h...


 |

 |

 |



The fix versions for this are not publicly available yet. What are the 
plausible workarounds for now? Once the heartbeat thread crashes, will it be 
restarted on next poll() or only on subscribe() method.

Thanks,-Sachin


Re: [VOTE] 2.1.0 RC0

2018-11-06 Thread Dong Lin
Hey Satish,

Yes! We will have another RC to include e.g.
https://github.com/apache/kafka/pull/5857.

Thanks,
Dong

On Mon, Nov 5, 2018 at 8:14 PM Satish Duggana 
wrote:

> Hi Dong,
> Is there a RC1 planned with configs documentation fixes and
> https://github.com/apache/kafka/pull/5857 ?
>
> Thanks,
> Satish.
> On Thu, Nov 1, 2018 at 4:05 PM Jakub Scholz  wrote:
> >
> > +1 (non-binding) ... I used the staged binaries and checked it with
> > different clients.
> >
> > On Wed, Oct 24, 2018 at 10:17 AM Dong Lin  wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the first candidate for feature release of Apache Kafka 2.1.0.
> > >
> > > This is a major version release of Apache Kafka. It includes 28 new
> KIPs
> > > and
> > >
> > > critical bug fixes. Please see the Kafka 2.1.0 release plan for more
> > > details:
> > >
> > > *
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
> > > <
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
> > > >
> > >
> > > Here are a few notable highlights:
> > >
> > > - Java 11 support
> > > - Support for Zstandard, which achieves compression comparable to gzip
> with
> > > higher compression and especially decompression speeds(KIP-110)
> > > - Avoid expiring committed offsets for active consumer group (KIP-211)
> > > - Provide Intuitive User Timeouts in The Producer (KIP-91)
> > > - Kafka's replication protocol now supports improved fencing of
> zombies.
> > > Previously, under certain rare conditions, if a broker became
> partitioned
> > > from Zookeeper but not the rest of the cluster, then the logs of
> replicated
> > > partitions could diverge and cause data loss in the worst case
> (KIP-320)
> > > - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353,
> KIP-356)
> > > - Admin script and admin client API improvements to simplify admin
> > > operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> > > - DNS handling improvements (KIP-235, KIP-302)
> > >
> > > Release notes for the 2.1.0 release:
> > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote ***
> > >
> > > * Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/javadoc/
> > >
> > > * Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc0 tag:
> > > https://github.com/apache/kafka/tree/2.1.0-rc0
> > >
> > > * Documentation:
> > > *http://kafka.apache.org/21/documentation.html*
> > > 
> > >
> > > * Protocol:
> > > http://kafka.apache.org/21/protocol.html
> > >
> > > * Successful Jenkins builds for the 2.1 branch:
> > > Unit/integration tests: *
> https://builds.apache.org/job/kafka-2.1-jdk8/38/
> > > *
> > >
> > > Please test and verify the release artifacts and submit a vote for
> this RC,
> > > or report any issues so we can fix them and get a new RC out ASAP.
> Although
> > > this release vote requires PMC votes to pass, testing, votes, and bug
> > > reports are valuable and appreciated from everyone.
> > >
> > > Cheers,
> > > Dong
> > >
>


[jira] [Resolved] (KAFKA-7595) Kafka Streams: KTrable to KTable join introduces duplicates in downstream KTable

2018-11-06 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7595.

Resolution: Not A Bug

> Kafka Streams: KTrable to KTable join introduces duplicates in downstream 
> KTable
> 
>
> Key: KAFKA-7595
> URL: https://issues.apache.org/jira/browse/KAFKA-7595
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.0.0
>Reporter: Vik Gamov
>Priority: Major
>
> When perform KTable to KTable join after aggregation, there are duplicates in 
> resulted KTable.
> 1. caching disabled, no materialized => duplicates
> {{streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 
> 0);}}
> {{KTable ratingCounts = ratingsById.count();}}
> {{KTable ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2);}}
> {{KTable ratingAverage = ratingSums.join(ratingCounts,}}
> {{ (sum, count) -> sum / count.doubleValue());}}
> 2. caching disabled, materialized => duplicate
> {{streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 
> 0);}}{{KTable ratingCounts = ratingsById.count();}}
> {{KTable ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2);}}
> {{KTable ratingAverage = ratingSums.join(ratingCounts,}}
> {{ (sum, count) -> sum / count.doubleValue(),}}
> {{ Materialized.as("average-ratings"));}}
> 3. caching enabled, materiazlized => all good
> {{// Enable record cache of size 10 MB.}}
> {{streamsConfiguration.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 10 
> * 1024 * 1024L);}}
> {{// Set commit interval to 1 second.}}
> {{streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 
> 1000);}}{{KTable ratingCounts = ratingsById.count();}}
> {{KTable ratingSums = ratingsById.reduce((v1, v2) -> v1 + v2);}}
> {{KTable ratingAverage = ratingSums.join(ratingCounts,}}
> {{ (sum, count) -> sum / count.doubleValue(),}}
> {{ Materialized.as("average-ratings"));}}
>  
> Demo app 
> [https://github.com/tlberglund/streams-movie-demo/blob/master/streams/src/main/java/io/confluent/demo/StreamsDemo.java#L107]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7601) Handle message format downgrades during upgrade of message format version

2018-11-06 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-7601:
--

 Summary: Handle message format downgrades during upgrade of 
message format version
 Key: KAFKA-7601
 URL: https://issues.apache.org/jira/browse/KAFKA-7601
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


During an upgrade of the message format, there is a short time during which the 
configured message format version is not consistent across all replicas of a 
partition. As long as all brokers are on the same version, this typically does 
not cause any problems. Followers will take whatever message format is used by 
the leader. However, it is possible for leadership to change several times 
between brokers which support the new format and those which support the old 
format. This can cause the version used in the log to flap between the 
different formats until the upgrade is complete. 

For example, suppose broker 1 has been updated to use v2 and broker 2 is still 
on v1. When broker 1 is the leader, all new messages will be written in the v2 
format. When broker 2 is leader, v1 will be used. If there is any instability 
in the cluster or if completion of the update is delayed, then the log will be 
seen to switch back and forth between v1 and v2. Once the update is completed 
and broker 1 begins using v2, then the message format will stabilize and 
everything will generally be ok.

Downgrades of the message format are problematic, even if they are just 
temporary. There are basically two issues:

1. We use the configured message format version to tell whether down-conversion 
is needed. We assume that the this is always the maximum version used in the 
log, but that assumption fails in the case of a downgrade. In the worst case, 
old clients will see the new format and likely fail.

2. The logic we use for finding the truncation offset during the become 
follower transition does not handle flapping between message formats. When the 
new format is used by the leader, then the epoch cache will be updated 
correctly. When the old format is in use, the epoch cache won't be updated. 
This can lead to an incorrect result to OffsetsForLeaderEpoch queries.

For the second point, the specific case we observed is something like this. 
Broker 1 is the leader of epoch 0 and writes some messages to the log using the 
v2 message format. Broker 2 then becomes the leader for epoch 1 and writes some 
messages in the v2 format. On broker 2, the last entry in the epoch cache is 
epoch 0. No entry is written for epoch 1 because it uses the old format. When 
broker 1 became a follower, it send an OffsetsForLeaderEpoch query to broker 2 
for epoch 0. Since epoch 0 was the last entry in the cache, the log end offset 
was returned. This resulted in localized log divergence.

There are a few options to fix this problem. From a high level, we can either 
be stricter about preventing downgrades of the message format, or we can add 
additional logic to make downgrades safe. 

(Disallow downgrades): As an example of the first approach, the leader could 
always use the maximum of the last version written to the log and the 
configured message format version. 

(Allow downgrades): If we want to allow downgrades, then it make makes sense to 
invalidate and remove all entries in the epoch cache following the message 
format downgrade. We would also need a solution for the problem of detecting 
when down-conversion is needed for a fetch request. One option I've been 
thinking about is enforcing the invariant that each segment uses only one 
message format version. Whenever the message format changes, we need to roll a 
new segment. Then we can simply remember which format is in use by each segment 
to tell whether down-conversion is needed for a given fetch request.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-386: Standardize on Min/Avg/Max metrics' default values

2018-11-06 Thread Harsha Chintalapani
+1 (binding)

-Harsha
On Nov 6, 2018, 8:04 AM -0800, Kevin Lu , wrote:
> +1 (non-binding)
>
> Regards,
> Kevin
>
> On Tue, Nov 6, 2018 at 1:01 AM Stanislav Kozlovski 
> wrote:
>
> > Hey everybody,
> >
> > I'm starting a vote thread on KIP-386: Standardize on Min/Avg/Max metrics'
> > default values
> >  > >
> > In short, after the discussion thread
> > <
> > https://lists.apache.org/thread.html/78b501fb296b05ee7da601dba1558f4b9014fbb50f663bf14e5aec67@%3Cdev.kafka.apache.org%3E
> > > ,
> > we decided to have all min/avg/max metrics output `NaN` as a default value.
> >
> > --
> > Best,
> > Stanislav
> >


Re: [VOTE] KIP-386: Standardize on Min/Avg/Max metrics' default values

2018-11-06 Thread Kevin Lu
+1 (non-binding)

Regards,
Kevin

On Tue, Nov 6, 2018 at 1:01 AM Stanislav Kozlovski 
wrote:

> Hey everybody,
>
> I'm starting a vote thread on KIP-386: Standardize on Min/Avg/Max metrics'
> default values
>  >
> In short, after the discussion thread
> <
> https://lists.apache.org/thread.html/78b501fb296b05ee7da601dba1558f4b9014fbb50f663bf14e5aec67@%3Cdev.kafka.apache.org%3E
> >,
> we decided to have all min/avg/max metrics output `NaN` as a default value.
>
> --
> Best,
> Stanislav
>


[jira] [Created] (KAFKA-7600) Provide capability to rename cluster id

2018-11-06 Thread Yeva Byzek (JIRA)
Yeva Byzek created KAFKA-7600:
-

 Summary: Provide capability to rename cluster id
 Key: KAFKA-7600
 URL: https://issues.apache.org/jira/browse/KAFKA-7600
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Yeva Byzek


Enhancement suggestion: ability to configure the cluster id that is displayed 
in ZK {{/cluster/id}} to be something more human readable like {{datacenter1}} 
instead of random characters like {{YLD3M3faTFG7ftEvoDGn5Q}}.

Value add: downstream clients that use the cluster id can present users with a 
more meaningful cluster identification

Other relevant links: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7599) Trogdor - Allow configuration for not throttling Benchmark Workers and expose messages per second

2018-11-06 Thread Stanislav Kozlovski (JIRA)
Stanislav Kozlovski created KAFKA-7599:
--

 Summary: Trogdor - Allow configuration for not throttling 
Benchmark Workers and expose messages per second
 Key: KAFKA-7599
 URL: https://issues.apache.org/jira/browse/KAFKA-7599
 Project: Kafka
  Issue Type: Improvement
Reporter: Stanislav Kozlovski
Assignee: Stanislav Kozlovski


In Trogdor, the ConsumeBench, ProduceBench and RoundTrip workers all take in an 
argument called "targetMessagesPerSec". That argument works as an upper bound 
on the number of messages that can be consumed/produced per second in that 
worker.

It is useful to support infinite messages per second. Currently, if the 
`targetMessagesPerSec` field is not present in the request, the RoundTripWorker 
will raise an exception, whereas the ConsumeBench and ProduceBench workers will 
work as if they had `targetMessagesPerSec=10`.

I propose we allow for unbounded `targetMessagesPerSec` if the field is not 
present.
Further, it would be very useful if some of these workers showed the 
`messagesPerSecond` they have been producing/consuming at. 
Even now, giving the worker a `targetMessagesPerSec` does not guarantee that 
the worker will reach the needed `targetMessagesPerSec`. There is no easy way 
of knowing how the worker performed - you have to subtract the status fields 
`startedMs` and `doneMs` to get the total duration of the task, convert to 
seconds and then divide that by the `maxMessages` field.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7598) SIGSEGV on scala library Set

2018-11-06 Thread Antoine Tran (JIRA)
Antoine Tran created KAFKA-7598:
---

 Summary: SIGSEGV on scala library Set
 Key: KAFKA-7598
 URL: https://issues.apache.org/jira/browse/KAFKA-7598
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 1.0.0
 Environment: Docker CentOs image 7.3.1611 upgraded to 7.4.1708 
(https://hub.docker.com/r/library/centos/tags/)
java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.x86_64
Reporter: Antoine Tran


We had a crash, that appears randomly, with a SIGSEGV to the scala Set library:

 

{color:#FF}[2018-09-24 20:52:04,568] INFO Updated PartitionLeaderEpoch. 
New: \{epoch:0, offset:0}, Current: \{epoch:-1, offset-1} for Partition: 
fac---MTI2RTM-rtmForecast2d-RCFD_2017100_0260-4. Cache now contains 0 
entries. (kafka.server.epoch.LeaderEpochFileCache){color}
{color:#FF} #{color}
{color:#FF} # A fatal error has been detected by the Java Runtime 
Environment:{color}
{color:#FF} #{color}
{color:#FF} #  SIGSEGV (0xb) at pc=0x7fdb3998c814, pid=1, 
tid=0x7fd9a4588700{color}
{color:#FF} #{color}
{color:#FF} # JRE version: OpenJDK Runtime Environment (8.0_161-b14) (build 
1.8.0_161-b14){color}
{color:#FF} # Java VM: OpenJDK 64-Bit Server VM (25.161-b14 mixed mode 
linux-amd64 compressed oops){color}
{color:#FF} # Problematic frame:{color}
{color:#FF} # J 6249 C2 
scala.collection.immutable.Set$EmptySet$.$minus(Ljava/lang/Object;)Lscala/collection/generic/Subtractable;
 (6 bytes) @ 0x7fdb3998c814 [0x7fdb3998c7e0+0x34]{color}
{color:#FF} #{color}
{color:#FF} # Core dump written. Default location: //core or core.1{color}
{color:#FF} #{color}
{color:#FF} # An error report file with more information is saved as:{color}
{color:#FF} # //hs_err_pid1.log{color}
{color:#FF} #{color}
{color:#FF} # If you would like to submit a bug report, please visit:{color}
{color:#FF} #   [http://bugreport.java.com/bugreport/crash.jsp]{color}
{color:#FF} #{color}

[error occurred during error reporting , id 0xb]

I couldn't have the core dump for now, I asked our team for it next time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7597) Trogdor - Support transactions in ProduceBenchWorker

2018-11-06 Thread Stanislav Kozlovski (JIRA)
Stanislav Kozlovski created KAFKA-7597:
--

 Summary: Trogdor - Support transactions in ProduceBenchWorker
 Key: KAFKA-7597
 URL: https://issues.apache.org/jira/browse/KAFKA-7597
 Project: Kafka
  Issue Type: Improvement
Reporter: Stanislav Kozlovski
Assignee: Stanislav Kozlovski


The `ProduceBenchWorker` allows you to schedule a benchmark of a Kafka Producer.
It would prove useful if we supported transactions in this producer, as to 
allow benchmarks with transactions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-trunk-jdk8 #3181

2018-11-06 Thread Apache Jenkins Server
See 


Changes:

[ismael] KAFKA-7559: Correct standalone system tests to use the correct external

--
[...truncated 2.73 MB...]
org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 

Jenkins build is back to normal : kafka-trunk-jdk11 #78

2018-11-06 Thread Apache Jenkins Server
See 




Build failed in Jenkins: kafka-2.1-jdk8 #49

2018-11-06 Thread Apache Jenkins Server
See 


Changes:

[ismael] KAFKA-7559: Correct standalone system tests to use the correct external

--
[...truncated 428.22 KB...]

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithInvalidRequestAddPartitionsToTransactionWhenTransactionalIdIsEmpty
 STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithInvalidRequestAddPartitionsToTransactionWhenTransactionalIdIsEmpty
 PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAppendPrepareAbortToLogOnEndTxnWhenStatusIsOngoingAndResultIsAbort STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAppendPrepareAbortToLogOnEndTxnWhenStatusIsOngoingAndResultIsAbort PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAcceptInitPidAndReturnNextPidWhenTransactionalIdIsNull STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAcceptInitPidAndReturnNextPidWhenTransactionalIdIsNull PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRemoveTransactionsForPartitionOnEmigration STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRemoveTransactionsForPartitionOnEmigration PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldWaitForCommitToCompleteOnHandleInitPidAndExistingTransactionInPrepareCommitState
 STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldWaitForCommitToCompleteOnHandleInitPidAndExistingTransactionInPrepareCommitState
 PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAbortExpiredTransactionsInOngoingStateAndBumpEpoch STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldAbortExpiredTransactionsInOngoingStateAndBumpEpoch PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldReturnInvalidTxnRequestOnEndTxnRequestWhenStatusIsCompleteCommitAndResultIsNotCommit
 STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldReturnInvalidTxnRequestOnEndTxnRequestWhenStatusIsCompleteCommitAndResultIsNotCommit
 PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldReturnOkOnEndTxnWhenStatusIsCompleteCommitAndResultIsCommit STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldReturnOkOnEndTxnWhenStatusIsCompleteCommitAndResultIsCommit PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithConcurrentTransactionsOnAddPartitionsWhenStateIsPrepareCommit 
STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithConcurrentTransactionsOnAddPartitionsWhenStateIsPrepareCommit 
PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldIncrementEpochAndUpdateMetadataOnHandleInitPidWhenExistingCompleteTransaction
 STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldIncrementEpochAndUpdateMetadataOnHandleInitPidWhenExistingCompleteTransaction
 PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldGenerateNewProducerIdIfEpochsExhausted STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldGenerateNewProducerIdIfEpochsExhausted PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithNotCoordinatorOnInitPidWhenNotCoordinator STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithNotCoordinatorOnInitPidWhenNotCoordinator PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithConcurrentTransactionOnAddPartitionsWhenStateIsPrepareAbort 
STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldRespondWithConcurrentTransactionOnAddPartitionsWhenStateIsPrepareAbort 
PASSED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldInitPidWithEpochZeroForNewTransactionalId STARTED

kafka.coordinator.transaction.TransactionCoordinatorTest > 
shouldInitPidWithEpochZeroForNewTransactionalId PASSED

kafka.network.SocketServerTest > testGracefulClose STARTED

kafka.network.SocketServerTest > testGracefulClose PASSED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingAlreadyDone STARTED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingAlreadyDone PASSED

kafka.network.SocketServerTest > controlThrowable STARTED

kafka.network.SocketServerTest > controlThrowable PASSED

kafka.network.SocketServerTest > testRequestMetricsAfterStop STARTED

kafka.network.SocketServerTest > testRequestMetricsAfterStop PASSED

kafka.network.SocketServerTest > testConnectionIdReuse STARTED

kafka.network.SocketServerTest > testConnectionIdReuse PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
STARTED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

[VOTE] KIP-386: Standardize on Min/Avg/Max metrics' default values

2018-11-06 Thread Stanislav Kozlovski
Hey everybody,

I'm starting a vote thread on KIP-386: Standardize on Min/Avg/Max metrics'
default values

In short, after the discussion thread
,
we decided to have all min/avg/max metrics output `NaN` as a default value.

-- 
Best,
Stanislav


Build failed in Jenkins: kafka-2.0-jdk8 #182

2018-11-06 Thread Apache Jenkins Server
See 


Changes:

[ismael] KAFKA-7559: Correct standalone system tests to use the correct external

--
[...truncated 435.72 KB...]
kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOfflinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOfflinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionZkUtilsExceptionFromCreateStates 
STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionZkUtilsExceptionFromCreateStates 
PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNewPartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNewPartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNewPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionErrorCodeFromStateLookup STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionErrorCodeFromStateLookup PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransitionForControlledShutdown STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransitionForControlledShutdown PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionZkUtilsExceptionFromStateLookup 
STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionZkUtilsExceptionFromStateLookup 
PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOfflinePartitionToNewPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOfflinePartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransition PASSED

kafka.controller.ControllerEventManagerTest > testEventThatThrowsException 
STARTED

kafka.controller.ControllerEventManagerTest > testEventThatThrowsException 
PASSED

kafka.controller.ControllerEventManagerTest > testSuccessfulEvent STARTED

kafka.controller.ControllerEventManagerTest > testSuccessfulEvent PASSED

kafka.controller.ControllerFailoverTest > testHandleIllegalStateException 
STARTED

kafka.controller.ControllerFailoverTest > testHandleIllegalStateException PASSED

kafka.network.SocketServerTest > testGracefulClose STARTED

kafka.network.SocketServerTest > testGracefulClose PASSED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingAlreadyDone STARTED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingAlreadyDone PASSED

kafka.network.SocketServerTest > controlThrowable STARTED

kafka.network.SocketServerTest > controlThrowable PASSED

kafka.network.SocketServerTest > testRequestMetricsAfterStop STARTED

kafka.network.SocketServerTest > testRequestMetricsAfterStop PASSED

kafka.network.SocketServerTest > testConnectionIdReuse STARTED

kafka.network.SocketServerTest > testConnectionIdReuse PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
STARTED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

kafka.network.SocketServerTest > testProcessorMetricsTags STARTED

kafka.network.SocketServerTest > testProcessorMetricsTags PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIp STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > testConnectionId STARTED

kafka.network.SocketServerTest > testConnectionId PASSED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics STARTED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics PASSED

kafka.network.SocketServerTest > testNoOpAction STARTED

kafka.network.SocketServerTest >