[jira] [Created] (KAFKA-9660) KAFKA-1 build a kafka-exporter by java

2020-03-04 Thread francis lee (Jira)
francis lee created KAFKA-9660:
--

 Summary:  KAFKA-1 build a kafka-exporter by java
 Key: KAFKA-9660
 URL: https://issues.apache.org/jira/browse/KAFKA-9660
 Project: Kafka
  Issue Type: Improvement
  Components: admin, metrics
Affects Versions: 2.0.0, 1.1.0, 0.10.2.0
 Environment: java8+
Reporter: francis lee






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


KIP-575: build a Kafka-Exporter by Java

2020-03-04 Thread ??????
hi all,
KIP-575
kafka is an excellent MQ running on JVM,  but no exporters JVMly. for a 
better future of  Kafka-Ecosystems
the Apache needs a formal exporter like 
https://github.com/apache/kafka-exporter. 
i wrote one for working, and hope to give to Apache. there are a lot of metric 
in JMX, it can be configed in the exporter-config.


if you are interested in it , join me!
if you are interested in it , join me!

if you are interested in it , join me!



for some metric list here:
kafka_AddPartitionsToTxn_50thPercentile
kafka_AddPartitionsToTxn_95thPercentile
kafka_AddPartitionsToTxn_999thPercentile
kafka_AddPartitionsToTxn_99thPercentile
kafka_AddPartitionsToTxn_Count
kafka_AddPartitionsToTxn_Max
kafka_AddPartitionsToTxn_Mean
kafka_AddPartitionsToTxn_MeanRate
kafka_AddPartitionsToTxn_Min
kafka_AddPartitionsToTxn_OneMinuteRate
kafka_AddPartitionsToTxn_StdDev
kafka_BrokerTopicMetrics_BytesInPerSec_Count
kafka_BrokerTopicMetrics_BytesInPerSec_MeanRate
kafka_BrokerTopicMetrics_BytesInPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_BytesOutPerSec_Count
kafka_BrokerTopicMetrics_BytesOutPerSec_MeanRate
kafka_BrokerTopicMetrics_BytesOutPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_BytesRejectedPerSec_Count
kafka_BrokerTopicMetrics_BytesRejectedPerSec_MeanRate
kafka_BrokerTopicMetrics_BytesRejectedPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_Count
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_FailedFetchRequestsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_Count
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_FailedProduceRequestsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_MessagesInPerSec_Count
kafka_BrokerTopicMetrics_MessagesInPerSec_MeanRate
kafka_BrokerTopicMetrics_MessagesInPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_Count
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_MeanRate
kafka_BrokerTopicMetrics_ProduceMessageConversionsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_Count
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_MeanRate
kafka_BrokerTopicMetrics_ReplicationBytesInPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_Count
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_MeanRate
kafka_BrokerTopicMetrics_ReplicationBytesOutPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_Count
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_TotalFetchRequestsPerSec_OneMinuteRate
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_Count
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_MeanRate
kafka_BrokerTopicMetrics_TotalProduceRequestsPerSec_OneMinuteRate
kafka_BytesInPerSec_Count
kafka_BytesInPerSec_FifteenMinuteRate
kafka_BytesInPerSec_FiveMinuteRate
kafka_BytesInPerSec_MeanRate
kafka_BytesInPerSec_OneMinuteRate
kafka_BytesOutPerSec_Count
kafka_BytesOutPerSec_FifteenMinuteRate
kafka_BytesOutPerSec_FiveMinuteRate
kafka_BytesOutPerSec_MeanRate
kafka_BytesOutPerSec_OneMinuteRate
kafka_BytesRejectedPerSec_Count
kafka_BytesRejectedPerSec_FifteenMinuteRate
kafka_BytesRejectedPerSec_FiveMinuteRate
kafka_BytesRejectedPerSec_MeanRate
kafka_BytesRejectedPerSec_OneMinuteRate
kafka_CreatePartitions_50thPercentile
kafka_CreatePartitions_95thPercentile
kafka_CreatePartitions_999thPercentile
kafka_CreatePartitions_99thPercentile
kafka_CreatePartitions_Count
kafka_CreatePartitions_Max
kafka_CreatePartitions_Mean
kafka_CreatePartitions_MeanRate
kafka_CreatePartitions_Min
kafka_CreatePartitions_OneMinuteRate
kafka_CreatePartitions_StdDev
kafka_CreateTopics_50thPercentile
kafka_CreateTopics_95thPercentile
kafka_CreateTopics_999thPercentile
kafka_CreateTopics_99thPercentile
kafka_CreateTopics_Count
kafka_CreateTopics_Max
kafka_CreateTopics_Mean
kafka_CreateTopics_MeanRate
kafka_CreateTopics_Min
kafka_CreateTopics_OneMinuteRate
kafka_CreateTopics_StdDev
kafka_DeleteGroups_50thPercentile
kafka_DeleteGroups_95thPercentile
kafka_DeleteGroups_999thPercentile
kafka_DeleteGroups_99thPercentile
kafka_DeleteGroups_Count
kafka_DeleteGroups_Max
kafka_DeleteGroups_Mean
kafka_DeleteGroups_Min
kafka_DeleteGroups_StdDev
kafka_DeleteTopics_50thPercentile
kafka_DeleteTopics_95thPercentile
kafka_DeleteTopics_999thPercentile
kafka_DeleteTopics_99thPercentile
kafka_DeleteTopics_Count
kafka_DeleteTopics_Max
kafka_DeleteTopics_Mean
kafka_DeleteTopics_MeanRate
kafka_DeleteTopics_Min
kafka_DeleteTopics_OneMinuteRate
kafka_DeleteTopics_StdDev
kafka_DescribeGroups_50thPercentile
kafka_DescribeGroups_95thPercentile
kafka_DescribeGroups_999thPercentile
kafka_DescribeGroups_99thPercentile
kafka_DescribeGroups_Count
kafka_DescribeGroups_Max
kafka_DescribeGroups_Mean
kafka_DescribeGroups_MeanRate
kafka_DescribeGroups_Min
kafka_DescribeGroups_OneMinuteRate
kafka_DescribeGroups_StdDev
kafka_FailedFetchRequestsPerSec_Count
kafka_FailedFetchRequestsPerSec_FifteenMinuteRate
kafka_F

[Kafka Config] Setting multiple timestamp.column.name

2020-03-04 Thread Hung X. Pham
Hi guys,
Sorry if this mail has bothered you. Currently, I want to set up a topic but it 
listen two column timestamp to consum data when the data is changed.
Example: timestamp.column.name = createddate or modifieddate

Thank for your help, glad to hear from you soon.

Thanks,
Hung Pham
Application Developer | Electronic Transaction Consultants Corporation (ETC)
1600 N. Collins Boulevard, Suite 4000, Richardson, TX 75080
(o) 214.615.2320



[jira] [Created] (KAFKA-9659) Kafka Streams / Consumer fails on "fatal exception: group.instance.id gets fenced"

2020-03-04 Thread Rohan Desai (Jira)
Rohan Desai created KAFKA-9659:
--

 Summary: Kafka Streams / Consumer fails on "fatal exception: 
group.instance.id gets fenced"
 Key: KAFKA-9659
 URL: https://issues.apache.org/jira/browse/KAFKA-9659
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Rohan Desai


I'm running a KSQL query, which underneath is built into a Kafka Streams 
application. The application has been running without issue for a few days, 
until today, when all the streams threads exited with:

 

```

[ERROR] 2020-03-05 00:57:58,776 
[_confluent-ksql-pksqlc-xm6g1query_CSAS_RATINGS_WITH_USER_AVERAGE_5-39e8046a-b6e6-44fd-8d6d-37cff78649bf-StreamThread-2]
 org.apache.kafka.clients.consumer.internals.AbstractCoordinator handle - 
[Consumer instanceId=ksql-1-2, 
clientId=_confluent-ksql-pksqlc-xm6g1query_CSAS_RATINGS_WITH_USER_AVERAGE_5-39e8046a-b6e6-44fd-8d6d-37cff78649bf-StreamThread-2-consumer,
 groupId=_confluent-ksql-pksqlc-xm6g1query_CSAS_RATINGS_WITH_USER_AVERAGE_5] 
Received fatal exception: group.instance.id gets fenced
[ERROR] 2020-03-05 00:57:58,776 
[_confluent-ksql-pksqlc-xm6g1query_CSAS_RATINGS_WITH_USER_AVERAGE_5-39e8046a-b6e6-44fd-8d6d-37cff78649bf-StreamThread-2]
 org.apache.kafka.clients.consumer.internals.AbstractCoordinator onFailure - 
[Consumer instanceId=ksql-1-2, 
clientId=_confluent-ksql-pksqlc-xm6g1query_CSAS_RATINGS_WITH_USER_AVERAGE_5-39e8046a-b6e6-44fd-8d6d-37cff78649bf-StreamThread-2-consumer,
 groupId=_confluent-ksql-pksqlc-xm6g1query_CSAS_RATINGS_WITH_USER_AVERAGE_5] 
Caught fenced group.instance.id Optional[ksql-1-2] error in heartbeat thread
[ERROR] 2020-03-05 00:57:58,776 
[_confluent-ksql-pksqlc-xm6g1query_CSAS_RATINGS_WITH_USER_AVERAGE_5-39e8046a-b6e6-44fd-8d6d-37cff78649bf-StreamThread-2]
 org.apache.kafka.streams.processor.internals.StreamThread run - stream-thread 
[_confluent-ksql-pksqlc-xm6g1query_CSAS_RATINGS_WITH_USER_AVERAGE_5-39e8046a-b6e6-44fd-8d6d-37cff78649bf-StreamThread-2]
 Encountered the following unexpected Kafka exception during processing, this 
usually indicate Streams internal errors:
org.apache.kafka.common.errors.FencedInstanceIdException: The broker rejected 
this static consumer since another consumer with the same group.instance.id has 
registered with a different member.id.
[INFO] 2020-03-05 00:57:58,776 
[_confluent-ksql-pksqlc-xm6g1query_CSAS_RATINGS_WITH_USER_AVERAGE_5-39e8046a-b6e6-44fd-8d6d-37cff78649bf-StreamThread-2]
 org.apache.kafka.streams.processor.internals.StreamThread setState - 
stream-thread 
[_confluent-ksql-pksqlc-xm6g1query_CSAS_RATINGS_WITH_USER_AVERAGE_5-39e8046a-b6e6-44fd-8d6d-37cff78649bf-StreamThread-2]
 State transition from RUNNING to PENDING_SHUTDOWN

```

 

This event coincided with a broker (broker 2) having some downtime (as measured 
by a healthchecking service which periodically pings it with a 
produce/consume). 

 

I've attached the KSQL and Kafka Streams logs to this ticket. Here's a summary 
for one of the streams threads (instance id `ksql-1-2`):

 

Around 00:56:36 the coordinator fails over from b11 to b2:

```

[INFO] 2020-03-05 00:56:36,258 
[_confluent-ksql-pksqlc-xm6g1query_CTAS_RATINGS_BY_USER_0-c1df9747-f353-47f1-82fd-30b97c20d038-StreamThread-2]
 org.apache.kafka.clients.consumer.internals.AbstractCoordinator handle - 
[Consumer instanceId=ksql-1-2, 
clientId=_confluent-ksql-pksqlc-xm6g1query_CTAS_RATINGS_BY_USER_0-c1df9747-f353-47f1-82fd-30b97c20d038-StreamThread-2-consumer,
 groupId=_confluent-ksql-pksqlc-xm6g1query_CTAS_RATINGS_BY_USER_0] Attempt to 
heartbeat failed since coordinator 
b11-pkc-lzxjz.us-west-2.aws.devel.cpdev.cloud:9092 (id: 2147483636 rack: null) 
is either not started or not valid.
[INFO] 2020-03-05 00:56:36,258 
[_confluent-ksql-pksqlc-xm6g1query_CTAS_RATINGS_BY_USER_0-c1df9747-f353-47f1-82fd-30b97c20d038-StreamThread-2]
 org.apache.kafka.clients.consumer.internals.AbstractCoordinator 
markCoordinatorUnknown - [Consumer instanceId=ksql-1-2, 
clientId=_confluent-ksql-pksqlc-xm6g1query_CTAS_RATINGS_BY_USER_0-c1df9747-f353-47f1-82fd-30b97c20d038-StreamThread-2-consumer,
 groupId=_confluent-ksql-pksqlc-xm6g1query_CTAS_RATINGS_BY_USER_0] Group 
coordinator b11-pkc-lzxjz.us-west-2.aws.devel.cpdev.cloud:9092 (id: 2147483636 
rack: null) is unavailable or invalid, will attempt rediscovery
[INFO] 2020-03-05 00:56:36,270 
[_confluent-ksql-pksqlc-xm6g1query_CTAS_RATINGS_BY_USER_0-c1df9747-f353-47f1-82fd-30b97c20d038-StreamThread-2]
 org.apache.kafka.clients.consumer.internals.AbstractCoordinator onSuccess - 
[Consumer instanceId=ksql-1-2, 
clientId=_confluent-ksql-pksqlc-xm6g1query_CTAS_RATINGS_BY_USER_0-c1df9747-f353-47f1-82fd-30b97c20d038-StreamThread-2-consumer,
 groupId=_confluent-ksql-pksqlc-xm6g1query_CTAS_RATINGS_BY_USER_0] Discovered 
group coordinator b2-pkc-lzxjz.us-west-2.aws.devel.cpdev.cloud:9092 (id: 
2147483645 rack: null)

```

 

A few seconds later, offset commits start 

Build failed in Jenkins: kafka-trunk-jdk11 #1212

2020-03-04 Thread Apache Jenkins Server
See 


Changes:

[github] MINOR: Add an extra check in StreamThreadTest (#8214)


--
[...truncated 5.87 MB...]
org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnStreamsTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfInMemoryBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfInMemoryBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourcesThatMatchMultiplePattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldPopulateGlobalStore[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
ena

[jira] [Created] (KAFKA-9658) Removing default user quota doesn't take effect until broker restart

2020-03-04 Thread Anna Povzner (Jira)
Anna Povzner created KAFKA-9658:
---

 Summary: Removing default user quota doesn't take effect until 
broker restart
 Key: KAFKA-9658
 URL: https://issues.apache.org/jira/browse/KAFKA-9658
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.3.1, 2.4.0, 2.2.2, 2.1.1, 2.0.1
Reporter: Anna Povzner
Assignee: Anna Povzner


To reproduce (for any quota type: produce, consume, and request):

Example with consumer quota, assuming no user/client quotas are set initially.
1. Set default user consumer quotas:

{{./kafka-configs.sh --zookeeper  --alter --add-config 
'consumer_byte_rate=1' --entity-type users --entity-default}}

{{2. Send some consume load for some user, say user1.}}

{{3. Remove default user consumer quota using:}}
{{./kafka-configs.sh --zookeeper  --alter --delete-config 
'consumer_byte_rate' --entity-type users --entity-default}}

Result: --describe (as below) returns correct result that there is no quota, 
but quota bound in ClientQuotaManager.metrics does not get updated for users 
that were sending load, which causes the broker to continue throttling requests 
with the previously set quota.
 {{/opt/confluent/bin/kafka-configs.sh --zookeeper   --describe 
--entity-type users --entity-default}}
{{}}{{}} 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9657) Add configurable throw on unsupported protocol

2020-03-04 Thread Boyang Chen (Jira)
Boyang Chen created KAFKA-9657:
--

 Summary: Add configurable throw on unsupported protocol
 Key: KAFKA-9657
 URL: https://issues.apache.org/jira/browse/KAFKA-9657
 Project: Kafka
  Issue Type: Sub-task
Reporter: Boyang Chen


Right now Stream could not handle the case when the brokers are downgraded, 
thus potentially could violate EOS requirement. We could add an (internal) 
config to either consumer or producer to actually crash on unsupported version 
when the broker connecting to is on an older version unexpectedly, to prevent 
this case from causing correctness concern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-519: Make SSL context/engine configuration extensible

2020-03-04 Thread Maulin Vasavada
Hi Rajini

I made changes suggested by you on
https://github.com/maulin-vasavada/kafka/pull/4. Please check.

In particular I had challenge in 'SslFactory#configure()' method the first
time to know which configs I have to add without having actual
SslEngineFactory impl. I think it is best to just copy all configs with
"ssl." prefix. Can you please review
https://github.com/maulin-vasavada/kafka/pull/4/files#diff-1e3432211fdbb7b2e2b44b5d8838a40bR90
 particularly?

Clement, I missed to see your point about Mode in previous post but even
after I realized what you are suggesting, my response would be the same as
before :)

Thanks
Maulin


On Wed, Feb 5, 2020 at 8:40 PM Maulin Vasavada 
wrote:

> Hi Rajini
>
> Will accommodate your comments.
>
> Celement, while SSLContext factories are common, in this particular case,
> we need SSLEngine object because Kafka is using SSLEngine (as mentioned
> much before in this email thread, the SSLContext acts as factory for
> getting SSLEngine, SSLSocketFactory or SSLServerSocketFactory and Kafka
> chooses SSLEngine to be used for SSL Connections). The 'Mode' challenge
> doesn't go away easily since Kafka is using the "same" classes for Client
> side and Server side. Other stacks where you don't see this challenge is
> because either libraries are client centric or server centric and not both
> at the same time. I would suggest you do a deeper dive into the sample Pull
> request, build the code to get better idea about it. I don't find it
> strange to have 'Mode' argument in this context of Kafka. Kafka is not
> doing anything out of norm here since ultimately it is using JSSE spec for
> SSL Connection.
>
> I will try to respond to code comments in couple of weeks since I am out
> for few weeks. Will keep you guys posted.
>
> Thanks
> Maulin
>
>
>
>
>
>
>
>
> On Wed, Feb 5, 2020 at 9:50 PM Pellerin, Clement 
> wrote:
>
>> Many of these points came up before.
>>
>> I had great hope when Maulin suggested the custom factory could
>> return an SSLContext instead of SSLEngine.  SSLContext factories are
>> common,
>> whereas I have never seen an SSLEngine factory being used before.
>> He must have hit the same problem I had with the Mode.
>>
>> If the Mode can be removed, can we find a way to return an SSLContext now?
>> What is so special about Kafka that it needs to hardcode the Mode when
>> everyone
>> else works with the SSLContext and ignores the other mode they don't use.
>>
>> -Original Message-
>> From: Rajini Sivaram [mailto:rajinisiva...@gmail.com]
>> Sent: Wednesday, February 5, 2020 10:03 AM
>> To: dev
>> Subject: Re: [DISCUSS] KIP-519: Make SSL context/engine configuration
>> extensible
>>
>> One more point:
>> 5) We should also add a method to SslEngineFactory that returns
>> `Set
>> reconfigurableConfigs()`
>>
>> On Wed, Feb 5, 2020 at 1:50 PM Rajini Sivaram 
>> wrote:
>>
>> > Hi Maulin,
>> >
>> > Thanks for the updates. A few comments below:
>> >
>> > 1) SslEngineFactory is currently in the internal package
>> > org.apache.kafka.common.security.ssl. We should move it to the public
>> > package org.apache.kafka.common.security.auth.
>> > 2) SslEngineFactory should extend Configurable and Closeable. We should
>> > expect the implementation class to have a default constructor and
>> invoke configure()
>> > to be consistent with otSslher pluggable classes.
>> > 3) SslEngineFactory.createSslEngine uses the internal enum `Mode`. It
>> > would be better to add two methods instead:
>> >
>> >
>> >- SSLEngine createClientSslEngine(String peerHost, int peerPort,
>> String endpointIdentification);
>> >- SSLEngine createServerSslEngine(String peerHost, int peerPort);
>> >
>> > 4) Don't think we need a method on SslEngineFactory to return configs.
>> It seems like an odd thing to do in a pubic Configurable API and is
>> inconsistent with other APIs. We can track configs in the internal
>> SslFactory class instead.
>>
>


[jira] [Created] (KAFKA-9656) TxnOffsetCommit should not return COORDINATOR_LOADING error for old request versions

2020-03-04 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-9656:
--

 Summary: TxnOffsetCommit should not return COORDINATOR_LOADING 
error for old request versions
 Key: KAFKA-9656
 URL: https://issues.apache.org/jira/browse/KAFKA-9656
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


In KAFKA-7296, we fixed a bug which causes the producer to enter a fatal state 
when the COORDINATOR_LOADING_IN_PROGRESS error is received. The impact of this 
bug in streams was that the application would crash. Generally we want users to 
upgrade to a later client version, but in some cases, this takes a long time. I 
am suggesting here that we revert this behavior for older versions of 
TxnOffsetCommit. For versions older than 2 (which was introduced in 2.3), 
rather than returning COORDINATOR_LOADING_IN_PROGRESS, we can return 
COORDINATOR_NOT_AVAILABLE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9240) Flaky test kafka.admin.ReassignPartitionsClusterTest.shouldMoveSinglePartitionWithinBroker

2020-03-04 Thread Matthias J. Sax (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-9240.

Resolution: Duplicate

Closing this as duplicate.

> Flaky test 
> kafka.admin.ReassignPartitionsClusterTest.shouldMoveSinglePartitionWithinBroker
> --
>
> Key: KAFKA-9240
> URL: https://issues.apache.org/jira/browse/KAFKA-9240
> Project: Kafka
>  Issue Type: Bug
>Reporter: Levani Kokhreidze
>Priority: Major
>  Labels: flaky-test
>
> {code:java}
> Error Messageorg.scalatest.exceptions.TestFailedException: Partition should 
> have been moved to the expected log 
> directoryStacktraceorg.scalatest.exceptions.TestFailedException: Partition 
> should have been moved to the expected log directory at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) 
> at 
> org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) 
> at org.scalatest.Assertions.fail(Assertions.scala:1091) at 
> org.scalatest.Assertions.fail$(Assertions.scala:1087) at 
> org.scalatest.Assertions$.fail(Assertions.scala:1389) at 
> kafka.admin.ReassignPartitionsClusterTest.shouldMoveSinglePartitionWithinBroker(ReassignPartitionsClusterTest.scala:176)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:412) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>  at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
>  at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
>  at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
>  at jdk.internal.reflect.GeneratedMethodA

[jira] [Created] (KAFKA-9655) Flaky Test SaslPlainSslEndToEndAuthorizationTest#testNoConsumeWithoutDescribeAclViaSubscribe

2020-03-04 Thread Matthias J. Sax (Jira)
Matthias J. Sax created KAFKA-9655:
--

 Summary: Flaky Test 
SaslPlainSslEndToEndAuthorizationTest#testNoConsumeWithoutDescribeAclViaSubscribe
 Key: KAFKA-9655
 URL: https://issues.apache.org/jira/browse/KAFKA-9655
 Project: Kafka
  Issue Type: Bug
  Components: core, security, unit tests
Reporter: Matthias J. Sax


[https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/983/testReport/junit/kafka.api/SaslPlainSslEndToEndAuthorizationTest/testNoConsumeWithoutDescribeAclViaSubscribe/]
{quote}
h3. Error Message
org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to 
access topics: [topic2]
h3. Stacktrace
org.apache.kafka.common.errors.TopicAuthorizationException: Not authorized to 
access topics: [topic2]
h3. Standard Output
Adding ACLs for resource `ResourcePattern(resourceType=CLUSTER, 
name=kafka-cluster, patternType=LITERAL)`: (principal=User:admin, host=*, 
operation=CLUSTER_ACTION, permissionType=ALLOW) Current ACLs for resource 
`ResourcePattern(resourceType=CLUSTER, name=kafka-cluster, 
patternType=LITERAL)`: (principal=User:admin, host=*, operation=CLUSTER_ACTION, 
permissionType=ALLOW) Adding ACLs for resource 
`ResourcePattern(resourceType=TOPIC, name=*, patternType=LITERAL)`: 
(principal=User:admin, host=*, operation=READ, permissionType=ALLOW) Current 
ACLs for resource `ResourcePattern(resourceType=TOPIC, name=*, 
patternType=LITERAL)`: (principal=User:admin, host=*, operation=READ, 
permissionType=ALLOW) [2020-03-04 19:38:33,457] ERROR [ReplicaFetcher 
replicaId=1, leaderId=2, fetcherId=0] Error for partition __consumer_offsets-0 
at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2020-03-04 19:38:33,464] ERROR 
[ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition 
__consumer_offsets-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2020-03-04 19:38:33,633] ERROR 
[ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition 
e2etopic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2020-03-04 19:38:33,634] ERROR 
[ReplicaFetcher replicaId=1, leaderId=2, fetcherId=0] Error for partition 
e2etopic-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. Adding ACLs for resource 
`ResourcePattern(resourceType=CLUSTER, name=kafka-cluster, 
patternType=LITERAL)`: (principal=User:admin, host=*, operation=CLUSTER_ACTION, 
permissionType=ALLOW) Current ACLs for resource 
`ResourcePattern(resourceType=CLUSTER, name=kafka-cluster, 
patternType=LITERAL)`: (principal=User:admin, host=*, operation=CLUSTER_ACTION, 
permissionType=ALLOW) Adding ACLs for resource 
`ResourcePattern(resourceType=TOPIC, name=*, patternType=LITERAL)`: 
(principal=User:admin, host=*, operation=READ, permissionType=ALLOW) Current 
ACLs for resource `ResourcePattern(resourceType=TOPIC, name=*, 
patternType=LITERAL)`: (principal=User:admin, host=*, operation=READ, 
permissionType=ALLOW) [2020-03-04 19:38:38,435] ERROR [ReplicaFetcher 
replicaId=1, leaderId=2, fetcherId=0] Error for partition __consumer_offsets-0 
at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. [2020-03-04 19:38:38,441] ERROR 
[ReplicaFetcher replicaId=0, leaderId=2, fetcherId=0] Error for partition 
__consumer_offsets-0 at offset 0 (kafka.server.ReplicaFetcherThread:76) 
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. Adding ACLs for resource 
`ResourcePattern(resourceType=TOPIC, name=e2etopic, patternType=LITERAL)`: 
(principal=User:user, host=*, operation=DESCRIBE, permissionType=ALLOW) 
(principal=User:user, host=*, operation=WRITE, permissionType=ALLOW) 
(principal=User:user, host=*, operation=CREATE, permissionType=ALLOW) Current 
ACLs for resource `ResourcePattern(resourceType=TOPIC, name=e2etopic, 
patternType=LITERAL)`: (principal=User:user, host=*, operation=DESCRIBE, 
permissionType=ALLOW) (principal=User:user, host=*, operation=WRITE, 
permissionType=ALLOW) (principal=User:user, host=*, operation=CREATE, 
permissionType=ALLOW) Adding ACLs for resource 
`ResourcePattern(resourceType=TOPIC, name=e2etopic, patternType=LITERAL)`: 
(principal=User:user, host=*, operation=DESCRIBE, permissionType=ALLOW) 
(principal=User:user, host=*, operation=READ, permissionType=ALLOW) Adding ACLs 
for resource `ResourcePattern(resourceType=GROUP, name=group, 
patte

Re: [VOTE] KIP-557: Add emit on change support for Kafka Streams

2020-03-04 Thread Guozhang Wang
Regarding the metric name, I was actually trying to be consistent with the
node-level `suppression-emit` as I feel this one's characteristics is
closer to that. I other folks feels better to align with the task-level
"dropped-records" I think I can be convinced too.


Guozhang

On Wed, Mar 4, 2020 at 12:09 AM Bruno Cadonna  wrote:

> Hi all,
>
> may I make a non-binding proposal for the metric name? I would prefer
> "skipped-idempotent-updates" to be consistent with the
> "dropped-records".
>
> Best,
> Bruno
>
> On Tue, Mar 3, 2020 at 11:57 PM Richard Yu 
> wrote:
> >
> > Hi all,
> >
> > Thanks for the discussion!
> >
> > @Guozhang, I will make the corresponding changes to the KIP (i.e.
> renaming
> > the sensor and adding some notes).
> > With the current state of things, we are very close. Just need that one
> > last binding vote.
> >
> > @Matthias J. Sax   It would be ideal if we can
> also
> > get your last two cents on this as well.
> > Other than that, we are good.
> >
> > Best,
> > Richard
> >
> >
> > On Tue, Mar 3, 2020 at 10:46 AM Guozhang Wang 
> wrote:
> >
> > > Hi Bruno, John:
> > >
> > > 1) That makes sense. If we consider them to be node-specific metrics
> that
> > > only applies to a subset of built-in processor nodes that are
> irrelevant to
> > > alert-relevant metrics (just like suppression-emit (rate | total)),
> they'd
> > > better be per-node instead of per-task and we would not associate such
> > > events with warning. With that in mind, I'd suggest we consider
> renaming
> > > the metric without the `dropped` keyword to distinguish it with the
> > > per-task level sensor. How about "idempotent-update-skip (rate |
> total)"?
> > >
> > > Also a minor suggestion: we should clarify in the KIP / javadocs which
> > > built-in processor nodes would have this metric while others don't.
> > >
> > > 2) About stream time tracking, there are multiple known issues that we
> > > should close to improve our consistency semantics:
> > >
> > >  a. preserve stream time of active tasks across rebalances where they
> may
> > > be migrated. This is what KAFKA-9368
> > >  meant for.
> > >  b. preserve stream time of standby tasks to be aligned with the active
> > > tasks, via the changelog topics.
> > >
> > > And what I'm more concerning is b) here. For example: let's say we
> have a
> > > topology of `source -> A -> repartition -> B` where both A and B have
> > > states along with changelogs, and both of them have standbys. If a
> record
> > > is piped from the source and completed traversed through the topology,
> we
> > > need to make sure that the stream time inferred across:
> > >
> > > * active task A (inferred from the source record),
> > > * active task B (inferred from the derived record from repartition
> topic),
> > > * standby task A (inferred from the changelog topic of A's store),
> > > * standby task B (inferred from the changelog topic of B's store)
> > >
> > > are consistent (note I'm not saying they should be "exactly the same",
> but
> > > consistent, meaning that they may have different values but as long as
> that
> > > does not impact the time-based queries, it is fine). The main
> motivation is
> > > that on IQ, where both active and standby tasks could be accessed, we
> can
> > > eventually improve our consistency guarantee to have 1)
> read-your-write, 2)
> > > consistency across stores, etc.
> > >
> > > I agree with John's assessment in the previous email, and just to
> clarify
> > > more concretely what I'm thinking.
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Tue, Mar 3, 2020 at 9:03 AM John Roesler 
> wrote:
> > >
> > > > Thanks, Guozhang and Bruno!
> > > >
> > > > 2)
> > > > I had a similar though to both of you about the metrics, but I
> ultimately
> > > > came out with a conclusion like Bruno's. These aren't dropped invalid
> > > > records, they're intentionally dropped, valid, but unnecessary,
> updates.
> > > > A "warning" for this case definitely seems wrong, and I'd also not
> > > > recommend
> > > > counting these events along with "dropped-records", because those are
> > > > all dropped invalid records, e.g., late or null-keyed or couldn't be
> > > > deserialized.
> > > >
> > > > Like Bruno pointed out, an operator should be concerned to see
> > > > non-zero "dropped-records", and would then consult the logs for
> warnings.
> > > > But that same person should be happy to see
> "dropped-idempotent-updates"
> > > > increasing, since it means they're saving time and money. Maybe the
> name
> > > > of the metric could be different, but I couldn't think of a better
> one.
> > > > OTOH,
> > > > maybe it just stands out to us because we recently discussed those
> other
> > > > metrics in KIP-444?
> > > >
> > > > 1)
> > > > Maybe we should discuss this point more. It seems like we should
> maintain
> > > > an invariant that the following three objects always have exactly the
> > > same
> > > > state (modulo flush boundarie

kafka dev subscription

2020-03-04 Thread Shubham Bhindwal
kafka dev com


subscribe kafka dev list

2020-03-04 Thread Shubham Bhindwal
subscription to dev


[jira] [Created] (KAFKA-9654) ReplicaAlterLogDirsThread can't be created again if the previous ReplicaAlterLogDirsThreadmeet encounters leader epoch error

2020-03-04 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-9654:
-

 Summary: ReplicaAlterLogDirsThread can't be created again if the 
previous ReplicaAlterLogDirsThreadmeet encounters leader epoch error
 Key: KAFKA-9654
 URL: https://issues.apache.org/jira/browse/KAFKA-9654
 Project: Kafka
  Issue Type: Bug
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai



ReplicaManager does create ReplicaAlterLogDirsThread only if an new future log 
is created. If the previous ReplicaAlterLogDirsThread encounters error when 
moving data, the target partition is moved to "failedPartitions" and 
ReplicaAlterLogDirsThread get idle due to empty partitions. The future log is 
still existent so we CAN'T either create another ReplicaAlterLogDirsThread to 
handle the parition or update the paritions of the idler 
ReplicaAlterLogDirsThread.

ReplicaManager should call ReplicaAlterLogDirsManager#addFetcherForPartitions 
even if there is already a future log since we can create an new 
ReplicaAlterLogDirsThread to handle the new partitions or update the partitions 
of existent ReplicaAlterLogDirsThread to make it busy again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9653) Duplicate tasks on workers after rebalance

2020-03-04 Thread Agam Brahma (Jira)
Agam Brahma created KAFKA-9653:
--

 Summary: Duplicate tasks on workers after rebalance
 Key: KAFKA-9653
 URL: https://issues.apache.org/jira/browse/KAFKA-9653
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.4.0, 2.3.2
Reporter: Agam Brahma


Verified the following
 * observed issue goes away when `connect.protocol` is switched from 
`compatible` to `eager`
 * Debug logs show `WorkerSourceTask` on two different nodes referencing the 
same task-id
 * Debug logs show the node referring to the task as as part of both 
`Configured assignments` and `Lost assignments`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9652) Throttle time metric needs to be updated for KIP-219

2020-03-04 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-9652:
--

 Summary: Throttle time metric needs to be updated for KIP-219
 Key: KAFKA-9652
 URL: https://issues.apache.org/jira/browse/KAFKA-9652
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


KIP-219 changed the throttling logic so that responses are returned 
immediately. The logic for updating the throttle time in `RequestChannel` 
appears to have not been updated to reflect this change and instead reflects 
the old behavior where the timing is based on the time between remote 
completion and response completion. This means the metric will pretty much 
always show negligible throttling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9651) Divide by zero in DefaultPartitioner

2020-03-04 Thread Tom Bentley (Jira)
Tom Bentley created KAFKA-9651:
--

 Summary: Divide by zero in DefaultPartitioner
 Key: KAFKA-9651
 URL: https://issues.apache.org/jira/browse/KAFKA-9651
 Project: Kafka
  Issue Type: Bug
Reporter: Tom Bentley
Assignee: Tom Bentley


The following exception was observed in a Kafka Streams application running on 
Kafka 2.3:

java.lang.ArithmeticException: / by zero
at 
org.apache.kafka.clients.producer.internals.DefaultPartitioner.partition(DefaultPartitioner.java:69)
at 
org.apache.kafka.streams.processor.internals.DefaultStreamPartitioner.partition(DefaultStreamPartitioner.java:39)
at 
org.apache.kafka.streams.processor.internals.StreamsMetadataState.getStreamsMetadataForKey(StreamsMetadataState.java:255)
at 
org.apache.kafka.streams.processor.internals.StreamsMetadataState.getMetadataWithKey(StreamsMetadataState.java:155)
at 
org.apache.kafka.streams.KafkaStreams.metadataForKey(KafkaStreams.java:1019)

The cause is that the {{Cluster}} returns an empty list from 
{{partitionsForTopic(topic)}} and the size is then used as a divisor.

The same pattern of using the size of the partitions as divisor is used in 
other implementations of {{Partitioner}} and also {{StickyPartitionCache}}, so 
presumably they're also prone to this problem when {{Cluster}} lacks 
information about a topic. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


US Department of Veterans Affairs Apache Kafka Product Inquiry

2020-03-04 Thread Laster, Caleb L. (Prosphere)
 Greetings,

I am reaching out to you from the Department of Veterans Affairs (VA) where I 
am part of the team that reviews various information-based products from an 
information security perspective for use within VA. I am reviewing information 
regarding Apache Kafka and have a few questions listed below; please respond to 
the best of your ability so that I may use your answers to reach a final 
determination.



  1.  Can you please provide associated User Manual/Installation Guide of your 
technology?
  2.  Are there any software components needed for this technology?
  3.  What kind of Licensing is needed, if any?  Is it Freeware?  Open Source?  
Is there a free trial period?
  4.  What Operating Systems are supported? Please list server/client.
  5.  Does the technology connect to other devices or other hardware systems 
(i.e servers)?  If  yes, could you please provide the server type and 
connection details (i.e. Microsoft SQL server, MySQL, IIS etc)?
  6.  To what extent does this technology use a FIPS 140-2 validated 
cryptographic module, and what is the certification number?
  7.  Are there any Runtime Dependencies? (software only)
  8.  Are there any Companion Technologies associated with this product?
  9.  What are the Version Numbers and Major Release Dates for this product?
  10. *VERY IMPORTANT*Does the technology store any data, and if so, how does 
it store data? Is data stored locally or in a database? What information/data 
is stored? Can you please explain the flow of data (i.e. how data is sent to 
storage and stored) and the database details (i.e. the type of database)? Does 
it support data encryption? What type of encryption?
  11. Is this technology available for on-premise deployment?
  12. Does this technology use a cloud? If yes, What Cloud Service Provider 
(CSP) agreements have been set for this product to be used securely through the 
cloud?
  13. Is there a Voluntary Product Accessibility Template (VPAT) program in 
place to assess Section 508 compliance?


Best regards,

Caleb Laster AWS CCP, Security+, ITIL (Contractor)
Security Analyst
Solution Delivery (SD)
IT Operations and Services (ITOPS), Office of Information and Technology (OIT)
Office: 202-382-9309  Monday-Friday



[jira] [Created] (KAFKA-9650) Include human readable quantities for default config docs

2020-03-04 Thread Tom Bentley (Jira)
Tom Bentley created KAFKA-9650:
--

 Summary: Include human readable quantities for default config docs
 Key: KAFKA-9650
 URL: https://issues.apache.org/jira/browse/KAFKA-9650
 Project: Kafka
  Issue Type: Improvement
  Components: docs
Reporter: Tom Bentley
Assignee: Tom Bentley


The Kafka config docs include default values for quantities in milliseconds and 
bytes, for example {{log.segment.bytes}} has default: {{1073741824}}. Many 
readers won't know that that's 1GiB, so will have to work it out. It would make 
the docs more readable if we included the quantity in the appropriate unit in 
parenthesis after the actual default value, like this:

default: 1073741824 (=1GiB)

Similarly for values in milliseconds. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is back to normal : kafka-trunk-jdk11 #1211

2020-03-04 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-9649) Remove/Warn on use of TimeWindowedSerde with no specified window size

2020-03-04 Thread Jira
Sören Henning created KAFKA-9649:


 Summary: Remove/Warn on use of TimeWindowedSerde with no specified 
window size
 Key: KAFKA-9649
 URL: https://issues.apache.org/jira/browse/KAFKA-9649
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Sören Henning


The API of the 
[{{TimeWindowedSerde}}|https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/WindowedSerdes.java]
 promotes its construction without specifying a window size:
{noformat}
public TimeWindowedSerde(final Serde inner)
{noformat}
While code using this constructor looks absolutely clean, it leads to fatal 
errors at runtime, which turned out to be very hard to discover.

The reason for these error can be found in the construction of the 
[{{TimeWindowedDeserializer}}|https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/kstream/TimeWindowedDeserializer.java],
 which is created via:
{noformat}
// TODO: fix this part as last bits of KAFKA-4468
public TimeWindowedDeserializer(final Deserializer inner) {
  this(inner, Long.MAX_VALUE);
}
{noformat}
The TODO comment suggests that this issue is (or at least was) already known.

We suggest to either remove the {{TimeWindowedSerde(final Serde inner)}} 
constructor or at least to warn when using it (if required for backwards 
compatiblity). The ideal solution of course would be to get the window size 
from some externally provided context. However, I expect this to be difficult 
to realize. Same applies also the {{TimeWindowedDeserializer(final 
Deserializer inner)}} constructor.

A further minor suggestion in this context: As now most Kafka Streams time 
declarations use {{Duration}}s instead of long-encoded millisseconds, I suggest 
to allow specifying window sizes with a {{Duration}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9648) kafka server should resize backlog when create serversocket

2020-03-04 Thread li xiangyuan (Jira)
li xiangyuan created KAFKA-9648:
---

 Summary: kafka server should resize backlog when create 
serversocket
 Key: KAFKA-9648
 URL: https://issues.apache.org/jira/browse/KAFKA-9648
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 0.10.0.1
Reporter: li xiangyuan


I have describe a mystery problem 
(https://issues.apache.org/jira/browse/KAFKA-9211). This issue I found kafka 
server will trigger tcp Congestion Control in some condition. finally we found 
the root cause.

when kafka server restart for any reason and then execute preferred replica 
leader, lots of replica leader will give back to it & trigger cluster metadata 
update. then all clients will establish connection to this server. at the 
monment many tcp estable request are waiting in the tcp sync queue , and then 
to accept queue. 

kafka create serversocket in SocketServer.scala 

 
{code:java}
serverChannel.socket.bind(socketAddress);{code}
this method has second parameter "backlog", min(backlog,tcp_max_syn_backlog) 
will decide the queue length.beacues kafka haven't set ,it is default value 50.

if this queue is full, and tcp_syncookies = 0, then new connection request will 
be rejected. If tcp_syncookies=1, it will trigger the tcp synccookie mechanism. 
this mechanism could allow linux handle more tcp sync request, but it would 
lose many tcp external parameter, include "wscale", the one that allow tcp 
connection to send much more bytes per tcp package. because syncookie triggerd, 
wscale has lost, and this tcp connection will handle network very slow, 
forever,until this connection is closed and establish another tcp connection.

so after a preferred repilca executed, lots of new tcp connection will 
establish without set wscale,and many network traffic to this server will have 
a very slow speed.

i'm not sure whether new linux version have resolved this problem, but kafka 
also should set backlog a larger value. we now have modify this to 512, seems 
everything is ok.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9647) Add ability to suppress until window end (not close)

2020-03-04 Thread Jira
Sören Henning created KAFKA-9647:


 Summary: Add ability to  suppress until window end (not close)
 Key: KAFKA-9647
 URL: https://issues.apache.org/jira/browse/KAFKA-9647
 Project: Kafka
  Issue Type: Wish
  Components: streams
Reporter: Sören Henning


*Preface:* This feature request originates from a [recently asked question on 
Stack 
Overflow|https://stackoverflow.com/questions/60005630/kafka-streams-suppress-until-window-end-not-close],
 for which Matthias J. Sax suggested to create a feature request.

*Feature Request:* In addition to suppressing updates to a windowed KTable 
until a window closes, we suggest to only suppress "early" results. By early 
results we mean results computed before the window ends, but not those results 
occurring during the grace period. Thus, this suppress option would suppress 
all aggregation results with timestamp < window end, but forward all records 
with timestamp >= window end and timestamp < window close.

*Use Case:* For an exemplary use case, we refer to John Roesler's [blog post on 
the initial introduction of the suppress 
operator|https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers/].
 The post argues that for the case of altering not every intermediate 
aggregation result should trigger an alert message, but only the "final" 
result. Otherwise, a "follow-up email telling people to ignore the first 
message" might become required if the final results would not cause an alert 
but intermediate results would. Kafka Streams' current solution for this use 
case would be to use a suppress operation, which would only forward the final 
result, which would be the last result before no further updates could occur. 
This is when the grace period of a window passed (the window closes).

However, ideally we would like to set the grace period a large as possible to 
allow for very late-arriving messages, which in turn would lead to very late 
alerts. On the other hand, such late-arriving messages are rare in practice and 
normally the order of events corresponds largely to the order of messages. 
Thus, a reasonable option would be to suppress aggregation results only until 
the window ends (i.e. stream time > window end) and then forward this "most 
likely final" result. For the use case of altering, this means an alert is 
triggered when we are relatively certain that recorded data requires an alert. 
Then, only the "seldom" case of late updates which would change our decision 
would require the "follow-up email telling people to ignore the first message". 
Such rare "correction" should be acceptable for many use cases.

*Further extension:* In addition to suppressing all updates until the window 
ends and afterwards forwarding all updates, a further extension would be to 
only forward late records every x seconds. Maybe the existing 
`Suppressed.untilTimeLimit( .. )` could be reused for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9646) kafka consumer cause high cpu usage

2020-03-04 Thread li xiangyuan (Jira)
li xiangyuan created KAFKA-9646:
---

 Summary: kafka consumer cause high cpu usage
 Key: KAFKA-9646
 URL: https://issues.apache.org/jira/browse/KAFKA-9646
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 2.3.0
 Environment: centos-7 3.10.0-957.21.3.el7.x86_64

Reporter: li xiangyuan
 Attachments: 0.10.0.1.svg, 2.4.0.svg, cpu_use

Recently we upgrade kafka server from 0.10.0.1 to 2.3.0 successfully, and 
because kafka support fetch records from closest broker since 2.4.0, we decide 
to upgrade our client from 0.10.0.1 to 2.4.0 directly.

After upgrade, we found some applications use much more cpu than before. The 
worst one up from 45% to 70%, therefore we have to rollback this application.

we profile this application in test environment(each one execute 6 minutes), 
and get 2 kafka-clients version cpu flame graph. I have update these file.

we found after upgrade to 2.4.0, select.selectNow cause highest cpu usage. this 
application subscribe 20 topics and each one has 6 consumer threads, and 19 
topics has low produce speed (less than 1 message per mintute). we set 
fetch.max.wait.ms to 5000, cpu usage reduce little but still high

 

then I write a test application, it subscribe 1 topic with 120 consumer 
threads. when use 2.4.0 client, cpu usage about to 40%. when use 0.10.0.1 ,cpu 
usage less than 10%.

then I try to use 2.4.0 and modify org.apache.kafka.common.network.select , old 
code below:
{code:java}
if (timeoutMs == 0L)
   return this.nioSelector.selectNow();
   else
return this.nioSelector.select(timeoutMs);{code}
change to
{code:java}
if (timeoutMs == 0) {
timeoutMs = 1;
}
return this.nioSelector.select(timeoutMs);
{code}
after this change cpu usage about to 20%. i have upload cpu usage pic.

i'm wondering why select.selectnow cause high cpu usage, maybe 2.4.0 client has 
to many useless select? or linux has some performance issue when multithread 
use selectnow concurrently?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Subject: [VOTE] 2.4.1 RC0

2020-03-04 Thread Eno Thereska
Hi Bill,

I built from source and ran unit and integration tests. They passed.
There was a large number of skipped tests, but I'm assuming that is
intentional.

Cheers
Eno

On Tue, Mar 3, 2020 at 8:42 PM Eric Lalonde  wrote:
>
> Hi,
>
> I ran:
> $  https://github.com/elalonde/kafka/blob/master/bin/verify-kafka-rc.sh 
>  2.4.1 
> https://home.apache.org/~bbejeck/kafka-2.4.1-rc0 
> 
>
> All checksums and signatures are good and all unit and integration tests that 
> were executed passed successfully.
>
> - Eric
>
> > On Mar 2, 2020, at 6:39 PM, Bill Bejeck  wrote:
> >
> > Hello Kafka users, developers and client-developers,
> >
> > This is the first candidate for release of Apache Kafka 2.4.1.
> >
> > This is a bug fix release and it includes fixes and improvements from 38
> > JIRAs, including a few critical bugs.
> >
> > Release notes for the 2.4.1 release:
> > https://home.apache.org/~bbejeck/kafka-2.4.1-rc0/RELEASE_NOTES.html
> >
> > *Please download, test and vote by Thursday, March 5, 9 am PT*
> >
> > Kafka's KEYS file containing PGP keys we use to sign the release:
> > https://kafka.apache.org/KEYS
> >
> > * Release artifacts to be voted upon (source and binary):
> > https://home.apache.org/~bbejeck/kafka-2.4.1-rc0/
> >
> > * Maven artifacts to be voted upon:
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> >
> > * Javadoc:
> > https://home.apache.org/~bbejeck/kafka-2.4.1-rc0/javadoc/
> >
> > * Tag to be voted upon (off 2.4 branch) is the 2.4.1 tag:
> > https://github.com/apache/kafka/releases/tag/2.4.1-rc0
> >
> > * Documentation:
> > https://kafka.apache.org/24/documentation.html
> >
> > * Protocol:
> > https://kafka.apache.org/24/protocol.html
> >
> > * Successful Jenkins builds for the 2.4 branch:
> > Unit/integration tests: Links to successful unit/integration test build to
> > follow
> > System tests:
> > https://jenkins.confluent.io/job/system-test-kafka/job/2.4/152/
> >
> >
> > Thanks,
> > Bill Bejeck
>


Jenkins build is back to normal : kafka-trunk-jdk8 #4292

2020-03-04 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-9632) Transient test failure: PartitionLockTest.testAppendReplicaFetchWithUpdateIsr

2020-03-04 Thread Rajini Sivaram (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-9632.
---
Fix Version/s: 2.6.0
 Reviewer: Manikumar
   Resolution: Fixed

> Transient test failure: PartitionLockTest.testAppendReplicaFetchWithUpdateIsr
> -
>
> Key: KAFKA-9632
> URL: https://issues.apache.org/jira/browse/KAFKA-9632
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.5.0
>Reporter: Rajini Sivaram
>Assignee: Rajini Sivaram
>Priority: Major
> Fix For: 2.6.0
>
>
> When running this test with _numRecordsPerProducer=500_, the test fails 
> intermittently. The test uses MockTime and runs concurrent log operations. 
> This can cause issues when attempting to roll a segment since Log and 
> MockScheduler don't work well together. MockScheduler currently runs tasks 
> while holding the MockScheduler lock. This can cause a deadlock if a thread 
> attempts to schedule a task while holding a lock which is also acquired 
> within a scheduled task.
> The issue in this test occurs when these two operations happen concurrently:
> 1) LogManager.cleanupLogs is a scheduled task that acquires Log lock. When 
> run with MockScheduler, the thread holds MockScheduler lock and then attempts 
> to acquire Log lock.
> 2) Partition.appendLogsToLeader holds Log lock and attempts to acquire 
> MockScheduler lock in order to schedule a roll().
> Since locking order is reversed in 1) and 2), this causes a deadlock.
> The test itself can be easily fixed by avoiding roll() in the test. But it 
> will be good to fix MockScheduler to enable it to be used in this case.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: kafka-trunk-jdk11 #1210

2020-03-04 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-9618: Directory deletion failure leading to error task RocksDB


--
[...truncated 2.90 MB...]
org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
PASSED

org.apache.kafka.streams.TestTopicsTest > testNonUsedOutputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonUsedOutputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testEmptyTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testEmptyTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testStartTimestamp STARTED

org.apache.kafka.streams.TestTopicsTest > testStartTimestamp PASSED

org.apache.kafka.streams.TestTopicsTest > testNegativeAdvance STARTED

org.apache.kafka.streams.TestTopicsTest > testNegativeAdvance PASSED

org.apache.kafka.streams.TestTopicsTest > shouldNotAllowToCreateWithNullDriver 
STARTED

org.apache.kafka.streams.TestTopicsTest > shouldNotAllowToCreateWithNullDriver 
PASSED

org.apache.kafka.streams.TestTopicsTest > testDuration STARTED

org.apache.kafka.streams.TestTopicsTest > testDuration PASSED

org.apache.kafka.streams.TestTopicsTest > testOutputToString STARTED

org.apache.kafka.streams.TestTopicsTest > testOutputToString PASSED

org.apache.kafka.streams.TestTopicsTest > testValue STARTED

org.apache.kafka.streams.TestTopicsTest > testValue PASSED

org.apache.kafka.streams.TestTopicsTest > testTimestampAutoAdvance STARTED

org.apache.kafka.streams.TestTopicsTest > testTimestampAutoAdvance PASSED

org.apache.kafka.streams.TestTopicsTest > testOutputWrongSerde STARTED

org.apache.kafka.streams.TestTopicsTest > testOutputWrongSerde PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputTopicWithNullTopicName STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputTopicWithNullTopicName PASSED

org.apache.kafka.streams.TestTopicsTest > testWrongSerde STARTED

org.apache.kafka.streams.TestTopicsTest > testWrongSerde PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMapWithNull STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMapWithNull PASSED

org.apache.kafka.streams.TestTopicsTest > testNonExistingOutputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonExistingOutputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testMultipleTopics STARTED

org.apache.kafka.streams.TestTopicsTest > testMultipleTopics PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValueList STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValueList PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputWithNullDriver STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputWithNullDriver PASSED

org.apache.kafka.streams.TestTopicsTest > testValueList STARTED

org.apache.kafka.streams.TestTopicsTest > testValueList PASSED

org.apache.kafka.streams.TestTopicsTest > testRecordList STARTED

org.apache.kafka.streams.TestTopicsTest > testRecordList PASSED

org.apache.kafka.streams.TestTopicsTest > testNonExistingInputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonExistingInputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMap STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMap PASSED

org.apache.kafka.streams.TestTopicsTest > testRecordsToList STARTED

org.apache.kafka.streams.TestTopicsTest > testRecordsToList PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValueListDuration STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValueListDuration PASSED

org.apache.kafka.streams.TestTopicsTest > testInputToString STARTED

org.apache.kafka.streams.TestTopicsTest > testInputToString PASSED

org.apache.kafka.streams.TestTopicsTest > testTimestamp STARTED

org.apache.kafka.streams.TestTopicsTest > testTimestamp PASSED

org.apache.kafka.streams.TestTopicsTest > testWithHeaders STARTED

org.apache.kafka.streams.TestTopicsTest > testWithHeaders PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValue STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValue PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateTopicWithNullTopicName STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateTopicWithNullTopicName PASSED

> Task :streams:upgrade-system-tests-0100:compileJava NO-SOURCE
> Task :streams:upgrade-system-tests-0100:processResources NO-SOURCE
> Task :streams:upgrade-system-tests-0100:classes UP-TO-DATE
> Task :streams:upgrade-system-tests-0100:checkstyleMain NO-SOURCE
> Task :streams:upgrade-system-tests-0100:compileTestJava
> Task :streams:upgrade-system-tests-0100:processTestResources NO-SOURCE
> Task :strea

Re: [VOTE] KIP-557: Add emit on change support for Kafka Streams

2020-03-04 Thread Bruno Cadonna
Hi all,

may I make a non-binding proposal for the metric name? I would prefer
"skipped-idempotent-updates" to be consistent with the
"dropped-records".

Best,
Bruno

On Tue, Mar 3, 2020 at 11:57 PM Richard Yu  wrote:
>
> Hi all,
>
> Thanks for the discussion!
>
> @Guozhang, I will make the corresponding changes to the KIP (i.e. renaming
> the sensor and adding some notes).
> With the current state of things, we are very close. Just need that one
> last binding vote.
>
> @Matthias J. Sax   It would be ideal if we can also
> get your last two cents on this as well.
> Other than that, we are good.
>
> Best,
> Richard
>
>
> On Tue, Mar 3, 2020 at 10:46 AM Guozhang Wang  wrote:
>
> > Hi Bruno, John:
> >
> > 1) That makes sense. If we consider them to be node-specific metrics that
> > only applies to a subset of built-in processor nodes that are irrelevant to
> > alert-relevant metrics (just like suppression-emit (rate | total)), they'd
> > better be per-node instead of per-task and we would not associate such
> > events with warning. With that in mind, I'd suggest we consider renaming
> > the metric without the `dropped` keyword to distinguish it with the
> > per-task level sensor. How about "idempotent-update-skip (rate | total)"?
> >
> > Also a minor suggestion: we should clarify in the KIP / javadocs which
> > built-in processor nodes would have this metric while others don't.
> >
> > 2) About stream time tracking, there are multiple known issues that we
> > should close to improve our consistency semantics:
> >
> >  a. preserve stream time of active tasks across rebalances where they may
> > be migrated. This is what KAFKA-9368
> >  meant for.
> >  b. preserve stream time of standby tasks to be aligned with the active
> > tasks, via the changelog topics.
> >
> > And what I'm more concerning is b) here. For example: let's say we have a
> > topology of `source -> A -> repartition -> B` where both A and B have
> > states along with changelogs, and both of them have standbys. If a record
> > is piped from the source and completed traversed through the topology, we
> > need to make sure that the stream time inferred across:
> >
> > * active task A (inferred from the source record),
> > * active task B (inferred from the derived record from repartition topic),
> > * standby task A (inferred from the changelog topic of A's store),
> > * standby task B (inferred from the changelog topic of B's store)
> >
> > are consistent (note I'm not saying they should be "exactly the same", but
> > consistent, meaning that they may have different values but as long as that
> > does not impact the time-based queries, it is fine). The main motivation is
> > that on IQ, where both active and standby tasks could be accessed, we can
> > eventually improve our consistency guarantee to have 1) read-your-write, 2)
> > consistency across stores, etc.
> >
> > I agree with John's assessment in the previous email, and just to clarify
> > more concretely what I'm thinking.
> >
> >
> > Guozhang
> >
> >
> > On Tue, Mar 3, 2020 at 9:03 AM John Roesler  wrote:
> >
> > > Thanks, Guozhang and Bruno!
> > >
> > > 2)
> > > I had a similar though to both of you about the metrics, but I ultimately
> > > came out with a conclusion like Bruno's. These aren't dropped invalid
> > > records, they're intentionally dropped, valid, but unnecessary, updates.
> > > A "warning" for this case definitely seems wrong, and I'd also not
> > > recommend
> > > counting these events along with "dropped-records", because those are
> > > all dropped invalid records, e.g., late or null-keyed or couldn't be
> > > deserialized.
> > >
> > > Like Bruno pointed out, an operator should be concerned to see
> > > non-zero "dropped-records", and would then consult the logs for warnings.
> > > But that same person should be happy to see "dropped-idempotent-updates"
> > > increasing, since it means they're saving time and money. Maybe the name
> > > of the metric could be different, but I couldn't think of a better one.
> > > OTOH,
> > > maybe it just stands out to us because we recently discussed those other
> > > metrics in KIP-444?
> > >
> > > 1)
> > > Maybe we should discuss this point more. It seems like we should maintain
> > > an invariant that the following three objects always have exactly the
> > same
> > > state (modulo flush boundaries):
> > > 1. The internal state store
> > > 2. The changelog
> > > 3. The operation's result view
> > >
> > > That is, if I have a materialized Filter, then it seems like I _must_
> > store
> > > exactly the same record in the store and the changelog, and also forward
> > > the exact same record, including the timestamp, to the downstream
> > > operations.
> > >
> > > If we store something different in the internal state store than the
> > > changelog, we can get a situation where the state is actually different
> > > after
> > > restoration than it is during processing, and queries against s

Re: [VOTE] KIP-570: Add leader epoch in StopReplicaRequest

2020-03-04 Thread David Jacot
Hi Jun,

You're right. I have noticed it while implementing it. I plan to use a
default
value as a sentinel in the protocol (e.g. -2) to cover this case.

David

On Wed, Mar 4, 2020 at 3:18 AM Jun Rao  wrote:

> Hi, David,
>
> Thanks for the KIP. +1 from me too. Just one comment below.
>
> 1. Regarding the sentinel leader epoch to indicate topic deletion, it seems
> that we need to use a different sentinel value to indicate that the leader
> epoch is not present when the controller is still on the old version during
> upgrade.
>
> Jun
>
> On Mon, Mar 2, 2020 at 11:20 AM Gwen Shapira  wrote:
>
> > +1
> >
> > On Mon, Feb 24, 2020, 2:16 AM David Jacot  wrote:
> >
> > > Hi all,
> > >
> > > I would like to start a vote on KIP-570: Add leader epoch in
> > > StopReplicaRequest
> > >
> > > The KIP is here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-570%3A+Add+leader+epoch+in+StopReplicaRequest
> > >
> > > Thanks,
> > > David
> > >
> >
>