[jira] [Created] (KAFKA-8164) Improve test passing rate by rerunning flaky tests

2019-03-27 Thread Viktor Somogyi-Vass (JIRA)
Viktor Somogyi-Vass created KAFKA-8164:
--

 Summary: Improve test passing rate by rerunning flaky tests
 Key: KAFKA-8164
 URL: https://issues.apache.org/jira/browse/KAFKA-8164
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


Failing flaky tests are often a problem:
* pull requests need to be rerun to achieve a green build
* rerunning the whole build for a PR is resource and time consuming
* green build give high confidence when releasing (and well, generally too) but 
flaky tests often erode this and annoying for the developers

The aim of this JIRA is to provide an improvement which automatically retries 
any tests that are failed for the first run.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7981) Add Replica Fetcher and Log Cleaner Count Metrics

2019-02-22 Thread Viktor Somogyi-Vass (JIRA)
Viktor Somogyi-Vass created KAFKA-7981:
--

 Summary: Add Replica Fetcher and Log Cleaner Count Metrics
 Key: KAFKA-7981
 URL: https://issues.apache.org/jira/browse/KAFKA-7981
 Project: Kafka
  Issue Type: Improvement
  Components: metrics
Affects Versions: 2.3.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8566) Force Topic Deletion

2019-06-19 Thread Viktor Somogyi-Vass (JIRA)
Viktor Somogyi-Vass created KAFKA-8566:
--

 Summary: Force Topic Deletion
 Key: KAFKA-8566
 URL: https://issues.apache.org/jira/browse/KAFKA-8566
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


Forcing topic deletion is sometimes useful if normal topic deletion gets 
stucked. In this case we want to remove the topic data from zookeeper anyway 
and also the segment files from the online brokers. On offline brokers it could 
be done on startup somehow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8330) Separate Replica and Reassignment Throttling

2019-05-07 Thread Viktor Somogyi-Vass (JIRA)
Viktor Somogyi-Vass created KAFKA-8330:
--

 Summary: Separate Replica and Reassignment Throttling
 Key: KAFKA-8330
 URL: https://issues.apache.org/jira/browse/KAFKA-8330
 Project: Kafka
  Issue Type: Improvement
  Components: core, replication
Affects Versions: 2.3.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


Currently Kafka doesn't separate reassignment related replication from ISR 
replication. That is dangerous because if replication is throttled during 
reassignment and some replicas fall out of ISR they could get further behind 
while reassignment is also slower. We need a method that throttles reassignment 
related replication only.

A KIP is in progress for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-9167) Implement a broker to controller request channel

2019-11-09 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9167:
--

 Summary: Implement a broker to controller request channel
 Key: KAFKA-9167
 URL: https://issues.apache.org/jira/browse/KAFKA-9167
 Project: Kafka
  Issue Type: Sub-task
  Components: controller, core
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


In some cases, we will need to create a new API to replace an operation that 
was formerly done via ZooKeeper.  One example of this is that when the leader 
of a partition wants to modify the in-sync replica set, it currently modifies 
ZooKeeper directly  In the post-ZK world, the leader will make an RPC to the 
active controller instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9166) Implement MetadataFetch API

2019-11-09 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9166:
--

 Summary: Implement MetadataFetch API
 Key: KAFKA-9166
 URL: https://issues.apache.org/jira/browse/KAFKA-9166
 Project: Kafka
  Issue Type: Sub-task
  Components: controller
Reporter: Viktor Somogyi-Vass


Instead of the controller pushing out updates to the other brokers, those 
brokers will fetch updates from the active controller via the new MetadataFetch 
API.

A MetadataFetch is similar to a fetch request.  Just like with a fetch request, 
the broker will track the offset of the last updates it fetched, and only 
request newer updates from the active controller. 

The broker will persist the metadata it fetched to disk.  This will allow the 
broker to start up very quickly, even if there are hundreds of thousands or 
even millions of partitions.  (Note that since this persistence is an 
optimization, we can leave it out of the first version, if it makes development 
easier.)

Most of the time, the broker should only need to fetch the deltas, not the full 
state.  However, if the broker is too far behind the active controller, or if 
the broker has no cached metadata at all, the controller will send a full 
metadata image rather than a series of deltas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9154) ProducerId generation should be managed by the Controller

2019-11-06 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9154:
--

 Summary: ProducerId generation should be managed by the Controller
 Key: KAFKA-9154
 URL: https://issues.apache.org/jira/browse/KAFKA-9154
 Project: Kafka
  Issue Type: Sub-task
  Components: core
Reporter: Viktor Somogyi-Vass


Currently producerIds are maintained in Zookeeper but in the future we'd like 
them to be managed by the controller quorum in an internal topic.
The reason for storing this in Zookeeper was that this must be unique across 
the cluster. In this task it should be refactored such that the 
TransactionManager turns to the Controller for a ProducerId which connects to 
Zookeeper to acquire this ID. Since ZK is the single source of truth and the 
PID won't be cached anywhere it should be safe (just one extra hop added).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9125) GroupMetadataManager and TransactionStateManager should use metadata cache instead of zkClient

2019-10-31 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9125:
--

 Summary: GroupMetadataManager and TransactionStateManager should 
use metadata cache instead of zkClient
 Key: KAFKA-9125
 URL: https://issues.apache.org/jira/browse/KAFKA-9125
 Project: Kafka
  Issue Type: Sub-task
Reporter: Viktor Somogyi-Vass


Both classes query their respective topic's partition count via the zkClient. 
This however could be queried by the broker's local metadata cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-9124) Topic creation should be done by the controller

2019-10-31 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass reopened KAFKA-9124:


> Topic creation should be done by the controller
> ---
>
> Key: KAFKA-9124
> URL: https://issues.apache.org/jira/browse/KAFKA-9124
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Viktor Somogyi-Vass
>Priority: Major
>
> Currently {{KafkaApis}} invokes the {{adminManager}} (which esentially wraps 
> the zkClient) to create a topic. Instead of this we should create a protocol 
> that sends a request to the broker and listens for confirmation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9124) Topic creation should be done by the controller

2019-10-31 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-9124.

Resolution: Invalid

Opened by mistake.

> Topic creation should be done by the controller
> ---
>
> Key: KAFKA-9124
> URL: https://issues.apache.org/jira/browse/KAFKA-9124
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Viktor Somogyi-Vass
>Priority: Major
>
> Currently {{KafkaApis}} invokes the {{adminManager}} (which esentially wraps 
> the zkClient) to create a topic. Instead of this we should create a protocol 
> that sends a request to the broker and listens for confirmation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9116) Improve navigation on the kafka site

2019-10-30 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9116:
--

 Summary: Improve navigation on the kafka site
 Key: KAFKA-9116
 URL: https://issues.apache.org/jira/browse/KAFKA-9116
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Reporter: Viktor Somogyi-Vass


It is often very hard to navigate the Kafka documentation, especially configs 
because it is a one-pager. It would be useful to somehow shrink its size and 
make it more readable, for instance by using tabs for the broker, producer, 
consumer, topic etc. configs (like in 
https://getbootstrap.com/docs/3.3/components/#nav) but any other idea that 
makes it better would be fine.
It is important that it should stay searchable as the number of configs are 
huge.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9118) LogDirFailureHandler shouldn't use Zookeeper

2019-10-30 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9118:
--

 Summary: LogDirFailureHandler shouldn't use Zookeeper
 Key: KAFKA-9118
 URL: https://issues.apache.org/jira/browse/KAFKA-9118
 Project: Kafka
  Issue Type: Improvement
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9124) Topic creation should be done by the controller

2019-10-31 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9124:
--

 Summary: Topic creation should be done by the controller
 Key: KAFKA-9124
 URL: https://issues.apache.org/jira/browse/KAFKA-9124
 Project: Kafka
  Issue Type: Sub-task
Reporter: Viktor Somogyi-Vass


Currently {{KafkaApis}} invokes the {{adminManager}} (which esentially wraps 
the zkClient) to create a topic. Instead of this we should create a protocol 
that sends a request to the broker and listens for confirmation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8081) Flaky Test TopicCommandWithAdminClientTest#testDescribeUnderMinIsrPartitions

2019-10-21 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-8081.

  Assignee: Viktor Somogyi-Vass
Resolution: Fixed

> Flaky Test TopicCommandWithAdminClientTest#testDescribeUnderMinIsrPartitions
> 
>
> Key: KAFKA-8081
> URL: https://issues.apache.org/jira/browse/KAFKA-8081
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, unit tests
>Affects Versions: 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Viktor Somogyi-Vass
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.5.0
>
>
> [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3450/tests]
> {quote}java.lang.AssertionError
> at org.junit.Assert.fail(Assert.java:87)
> at org.junit.Assert.assertTrue(Assert.java:42)
> at org.junit.Assert.assertTrue(Assert.java:53)
> at 
> kafka.admin.TopicCommandWithAdminClientTest.testDescribeUnderMinIsrPartitions(TopicCommandWithAdminClientTest.scala:560){quote}
> STDOUT
> {quote}[2019-03-09 00:13:23,129] ERROR [ReplicaFetcher replicaId=2, 
> leaderId=0, fetcherId=0] Error for partition 
> testAlterPartitionCount-yrX0KHVdgf-1 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-09 00:13:23,130] ERROR [ReplicaFetcher replicaId=3, leaderId=5, 
> fetcherId=0] Error for partition testAlterPartitionCount-yrX0KHVdgf-0 at 
> offset 0 (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-09 00:13:33,184] ERROR [ReplicaFetcher replicaId=4, leaderId=1, 
> fetcherId=0] Error for partition testAlterPartitionCount-yrX0KHVdgf-2 at 
> offset 0 (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-09 00:13:36,319] WARN Unable to read additional data from client 
> sessionid 0x10433c934dc0006, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376)
> [2019-03-09 00:13:46,311] WARN Unable to read additional data from client 
> sessionid 0x10433c98cb10003, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376)
> [2019-03-09 00:13:46,356] WARN Unable to read additional data from client 
> sessionid 0x10433c98cb10004, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376)
> [2019-03-09 00:14:06,066] WARN Unable to read additional data from client 
> sessionid 0x10433c9b17d0005, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376)
> [2019-03-09 00:14:06,457] WARN Unable to read additional data from client 
> sessionid 0x10433c9b17d0001, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376)
> [2019-03-09 00:14:11,206] ERROR [ReplicaFetcher replicaId=2, leaderId=3, 
> fetcherId=0] Error for partition kafka.testTopic1-1 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-09 00:14:11,218] ERROR [ReplicaFetcher replicaId=4, leaderId=1, 
> fetcherId=0] Error for partition __consumer_offsets-1 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
> does not host this topic-partition.
> [2019-03-09 00:14:22,096] WARN Unable to read additional data from client 
> sessionid 0x10433c9f1210004, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376)
> [2019-03-09 00:14:28,290] WARN Unable to read additional data from client 
> sessionid 0x10433ca28de0005, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376)
> [2019-03-09 00:14:28,733] WARN Unable to read additional data from client 
> sessionid 0x10433ca28de0006, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376)
> [2019-03-09 00:14:29,529] WARN Unable to read additional data from client 
> sessionid 0x10433ca28de, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376)
> [2019-03-09 00:14:31,841] WARN Unable to read additional data from client 
> sessionid 0x10433ca39ed0002, likely client has closed socket 
> (org.apache.zookeeper.server.NIOServerCnxn:376)
> [2019-03-09 00:14:40,221] ERROR [ReplicaFetcher replicaId=0, leaderId=1, 
> fetcherId=0] Error for partition 
> testCreateAlterTopicWithRackAware-IQe98agDrW-16 at offset 0 
> (kafka.server.ReplicaFetcherThread:76)
> 

[jira] [Created] (KAFKA-9049) TopicCommandWithAdminClientTest should use mocks

2019-10-16 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9049:
--

 Summary: TopicCommandWithAdminClientTest should use mocks
 Key: KAFKA-9049
 URL: https://issues.apache.org/jira/browse/KAFKA-9049
 Project: Kafka
  Issue Type: Improvement
  Components: unit tests
Affects Versions: 2.5.0
Reporter: Viktor Somogyi-Vass


The {{TopicCommandWithAdminClientTest}} class currently sets up a few brokers 
for every test case which is wasteful and slow. We should improve it by mocking 
out the broker behavior (maybe use {{MockAdminClient}}?). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9059) Implement MaxReassignmentLag

2019-10-17 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9059:
--

 Summary: Implement MaxReassignmentLag
 Key: KAFKA-9059
 URL: https://issues.apache.org/jira/browse/KAFKA-9059
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.5.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


Some more thinking is required to the implementation of MaxReassignmentLag 
proposed by 
[KIP-352|https://cwiki.apache.org/confluence/display/KAFKA/KIP-352:+Distinguish+URPs+caused+by+reassignment]
 as it's not so straightforward (need to maintain partitions' lag), therefore 
we separated it off into this JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9015) Unify metric names

2019-10-10 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9015:
--

 Summary: Unify metric names
 Key: KAFKA-9015
 URL: https://issues.apache.org/jira/browse/KAFKA-9015
 Project: Kafka
  Issue Type: Improvement
  Components: metrics
Affects Versions: 2.5.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


Some of the metrics use a lower-case style metrics name like "i-am-a-metric" 
while the majority uses UpperCamelCase (IAmAnotherMetric). We need consistency 
across the project and since the majority of the metrics uses the camel case 
notation, we need to change the others.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9016) Warn when log dir stopped serving replicas

2019-10-10 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9016:
--

 Summary: Warn when log dir stopped serving replicas
 Key: KAFKA-9016
 URL: https://issues.apache.org/jira/browse/KAFKA-9016
 Project: Kafka
  Issue Type: Improvement
  Components: log, log cleaner
Reporter: Viktor Somogyi-Vass


Kafka should warn if the log directory stops serving replicas as usually it is 
due to an error and it's much visible if it's on warn level.

{noformat}
2019-09-19 12:39:54,194 ERROR kafka.server.LogDirFailureChannel: Error while 
writing to checkpoint file /kafka/data/diskX/replication-offset-checkpoint
java.io.SyncFailedException: sync failed
..
2019-09-19 12:39:54,197 INFO kafka.server.ReplicaManager: [ReplicaManager 
broker=638] Stopping serving replicas in dir /kafka/data/diskX
..
2019-09-19 12:39:54,205 INFO kafka.server.ReplicaFetcherManager: 
[ReplicaFetcherManager on broker 638] Removed fetcher for partitions 
Set(test1-0, test2-2, test-0, test2-2, test4-14, test4-0, test2-6)
2019-09-19 12:39:54,206 INFO kafka.server.ReplicaAlterLogDirsManager: 
[ReplicaAlterLogDirsManager on broker 638] Removed fetcher for partitions 
Set(test1-0, test2-2, test-0, test3-2, test4-14, test4-0, test2-6)
2019-09-19 12:39:54,222 INFO kafka.server.ReplicaManager: [ReplicaManager 
broker=638] Broker 638 stopped fetcher for partitions 
test1-0,test2-2,test-0,test3-2,test4-14,test4-0,test2-6 and stopped moving logs 
for partitions  because they are in the failed log directory /kafka/data/diskX.

{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9814) Allow AdminClient.listTopics to list based on expression

2020-04-03 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-9814:
--

 Summary: Allow AdminClient.listTopics to list based on expression
 Key: KAFKA-9814
 URL: https://issues.apache.org/jira/browse/KAFKA-9814
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.5.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


Currently listTopics would list all the topics in the cluster which often isn't 
desired due to the large number of topics and because the authorizer would 
check access for every topic which could result in various bloated access/audit 
logs that would make them hard to read and also in case of many topics it isn't 
really effective to return all of them due to the response's size.

To take an example let's have a user who issues a naive command such as:
{noformat}
kafka-topics --list --topic topic6 --bootstrap-server mycompany.com:9092 
--command-config client.properties
{noformat}
In this case the admin client would issue a metadata request for all the 
topics. When this request gets to the broker then it will try to check 
permissions to every topic in the cluster and then return those which are 
allowed for "describe" to the client. Then the command when it gets the answer 
will simply filter out {{topic6}} from the response.

Looking at this problem we can see that if we allow the broker to do the 
filtering then we can:
* do less permission checks
* return less data to the clients

Regarding the solution I see two path but both of them of course subject 
community discussion:
# modify the MetadataRequest and put some kind of regex in it 
# create a new list API that would contain this filter pattern
Perhaps the list API would make more sense as the MetadataRequest is currently 
used because it just fits but there is no real reason to extend it with this as 
the functionality won't be used elsewhere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-7805) Use --bootstrap-server option for kafka-topics.sh in ducktape tests where applicable

2020-04-30 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-7805.

Resolution: Fixed

> Use --bootstrap-server option for kafka-topics.sh in ducktape tests where 
> applicable
> 
>
> Key: KAFKA-7805
> URL: https://issues.apache.org/jira/browse/KAFKA-7805
> Project: Kafka
>  Issue Type: Improvement
>  Components: system tests
>Reporter: Viktor Somogyi-Vass
>Priority: Major
>
> KIP-377 introduces the {{--bootstrap-server}} option and deprecates the 
> {{--zookeeper}} option in {{kafka-topics.sh}}. I'd be nice to use the new 
> option in the ducktape tests to gain higher test coverage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10650) Use Murmur3 hashing instead of MD5 in SkimpyOffsetMap

2020-10-27 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-10650:
---

 Summary: Use Murmur3 hashing instead of MD5 in SkimpyOffsetMap
 Key: KAFKA-10650
 URL: https://issues.apache.org/jira/browse/KAFKA-10650
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


The usage of MD5 has been uncovered during testing Kafka for FIPS (Federal 
Information Processing Standards) verification.

While MD5 isn't a FIPS incompatibility here as it isn't used for cryptographic 
purposes, I spent some time with this as it isn't ideal either. MD5 is a 
relatively fast crypto hashing algo but there are much better performing 
algorithms for hash tables as it's used in SkimpyOffsetMap.

By applying Murmur3 (that is implemented in Streams) I could achieve a 3x 
faster {{put}} operation and the overall segment cleaning sped up by 30% while 
preserving the same collision rate (both performed within 0.0015 - 0.007, 
mostly with 0.004 median).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12922) MirrorCheckpointTask should close topic filter

2021-06-09 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-12922:
---

 Summary: MirrorCheckpointTask should close topic filter
 Key: KAFKA-12922
 URL: https://issues.apache.org/jira/browse/KAFKA-12922
 Project: Kafka
  Issue Type: Improvement
  Components: mirrormaker
Affects Versions: 2.8.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


When a lot of connectors are restarted it turned out that underlying 
ConfigConsumers are not closed property and from the logs we can see that the 
old ones are still running.

Turns out that MirrorCheckpointTask utilizes a TopicFilter, but never closes 
it, leaking resources.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13240) HTTP TRACE should be disabled in Connect

2021-08-27 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-13240:
---

 Summary: HTTP TRACE should be disabled in Connect
 Key: KAFKA-13240
 URL: https://issues.apache.org/jira/browse/KAFKA-13240
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


Modern browsers mostly disable HTTP TRACE to prevent XST (cross-site tracking) 
attacks.  Because of this usually this type of attack isn't too prevalent these 
days but since it isn't disabled in Connect it may open up possible ways of 
attacks (and constantly pops up in security scans :) ). Therefore we'd like to 
disable it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13442) REST API endpoint for fetching a connector's config definition

2021-11-10 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-13442:
---

 Summary: REST API endpoint for fetching a connector's config 
definition
 Key: KAFKA-13442
 URL: https://issues.apache.org/jira/browse/KAFKA-13442
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 3.2.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


To enhance UI based applications' capability in helping users to create new 
connectors from default configurations, it would be very good to have an API 
which can fetch a connector type's configuration definition which will be 
filled out by users and sent back for validation and then creating a new 
connector out of it.

The API should be placed under {{connector-plugins}} and since 
{{connector-plugins/\{connectorType\}/config/validate}} already exists, 
{{connector-plugins/\{connectorType\}/config}} might be a good option for the 
new API.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13917) Avoid calling lookupCoordinator() in tight loop

2022-05-19 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-13917:
---

 Summary: Avoid calling lookupCoordinator() in tight loop
 Key: KAFKA-13917
 URL: https://issues.apache.org/jira/browse/KAFKA-13917
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Affects Versions: 3.1.1, 3.1.0, 3.1.2
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


Currently the heartbeat thread's lookupCoordinator() is called in a tight loop 
if brokers crash and the consumer is left running. Besides that it floods the 
logs on debug level, it increases CPU usage as well.

The fix is easy, just need to put a backoff call after coordinator lookup.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-13949) Connect /connectors endpoint should support querying the active topics and the task configs

2022-05-31 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-13949:
---

 Summary: Connect /connectors endpoint should support querying the 
active topics and the task configs
 Key: KAFKA-13949
 URL: https://issues.apache.org/jira/browse/KAFKA-13949
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 3.2.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


The /connectors endpoint supports the "expand" query parameter, which acts as a 
set of queried categories, currently supporting info (config) and status 
(monitoring status).

The endpoint should also support adding the active topics of a connector, and 
adding the separate task configs, too.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Reopened] (KAFKA-6945) Add support to allow users to acquire delegation tokens for other users

2022-06-22 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass reopened KAFKA-6945:


> Add support to allow users to acquire delegation tokens for other users
> ---
>
> Key: KAFKA-6945
> URL: https://issues.apache.org/jira/browse/KAFKA-6945
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 3.3.0
>Reporter: Manikumar
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.3.0
>
>
> Currently, we only allow a user to create delegation token for that user 
> only. 
> We should allow users to acquire delegation tokens for other users.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13452) MM2 creates invalid checkpoint when offset mapping is not available

2022-04-28 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-13452.
-
Resolution: Duplicate

> MM2 creates invalid checkpoint when offset mapping is not available
> ---
>
> Key: KAFKA-13452
> URL: https://issues.apache.org/jira/browse/KAFKA-13452
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Daniel Urban
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>
> MM2 checkpointing reads the offset-syncs topic to create offset mappings for 
> committed consumer group offsets. In some corner cases, it is possible that a 
> mapping is not available in offset-syncs - in that case, MM2 simply copies 
> the source offset, which might not be a valid offset in the replica topic at 
> all.
> One possible situation is if there is an empty topic in the source cluster 
> with a non-zero endoffset (e.g. retention already removed the records), and a 
> consumer group which has a committed offset set to the end offset. If 
> replication is configured to start replicating this topic, it will not have 
> an offset mapping available in offset-syncs (as the topic is empty), causing 
> MM2 to copy the source offset.
> This can cause issues when auto offset sync is enabled, as the consumer group 
> offset can be potentially set to a high number. MM2 never rewinds these 
> offsets, so even when there is a correct offset mapping available, the offset 
> will not be updated correctly.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-6084) ReassignPartitionsCommand should propagate JSON parsing failures

2022-04-28 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-6084.

Fix Version/s: 2.8.0
   Resolution: Fixed

> ReassignPartitionsCommand should propagate JSON parsing failures
> 
>
> Key: KAFKA-6084
> URL: https://issues.apache.org/jira/browse/KAFKA-6084
> Project: Kafka
>  Issue Type: Improvement
>  Components: admin
>Affects Versions: 0.11.0.0
>Reporter: Viktor Somogyi-Vass
>Assignee: Viktor Somogyi-Vass
>Priority: Minor
>  Labels: easyfix, newbie
> Fix For: 2.8.0
>
> Attachments: Screen Shot 2017-10-18 at 23.31.22.png
>
>
> Basically looking at Json.scala it will always swallow any parsing errors:
> {code}
>   def parseFull(input: String): Option[JsonValue] =
> try Option(mapper.readTree(input)).map(JsonValue(_))
> catch { case _: JsonProcessingException => None }
> {code}
> However sometimes it is easy to figure out the problem by simply looking at 
> the JSON, in some cases it is not very trivial, such as some invisible 
> characters (like byte order mark) won't be displayed by most of the text 
> editors and can people spend time on figuring out what's the problem.
> As Jackson provides a really detailed exception about what failed and how, it 
> is easy to propagate the failure to the user.
> As an example I attached a BOM prefixed JSON which fails with the following 
> error which is very counterintuitive:
> {noformat}
> [root@localhost ~]# kafka-reassign-partitions --zookeeper localhost:2181 
> --reassignment-json-file /root/increase-replication-factor.json --execute
> Partitions reassignment failed due to Partition reassignment data file 
> /root/increase-replication-factor.json is empty
> kafka.common.AdminCommandFailedException: Partition reassignment data file 
> /root/increase-replication-factor.json is empty
> at 
> kafka.admin.ReassignPartitionsCommand$.executeAssignment(ReassignPartitionsCommand.scala:120)
> at 
> kafka.admin.ReassignPartitionsCommand$.main(ReassignPartitionsCommand.scala:52)
> at kafka.admin.ReassignPartitionsCommand.main(ReassignPartitionsCommand.scala)
> ...
> {noformat}
> In case of the above error it would be much better to see what fails exactly:
> {noformat}
> kafka.common.AdminCommandFailedException: Admin command failed
>   at 
> kafka.admin.ReassignPartitionsCommand$.parsePartitionReassignmentData(ReassignPartitionsCommand.scala:267)
>   at 
> kafka.admin.ReassignPartitionsCommand$.parseAndValidate(ReassignPartitionsCommand.scala:275)
>   at 
> kafka.admin.ReassignPartitionsCommand$.executeAssignment(ReassignPartitionsCommand.scala:197)
>   at 
> kafka.admin.ReassignPartitionsCommand$.executeAssignment(ReassignPartitionsCommand.scala:193)
>   at 
> kafka.admin.ReassignPartitionsCommand$.main(ReassignPartitionsCommand.scala:64)
>   at 
> kafka.admin.ReassignPartitionsCommand.main(ReassignPartitionsCommand.scala)
> Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected 
> character ('' (code 65279 / 0xfeff)): expected a valid value (number, 
> String, array, object, 'true', 'false' or 'null')
>  at [Source: (String)"{"version":1,
>   "partitions":[
>{"topic": "test1", "partition": 0, "replicas": [1,2]},
>{"topic": "test2", "partition": 1, "replicas": [2,3]}
> ]}"; line: 1, column: 2]
>   at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1798)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663)
>   at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:561)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser._handleOddValue(ReaderBasedJsonParser.java:1892)
>   at 
> com.fasterxml.jackson.core.json.ReaderBasedJsonParser.nextToken(ReaderBasedJsonParser.java:747)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper._readTreeAndClose(ObjectMapper.java:4030)
>   at 
> com.fasterxml.jackson.databind.ObjectMapper.readTree(ObjectMapper.java:2539)
>   at kafka.utils.Json$.kafka$utils$Json$$doParseFull(Json.scala:46)
>   at kafka.utils.Json$$anonfun$tryParseFull$1.apply(Json.scala:44)
>   at kafka.utils.Json$$anonfun$tryParseFull$1.apply(Json.scala:44)
>   at scala.util.Try$.apply(Try.scala:192)
>   at kafka.utils.Json$.tryParseFull(Json.scala:44)
>   at 
> kafka.admin.ReassignPartitionsCommand$.parsePartitionReassignmentData(ReassignPartitionsCommand.scala:241)
>   ... 5 more
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13442) REST API endpoint for fetching a connector's config definition

2022-04-28 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-13442.
-
Resolution: Duplicate

> REST API endpoint for fetching a connector's config definition
> --
>
> Key: KAFKA-13442
> URL: https://issues.apache.org/jira/browse/KAFKA-13442
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 3.2.0
>Reporter: Viktor Somogyi-Vass
>Assignee: Viktor Somogyi-Vass
>Priority: Major
>
> To enhance UI based applications' capability in helping users to create new 
> connectors from default configurations, it would be very good to have an API 
> which can fetch a connector type's configuration definition which will be 
> filled out by users and sent back for validation and then creating a new 
> connector out of it.
> The API should be placed under {{connector-plugins}} and since 
> {{connector-plugins/\{connectorType\}/config/validate}} already exists, 
> {{connector-plugins/\{connectorType\}/config}} might be a good option for the 
> new API.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-14331) Upgrade to Scala 2.13.10

2022-10-25 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-14331.
-
Resolution: Duplicate

> Upgrade to Scala 2.13.10
> 
>
> Key: KAFKA-14331
> URL: https://issues.apache.org/jira/browse/KAFKA-14331
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 3.4.0
>Reporter: Viktor Somogyi-Vass
>Priority: Major
>
> There are some CVEs in Scala 2.13.8, so we should upgrade to the latest.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14331) Upgrade to Scala 2.13.10

2022-10-24 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-14331:
---

 Summary: Upgrade to Scala 2.13.10
 Key: KAFKA-14331
 URL: https://issues.apache.org/jira/browse/KAFKA-14331
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 3.4.0
Reporter: Viktor Somogyi-Vass


There are some CVEs in Scala 2.13.8, so we should upgrade to the latest.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14250) Exception during normal operation in MirrorSourceTask causes the task to fail instead of shutting down gracefully

2022-09-21 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-14250:
---

 Summary: Exception during normal operation in MirrorSourceTask 
causes the task to fail instead of shutting down gracefully
 Key: KAFKA-14250
 URL: https://issues.apache.org/jira/browse/KAFKA-14250
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect, mirrormaker
Affects Versions: 3.3
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


In MirrorSourceTask we are loading offsets for the topic partitions. At this 
point, while we are fetching the partitions, it is possible for the offset 
reader to be stopped by a parallel thread. Stopping the reader causes a 
CancellationException to be thrown, due to KAFKA-9051.

Currently this exception is not caught in MirrorSourceTask and so the exception 
propagates up and causes the task to go into FAILED state. We only need it to 
go to STOPPED state so that it would be restarted later.

This can be achieved by catching the exception and stopping the task directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14281) Multi-level rack awareness

2022-10-05 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-14281:
---

 Summary: Multi-level rack awareness
 Key: KAFKA-14281
 URL: https://issues.apache.org/jira/browse/KAFKA-14281
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 3.4.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


h1. Motivation

With replication services data can be replicated across independent Kafka 
clusters in multiple data center. In addition, many customers need "stretch 
clusters" - a single Kafka cluster that spans across multiple data centers. 
This architecture has the following useful characteristics:
 - Data is natively replicated into all data centers by Kafka topic replication.
 - No data is lost when 1 DC is lost and no configuration change is required - 
design is implicitly relying on native Kafka replication.
 - From operational point of view, it is much easier to configure and operate 
such a topology than a replication scenario via MM2.

Kafka should provide "native" support for stretch clusters, covering any 
special aspects of operations of stretch cluster.

h2. Multi-level rack awareness

Additionally, stretch clusters are implemented using the rack awareness 
feature, where each DC is represented as a rack. This ensures that replicas are 
spread across DCs evenly. Unfortunately, there are cases where this is too 
limiting - in case there are actual racks inside the DCs, we cannot specify 
those. Consider having 3 DCs with 2 racks each:

/DC1/R1, /DC1/R2
/DC2/R1, /DC2/R2
/DC3/R1, /DC3/R2

If we were to use racks as DC1, DC2, DC3, we lose the rack-level information of 
the setup. This means that it is possible that when we are using RF=6, that the 
2 replicas assigned to DC1 will both end up in the same rack.

If we were to use racks as /DC1/R1, /DC1/R2, etc, then when using RF=3, it is 
possible that 2 replicas end up in the same DC, e.g. /DC1/R1, /DC1/R2, /DC2/R1.

Because of this, Kafka should support "multi-level" racks, which means that 
rack IDs should be able to describe some kind of a hierarchy. With this 
feature, brokers should be able to:
 # spread replicas evenly based on the top level of the hierarchy (i.e. first, 
between DCs)
 # then inside a top-level unit (DC), if there are multiple replicas, they 
should be spread evenly among lower-level units (i.e. between racks, then 
between physical hosts, and so on)
 ## repeat for all levels



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14929) Flaky KafkaStatusBackingStoreFormatTest#putTopicStateRetriableFailure

2023-04-27 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-14929.
-
Resolution: Fixed

> Flaky KafkaStatusBackingStoreFormatTest#putTopicStateRetriableFailure
> -
>
> Key: KAFKA-14929
> URL: https://issues.apache.org/jira/browse/KAFKA-14929
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect
>Reporter: Greg Harris
>Assignee: Sagar Rao
>Priority: Major
>  Labels: flaky-test
> Fix For: 3.5.0
>
>
> This test recently started flaky-failing with the following stack trace:
> {noformat}
> org.mockito.exceptions.verification.TooFewActualInvocations: 
> kafkaBasedLog.send(, , );
> Wanted 2 times:->
>  at org.apache.kafka.connect.util.KafkaBasedLog.send(KafkaBasedLog.java:376)
> But was 1 time:->
>  at 
> org.apache.kafka.connect.storage.KafkaStatusBackingStore.sendTopicStatus(KafkaStatusBackingStore.java:315)
>   at 
> app//org.apache.kafka.connect.util.KafkaBasedLog.send(KafkaBasedLog.java:376)
>   at 
> app//org.apache.kafka.connect.storage.KafkaStatusBackingStoreFormatTest.putTopicStateRetriableFailure(KafkaStatusBackingStoreFormatTest.java:219)
>   at 
> java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
>  Method)
>   at 
> java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566)
> ...{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15161) InvalidReplicationFactorException at connect startup

2023-07-07 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-15161:
---

 Summary: InvalidReplicationFactorException at connect startup
 Key: KAFKA-15161
 URL: https://issues.apache.org/jira/browse/KAFKA-15161
 Project: Kafka
  Issue Type: Improvement
  Components: clients, KafkaConnect
Affects Versions: 3.6.0
Reporter: Viktor Somogyi-Vass


.h2 Problem description

In our system test environment in certain cases due to a very specific timing 
issue Connect may fail to start up. the problem lies in the very specific 
timing of a Kafka cluster and connect start/restart. In these cases while the 
broker doesn't have metadata and a consumer in connect starts and asks for 
topic metadata, it returns the following exception and fails:
{noformat}
[2023-07-07 13:56:47,994] ERROR [Worker clientId=connect-1, 
groupId=connect-cluster] Uncaught exception in herder work thread, exiting:  
(org.apache.kafka.connect.runtime.distributed.DistributedHerder)
org.apache.kafka.common.KafkaException: Unexpected error fetching metadata for 
topic connect-offsets
at 
org.apache.kafka.clients.consumer.internals.TopicMetadataFetcher.getTopicMetadata(TopicMetadataFetcher.java:130)
at 
org.apache.kafka.clients.consumer.internals.TopicMetadataFetcher.getTopicMetadata(TopicMetadataFetcher.java:66)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:2001)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.partitionsFor(KafkaConsumer.java:1969)
at 
org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:251)
at 
org.apache.kafka.connect.storage.KafkaOffsetBackingStore.start(KafkaOffsetBackingStore.java:242)
at org.apache.kafka.connect.runtime.Worker.start(Worker.java:230)
at 
org.apache.kafka.connect.runtime.AbstractHerder.startServices(AbstractHerder.java:151)
at 
org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:363)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor is below 1 or larger than the number of available brokers.
{noformat}

Due to this error the connect node stops and it has to be manually restarted 
(and ofc it fails the test scenarios as well).

.h2 Reproduction

In my test scenario I had:
- 1 broker
- 1 connect distributed node
- I also had a patch that I applied on the broker to make sure we don't have 
metadata

Steps to repro:
# start up a zookeeper based broker without the patch
# put a breakpoint here: 
https://github.com/apache/kafka/blob/1d8b07ed6435568d3daf514c2d902107436d2ac8/clients/src/main/java/org/apache/kafka/clients/consumer/internals/TopicMetadataFetcher.java#L94
# start up a distributed connect node
# restart the kafka broker with the patch to make sure there is no metadata
# once the broker is started, release the debugger in connect

It should run into the error cited above and shut down.

This is not desirable, the connect cluster should retry to ensure its 
continuous operation or the broker should handle this case somehow differently, 
for instance by returning a RetriableException.

The earliest I've tried this is 2.8 but I think this affects versions before 
that as well (and after).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15219) Support delegation tokens in KRaft

2023-07-19 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-15219:
---

 Summary: Support delegation tokens in KRaft
 Key: KAFKA-15219
 URL: https://issues.apache.org/jira/browse/KAFKA-15219
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 3.6.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


Delegation tokens have been created in KIP-48 and improved in KIP-373. KRaft 
enabled the way to supporting them in KIP-900 by adding SCRAM support but 
delegation tokens still don't support KRaft.

There are multiple issues:
- TokenManager still would try to create tokens in Zookeeper. Instead of this 
we should forward admin requests to the controller that would store them in the 
metadata similarly to SCRAM.
- TokenManager should run on Controller nodes only (or in mixed mode).
- Integration tests will need to be adapted as well and parameterize them with 
Zookeeper/KRaft.
- Documentation needs to be improved to factor in KRaft.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-12384) Flaky Test ListOffsetsRequestTest.testResponseIncludesLeaderEpoch

2023-05-24 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-12384.
-
Fix Version/s: 3.6.0
   Resolution: Fixed

> Flaky Test ListOffsetsRequestTest.testResponseIncludesLeaderEpoch
> -
>
> Key: KAFKA-12384
> URL: https://issues.apache.org/jira/browse/KAFKA-12384
> Project: Kafka
>  Issue Type: Test
>  Components: core, unit tests
>Reporter: Matthias J. Sax
>Assignee: Chia-Ping Tsai
>Priority: Critical
>  Labels: flaky-test
> Fix For: 3.6.0, 3.0.0
>
>
> {quote}org.opentest4j.AssertionFailedError: expected: <(0,0)> but was: 
> <(-1,-1)> at 
> org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55) at 
> org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62) at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182) at 
> org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177) at 
> org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1124) at 
> kafka.server.ListOffsetsRequestTest.testResponseIncludesLeaderEpoch(ListOffsetsRequestTest.scala:172){quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15059) Exactly-once source tasks fail to start during pending rebalances

2023-06-21 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-15059.
-
Resolution: Fixed

[~ChrisEgerton] since the PR is merged, I resolve this ticket.

> Exactly-once source tasks fail to start during pending rebalances
> -
>
> Key: KAFKA-15059
> URL: https://issues.apache.org/jira/browse/KAFKA-15059
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect, mirrormaker
>Affects Versions: 3.6.0
>Reporter: Chris Egerton
>Assignee: Chris Egerton
>Priority: Blocker
> Fix For: 3.6.0
>
>
> When asked to perform a round of zombie fencing, the distributed herder will 
> [reject the 
> request|https://github.com/apache/kafka/blob/17fd30e6b457f097f6a524b516eca1a6a74a9144/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L1249-L1250]
>  if a rebalance is pending, which can happen if (among other things) a config 
> for a new connector or a new set of task configs has been recently read from 
> the config topic.
> Normally this can be alleviated with a simple task restart, which isn't great 
> but isn't terrible.
> However, when running MirrorMaker 2 in dedicated mode, there is no API to 
> restart failed tasks, and it can be more common to see this kind of failure 
> on a fresh cluster because three connector configurations are written in 
> rapid succession to the config topic.
>  
> In order to provide a better experience for users of both vanilla Kafka 
> Connect and dedicated MirrorMaker 2 clusters, we can retry (likely with the 
> same exponential backoff introduced with KAFKA-14732) zombie fencing attempts 
> that fail due to a pending rebalance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15992) Make MM2 heartbeats topic name configurable

2023-12-11 Thread Viktor Somogyi-Vass (Jira)
Viktor Somogyi-Vass created KAFKA-15992:
---

 Summary: Make MM2 heartbeats topic name configurable
 Key: KAFKA-15992
 URL: https://issues.apache.org/jira/browse/KAFKA-15992
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 3.7.0
Reporter: Viktor Somogyi-Vass
Assignee: Viktor Somogyi-Vass


With DefaultReplicationPolicy, the heartbeats topic name is hard-coded. 
Instead, this should be configurable, so users can avoid collisions with the 
"heartbeats" topics of other systems.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16596) Flaky test – org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()

2024-04-22 Thread Viktor Somogyi-Vass (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viktor Somogyi-Vass resolved KAFKA-16596.
-
Fix Version/s: 3.8.0
 Assignee: Andras Katona
   Resolution: Fixed

> Flaky test – 
> org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
>  
> ---
>
> Key: KAFKA-16596
> URL: https://issues.apache.org/jira/browse/KAFKA-16596
> Project: Kafka
>  Issue Type: Test
>Reporter: Igor Soarez
>Assignee: Andras Katona
>Priority: Major
>  Labels: GoodForNewContributors, good-first-issue
> Fix For: 3.8.0
>
>
> org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup()
>  failed in the following way:
>  
> {code:java}
> org.opentest4j.AssertionFailedError: Unexpected addresses [93.184.215.14, 
> 2606:2800:21f:cb07:6820:80da:af6b:8b2c] ==> expected:  but was:  
>   at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
>at 
> app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
>at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)   
>  at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) 
> at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:214)  
>   at 
> app//org.apache.kafka.clients.ClientUtilsTest.testParseAndValidateAddressesWithReverseLookup(ClientUtilsTest.java:65)
>  {code}
> As a result of the following assertions:
>  
> {code:java}
> // With lookup of example.com, either one or two addresses are expected 
> depending on
> // whether ipv4 and ipv6 are enabled
> List validatedAddresses = 
> checkWithLookup(asList("example.com:1"));
> assertTrue(validatedAddresses.size() >= 1, "Unexpected addresses " + 
> validatedAddresses);
> List validatedHostNames = 
> validatedAddresses.stream().map(InetSocketAddress::getHostName)
> .collect(Collectors.toList());
> List expectedHostNames = asList("93.184.216.34", 
> "2606:2800:220:1:248:1893:25c8:1946"); {code}
> It seems that the DNS result has changed for example.com.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)