[jira] [Updated] (KAFKA-10672) Restarting Kafka always takes a lot of time

2020-11-01 Thread shenwenbing (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shenwenbing updated KAFKA-10672:

Attachment: server.log

> Restarting Kafka always takes a lot of time
> ---
>
> Key: KAFKA-10672
> URL: https://issues.apache.org/jira/browse/KAFKA-10672
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.0.0
> Environment: A cluster of 21 Kafka nodes;
> Each node has 12 disks;
> Each node has about 1500 partitions;
> There are approximately 700 leader partitions per node;
> Slow-loading partitions have about 1000 log segments;
>Reporter: shenwenbing
>Priority: Major
> Attachments: server.log
>
>
> When the snapshot file does not exist, or the latest snapshot file before the 
> current active period, restoring the state of producers will traverse the log 
> section, it will traverse the log all batch, in the period when the 
> individual broker node partition number many, that there are most of the 
> number of logs, can cause a lot of IO number, IO will only load one batch at 
> a time, such as a log there will always be in the tens of thousands of batch, 
> I found that in the code for each batch are at least two IO operation, when a 
> batch as the default 16 KB,When a log segment is 1G, 65,536 batches will be 
> generated, and then at least 65,536 *2= 131,072 IO operations will be 
> generated, which will lead to a lot of time spent in kafka startup process. 
> We configured 15 log recovery threads in the production environment, and it 
> still took more than 2 hours to load a partition,can community puts forward 
> some proposals to the situation or improve.For detailed logs, see the section 
> on test-perf-18 partitions in the nearby logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10672) Restarting Kafka always takes a lot of time

2020-11-01 Thread shenwenbing (Jira)
shenwenbing created KAFKA-10672:
---

 Summary: Restarting Kafka always takes a lot of time
 Key: KAFKA-10672
 URL: https://issues.apache.org/jira/browse/KAFKA-10672
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 2.0.0
 Environment: A cluster of 21 Kafka nodes;
Each node has 12 disks;
Each node has about 1500 partitions;
There are approximately 700 leader partitions per node;
Slow-loading partitions have about 1000 log segments;
Reporter: shenwenbing
 Attachments: server.log

When the snapshot file does not exist, or the latest snapshot file before the 
current active period, restoring the state of producers will traverse the log 
section, it will traverse the log all batch, in the period when the individual 
broker node partition number many, that there are most of the number of logs, 
can cause a lot of IO number, IO will only load one batch at a time, such as a 
log there will always be in the tens of thousands of batch, I found that in the 
code for each batch are at least two IO operation, when a batch as the default 
16 KB,When a log segment is 1G, 65,536 batches will be generated, and then at 
least 65,536 *2= 131,072 IO operations will be generated, which will lead to a 
lot of time spent in kafka startup process. We configured 15 log recovery 
threads in the production environment, and it still took more than 2 hours to 
load a partition,can community puts forward some proposals to the situation or 
improve.For detailed logs, see the section on test-perf-18 partitions in the 
nearby logs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10660) Poll time out logstash

2020-11-01 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen reassigned KAFKA-10660:
-

Assignee: Luke Chen

> Poll time out logstash
> --
>
> Key: KAFKA-10660
> URL: https://issues.apache.org/jira/browse/KAFKA-10660
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.2.1
> Environment: Non Production
>Reporter: David
>Assignee: Luke Chen
>Priority: Minor
>
> I am getting below message (logstash log from kafka input which I believe I 
> need increase max.poll.interval.ms (I think the default is 3)
>  
> This member will leave the group because consumer poll timeout has expired. 
> This means the time between subsequent calls to poll() was longer than the 
> configured max.poll.interval.ms, which typically implies that the poll loop 
> is spending too much time processing messages. You can address this either by 
> increasing max.poll.interval.ms or by reducing the maximum size of batches 
> returned in poll() with max.poll.records



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10670) repo.maven.apache.org: Name or service not known

2020-11-01 Thread Luke Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224380#comment-17224380
 ] 

Luke Chen commented on KAFKA-10670:
---

I cannot reproduce this issue.

> repo.maven.apache.org: Name or service not known
> 
>
> Key: KAFKA-10670
> URL: https://issues.apache.org/jira/browse/KAFKA-10670
> Project: Kafka
>  Issue Type: Bug
>  Components: build
> Environment: Fedora 33, Aarch64 
>Reporter: Lutz Weischer
>Priority: Minor
>
> ./gradlew jar 
> fails: 
> > Configure project :
> Building project 'core' with Scala version 2.13.3
> Building project 'streams-scala' with Scala version 2.13.3
> > Task :clients:processMessages
> MessageGenerator: processed 121 Kafka message JSON files(s).
> > Task :clients:compileJava FAILED
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':clients:compileJava'.
> > Could not resolve all files for configuration ':clients:compileClasspath'.
>> Could not resolve org.xerial.snappy:snappy-java:1.1.7.7.
>  Required by:
>  project :clients
>   > Could not resolve org.xerial.snappy:snappy-java:1.1.7.7.
>  > Could not get resource 
> 'https://repo.maven.apache.org/maven2/org/xerial/snappy/snappy-java/1.1.7.7/snappy-java-1.1.7.7.pom'.
> > Could not HEAD 
> 'https://repo.maven.apache.org/maven2/org/xerial/snappy/snappy-java/1.1.7.7/snappy-java-1.1.7.7.pom'.
>> repo.maven.apache.org: Name or service not known
> * Try:
> Run with --stacktrace option to get the stack trace. Run with --info or 
> --debug option to get more log output. Run with --scan to get full insights.
> * Get more help at https://help.gradle.org
> Deprecated Gradle features were used in this build, making it incompatible 
> with Gradle 7.0.
> Use '--warning-mode all' to show the individual deprecation warnings.
> See 
> https://docs.gradle.org/6.7/userguide/command_line_interface.html#sec:command_line_warnings
> BUILD FAILED in 21s
> 4 actionable tasks: 4 executed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10671) partition.assignment.strategy documentation does not include all options

2020-11-01 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen reassigned KAFKA-10671:
-

Assignee: Luke Chen

> partition.assignment.strategy documentation does not include all options
> 
>
> Key: KAFKA-10671
> URL: https://issues.apache.org/jira/browse/KAFKA-10671
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Reporter: Dave Shook
>Assignee: Luke Chen
>Priority: Minor
> Fix For: 2.6.0
>
>
> The current documentation for partition.assignment.strategy does not mention 
> the following options:
> org.apache.kafka.clients.consumer.StickyAssignor or
> org.apache.kafka.clients.consumer.CooperativeStickyAssignor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-4628) Support KTable/GlobalKTable Joins

2020-11-01 Thread Hartmut Armbruster (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224330#comment-17224330
 ] 

Hartmut Armbruster edited comment on KAFKA-4628 at 11/1/20, 8:16 PM:
-

Hi, has the content of KIP-314 been reviewed yet?

Are the concerns 'for relational data with a very narrow domain' 
- which have been outlined in great detail by [~abellemare] - still valid with 
the final solution/implementation of KIP-213 / KAFKA-3705 ? (given both KIPs 
span over several years...)


was (Author: hartmutcouk):
Hi, has the content of KIP-314 been reviewed yet?

Are the concerns 'for relational data with a very narrow domain' 
- which have been outlined in great detail by [~abellemare] - still valid with 
the final solution/implementation of KIP-213 / KAFKA-3705 ?

> Support KTable/GlobalKTable Joins
> -
>
> Key: KAFKA-4628
> URL: https://issues.apache.org/jira/browse/KAFKA-4628
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Priority: Major
>  Labels: needs-kip
>
> In KIP-99 we have added support for GlobalKTables, however we don't currently 
> support KTable/GlobalKTable joins as they require materializing a state store 
> for the join. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-4628) Support KTable/GlobalKTable Joins

2020-11-01 Thread Hartmut Armbruster (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224330#comment-17224330
 ] 

Hartmut Armbruster commented on KAFKA-4628:
---

Hi, has the content of KIP-314 been reviewed yet?

Are the concerns 'for relational data with a very narrow domain' 
- which have been outlined in great detail by [~abellemare] - still valid with 
the final solution/implementation of KIP-213 / KAFKA-3705 ?

> Support KTable/GlobalKTable Joins
> -
>
> Key: KAFKA-4628
> URL: https://issues.apache.org/jira/browse/KAFKA-4628
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Damian Guy
>Priority: Major
>  Labels: needs-kip
>
> In KIP-99 we have added support for GlobalKTables, however we don't currently 
> support KTable/GlobalKTable joins as they require materializing a state store 
> for the join. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10633) Constant probing rebalances in Streams 2.6

2020-11-01 Thread Bradley Peterson (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224310#comment-17224310
 ] 

Bradley Peterson commented on KAFKA-10633:
--

[~eran-levy] The fix did seem to work for us, we've been running it for a few 
days. We ended up using the latest commit for 2.6.1-SNAPSHOT. We just built it 
locally from https://github.com/apache/kafka/tree/2.6. As you said, 2.6.1 is 
not released, and doesn't have a planned release date yet.

> Constant probing rebalances in Streams 2.6
> --
>
> Key: KAFKA-10633
> URL: https://issues.apache.org/jira/browse/KAFKA-10633
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Bradley Peterson
>Priority: Major
> Attachments: Discover 2020-10-21T23 34 03.867Z - 2020-10-21T23 44 
> 46.409Z.csv
>
>
> We are seeing a few issues with the new rebalancing behavior in Streams 2.6. 
> This ticket is for constant probing rebalances on one StreamThread, but I'll 
> mention the other issues, as they may be related.
> First, when we redeploy the application we see tasks being moved, even though 
> the task assignment was stable before redeploying. We would expect to see 
> tasks assigned back to the same instances and no movement. The application is 
> in EC2, with persistent EBS volumes, and we use static group membership to 
> avoid rebalancing. To redeploy the app we terminate all EC2 instances. The 
> new instances will reattach the EBS volumes and use the same group member id.
> After redeploying, we sometimes see the group leader go into a tight probing 
> rebalance loop. This doesn't happen immediately, it could be several hours 
> later. Because the redeploy caused task movement, we see expected probing 
> rebalances every 10 minutes. But, then one thread will go into a tight loop 
> logging messages like "Triggering the followup rebalance scheduled for 
> 1603323868771 ms.", handling the partition assignment (which doesn't change), 
> then "Requested to schedule probing rebalance for 1603323868771 ms." This 
> repeats several times a second until the app is restarted again. I'll attach 
> a log export from one such incident.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10671) partition.assignment.strategy documentation does not include all options

2020-11-01 Thread Dave Shook (Jira)
Dave Shook created KAFKA-10671:
--

 Summary: partition.assignment.strategy documentation does not 
include all options
 Key: KAFKA-10671
 URL: https://issues.apache.org/jira/browse/KAFKA-10671
 Project: Kafka
  Issue Type: Bug
  Components: documentation
Reporter: Dave Shook
 Fix For: 2.6.0


The current documentation for partition.assignment.strategy does not mention 
the following options:

org.apache.kafka.clients.consumer.StickyAssignor or
org.apache.kafka.clients.consumer.CooperativeStickyAssignor



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10670) repo.maven.apache.org: Name or service not known

2020-11-01 Thread Lutz Weischer (Jira)
Lutz Weischer created KAFKA-10670:
-

 Summary: repo.maven.apache.org: Name or service not known
 Key: KAFKA-10670
 URL: https://issues.apache.org/jira/browse/KAFKA-10670
 Project: Kafka
  Issue Type: Bug
  Components: build
 Environment: Fedora 33, Aarch64 
Reporter: Lutz Weischer


./gradlew jar 

fails: 

> Configure project :
Building project 'core' with Scala version 2.13.3
Building project 'streams-scala' with Scala version 2.13.3

> Task :clients:processMessages
MessageGenerator: processed 121 Kafka message JSON files(s).

> Task :clients:compileJava FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':clients:compileJava'.
> Could not resolve all files for configuration ':clients:compileClasspath'.
   > Could not resolve org.xerial.snappy:snappy-java:1.1.7.7.
 Required by:
 project :clients
  > Could not resolve org.xerial.snappy:snappy-java:1.1.7.7.
 > Could not get resource 
'https://repo.maven.apache.org/maven2/org/xerial/snappy/snappy-java/1.1.7.7/snappy-java-1.1.7.7.pom'.
> Could not HEAD 
'https://repo.maven.apache.org/maven2/org/xerial/snappy/snappy-java/1.1.7.7/snappy-java-1.1.7.7.pom'.
   > repo.maven.apache.org: Name or service not known

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output. Run with --scan to get full insights.

* Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with 
Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See 
https://docs.gradle.org/6.7/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 21s
4 actionable tasks: 4 executed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10633) Constant probing rebalances in Streams 2.6

2020-11-01 Thread Eran Levy (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17224247#comment-17224247
 ] 

Eran Levy commented on KAFKA-10633:
---

[~thebearmayor]  same happens here, didnt deploy the fix yet - just to make 
sure, do you still see any issues?

I see that it hasn't been released yet - 
[https://github.com/apache/kafka/releases]

Whats the best way to get that fix?

 

> Constant probing rebalances in Streams 2.6
> --
>
> Key: KAFKA-10633
> URL: https://issues.apache.org/jira/browse/KAFKA-10633
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.6.0
>Reporter: Bradley Peterson
>Priority: Major
> Attachments: Discover 2020-10-21T23 34 03.867Z - 2020-10-21T23 44 
> 46.409Z.csv
>
>
> We are seeing a few issues with the new rebalancing behavior in Streams 2.6. 
> This ticket is for constant probing rebalances on one StreamThread, but I'll 
> mention the other issues, as they may be related.
> First, when we redeploy the application we see tasks being moved, even though 
> the task assignment was stable before redeploying. We would expect to see 
> tasks assigned back to the same instances and no movement. The application is 
> in EC2, with persistent EBS volumes, and we use static group membership to 
> avoid rebalancing. To redeploy the app we terminate all EC2 instances. The 
> new instances will reattach the EBS volumes and use the same group member id.
> After redeploying, we sometimes see the group leader go into a tight probing 
> rebalance loop. This doesn't happen immediately, it could be several hours 
> later. Because the redeploy caused task movement, we see expected probing 
> rebalances every 10 minutes. But, then one thread will go into a tight loop 
> logging messages like "Triggering the followup rebalance scheduled for 
> 1603323868771 ms.", handling the partition assignment (which doesn't change), 
> then "Requested to schedule probing rebalance for 1603323868771 ms." This 
> repeats several times a second until the app is restarted again. I'll attach 
> a log export from one such incident.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (KAFKA-10629) TopologyTestDriver should not require a Properties arg

2020-11-01 Thread Rohit Deshpande (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohit Deshpande reassigned KAFKA-10629:
---

Assignee: Rohit Deshpande

> TopologyTestDriver should not require a Properties arg
> --
>
> Key: KAFKA-10629
> URL: https://issues.apache.org/jira/browse/KAFKA-10629
> Project: Kafka
>  Issue Type: Task
>  Components: streams, streams-test-utils
>Reporter: John Roesler
>Assignee: Rohit Deshpande
>Priority: Minor
>  Labels: needs-kip, newbie
>
> As of [https://github.com/apache/kafka/pull/9477,] many TopologyTestDriver 
> usages will have no configurations at all to specify, so we should provide a 
> constructor that doesn't take a Properties argument. Right now, such 
> configuration-free usages have to provide an empty Properties object.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)