[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-09-15 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15494596#comment-15494596
 ] 

Jiangjie Qin commented on KAFKA-1464:
-

It seems the PR title did not start with "KAFKA-1464" so the PR link is not 
updated. Anyway, the PR link is https://github.com/apache/kafka/pull/1776

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ben Stopford
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.10.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-07-12 Thread Ralph Weires (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372870#comment-15372870
 ] 

Ralph Weires commented on KAFKA-1464:
-

Another related idea then, since those consumer rebalancing issues that result 
during maintenance for us drove me up the walls yesterday... Just desperately 
looking for a way to get this stabilized (on our v0.8.2.1) ;)

Wouldn't a (manual and temporary) modification of the partition assignment also 
be a viable option, to prevent a given node from becoming leader for any 
partitions?

I mean, could I issue kafka-reassign-partitions.sh with a customized partition 
assignment, that wouldn't actually re-assign any partitions to different 
brokers, but would merely change the replica *order* for several of the 
partitions - such that the node in question no longer is first replica for any 
partition? If I understand it right, the controller will always prefer the 
first replica as leader in balancing, so I'd just need to make sure that my 
node won't be the first replica for anything. All this temporarily of course, 
so after the maintenance I'd restore the original partition assignment back 
again.

Should this work, or would you expect specific problems with this workaround...?

Also: Let me know if this rather belongs onto the mailing list, since 
admittedly it isn't really related to throttling... But as a side-remark in 
this regard, I also tried throttling outside kafka (i.e. on side of the 
network, tried via wondershaper) in our problem case, but that didn't help. I'd 
agree this would need to be within kafka, i.e. to be able to separate 
out-of-sync replica recovery traffic from the rest.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ben Stopford
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.10.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-07-11 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15372039#comment-15372039
 ] 

Jun Rao commented on KAFKA-1464:


The controller does leader balancing. So, auto.leader.rebalance.enable needs to 
be set on the controller. However, controller can move.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ben Stopford
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.10.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-07-11 Thread Ralph Weires (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371776#comment-15371776
 ] 

Ralph Weires commented on KAFKA-1464:
-

Thanks a lot for the input - so if I understand this right, the config setting 
James proposed would not work for me if I only set this on a single node (i.e. 
the node under maintenance) before starting it up again, correct? Otherwise, 
that would have been the perfect solution for me. I wouldn't mind running the 
node with the custom setting during recovery, and just restarting it again once 
more in the end without the setting.

If this won't work, what would even happen if this setting is defined 
differently on various nodes in the cluster? Anyhow, alternatively I'd still 
even consider using that option along with a full cluster restart before (and 
disabling with another cluster restart afterwards), since a maintenance 
scenario as described happens every now and then for us, and currently really 
causes us major hassle for many hours, every time.

Jun - I'm also not be sure if disabling leader balancing during catch up would 
necessarily be a good idea in general - but having / allowing the possibility 
to configure this some way would be a nice option to have IMO.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ben Stopford
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.10.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-07-11 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371583#comment-15371583
 ] 

Jun Rao commented on KAFKA-1464:


Currently, our leader balancing logic happens automatically on a per partition 
basis. Turning this off requires a restart of all brokers.

I am not sure if we always want to disable leader balancing during catch up 
though. Balancing the leaders as the replicas catching up allows us to balance 
the client traffic to more brokers. Doing this may slow down the catch-up 
traffic a bit. However, this is probably fine if we do the throttling properly.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ben Stopford
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.10.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-07-11 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371578#comment-15371578
 ] 

James Cheng commented on KAFKA-1464:


[~r.weires], you might be able to control this a little by setting 
auto.leader.rebalance.enable=false. If you it to false, then the broker would 
come up but would not assume leadership for any partitions at all, unless 
manually told to. You would then have to use the 
kafka-preferred-replica-election.sh tool 
[https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-1.PreferredReplicaLeaderElectionTool],
 to allow it to assume leadership.

This would mean that you wouldn't have the problem you described. But the 
downside is that you are now in charge of handling rebalancing on your own.

The auto.leader.rebalance.enable flag is not changeable during runtime, tho. I 
think it is only read at startup time.


> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ben Stopford
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.10.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-07-11 Thread K Zakee (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371402#comment-15371402
 ] 

K Zakee commented on KAFKA-1464:


I agree with Ralph. 
Lets say, we have a high produce rate and a system failure (as long as the 
kafka retention period itself), there is a lot of data to catchup and as fast 
as it could. Throttling catching up of out-of-sync replicas in this case may 
become a "chase-your-own-tail thing" and these may never be able to catchup 
with their leader or take days depending on produce-rate and throttle limit. 
Suppressing new replicas taking the leadership until the time they have all 
caught up sounds a better idea.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ben Stopford
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.10.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-06-13 Thread Ralph Weires (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327561#comment-15327561
 ] 

Ralph Weires commented on KAFKA-1464:
-

We have similar problems as described by Jason above, in our case usually when 
taking a broker offline due to hardware failure (broken HD, with each broker 
being equipped with 2 HDs / log directories in our case). If the broker gets 
back online with one fresh disk and corresponding missing data (i.e. half of 
the partitions of that broker missing), its network link is saturated for some 
time by inbound traffic to catch up with replication.

While the broker is re-streaming all the missing data, we are additionally 
experiencing problems with consumers as well. After the broker has caught up 
with it's missing data, the situation normalizes again quickly.

To me it seems as if the partitions for which the broker already catches up 
soon after restart (esp. the ones from non-broken HD which just had little data 
missing) are causing issues if the broker becomes leader for them, while it is 
otherwise still clogging its incoming link with replication of the remaining 
data.

In this scenario, I would actually prefer to just let the broker catch up with 
any replication it still needs to do, without it becoming leader for any 
partition it has. Isn't there actually a way to achieve this? I.e. just keeping 
a broker online with replication and all, but not having it take over any 
partition leadership (at least so long as there are other candidates available 
for leadership). Being able to toggle that behavior at run-time would be ideal, 
so that we would just explicitly activate it again after the maintenance 
interval, once the node has caught up the bulk of necessary replication. Could 
IMO be an alternative to any throttling approach.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.10.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-04-14 Thread Jason Ruckman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241774#comment-15241774
 ] 

Jason Ruckman commented on KAFKA-1464:
--

Hello Neha, 

One problem we've run into, is we run a system where sometimes we replace 
brokers completely, in an automated fashion, and rebalance leadership and 
replicas across them.  When we bring a new broker online, we move some 
partitions to it.  What we see is something like this:

Consider topics A, B, C with replication factors of 3
Consider brokers 1,2,3 as serving topics A,B,C

A new broker 4 is replacing 1 (maybe the machine died, or whatever)

A and B are relatively small, but C is large

1. Move some leaders and replicas to 4 for A and B from 2 and 3.  Everything is 
good up until now
2. Move some leaders and replicas to 4 for C from 2 and 3. 

At this point, broker 4 is pegged, since it's trying to pull in data from 2 and 
3 (the other two replicas) trying to catch up, so it causes timeouts for 
partitions it is the leader for.  Brokers 2 and 3 are ok because 4 can only use 
1/2 of their bandwidth to replicate, since they still have some bandwidth 
available to serve requests.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.10.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-02-13 Thread Neha Narkhede (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146250#comment-15146250
 ] 

Neha Narkhede commented on KAFKA-1464:
--

The most useful resource to throttle for is network bandwidth usage by 
replication, as measured by the rate of total outgoing replication data on 
every leader. Adding the ability on every leader to cap data transferred under 
an upper limit is what we are looking for. This can be a config option similar 
to the one we have for the log cleaner. It seems to be that it is better to 
have the leader send less instead of have the replica fetch less as the leader 
has a holistic view of the total amount of data being transferred out.
Data transferred from a leader includes
- Fetch requests from an in-sync replica
- Fetch requests from an out-of-sync replica of a partition being reassigned
- Fetch requests from an out-of-sync replica of a partition not being reassigned
Data transferred across 1+2+3 should stay roughly within the configured upper 
limit. If the limit is crossed, we want to start throttling requests, all 
except the ones that fall under #1. The leader can assign the remaining 
available bandwidth amongst partitions that fall under #2 and #3 by allowing 
more bandwidth to #3 since presumably it is fine to let partitions being 
reassigned to catch up slower than the rest. Throttling could involve returning 
fewer bytes as determined by this computation for each such partition as Jay 
suggests.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.9.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-01-29 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123618#comment-15123618
 ] 

Ismael Juma commented on KAFKA-1464:


Thanks for your input [~jkreps].

With regards to the issue where a replica may never catch up, it is a good 
point that came up previously. One option may be to disable throttling (or 
increase the catch-up rate) in the case where the replica is falling further 
behind.

One important question is whether users have enough information to be able to 
configure an appropriate throttling/catch-up rate that takes into account both 
disk IO and network bandwidth while keeping resource utilisation at an 
appropriate level. Thoughts? (the log cleaner has a similar config: 
`log.cleaner.io.max.bytes.per.second`, although it seems simpler to figure out).

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.9.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-01-29 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123956#comment-15123956
 ] 

Jiangjie Qin commented on KAFKA-1464:
-

It looks our purpose is to minimize user impact during replica catching up. 
From broker point of view, as long as client request latency is acceptable we 
should fully utilize the bandwidth we have to let replicas keep up. We should 
be able to measure the user experience by checking Queuing time of requests 
from and response to clients.

If that is the case, maybe we can let user set an SLA for latency. And we will 
not throttle replication as long as the user ProduceRequest / FetchRequest 
queuing time. Otherwise, we will throttle the fetching from out of sync replica 
(We probably don't want to throttle in-sync replicas).

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.9.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-01-28 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121706#comment-15121706
 ] 

Jay Kreps commented on KAFKA-1464:
--

Another issue this raises is that a partition might have a natural rate of new 
data coming in that is higher than the catch-up rate in which case if it ever 
falls out of sync it can never catch up. This is possible today to some extent 
but not a common problem since the followers are, if anything, a bit faster 
than the leader and have no throttle.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.9.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-01-28 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15121691#comment-15121691
 ] 

Jay Kreps commented on KAFKA-1464:
--

I agree that the key difference is in-sync vs out-of-sync replicas. In-sync 
replicas add to the commit time so they are really the highest priority and 
generally should add much load anyway. Out-of-sync replicas are the catch up 
case that add load.

Blindly reducing the fetch size for out-of-sync partitions probably would make 
things worse though. Large fetch size is actually good for efficiency and 
shrinking it will add overhead (more physical I/O, more FS reads, more requests 
overall, etc).

However it should be possible to throttle dynamically at the partition level 
for out of sync partitions. This could be done by dynamically omitting 
partitions that have exceeded their throttle rate from either the fetch request 
that the follower sends or from the fetch response the leader constructs. For 
example when handling follower fetch requests the leader could check the 
observed fetch rate for that follower and whether it is in sync or not; if the 
rate exceeds the configured maximum for catch-up traffic the leader would 
ignore that partition and only answer for other partitions (if there are no 
other partitions the purgatory time would need to be calculated to be no 
greater than the time in which the fetch rate might come down below the 
throttle). This would allow for dynamically throttling down the catch up 
traffic without reducing efficiency.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.9.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2016-01-18 Thread Eno Thereska (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15105639#comment-15105639
 ] 

Eno Thereska commented on KAFKA-1464:
-

An alternative to throttling background maintenance traffic is to use a 
priority levels (just two: foreground and background). This has the advantage 
of being fairly simple and allows for important replication work to proceed 
fast if there is little or no foreground traffic. If most of the contention 
happens at the disk (as [~mjuarez] seems to indicate) then priorities 
implemented as two queues at the receiving end could be sufficient. However, if 
the network is a problem as well, then throttling would probably work best 
since it limits background traffic at the source.

> Add a throttling option to the Kafka replication tool
> -
>
> Key: KAFKA-1464
> URL: https://issues.apache.org/jira/browse/KAFKA-1464
> Project: Kafka
>  Issue Type: New Feature
>  Components: replication
>Affects Versions: 0.8.0
>Reporter: mjuarez
>Assignee: Ismael Juma
>Priority: Minor
>  Labels: replication, replication-tools
> Fix For: 0.9.1.0
>
>
> When performing replication on new nodes of a Kafka cluster, the replication 
> process will use all available resources to replicate as fast as possible.  
> This causes performance issues (mostly disk IO and sometimes network 
> bandwidth) when doing this in a production environment, in which you're 
> trying to serve downstream applications, at the same time you're performing 
> maintenance on the Kafka cluster.
> An option to throttle the replication to a specific rate (in either MB/s or 
> activities/second) would help production systems to better handle maintenance 
> tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2015-08-04 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654015#comment-14654015
 ] 

Ismael Juma commented on KAFKA-1464:


I'd like to take a look at this. In a separate conversation, [~junrao] 
suggested that the throttling should perhaps only happen for out of sync 
replicas.

 Add a throttling option to the Kafka replication tool
 -

 Key: KAFKA-1464
 URL: https://issues.apache.org/jira/browse/KAFKA-1464
 Project: Kafka
  Issue Type: New Feature
  Components: replication
Affects Versions: 0.8.0
Reporter: mjuarez
Assignee: Ismael Juma
Priority: Minor
  Labels: replication, replication-tools

 When performing replication on new nodes of a Kafka cluster, the replication 
 process will use all available resources to replicate as fast as possible.  
 This causes performance issues (mostly disk IO and sometimes network 
 bandwidth) when doing this in a production environment, in which you're 
 trying to serve downstream applications, at the same time you're performing 
 maintenance on the Kafka cluster.
 An option to throttle the replication to a specific rate (in either MB/s or 
 activities/second) would help production systems to better handle maintenance 
 tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2015-08-04 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14654052#comment-14654052
 ] 

Jun Rao commented on KAFKA-1464:


Another thing that we need to be a bit careful is that typically throttling 
just slows down a request. However, in our case, a single replica fetch request 
may have multiple partitions and we don't want to slow down the in-sync 
replicas. Perhaps we should always respond asap but just gives back less data 
for out-of-sync replicas.

 Add a throttling option to the Kafka replication tool
 -

 Key: KAFKA-1464
 URL: https://issues.apache.org/jira/browse/KAFKA-1464
 Project: Kafka
  Issue Type: New Feature
  Components: replication
Affects Versions: 0.8.0
Reporter: mjuarez
Assignee: Ismael Juma
Priority: Minor
  Labels: replication, replication-tools

 When performing replication on new nodes of a Kafka cluster, the replication 
 process will use all available resources to replicate as fast as possible.  
 This causes performance issues (mostly disk IO and sometimes network 
 bandwidth) when doing this in a production environment, in which you're 
 trying to serve downstream applications, at the same time you're performing 
 maintenance on the Kafka cluster.
 An option to throttle the replication to a specific rate (in either MB/s or 
 activities/second) would help production systems to better handle maintenance 
 tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1464) Add a throttling option to the Kafka replication tool

2014-05-21 Thread Jon Bringhurst (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14005052#comment-14005052
 ] 

Jon Bringhurst commented on KAFKA-1464:
---

Although this would be nice to have, something similar exists as part of linux. 
You can accomplish the same type of thing by first adding the process into a 
net_cls cgroup. Then, you can use the tc command to classify the marked packets 
into an htb qdisc (possibly with an stb further down the tree to completely 
prevent starvation) to throttle the packets coming from kafka.

* https://www.kernel.org/doc/Documentation/cgroups/net_cls.txt
* http://www.tldp.org/
* http://linux.die.net/man/8/tc

The blkio cgroup works in a similar way to throttle disk io.

 Add a throttling option to the Kafka replication tool
 -

 Key: KAFKA-1464
 URL: https://issues.apache.org/jira/browse/KAFKA-1464
 Project: Kafka
  Issue Type: New Feature
  Components: replication
Affects Versions: 0.8.0
Reporter: Marcos Juarez
Assignee: Neha Narkhede
Priority: Minor
  Labels: replication, replication-tools

 When performing replication on new nodes of a Kafka cluster, the replication 
 process will use all available resources to replicate as fast as possible.  
 This causes performance issues (mostly disk IO and sometimes network 
 bandwidth) when doing this in a production environment, in which you're 
 trying to serve downstream applications, at the same time you're performing 
 maintenance on the Kafka cluster.
 An option to throttle the replication to a specific rate (in either MB/s or 
 activities/second) would help production systems to better handle maintenance 
 tasks while still serving downstream applications.



--
This message was sent by Atlassian JIRA
(v6.2#6252)