Re: How to prevent custom Partitioner from increasing the number of producer's requests?

2015-06-02 Thread Sebastien Falquier
Hi Jason,

The default partitioner does not make the job since my producers haven't a
smooth traffic. What I mean is that they can deliver lots of messages
during 10 minutes and less during the next 10 minutes, that is too say the
first partition will have stacked most of the messages of the last 20
minutes.

By the way, I don't understand your point about breaking batch into 2
separate partitions. With that code, I jump to a new partition on message
201, 401, 601, ... with batch size = 200, where is my mistake?

Thanks for your help,
Sébastien

2015-06-02 16:55 GMT+02:00 Jason Rosenberg :

> Hi Sebastien,
>
> You might just try using the default partitioner (which is random).  It
> works by choosing a random partition each time it re-polls the meta-data
> for the topic.  By default, this happens every 10 minutes for each topic
> you produce to (so it evenly distributes load at a granularity of 10
> minutes).  This is based on 'topic.metadata.refresh.interval.ms'.
>
> I suspect your code is causing double requests for each batch, if your
> partitioning is actually breaking up your batches into 2 separate
> partitions.  Could be an off by 1 error, with your modulo calculation?
> Perhaps you need to use '% 0' instead of '% 1' there?
>
> Jason
>
>
>
> On Tue, Jun 2, 2015 at 3:35 AM, Sebastien Falquier <
> sebastien.falqu...@teads.tv> wrote:
>
> > Hi guys,
> >
> > I am new to Kafka and I am facing a problem I am not able to sort out.
> >
> > To smooth traffic over all my brokers' partitions, I have coded a custom
> > Paritioner for my producers, using a simple round robin algorithm that
> > jumps from a partition to another on every batch of messages
> (corresponding
> > to batch.num.messages value). It looks like that :
> > https://gist.github.com/sfalquier/4c0c7f36dd96d642b416
> >
> > With that fix, every partitions are used equally, but the amount of
> > requests from the producers to the brokers have been multiplied by 2. I
> do
> > not understand since all producers are async with batch.num.messages=200
> > and the amount of messages processed is still the same as before. Why do
> > producers need more requests to do the job? As internal traffic is a bit
> > critical on our platform, I would really like to reduce producers'
> requests
> > volume if possible.
> >
> > Any idea? Any suggestion?
> >
> > Regards,
> > Sébastien
> >
>


Re: KafkaMetricsConfig not documented

2015-06-02 Thread Stevo Slavić
Created https://issues.apache.org/jira/browse/KAFKA-2244

On Mon, Jun 1, 2015 at 7:18 AM, Aditya Auradkar <
aaurad...@linkedin.com.invalid> wrote:

> Yeah, they aren't included in KafkaConfig for some reason but I think they
> should. Can you file a jira?
>
> Aditya
>
> 
> From: Stevo Slavić [ssla...@gmail.com]
> Sent: Sunday, May 31, 2015 3:57 PM
> To: users@kafka.apache.org
> Subject: KafkaMetricsConfig not documented
>
> Hello Apache Kafka community,
>
> In current (v0.8.2.1) documentation at
> http://kafka.apache.org/documentation.html#configuration I cannot find
> anything about two configuration properties used in
> kafka.metrics.KafkaMetricsConfig, namely "kafka.metrics.reporters" and
> "kafka.metrics.polling.interval.secs". I'd expect them in broker
> configuration section. Am I missing something? Or are those two
> intentionally left out of configuration documentation?
>
> Kind regards,
> Stevo Slavic.
>


Re: potential bug with offset request and just rolled log segment

2015-06-02 Thread Guozhang Wang
Alfred,

As for 0.8.3, we are shooting for end of July:

https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan

Guozhang

On Tue, Jun 2, 2015 at 8:43 AM, Alfred Landrum  wrote:

> I filed KAFKA-2236:
> https://issues.apache.org/jira/browse/KAFKA-2236
>
> Is there any guidance on when 0.8.3 might be released?
>



-- 
-- Guozhang


Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Theo Hultberg
Henry: We run Kafka on the old and trusty m1.xlarge. We avoid EBS
completely, it's network storage that pretends to be local and when the
network, which is AWS' weak spot, acts up EBS is a big liability. It's also
slow and expensive.

Others: Thanks for sharing your experience with the d2's. We have been
considering them for Kafka, but now it sounds like we should wait with that
until they're fixed.

T#

On Wed, Jun 3, 2015 at 1:26 AM, Henry Cai 
wrote:

> Steven,
>
> Do you have the AWS case # (or the Ubuntu bug/case #) when you hit that
> kernel panic issue?
>
> Our company will still be running on AMI image 12.04 for a while, I will
> see whether the fix was also ported onto Ubuntu 12.04
>
> On Tue, Jun 2, 2015 at 2:53 PM, Steven Wu  wrote:
>
> > now I remember we had same kernel panic issue in the first week of D2
> > rolling-out. then AWS fixed it and we haven't seen any issue since. try
> > Ubuntu 14.04 and see if it resolves your remaining kernel/instability
> issue.
> >
> > On Tue, Jun 2, 2015 at 2:30 PM, Wes Chow  wrote:
> >
> >>
> >>   Daniel Nelson 
> >>  June 2, 2015 at 4:39 PM
> >>
> >> On Jun 2, 2015, at 1:22 PM, Steven Wu  <
> stevenz...@gmail.com> wrote:
> >>
> >> can you elaborate what kind of instability you have encountered?
> >>
> >> We have seen the nodes become completely non-responsive. Usually they
> get rebooted automatically after 10-20 minutes, but occasionally they get
> stuck for days in a state where they cannot be rebooted via the Amazon APIs.
> >>
> >>
> >> Same here. It was worse right after d2 launch. We had 6 out of 9 servers
> >> die within 10 hours after spinning them up. Amazon rolled out a fix, but
> >> we're still seeing similar issues, though not nearly as bad. The first
> fix
> >> was for something network related, and apparently sending lots of data
> >> through the instances caused a kernel panic on the host. We have no
> >> information yet about the current issue.
> >>
> >> Wes
> >>
> >>   Steven Wu 
> >>  June 2, 2015 at 4:22 PM
> >> Wes/Daniel,
> >>
> >> can you elaborate what kind of instability you have encountered?
> >>
> >> we are on Ubuntu 14.04.2 and haven't encountered any issues so far. in
> >> the announcement, they did mention using Ubuntu 14.04 for better disk
> >> throughput. not sure whether 14.04 also addresses any instability issue
> you
> >> encountered or not.
> >>
> >> Thanks,
> >> Steven
> >>
> >> In order to ensure the best disk throughput performance from your D2
> instances
> >> on Linux, we recommend that you use the most recent version of the
> Amazon
> >> Linux AMI, or another Linux AMI with a kernel version of 3.8 or later.
> The
> >> D2 instances provide the best disk performance when you use a Linux
> >> kernel that supports Persistent Grants – an extension to the Xen block
> ring
> >> protocol that significantly improves disk throughput and scalability.
> The
> >> following Linux AMIs support this feature:
> >>
> >>- Amazon Linux AMI 2015.03 (HVM)
> >>- Ubuntu Server 14.04 LTS (HVM)
> >>- Red Hat Enterprise Linux 7.1 (HVM)
> >>- SUSE Linux Enterprise Server 12 (HVM)
> >>
> >>
> >>
> >>
> >>   Daniel Nelson 
> >>  June 2, 2015 at 2:42 PM
> >>
> >> Do you have any workarounds for the d2 issues? We’ve been using them for
> >> our Kafkas too, and ran into the instability. We’re on Ubuntu 12.04 and
> >> plan to try on 14.04 with the latest HWE to see if that helps any.
> >>
> >> Thanks!
> >>   Wes Chow 
> >>  June 2, 2015 at 1:39 PM
> >>
> >> We have run d2 instances with Kafka. They're currently unstable --
> Amazon
> >> confirmed a host issue with d2 instances that gets tickled by a Kafka
> >> workload yesterday. Otherwise, it seems the d2 instance type is ideal
> as it
> >> gets an enormous amount of disk throughput and you'll likely be network
> >> bottlenecked.
> >>
> >> Wes
> >>
> >>
> >>   Steven Wu 
> >>  June 2, 2015 at 1:07 PM
> >> EBS (network attached storage) has got a lot better over the last a few
> >> years. we don't quite trust it for kafka workload.
> >>
> >> At Netflix, we were going with the new d2 instance type (HDD). our
> >> perf/load testing shows it satisfy our workload. SSD is better in
> latency
> >> curve but pretty comparable in terms of throughput. we can use the extra
> >> space from HDD for longer retention period.
> >>
> >> On Tue, Jun 2, 2015 at 9:37 AM, Henry Cai 
> >> 
> >>
> >>
> >
>


Re: Using SimpleConsumer to get messages from offset until now

2015-06-02 Thread luo.fucong
I think the SimpleConsumer 
Example(https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
 
)
 in the wiki is a very good starting point.

You can pass in the offset to the FetchRequest. And you can ask for the latest 
offset using kafka.api.OffsetRequest.LatestTime()

> 在 2015年6月3日,上午12:25,Kevin Sjöberg  写道:
> 
> Hello,
> 
> I'm trying to create a custom consumer that given a offset returns all
> messages until now. After this is done, the consumer is not needed anymore,
> hence, the consumer does not have to continue consuming messages that are
> being produced.
> 
> The Kafka cluster exists of one broker and we only use one partition as
> well. My understanding is that I can use the SimpleConsumer API for this,
> but I'm a bit unsure on how to go about it.
> 
> Would anyone mind helping me out or point me in the right direction?
> 
> Cheers,
> Kevin
> 
> -- 
> Kevin Sjöberg
> 
> -- 
> Du skapar din faktura, vi skickar den och ser till att kunden betalar. Vi 
> sköter allt, hela vägen tills fakturan är betald! 
> Läs om Nox Finans här 
> 
> .



Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Henry Cai
Steven,

Do you have the AWS case # (or the Ubuntu bug/case #) when you hit that
kernel panic issue?

Our company will still be running on AMI image 12.04 for a while, I will
see whether the fix was also ported onto Ubuntu 12.04

On Tue, Jun 2, 2015 at 2:53 PM, Steven Wu  wrote:

> now I remember we had same kernel panic issue in the first week of D2
> rolling-out. then AWS fixed it and we haven't seen any issue since. try
> Ubuntu 14.04 and see if it resolves your remaining kernel/instability issue.
>
> On Tue, Jun 2, 2015 at 2:30 PM, Wes Chow  wrote:
>
>>
>>   Daniel Nelson 
>>  June 2, 2015 at 4:39 PM
>>
>> On Jun 2, 2015, at 1:22 PM, Steven Wu  
>>  wrote:
>>
>> can you elaborate what kind of instability you have encountered?
>>
>> We have seen the nodes become completely non-responsive. Usually they get 
>> rebooted automatically after 10-20 minutes, but occasionally they get stuck 
>> for days in a state where they cannot be rebooted via the Amazon APIs.
>>
>>
>> Same here. It was worse right after d2 launch. We had 6 out of 9 servers
>> die within 10 hours after spinning them up. Amazon rolled out a fix, but
>> we're still seeing similar issues, though not nearly as bad. The first fix
>> was for something network related, and apparently sending lots of data
>> through the instances caused a kernel panic on the host. We have no
>> information yet about the current issue.
>>
>> Wes
>>
>>   Steven Wu 
>>  June 2, 2015 at 4:22 PM
>> Wes/Daniel,
>>
>> can you elaborate what kind of instability you have encountered?
>>
>> we are on Ubuntu 14.04.2 and haven't encountered any issues so far. in
>> the announcement, they did mention using Ubuntu 14.04 for better disk
>> throughput. not sure whether 14.04 also addresses any instability issue you
>> encountered or not.
>>
>> Thanks,
>> Steven
>>
>> In order to ensure the best disk throughput performance from your D2 
>> instances
>> on Linux, we recommend that you use the most recent version of the Amazon
>> Linux AMI, or another Linux AMI with a kernel version of 3.8 or later. The
>> D2 instances provide the best disk performance when you use a Linux
>> kernel that supports Persistent Grants – an extension to the Xen block ring
>> protocol that significantly improves disk throughput and scalability. The
>> following Linux AMIs support this feature:
>>
>>- Amazon Linux AMI 2015.03 (HVM)
>>- Ubuntu Server 14.04 LTS (HVM)
>>- Red Hat Enterprise Linux 7.1 (HVM)
>>- SUSE Linux Enterprise Server 12 (HVM)
>>
>>
>>
>>
>>   Daniel Nelson 
>>  June 2, 2015 at 2:42 PM
>>
>> Do you have any workarounds for the d2 issues? We’ve been using them for
>> our Kafkas too, and ran into the instability. We’re on Ubuntu 12.04 and
>> plan to try on 14.04 with the latest HWE to see if that helps any.
>>
>> Thanks!
>>   Wes Chow 
>>  June 2, 2015 at 1:39 PM
>>
>> We have run d2 instances with Kafka. They're currently unstable -- Amazon
>> confirmed a host issue with d2 instances that gets tickled by a Kafka
>> workload yesterday. Otherwise, it seems the d2 instance type is ideal as it
>> gets an enormous amount of disk throughput and you'll likely be network
>> bottlenecked.
>>
>> Wes
>>
>>
>>   Steven Wu 
>>  June 2, 2015 at 1:07 PM
>> EBS (network attached storage) has got a lot better over the last a few
>> years. we don't quite trust it for kafka workload.
>>
>> At Netflix, we were going with the new d2 instance type (HDD). our
>> perf/load testing shows it satisfy our workload. SSD is better in latency
>> curve but pretty comparable in terms of throughput. we can use the extra
>> space from HDD for longer retention period.
>>
>> On Tue, Jun 2, 2015 at 9:37 AM, Henry Cai 
>> 
>>
>>
>


Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Steven Wu
now I remember we had same kernel panic issue in the first week of D2
rolling-out. then AWS fixed it and we haven't seen any issue since. try
Ubuntu 14.04 and see if it resolves your remaining kernel/instability issue.

On Tue, Jun 2, 2015 at 2:30 PM, Wes Chow  wrote:

>
>   Daniel Nelson 
>  June 2, 2015 at 4:39 PM
>
> On Jun 2, 2015, at 1:22 PM, Steven Wu  
>  wrote:
>
> can you elaborate what kind of instability you have encountered?
>
> We have seen the nodes become completely non-responsive. Usually they get 
> rebooted automatically after 10-20 minutes, but occasionally they get stuck 
> for days in a state where they cannot be rebooted via the Amazon APIs.
>
>
> Same here. It was worse right after d2 launch. We had 6 out of 9 servers
> die within 10 hours after spinning them up. Amazon rolled out a fix, but
> we're still seeing similar issues, though not nearly as bad. The first fix
> was for something network related, and apparently sending lots of data
> through the instances caused a kernel panic on the host. We have no
> information yet about the current issue.
>
> Wes
>
>   Steven Wu 
>  June 2, 2015 at 4:22 PM
> Wes/Daniel,
>
> can you elaborate what kind of instability you have encountered?
>
> we are on Ubuntu 14.04.2 and haven't encountered any issues so far. in the
> announcement, they did mention using Ubuntu 14.04 for better disk
> throughput. not sure whether 14.04 also addresses any instability issue you
> encountered or not.
>
> Thanks,
> Steven
>
> In order to ensure the best disk throughput performance from your D2 instances
> on Linux, we recommend that you use the most recent version of the Amazon
> Linux AMI, or another Linux AMI with a kernel version of 3.8 or later. The
> D2 instances provide the best disk performance when you use a Linux
> kernel that supports Persistent Grants – an extension to the Xen block ring
> protocol that significantly improves disk throughput and scalability. The
> following Linux AMIs support this feature:
>
>- Amazon Linux AMI 2015.03 (HVM)
>- Ubuntu Server 14.04 LTS (HVM)
>- Red Hat Enterprise Linux 7.1 (HVM)
>- SUSE Linux Enterprise Server 12 (HVM)
>
>
>
>
>   Daniel Nelson 
>  June 2, 2015 at 2:42 PM
>
> Do you have any workarounds for the d2 issues? We’ve been using them for
> our Kafkas too, and ran into the instability. We’re on Ubuntu 12.04 and
> plan to try on 14.04 with the latest HWE to see if that helps any.
>
> Thanks!
>   Wes Chow 
>  June 2, 2015 at 1:39 PM
>
> We have run d2 instances with Kafka. They're currently unstable -- Amazon
> confirmed a host issue with d2 instances that gets tickled by a Kafka
> workload yesterday. Otherwise, it seems the d2 instance type is ideal as it
> gets an enormous amount of disk throughput and you'll likely be network
> bottlenecked.
>
> Wes
>
>
>   Steven Wu 
>  June 2, 2015 at 1:07 PM
> EBS (network attached storage) has got a lot better over the last a few
> years. we don't quite trust it for kafka workload.
>
> At Netflix, we were going with the new d2 instance type (HDD). our
> perf/load testing shows it satisfy our workload. SSD is better in latency
> curve but pretty comparable in terms of throughput. we can use the extra
> space from HDD for longer retention period.
>
> On Tue, Jun 2, 2015 at 9:37 AM, Henry Cai 
> 
>
>


Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Wes Chow



Daniel Nelson 
June 2, 2015 at 4:39 PM
On Jun 2, 2015, at 1:22 PM, Steven Wu  wrote:

can you elaborate what kind of instability you have encountered?

We have seen the nodes become completely non-responsive. Usually they get 
rebooted automatically after 10-20 minutes, but occasionally they get stuck for 
days in a state where they cannot be rebooted via the Amazon APIs.


Same here. It was worse right after d2 launch. We had 6 out of 9 servers 
die within 10 hours after spinning them up. Amazon rolled out a fix, but 
we're still seeing similar issues, though not nearly as bad. The first 
fix was for something network related, and apparently sending lots of 
data through the instances caused a kernel panic on the host. We have no 
information yet about the current issue.


Wes


Steven Wu 
June 2, 2015 at 4:22 PM
Wes/Daniel,

can you elaborate what kind of instability you have encountered?

we are on Ubuntu 14.04.2 and haven't encountered any issues so far. in 
the announcement, they did mention using Ubuntu 14.04 for better disk 
throughput. not sure whether 14.04 also addresses any instability 
issue you encountered or not.


Thanks,
Steven

In order to ensure the best disk throughput performance from your 
D2 instances on Linux, we recommend that you use the most recent 
version of the Amazon Linux AMI, or another Linux AMI with a kernel 
version of 3.8 or later. The D2 instances provide the best disk 
performance when you use a Linux kernel that supports Persistent 
Grants – an extension to the Xen block ring protocol that 
significantly improves disk throughput and scalability. The following 
Linux AMIs support this feature:


  * Amazon Linux AMI 2015.03 (HVM)
  * Ubuntu Server 14.04 LTS (HVM)
  * Red Hat Enterprise Linux 7.1 (HVM)
  * SUSE Linux Enterprise Server 12 (HVM)




Daniel Nelson 
June 2, 2015 at 2:42 PM

Do you have any workarounds for the d2 issues? We’ve been using them 
for our Kafkas too, and ran into the instability. We’re on Ubuntu 
12.04 and plan to try on 14.04 with the latest HWE to see if that 
helps any.


Thanks!
Wes Chow 
June 2, 2015 at 1:39 PM

We have run d2 instances with Kafka. They're currently unstable -- 
Amazon confirmed a host issue with d2 instances that gets tickled by a 
Kafka workload yesterday. Otherwise, it seems the d2 instance type is 
ideal as it gets an enormous amount of disk throughput and you'll 
likely be network bottlenecked.


Wes


Steven Wu 
June 2, 2015 at 1:07 PM
EBS (network attached storage) has got a lot better over the last a few
years. we don't quite trust it for kafka workload.

At Netflix, we were going with the new d2 instance type (HDD). our
perf/load testing shows it satisfy our workload. SSD is better in latency
curve but pretty comparable in terms of throughput. we can use the extra
space from HDD for longer retention period.

On Tue, Jun 2, 2015 at 9:37 AM, Henry Cai 



Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Daniel Nelson
On Jun 2, 2015, at 1:22 PM, Steven Wu  wrote:
> 
> can you elaborate what kind of instability you have encountered? 
We have seen the nodes become completely non-responsive. Usually they get 
rebooted automatically after 10-20 minutes, but occasionally they get stuck for 
days in a state where they cannot be rebooted via the Amazon APIs.

> 
> we are on Ubuntu 14.04.2 and haven't encountered any issues so far. in the 
> announcement, they did mention using Ubuntu 14.04 for better disk throughput. 
> not sure whether 14.04 also addresses any instability issue you encountered 
> or not.
> 

That’s encouraging, we’ll try on 14.04 and hopefully our issues will go away. 
I’ll update the list as soon as we have a chance to test.

-- 
Daniel Nelson

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Steven Wu
Wes/Daniel,

can you elaborate what kind of instability you have encountered?

we are on Ubuntu 14.04.2 and haven't encountered any issues so far. in the
announcement, they did mention using Ubuntu 14.04 for better disk
throughput. not sure whether 14.04 also addresses any instability issue you
encountered or not.

Thanks,
Steven

In order to ensure the best disk throughput performance from your D2 instances
on Linux, we recommend that you use the most recent version of the Amazon
Linux AMI, or another Linux AMI with a kernel version of 3.8 or later. The
D2 instances provide the best disk performance when you use a Linux kernel
that supports Persistent Grants – an extension to the Xen block ring
protocol that significantly improves disk throughput and scalability. The
following Linux AMIs support this feature:

   - Amazon Linux AMI 2015.03 (HVM)
   - Ubuntu Server 14.04 LTS (HVM)
   - Red Hat Enterprise Linux 7.1 (HVM)
   - SUSE Linux Enterprise Server 12 (HVM)



On Tue, Jun 2, 2015 at 12:31 PM, Wes Chow  wrote:

>
> Our workaround is to switch to i2's. Amazon didn't mention anything,
> though we're getting on a call with them soon so I'll be sure to ask. Fwiw,
> we're also on 12.04.
>
> Wes
>
>
>   Daniel Nelson 
>  June 2, 2015 at 2:42 PM
>
> Do you have any workarounds for the d2 issues? We’ve been using them for
> our Kafkas too, and ran into the instability. We’re on Ubuntu 12.04 and
> plan to try on 14.04 with the latest HWE to see if that helps any.
>
> Thanks!
>   Wes Chow 
>  June 2, 2015 at 1:39 PM
>
> We have run d2 instances with Kafka. They're currently unstable -- Amazon
> confirmed a host issue with d2 instances that gets tickled by a Kafka
> workload yesterday. Otherwise, it seems the d2 instance type is ideal as it
> gets an enormous amount of disk throughput and you'll likely be network
> bottlenecked.
>
> Wes
>
>
>   Henry Cai 
>  June 2, 2015 at 12:37 PM
> We have been hosting kafka brokers in Amazon EC2 and we are using EBS
> disk. But periodically we were hit by long I/O wait time on EBS in some
> Availability Zones.
>
> We are thinking to change the instance types to a local HDD or local SSD.
> HDD is cheaper and bigger and seems quite fit for the Kafka use case which
> is mostly sequential read/write, but some early experiments show the HDD
> cannot catch up with the message producing speed since there are many
> topic/partitions on the broker which actually makes the disk I/O more
> randomly accessed.
>
> How are people's experience of choosing disk types on Amazon?
>
>


Consumer lag lies - orphaned offsets?

2015-06-02 Thread Otis Gospodnetic
Hi,

I've noticed that when we restart our Kafka consumers our consumer lag
metric sometimes looks "weird".

Here's an example: https://apps.sematext.com/spm-reports/s/0Hq5zNb4hH

You can see lag go up around 15:00, when some consumers were restarted.
The "weird" thing is that the lag remains flat!
How could it remain flat if consumers are running? (they have enough juice
to catch up!)

What I think is happening is this:
1) consumers are initially not really lagging
2) consumers get stopped
3) lag grows
4) consumers get started again
5) something shifts around...not sure what...
6) consumers start consuming, and there is actually no lag, but the offsets
written to ZK sometime during 3) don't get updated because after restart
consumers are reading from somewhere else, not from partition(s) whose lag
and offset delta jumped during 3)

Oh, and:
7) Kafka JMX still exposes all offsets, event those for partitions that are
no longer being read, so the consumer lag metric remains constant/flat,
even though consumers are actually not lagging on partitions from which
they are now consuming.

What bugs me is 7), because reading lag info from JMX looks like it's
"lying".

Does this sound crazy or reasonable?

If anyone has any comments/advice/suggestions for what one can do about
this, I'm all ears!

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Wes Chow


Our workaround is to switch to i2's. Amazon didn't mention anything, 
though we're getting on a call with them soon so I'll be sure to ask. 
Fwiw, we're also on 12.04.


Wes



Daniel Nelson 
June 2, 2015 at 2:42 PM

Do you have any workarounds for the d2 issues? We’ve been using them 
for our Kafkas too, and ran into the instability. We’re on Ubuntu 
12.04 and plan to try on 14.04 with the latest HWE to see if that 
helps any.


Thanks!
Wes Chow 
June 2, 2015 at 1:39 PM

We have run d2 instances with Kafka. They're currently unstable -- 
Amazon confirmed a host issue with d2 instances that gets tickled by a 
Kafka workload yesterday. Otherwise, it seems the d2 instance type is 
ideal as it gets an enormous amount of disk throughput and you'll 
likely be network bottlenecked.


Wes


Henry Cai 
June 2, 2015 at 12:37 PM
We have been hosting kafka brokers in Amazon EC2 and we are using EBS
disk. But periodically we were hit by long I/O wait time on EBS in some
Availability Zones.

We are thinking to change the instance types to a local HDD or local SSD.
HDD is cheaper and bigger and seems quite fit for the Kafka use case which
is mostly sequential read/write, but some early experiments show the HDD
cannot catch up with the message producing speed since there are many
topic/partitions on the broker which actually makes the disk I/O more
randomly accessed.

How are people's experience of choosing disk types on Amazon?



Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Daniel Nelson
> On Jun 2, 2015, at 10:39 AM, Wes Chow  wrote:
> 
> 
> We have run d2 instances with Kafka. They're currently unstable -- Amazon 
> confirmed a host issue with d2 instances that gets tickled by a Kafka 
> workload yesterday. Otherwise, it seems the d2 instance type is ideal as it 
> gets an enormous amount of disk throughput and you'll likely  be network 
> bottlenecked.
> 

Do you have any workarounds for the d2 issues? We’ve been using them for our 
Kafkas too, and ran into the instability. We’re on Ubuntu 12.04 and plan to try 
on 14.04 with the latest HWE to see if that helps any.

Thanks!

Re: Kafka JMS metrics meaning

2015-06-02 Thread Marina
Thanks a lot to everybody for your suggestions! 
In addition to the Consumer lag (on the Consumers side though), 
under-replicated partitions, offline partitions, active controller count, I am 
also thinking of monitoring the total size of partitions to not exceed some MAX 
(like 10G, for example) - to prevent disk out of space issues.

Now, can somebody shed light on my second question :) :"2. what do these 
metrics mean: ReplicaManager -> LeaderCount ? and ReplicaManager -> 
PartitionCount ?I have three topics created, with one partition each, and 
replication = 1, however the values for both of the above attributes is 
"53" So I am not sure what the count '53' means here"
thanks!MArina

  From: Todd Palino 
 To: "users@kafka.apache.org"  
Cc: Marina  
 Sent: Tuesday, June 2, 2015 1:29 PM
 Subject: Re: Kafka JMS metrics meaning
   
Under replicated is a must. Offline partitions is also good to monitor. We also 
use the active controller metric (it's 1 or 0) in aggregate for a cluster to 
know that the controller is running somewhere. 

For more general metrics, all topics bytes in and bytes out is good. We also 
watch the leader partitions count to know when to do a preferred replica 
election. Specifically, we take the ratio of that number to the total partition 
count for the broker and keep it near 50%

Most other things, like specific request type time and 99% metrics, we 
generally only look at when we are doing performance testing or have a specific 
concern. 

-Todd



> On Jun 2, 2015, at 1:01 PM, Aditya Auradkar  
> wrote:
> 
> Number of underreplicated partitions, total request time are some good bets.
> 
> Aditya
> 
> 
> From: Otis Gospodnetic [otis.gospodne...@gmail.com]
> Sent: Tuesday, June 02, 2015 9:56 AM
> To: users@kafka.apache.org; Marina
> Subject: Re: Kafka JMS metrics meaning
> 
> Hi,
> 
>> On Tue, Jun 2, 2015 at 12:50 PM, Marina  wrote:
>> 
>> Hi,
>> I have enabled JMX_PORT for KAfka server and am trying to understand some
>> of the metrics that are being exposed. I have two questions:
>> 1. what are the best metrics to monitor to quickly spot unhealthy Kafka
>> cluster?
> 
> People lve looking at consumer lag :)
> 
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 2. what do these metrics mean: ReplicaManager -> LeaderCount ? and
>> ReplicaManager -> PartitionCount ?I have three topics created, with one
>> partition each, and replication = 1, however the values for both of the
>> above attributes is "53" So I am not sure what the count '53' means
>> here
>> thanksMarina
>> 

  

Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Wes Chow


We have run d2 instances with Kafka. They're currently unstable -- 
Amazon confirmed a host issue with d2 instances that gets tickled by a 
Kafka workload yesterday. Otherwise, it seems the d2 instance type is 
ideal as it gets an enormous amount of disk throughput and you'll likely 
be network bottlenecked.


Wes



Steven Wu 
June 2, 2015 at 1:07 PM
EBS (network attached storage) has got a lot better over the last a few
years. we don't quite trust it for kafka workload.

At Netflix, we were going with the new d2 instance type (HDD). our
perf/load testing shows it satisfy our workload. SSD is better in latency
curve but pretty comparable in terms of throughput. we can use the extra
space from HDD for longer retention period.

On Tue, Jun 2, 2015 at 9:37 AM, Henry Cai 

Henry Cai 
June 2, 2015 at 12:37 PM
We have been hosting kafka brokers in Amazon EC2 and we are using EBS
disk. But periodically we were hit by long I/O wait time on EBS in some
Availability Zones.

We are thinking to change the instance types to a local HDD or local SSD.
HDD is cheaper and bigger and seems quite fit for the Kafka use case which
is mostly sequential read/write, but some early experiments show the HDD
cannot catch up with the message producing speed since there are many
topic/partitions on the broker which actually makes the disk I/O more
randomly accessed.

How are people's experience of choosing disk types on Amazon?



Re: Kafka JMS metrics meaning

2015-06-02 Thread Todd Palino
Under replicated is a must. Offline partitions is also good to monitor. We also 
use the active controller metric (it's 1 or 0) in aggregate for a cluster to 
know that the controller is running somewhere. 

For more general metrics, all topics bytes in and bytes out is good. We also 
watch the leader partitions count to know when to do a preferred replica 
election. Specifically, we take the ratio of that number to the total partition 
count for the broker and keep it near 50%

Most other things, like specific request type time and 99% metrics, we 
generally only look at when we are doing performance testing or have a specific 
concern. 

-Todd

> On Jun 2, 2015, at 1:01 PM, Aditya Auradkar  
> wrote:
> 
> Number of underreplicated partitions, total request time are some good bets.
> 
> Aditya
> 
> 
> From: Otis Gospodnetic [otis.gospodne...@gmail.com]
> Sent: Tuesday, June 02, 2015 9:56 AM
> To: users@kafka.apache.org; Marina
> Subject: Re: Kafka JMS metrics meaning
> 
> Hi,
> 
>> On Tue, Jun 2, 2015 at 12:50 PM, Marina  wrote:
>> 
>> Hi,
>> I have enabled JMX_PORT for KAfka server and am trying to understand some
>> of the metrics that are being exposed. I have two questions:
>> 1. what are the best metrics to monitor to quickly spot unhealthy Kafka
>> cluster?
> 
> People lve looking at consumer lag :)
> 
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 2. what do these metrics mean: ReplicaManager -> LeaderCount ? and
>> ReplicaManager -> PartitionCount ?I have three topics created, with one
>> partition each, and replication = 1, however the values for both of the
>> above attributes is "53" So I am not sure what the count '53' means
>> here
>> thanksMarina
>> 


Re: Offset management: client vs broker side responsibility

2015-06-02 Thread Otis Gospodnetic
Hi,

I haven't followed the changes to offset tracking closely, other than that
storing them in ZK is not the only option any more.
I think what Stevo is asking about/suggesting is that there there be a
single API from which offset information can be retrieved (e.g. by
monitoring tools), so that monitoring tools work regardless of where one
chose to store offsets.
I know we'd love to have this for SPM's Kafka monitoring and can tell you
that adding support for N different APIs for N different offset storage
systems would be hard/time-consuming/expensive.
But maybe this single API already exists?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Mon, Jun 1, 2015 at 4:41 PM, Jason Rosenberg  wrote:

> Stevo,
>
> Both of the main solutions used by the high-level consumer are standardized
> and supported directly by the kafka client libraries (e.g. maintaining
> offsets in zookeeper or in kafka itself).  And for the zk case, there is
> the consumer offset checker (which is good for monitoring).  Consumer
> offset checker still needs to be extended for offsets stored in kafka
> _consumer_offset topics though.
>
> Anyway, I'm not sure I understand your question, you want something for
> better monitoring of all possible clients (some of which might choose to
> manage offsets in their own way)?
>
> It's just not part of the kafka design to directly track individual
> consumers.
>
> Jason
>
> On Wed, May 27, 2015 at 7:42 AM, Shady Xu  wrote:
>
> > I guess adding a new component will increase the complexity of the system
> > structure. And if the new component consists of one or a few nodes, it
> may
> > becomes the bottleneck of the whole system, if it consists of many nodes,
> > it will make the system even more complex.
> >
> > Although every solution has its downsides, I think the current one is
> > decent.
> >
> > 2015-05-27 17:10 GMT+08:00 Stevo Slavić :
> >
> > > It could be a separate server component, does not have to be
> > > monolith/coupled with broker.
> > > Such solution would have benefits - single API, pluggable
> > implementations.
> > >
> > > On Wed, May 27, 2015 at 8:57 AM, Shady Xu  wrote:
> > >
> > > > Storing and managing offsets by broker will leave high pressure on
> the
> > > > brokers which will affect the performance of the cluster.
> > > >
> > > > You can use the advanced consumer APIs, then you can get the offsets
> > > either
> > > > from zookeeper or the __consumer_offsets__ topic. On the other hand,
> if
> > > you
> > > > use the simple consumer APIs, you mean to manage offsets yourself,
> then
> > > you
> > > > should monitor them yourself, simple and plain, right?
> > > >
> > > > 2015-04-22 14:36 GMT+08:00 Stevo Slavić :
> > > >
> > > > > Hello Apache Kafka community,
> > > > >
> > > > > Please correct me if wrong, AFAIK currently (Kafka 0.8.2.x) offset
> > > > > management responsibility is mainly client/consumer side
> > > responsibility.
> > > > >
> > > > > Wouldn't it be better if it was broker side only responsibility?
> > > > >
> > > > > E.g. now if one wants to use custom offset management, any of the
> > Kafka
> > > > > monitoring tools cannot see the offsets - they would need to use
> same
> > > > > custom client implementation which is practically not possible.
> > > > >
> > > > > Kind regards,
> > > > > Stevo Slavic.
> > > > >
> > > >
> > >
> >
>


Re: HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Steven Wu
EBS (network attached storage) has got a lot better over the last a few
years. we don't quite trust it for kafka workload.

At Netflix, we were going with the new d2 instance type (HDD). our
perf/load testing shows it satisfy our workload. SSD is better in latency
curve but pretty comparable in terms of throughput. we can use the extra
space from HDD for longer retention period.

On Tue, Jun 2, 2015 at 9:37 AM, Henry Cai 
wrote:

> We have been hosting kafka brokers in Amazon EC2 and we are using EBS
> disk.  But periodically we were hit by long I/O wait time on EBS in some
> Availability Zones.
>
> We are thinking to change the instance types to a local HDD or local SSD.
> HDD is cheaper and bigger and seems quite fit for the Kafka use case which
> is mostly sequential read/write, but some early experiments show the HDD
> cannot catch up with the message producing speed since there are many
> topic/partitions on the broker which actually makes the disk I/O more
> randomly accessed.
>
> How are people's experience of choosing disk types on Amazon?
>


RE: Kafka JMS metrics meaning

2015-06-02 Thread Aditya Auradkar
Number of underreplicated partitions, total request time are some good bets.

Aditya


From: Otis Gospodnetic [otis.gospodne...@gmail.com]
Sent: Tuesday, June 02, 2015 9:56 AM
To: users@kafka.apache.org; Marina
Subject: Re: Kafka JMS metrics meaning

Hi,

On Tue, Jun 2, 2015 at 12:50 PM, Marina  wrote:

> Hi,
> I have enabled JMX_PORT for KAfka server and am trying to understand some
> of the metrics that are being exposed. I have two questions:
> 1. what are the best metrics to monitor to quickly spot unhealthy Kafka
> cluster?
>

People lve looking at consumer lag :)

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

2. what do these metrics mean: ReplicaManager -> LeaderCount ? and
> ReplicaManager -> PartitionCount ?I have three topics created, with one
> partition each, and replication = 1, however the values for both of the
> above attributes is "53" So I am not sure what the count '53' means
> here
> thanksMarina
>


Re: Kafka JMS metrics meaning

2015-06-02 Thread Otis Gospodnetic
Hi,

On Tue, Jun 2, 2015 at 12:50 PM, Marina  wrote:

> Hi,
> I have enabled JMX_PORT for KAfka server and am trying to understand some
> of the metrics that are being exposed. I have two questions:
> 1. what are the best metrics to monitor to quickly spot unhealthy Kafka
> cluster?
>

People lve looking at consumer lag :)

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

2. what do these metrics mean: ReplicaManager -> LeaderCount ? and
> ReplicaManager -> PartitionCount ?I have three topics created, with one
> partition each, and replication = 1, however the values for both of the
> above attributes is "53" So I am not sure what the count '53' means
> here
> thanksMarina
>


Kafka JMS metrics meaning

2015-06-02 Thread Marina
Hi, 
I have enabled JMX_PORT for KAfka server and am trying to understand some of 
the metrics that are being exposed. I have two questions:
1. what are the best metrics to monitor to quickly spot unhealthy Kafka cluster?
2. what do these metrics mean: ReplicaManager -> LeaderCount ? and 
ReplicaManager -> PartitionCount ?I have three topics created, with one 
partition each, and replication = 1, however the values for both of the above 
attributes is "53" So I am not sure what the count '53' means here
thanksMarina


HDD or SSD or EBS for kafka brokers in Amazon EC2

2015-06-02 Thread Henry Cai
We have been hosting kafka brokers in Amazon EC2 and we are using EBS
disk.  But periodically we were hit by long I/O wait time on EBS in some
Availability Zones.

We are thinking to change the instance types to a local HDD or local SSD.
HDD is cheaper and bigger and seems quite fit for the Kafka use case which
is mostly sequential read/write, but some early experiments show the HDD
cannot catch up with the message producing speed since there are many
topic/partitions on the broker which actually makes the disk I/O more
randomly accessed.

How are people's experience of choosing disk types on Amazon?


Using SimpleConsumer to get messages from offset until now

2015-06-02 Thread Kevin Sjöberg
Hello,

I'm trying to create a custom consumer that given a offset returns all
messages until now. After this is done, the consumer is not needed anymore,
hence, the consumer does not have to continue consuming messages that are
being produced.

The Kafka cluster exists of one broker and we only use one partition as
well. My understanding is that I can use the SimpleConsumer API for this,
but I'm a bit unsure on how to go about it.

Would anyone mind helping me out or point me in the right direction?

Cheers,
Kevin

-- 
Kevin Sjöberg

-- 
Du skapar din faktura, vi skickar den och ser till att kunden betalar. Vi 
sköter allt, hela vägen tills fakturan är betald! 
Läs om Nox Finans här 

.


Re: potential bug with offset request and just rolled log segment

2015-06-02 Thread Alfred Landrum
I filed KAFKA-2236:
https://issues.apache.org/jira/browse/KAFKA-2236

Is there any guidance on when 0.8.3 might be released?


Re: How to prevent custom Partitioner from increasing the number of producer's requests?

2015-06-02 Thread Jason Rosenberg
Hi Sebastien,

You might just try using the default partitioner (which is random).  It
works by choosing a random partition each time it re-polls the meta-data
for the topic.  By default, this happens every 10 minutes for each topic
you produce to (so it evenly distributes load at a granularity of 10
minutes).  This is based on 'topic.metadata.refresh.interval.ms'.

I suspect your code is causing double requests for each batch, if your
partitioning is actually breaking up your batches into 2 separate
partitions.  Could be an off by 1 error, with your modulo calculation?
Perhaps you need to use '% 0' instead of '% 1' there?

Jason



On Tue, Jun 2, 2015 at 3:35 AM, Sebastien Falquier <
sebastien.falqu...@teads.tv> wrote:

> Hi guys,
>
> I am new to Kafka and I am facing a problem I am not able to sort out.
>
> To smooth traffic over all my brokers' partitions, I have coded a custom
> Paritioner for my producers, using a simple round robin algorithm that
> jumps from a partition to another on every batch of messages (corresponding
> to batch.num.messages value). It looks like that :
> https://gist.github.com/sfalquier/4c0c7f36dd96d642b416
>
> With that fix, every partitions are used equally, but the amount of
> requests from the producers to the brokers have been multiplied by 2. I do
> not understand since all producers are async with batch.num.messages=200
> and the amount of messages processed is still the same as before. Why do
> producers need more requests to do the job? As internal traffic is a bit
> critical on our platform, I would really like to reduce producers' requests
> volume if possible.
>
> Any idea? Any suggestion?
>
> Regards,
> Sébastien
>


RE: leader update partitions fail with KeeperErrorCode = BadVersion,kafka version=0.8.1.1

2015-06-02 Thread chenlax
i create a topic with 72 partitions 2 replicas,then increased to 108,and the 
cluster is run ok.
some days later i find the topic has 2 partition which ISR only include 
leader,check follow log-segment with partitions,the log-segment does not later.

and i can not find more useful logs from kafka logs or zk logs.

if restart leader,the follow will become leader,and pre-leader whill back as 
follow.

Thanks,
Lax


> Date: Mon, 1 Jun 2015 16:14:15 -0400
> Subject: Re: leader update partitions fail with KeeperErrorCode = 
> BadVersion,kafka version=0.8.1.1
> From: j...@squareup.com
> To: users@kafka.apache.org
> 
> I've seen this problem now too with 0.8.2.1.  It happened after we had a
> disk failure (but the server failed to shutdown:  KAFKA-).  After that
> happened, subsequently, several ISR's underwent I think 'unclean leader
> election', but I'm not 100% sure. But I did see lots of those same error
> messages: "Cached zkVersion [X] not equal to that in zookeeper, skip
> updating ISR...".
> 
> So, I don't know that the issue was fixed in 0.8.1.1.
> 
> Can you describe the circumstances for the errors you saw?
> 
> Jason
> 
> On Fri, May 29, 2015 at 12:17 AM, chenlax  wrote:
> 
> > kafka version =0.8.1.1
> >
> >  i get the error log as follow:
> >
> > INFO Partition [Topic_Beacon_1,10] on broker 4: Shrinking ISR for
> > partition [Topic_Beacon_1,10] from 4,7 to 4 (kafka.cluster.Partition)
> > ERROR Conditional update of path
> > /brokers/topics/Topic_Beacon_1/partitions/10/state with data
> > {"controller_epoch":16,"leader":4,"version":1,"leader_epoch":7,"isr":[4]}
> > and expecte
> > d version 5032 failed due to
> > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode =
> > BadVersion for /brokers/topics/Topic_Beacon_1/partitions/10/state
> > (kafka.utils.ZkUtils$)
> > INFO Partition [Topic_Beacon_1,10] on broker 4: Cached zkVersion [5032]
> > not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> >
> >
> > Only restart broker,the issue will fix, why the broker can not back to ISR
> > for partition?
> >
> > and i find the same issue,
> > http://mail-archives.apache.org/mod_mbox/kafka-users/201404.mbox/%3CCAHwHRrW_vKsSpoAnhEqQUZVBT5_Djx3qbixsH8=6hAe=vg4...@mail.gmail.com%3E
> >
> > it point out kafka_0.8.1.1 fix the bug,so i want to know, what causes this
> > problem.
> >
> >
> > Thanks,
> > Lax
> >
  

How to prevent custom Partitioner from increasing the number of producer's requests?

2015-06-02 Thread Sebastien Falquier
Hi guys,

I am new to Kafka and I am facing a problem I am not able to sort out.

To smooth traffic over all my brokers' partitions, I have coded a custom
Paritioner for my producers, using a simple round robin algorithm that
jumps from a partition to another on every batch of messages (corresponding
to batch.num.messages value). It looks like that :
https://gist.github.com/sfalquier/4c0c7f36dd96d642b416

With that fix, every partitions are used equally, but the amount of
requests from the producers to the brokers have been multiplied by 2. I do
not understand since all producers are async with batch.num.messages=200
and the amount of messages processed is still the same as before. Why do
producers need more requests to do the job? As internal traffic is a bit
critical on our platform, I would really like to reduce producers' requests
volume if possible.

Any idea? Any suggestion?

Regards,
Sébastien


Re: Kafka partitions unbalanced

2015-06-02 Thread Vijay Patil
I ran into similar issue. I configured 3 disks, but partitions were
allocated only to 2 disks (disk2 and disk3). Then I found that the left out
disk (disk1) was already hosting lot number of other partitions from
different topics. So may be partition allocation happens based on "how many
partitions is disk is already hosting (from all topics)". Its just my
observation and guess.

Regards,
Vijay

On 2 June 2015 at 02:00, Jason Rosenberg  wrote:

> Andrew Otto,
>
> This is a known problem (and which I have run into as well).  Generally, my
> solution has been to increase the number of partitions such that the
> granularity of partitions is much higher than the number of disks, such
> that its more unlikely for the imbalance to be significant.
>
> I would not recommend explicitly trying to game the system, by manually
> moving partitions and recovery files.  You could do something to cause it
> to recreate the replicas by having them recreated from scratch (e.g. use
> the partition reassignment tool to move it to a new broker and hope for a
> cleaner distribution).  Also, I've removed a log-dir from the 'log.dirs'
> list and restarted a broker when dealing with a failed disk (this will
> cause any data on the removed log.dir to be reassigned elsewhere, and the
> data will have to re-sync from replicas to fully recover).
>
> There is a 'KIP' about this issue, to make JBOD support in Kafka a bit more
> first-class, and I think this would be one of the main issues to solve.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-18+-+JBOD+Support
>
> Jason
>
> On Wed, May 27, 2015 at 5:55 PM, Jonathan Creasy  >
> wrote:
>
> > I have a similar issue, let me know how it goes. :)
> >
> > -Original Message-
> > From: Andrew Otto [mailto:ao...@wikimedia.org]
> > Sent: Wednesday, May 27, 2015 3:12 PM
> > To: users@kafka.apache.org
> > Subject: Kafka partitions unbalanced
> >
> > Hi all,
> >
> > I’ve recently noticed that our broker log.dirs are using up different
> > amounts of storage.  We use JBOD for our brokers, with 12 log.dirs, 1 on
> > each disk.  One of our topics is larger than the others, and has 12
> > partitions.  Replication factor is 3, and we have 4 brokers.  Each broker
> > then has to store 9 partitions for this topic (12*3/4 == 9).
> >
> > I guess I had originally assumed that Kafka would be smart enough to
> > spread partitions for a given topic across each of the log.dirs as evenly
> > as it could.  However, on some brokers this one topic has 2 partitions
> in a
> > single log.dir, meaning that the storage taken up on a single disk by
> this
> > topic on those brokers is twice what it should be.
> >
> > e.g.
> >
> > Filesystem  Size  Used Avail Use% Mounted on
> > /dev/sda3   1.8T  1.2T  622G  66% /var/spool/kafka/a
> > /dev/sdb3   1.8T  1.7T  134G  93% /var/spool/kafka/b
> > …
> > $ du -sh /var/spool/kafka/{a,b}/data/webrequest_upload-*
> > 501Ga/data/webrequest_upload-4
> > 500Gb/data/webrequest_upload-11
> > 501Gb/data/webrequest_upload-8
> >
> >
> > This also means that those over populated disks have more writes to do.
> > My I/O is imbalanced!
> >
> > This is sort of documented at http://kafka.apache.org/documentation.html
> <
> > http://kafka.apache.org/documentation.html>:
> >
> > "If you configure multiple data directories partitions will be assigned
> > round-robin to data directories. Each partition will be entirely in one
> of
> > the data directories. If data is not well balanced among partitions this
> > can lead to load imbalance between disks.”
> >
> > But my data is well balanced among partitions!  It’s just that multiple
> > partitions are assigned to a single disk.
> >
> > Anyyway, on to a question:  Is it possible to move partitions between
> > log.dirs?  Is there tooling to do so?  Poking around in there, it looks
> > like it might be as simple as shutting down the broker, moving the
> > partition directory, and then editing both replication-offset-checkpoint
> > and recovery-point-offset-checkpoint files so that they say the
> appropriate
> > things in the appropriate directories, and then restarting broker.
> >
> > Someone tell me that this is a horrible idea. :)
> >
> > -Ao
> >
> >
> >
>