Re: [KIP-DISCUSSION] KIP-13 Quotas

2015-03-11 Thread Bhavesh Mistry
Hi Aditya,

I just wanted to give you use case of rate limiting that we have
implemented with producer which is a work around:

Use Case 1:

1) topic based rate limiting per producer instance (not across multiple
instance of producers yet, we have producer which we send Heartbeat and
regular message and we do not want to rate limit HB (which very very
important data about health of application and it is periodic message does
not depend on site traffic)
2) The major goal was to prevent network saturation by Kafka Producer (so
limit it at producer before message are send across network to brokers and
broker rejecting it no protection to network it-self.)
3) Reset the quota limit per minute (not seconds and can be changed while
producer instance is running via configuration management and should not
impact producer life-cycle)

The design/implementation issues  are:
1) Quota enforcement is per producer instance  (If application team really
really want to more quote they just create multiple instances of producer
as work around which defeats the purpose of creating Quota in my opinion )
2) Before inserting the message into Kafka Memory queue, we count # of
bytes (un-compressed bytes) hence we do not have very accurate accounting
of quote.

I think from producer quota limit, I think we need to consider use case
where people just want to limit data at producer and be able to control it
regardless of # of producer instances to topic for same JVM and same class
loader (like tomcat container).


Use Case 2:

Based on message key:

For example, you are building Linked-In distributed tracing solution, you
need sampling based on hash(key) % 100 < 10% then send else reject it.
(Although, you can consider this as application specific or plug-able quota
or sampling or a selection of message which app can do prior to
producer.send() but none the less another use case at producer side)

Let me know your thoughts and suggestions.

Thanks,

Bhavesh

On Wed, Mar 11, 2015 at 10:22 PM, Jay Kreps  wrote:

> Hey Todd,
>
> Yeah it is kind of weird to do the quota check after taking a request, but
> since the penalty is applied during that request and it just delays you to
> the right rate, I think it isn't exactly wrong. I admit it is weird,
> though.
>
> What you say about closing the connection makes sense. The issue is that
> our current model for connections is totally transient. The clients are
> supposed to handle any kind of transient connection loss and just
> re-establish. So basically all existing clients would likely just retry all
> the same whether you closed the connection or not, so at the moment there
> would be no way to know a retried request is actually a retry.
>
> Your point about the REST proxy is a good one, I don't think we had
> considered that. Currently the java producer just has a single client.id
> for all requests so the rest proxy would be a single client. But actually
> what you want is the original sender to be the client. This is technically
> very hard to do because the client will actually be batching records from
> all senders together into one request so the only way to get the client id
> right would be to make a new producer for each rest proxy client and this
> would mean a lot of memory and connections. This needs thought, not sure
> what solution there is.
>
> I am not 100% convinced we need to obey the request timeout. The
> configuration issue actually isn't a problem because the request timeout is
> sent with the request so the broker actually knows it now even without a
> handshake. However the question is, if someone sets a pathologically low
> request timeout do we need to obey it? and if so won't that mean we can't
> quota them? I claim the answer is no! I think we should redefine request
> timeout to mean "replication timeout", which is actually what it is today.
> Even today if you interact with a slow server it may take longer than that
> timeout (say because the fs write queues up for a long-ass time). I think
> we need a separate client timeout which should be fairly long and unlikely
> to be hit (default to 30 secs or something).
>
> -Jay
>
> On Tue, Mar 10, 2015 at 10:12 AM, Todd Palino  wrote:
>
> > Thanks, Jay. On the interface, I agree with Aditya (and you, I believe)
> > that we don't need to expose the public API contract at this time, but
> > structuring the internal logic to allow for it later with low cost is a
> > good idea.
> >
> > Glad you explained the thoughts on where to hold requests. While my gut
> > reaction is to not like processing a produce request that is over quota,
> it
> > makes sense to do it that way if you are going to have your quota action
> be
> > a delay.
> >
> > On the delay, I see your point on the bootstrap cases. However, one of
> the
> > places I differ, and part of the reason that I prefer the error, is that
> I
> > would never allow a producer who is over quota to resend a produce
> request.
> > A producer should identify itself at the s

[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows

2015-03-11 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358125#comment-14358125
 ] 

Jay Kreps commented on KAFKA-1646:
--

Hey [~waldenchen] this patch is adding a TON of windows-specific if/else 
statements. I don't think that is sustainable. I think if we are going to do 
this we need to try to make it the same strategy across OS's just for 
maintainability.

That said, are you sure NTFS can't just be tuned to accomplish the same thing?

> Improve consumer read performance for Windows
> -
>
> Key: KAFKA-1646
> URL: https://issues.apache.org/jira/browse/KAFKA-1646
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1.1
> Environment: Windows
>Reporter: xueqiang wang
>Assignee: xueqiang wang
>  Labels: newbie, patch
> Attachments: Improve consumer read performance for Windows.patch, 
> KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, 
> KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch
>
>
> This patch is for Window platform only. In Windows platform, if there are 
> more than one replicas writing to disk, the segment log files will not be 
> consistent in disk and then consumer reading performance will be dropped down 
> greatly. This fix allocates more disk spaces when rolling a new segment, and 
> then it will improve the consumer reading performance in NTFS file system.
> This patch doesn't affect file allocation of other filesystems, for it only 
> adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1646) Improve consumer read performance for Windows

2015-03-11 Thread Jay Kreps (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps updated KAFKA-1646:
-
Reviewer:   (was: Jay Kreps)

> Improve consumer read performance for Windows
> -
>
> Key: KAFKA-1646
> URL: https://issues.apache.org/jira/browse/KAFKA-1646
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1.1
> Environment: Windows
>Reporter: xueqiang wang
>Assignee: xueqiang wang
>  Labels: newbie, patch
> Attachments: Improve consumer read performance for Windows.patch, 
> KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, 
> KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch
>
>
> This patch is for Window platform only. In Windows platform, if there are 
> more than one replicas writing to disk, the segment log files will not be 
> consistent in disk and then consumer reading performance will be dropped down 
> greatly. This fix allocates more disk spaces when rolling a new segment, and 
> then it will improve the consumer reading performance in NTFS file system.
> This patch doesn't affect file allocation of other filesystems, for it only 
> adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [KIP-DISCUSSION] KIP-13 Quotas

2015-03-11 Thread Jay Kreps
Hey Todd,

Yeah it is kind of weird to do the quota check after taking a request, but
since the penalty is applied during that request and it just delays you to
the right rate, I think it isn't exactly wrong. I admit it is weird, though.

What you say about closing the connection makes sense. The issue is that
our current model for connections is totally transient. The clients are
supposed to handle any kind of transient connection loss and just
re-establish. So basically all existing clients would likely just retry all
the same whether you closed the connection or not, so at the moment there
would be no way to know a retried request is actually a retry.

Your point about the REST proxy is a good one, I don't think we had
considered that. Currently the java producer just has a single client.id
for all requests so the rest proxy would be a single client. But actually
what you want is the original sender to be the client. This is technically
very hard to do because the client will actually be batching records from
all senders together into one request so the only way to get the client id
right would be to make a new producer for each rest proxy client and this
would mean a lot of memory and connections. This needs thought, not sure
what solution there is.

I am not 100% convinced we need to obey the request timeout. The
configuration issue actually isn't a problem because the request timeout is
sent with the request so the broker actually knows it now even without a
handshake. However the question is, if someone sets a pathologically low
request timeout do we need to obey it? and if so won't that mean we can't
quota them? I claim the answer is no! I think we should redefine request
timeout to mean "replication timeout", which is actually what it is today.
Even today if you interact with a slow server it may take longer than that
timeout (say because the fs write queues up for a long-ass time). I think
we need a separate client timeout which should be fairly long and unlikely
to be hit (default to 30 secs or something).

-Jay

On Tue, Mar 10, 2015 at 10:12 AM, Todd Palino  wrote:

> Thanks, Jay. On the interface, I agree with Aditya (and you, I believe)
> that we don't need to expose the public API contract at this time, but
> structuring the internal logic to allow for it later with low cost is a
> good idea.
>
> Glad you explained the thoughts on where to hold requests. While my gut
> reaction is to not like processing a produce request that is over quota, it
> makes sense to do it that way if you are going to have your quota action be
> a delay.
>
> On the delay, I see your point on the bootstrap cases. However, one of the
> places I differ, and part of the reason that I prefer the error, is that I
> would never allow a producer who is over quota to resend a produce request.
> A producer should identify itself at the start of it's connection, and at
> that point if it is over quota, the broker would return an error and close
> the connection. The same goes for a consumer. I'm a fan, in general, of
> pushing all error cases and handling down to the client and doing as little
> special work to accommodate those cases on the broker side as possible.
>
> A case to consider here is what does this mean for REST endpoints to Kafka?
> Are you going to hold the HTTP connection open as well? Is the endpoint
> going to queue and hold requests?
>
> I think the point that we can only delay as long as the producer's timeout
> is a valid one, especially given that we do not have any means for the
> broker and client to negotiate settings, whether that is timeouts or
> message sizes or anything else. There are a lot of things that you have to
> know when setting up a Kafka client about what your settings should be,
> when much of that should be provided for in the protocol handshake. It's
> not as critical in an environment like ours, where we have central
> configuration for most clients, but we still see issues with it. I think
> being able to have the client and broker negotiate a minimum timeout
> allowed would make the delay more palatable.
>
> I'm still not sure it's the right solution, and that we're not just going
> with what's fast and cheap as opposed to what is good (or right). But given
> the details of where to hold the request, I have less of a concern with the
> burden on the broker.
>
> -Todd
>
>
> On Mon, Mar 9, 2015 at 5:01 PM, Jay Kreps  wrote:
>
> > Hey Todd,
> >
> > Nice points, let me try to respond:
> >
> > Plugins
> >
> > Yeah let me explain what I mean about plugins. The contracts that matter
> > for us are public contracts, i.e. the protocol, the binary format, stuff
> in
> > zk, and the various plug-in apis we expose. Adding an internal interface
> > later will not be hard--the quota check is going to be done in 2-6 places
> > in the code which would need to be updated, all internal to the broker.
> >
> > The challenge with making things pluggable up front is that the policy is
> > usually fairl

Re: [KIP-DISCUSSION] KIP-13 Quotas

2015-03-11 Thread Jay Kreps
1. Cool

2. Yeah I just wanted to flag the dependency/interaction.

3. Cool, I think we are in agreement then that a pluggable system could
possibly be nice but we can get to know it operationally before deciding to
expose such a thing.

4. Yeah, I agree, let's do it as a separate discussion. We actually had a
full discussion and vote back when we started down the path with metrics,
but I think there were some concerns so let's talk about it a bit more and
see.

5. Yeah I think my concern was just the resulting api. Basically because
the logic for each quota is different--at the very least a different metric
to check and different requests type to compute the value from, it seems
that the seemingly generic api just masks the fact that we handle each case
separately. I.e. the implementation of the method internally would be

  def check(request: T) {
if(request.instanceOf[ProduceRequest])
   [check produce request]
if(request.instanceOf[FetchRequest])
   [check fetch request]
..
  }

So basically we have logic specific to each request, but rather than
putting that logic into the method for handling that request we kind of put
it into a big case statement. So it seems like this doesn't really abstract
things since any time you add a new thing to quota you have to jump instead
the big case statement and add a new case, right? I think I may be
misunderstanding though...in any case not arguing that we want to just
shove this into the existing methods I just want to make sure if we
introduce an abstraction its a good one.

6. Yes, I think it is preferable not to have the seesaw effect in the delay
time. So if you need to impose 20 seconds of delay it is better to delay
all 200 requests 100 ms each rather than 199 requests 0 ms and one request
20 seconds. Several reasons for this:
a. gives predictable latency to the producer.
b. avoids hitting the request timeout on the one slow request
c. there is a trade-off between window size and delay time. If the window
is too small the estimate will be inaccurate and you will accidentally
penalize an okay client (e.g. imagine a 100 ms window, one big request
could overflow it). If the window is too large you will allow the system to
be brought to its knees for a long period of time prior to the throttling.

The other important question here is the details of the windowing policy.
If the window resets every 30 seconds, the client exhausts it in 10
seconds, then is throttled for 20, then it resets and the client starts
blitzing again. The result is basically 10 second outages every 30 seconds
as the throttling expires and the client goes full tilt, crushing the
server. So the quotas don't really do their job very well.

-Jay


On Mon, Mar 9, 2015 at 6:22 PM, Aditya Auradkar <
aaurad...@linkedin.com.invalid> wrote:

> Thanks for the comments Jay and Todd. Firstly, I'd like to address some of
> Jay's points.
>
> 1. You are right that we don't need to disable quotas on a per-client
> basis.
>
> 2. Configuration management: I must admit, I haven't read Joe's proposal
> in detail. My thinking was that we can keep configuration management
> separate from this discussion since this is already quite a meaty topic.
> Let me spend some time reading that KIP and then I can add more detail to
> the quota KIP.
>
> 3. Custom Quota implementations: I don't think it is necessarily a bad
> idea to have a interface called the QuotaManager(RequestThrottler). This
> doesn't necessarily mean exposing the interface as a public API. It is a
> mechanism to limit code changes to 1-2 specific classes. It prevents quota
> logic from bleeding into multiples places in the code as happens in any big
> piece of code. I fully agree that we should not expose this as a public API
> unless there is a very strong reason to. This seems to be more of an
> implementation detail.
>
> 4. Metrics Package: I'll add a section on the wiki about using things from
> the metrics package. Currently, the quota stuff is located in
> "clients/common/metrics". This means that we will have to migrate all that
> functionality into core. Do this also mean that we will need to replace the
> existing metrics code in "core" with the newly imported package as a part
> of this project? If so, that's a relatively large undertaking and it needs
> to be discussed separately IMO.
>
> 5. Request Throttler vs QuotaManager -
> I wanted my quota manager to do something similar to what you proposed.
> Inside KafkaApis, I could do:
>
> if(quotaManager.check())
>   // process request
> else
>   return
>
> Internally QuotaManager:check() could do exactly what you suggested
> try {
>  quotaMetric.record(newVal)
>} catch (QuotaException e) {
> // logic to calculate delay
>   requestThrottler.add(new DelayedResponse(...), ...)
>  return
>}
>
> This approach gives us the flexibility of deciding what metric we want to
> record inside QuotaManager. This brings us back to the same discussion of
> pluggable quota p

[jira] [Comment Edited] (KAFKA-1646) Improve consumer read performance for Windows

2015-03-11 Thread Honghai Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358096#comment-14358096
 ] 

Honghai Chen edited comment on KAFKA-1646 at 3/12/15 4:48 AM:
--

Hey, [~jkreps] Would you like help check the review at 
https://reviews.apache.org/r/29091/diff/7/  , really appreciate, thanks.


was (Author: waldenchen):
Het, [~jkreps] Would you like help check the review at 
https://reviews.apache.org/r/29091/diff/7/  , really appreciate, thanks.

> Improve consumer read performance for Windows
> -
>
> Key: KAFKA-1646
> URL: https://issues.apache.org/jira/browse/KAFKA-1646
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1.1
> Environment: Windows
>Reporter: xueqiang wang
>Assignee: xueqiang wang
>  Labels: newbie, patch
> Attachments: Improve consumer read performance for Windows.patch, 
> KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, 
> KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch
>
>
> This patch is for Window platform only. In Windows platform, if there are 
> more than one replicas writing to disk, the segment log files will not be 
> consistent in disk and then consumer reading performance will be dropped down 
> greatly. This fix allocates more disk spaces when rolling a new segment, and 
> then it will improve the consumer reading performance in NTFS file system.
> This patch doesn't affect file allocation of other filesystems, for it only 
> adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1646) Improve consumer read performance for Windows

2015-03-11 Thread Honghai Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358096#comment-14358096
 ] 

Honghai Chen commented on KAFKA-1646:
-

Het, [~jkreps] Would you like help check the review at 
https://reviews.apache.org/r/29091/diff/7/  , really appreciate, thanks.

> Improve consumer read performance for Windows
> -
>
> Key: KAFKA-1646
> URL: https://issues.apache.org/jira/browse/KAFKA-1646
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Affects Versions: 0.8.1.1
> Environment: Windows
>Reporter: xueqiang wang
>Assignee: xueqiang wang
>  Labels: newbie, patch
> Attachments: Improve consumer read performance for Windows.patch, 
> KAFKA-1646-truncate-off-trailing-zeros-on-broker-restart-if-bro.patch, 
> KAFKA-1646_20141216_163008.patch, KAFKA-1646_20150306_005526.patch
>
>
> This patch is for Window platform only. In Windows platform, if there are 
> more than one replicas writing to disk, the segment log files will not be 
> consistent in disk and then consumer reading performance will be dropped down 
> greatly. This fix allocates more disk spaces when rolling a new segment, and 
> then it will improve the consumer reading performance in NTFS file system.
> This patch doesn't affect file allocation of other filesystems, for it only 
> adds statements like 'if(Os.iswindow)' or adds methods used on Windows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 0.8.3 release plan

2015-03-11 Thread Jay Kreps
With regard to mm, I was kind of assuming just based on the amount of work
that that would go in for sure, but yeah I agree it is important.

-Jay

On Wed, Mar 11, 2015 at 9:39 PM, Jay Kreps  wrote:

> What I was trying to say was let's do a real release whenever either
> consumer or authn is done whichever happens first (or both if they can
> happen close together)--not sure which is more likely to slip.
>
> WRT the beta thing I think the question for people is whether the beta
> period was helpful or not in getting a more stable release? We could either
> do a beta release again or we could just do a normal release and call the
> consumer feature "experimental" or whatever...basically something to get it
> in peoples hands before it is supposed to work perfectly and never change
> again.
>
> -Jay
>
>
> On Wed, Mar 11, 2015 at 9:27 PM, Gwen Shapira 
> wrote:
>
>> So basically you are suggesting - lets do a beta release whenever we
>> feel the new consumer is done?
>>
>> This can definitely work.
>>
>> I'd prefer holding for MM improvements too. IMO, its not just more
>> improvements like flush() and compression optimization.
>> Current MirrorMaker can lose data, which makes it pretty useless for
>> its job. We hear lots of requests for robust MM from our customers, so
>> I can imagine its pretty important to the Kafka community (unless I
>> have a completely skewed sample).
>>
>> Gwen
>>
>>
>>
>> On Wed, Mar 11, 2015 at 9:18 PM, Jay Kreps  wrote:
>> > Yeah the real question is always what will we block on?
>> >
>> > I don't think we should try to hold back smaller changes. In this
>> bucket I
>> > would include most things you described: mm improvements, replica
>> > assignment tool improvements, flush, purgatory improvements, compression
>> > optimization, etc. Likely these will all get done in time as well as
>> many
>> > things that kind of pop up from users but probably aren't worth doing a
>> > release for on their own. If one of them slips that fine. I also don't
>> > think we should try to hold back work that is done if it isn't on a
>> list.
>> >
>> > I would consider either SSL+SASL or the consumer worthy of a release on
>> its
>> > own. If they finish close to the same time that is great. We can maybe
>> just
>> > assess as these evolve where the other one is at and make a call
>> whether it
>> > will be one or both?
>> >
>> > -Jay
>> >
>> > On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira 
>> wrote:
>> >
>> >> If we are going in terms of features, I can see the following features
>> >> getting in in the next month or two:
>> >>
>> >> * New consumer
>> >> * Improved Mirror Maker (I've seen tons of interest)
>> >> * Centralized admin requests (aka KIP-4)
>> >> * Nicer replica-reassignment tool
>> >> * SSL (and perhaps also SASL)?
>> >>
>> >> I think this collection will make a nice release. Perhaps we can cap
>> >> it there and focus (as a community) on getting these in, we can have a
>> >> release without too much scope creep in the not-very-distant-future?
>> >> Even just 3 out of these 5 will still make a nice incremental
>> >> improvement.
>> >>
>> >> Gwen
>> >>
>> >>
>> >> On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps 
>> wrote:
>> >> > Yeah I'd be in favor of a quicker, smaller release but I think as
>> long as
>> >> > we have these big things in flight we should probably keep the
>> release
>> >> > criteria feature-based rather than time-based, though (e.g. "when X
>> >> works"
>> >> > not "every other month).
>> >> >
>> >> > Ideally the next release would have at least a "beta" version of the
>> new
>> >> > consumer. I think having a new hunk of code like that available but
>> >> marked
>> >> > as "beta" is maybe a good way to go, as it gets it into peoples
>> hands for
>> >> > testing. This way we can declare the API not fully locked down until
>> the
>> >> > final release too, since mostly users only look at stuff after we
>> release
>> >> > it. Maybe we can try to construct a schedule around this?
>> >> >
>> >> > -Jay
>> >> >
>> >> >
>> >> > On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein 
>> wrote:
>> >> >
>> >> >> There hasn't been any public discussion about the 0.8.3 release
>> plan.
>> >> >>
>> >> >> There seems to be a lot of work in flight, work with patches and
>> review
>> >> >> that could/should get committed but now just pending KIPS, work
>> without
>> >> KIP
>> >> >> but that is in trunk already (e.g. the new Consumer) that would be
>> the
>> >> the
>> >> >> release but missing the KIP for the release...
>> >> >>
>> >> >> What does this mean for the 0.8.3 release? What are we trying to
>> get out
>> >> >> and when?
>> >> >>
>> >> >> Also looking at
>> >> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
>> >> >> there
>> >> >> seems to be things we are getting earlier (which is great of
>> course) so
>> >> are
>> >> >> we going to try to up the version and go with 0.9.0?
>> >> >>
>> >> >> 0.8.2.0 ended up getting very bloated and that delayed it much
>> long

Re: 0.8.3 release plan

2015-03-11 Thread Jay Kreps
What I was trying to say was let's do a real release whenever either
consumer or authn is done whichever happens first (or both if they can
happen close together)--not sure which is more likely to slip.

WRT the beta thing I think the question for people is whether the beta
period was helpful or not in getting a more stable release? We could either
do a beta release again or we could just do a normal release and call the
consumer feature "experimental" or whatever...basically something to get it
in peoples hands before it is supposed to work perfectly and never change
again.

-Jay


On Wed, Mar 11, 2015 at 9:27 PM, Gwen Shapira  wrote:

> So basically you are suggesting - lets do a beta release whenever we
> feel the new consumer is done?
>
> This can definitely work.
>
> I'd prefer holding for MM improvements too. IMO, its not just more
> improvements like flush() and compression optimization.
> Current MirrorMaker can lose data, which makes it pretty useless for
> its job. We hear lots of requests for robust MM from our customers, so
> I can imagine its pretty important to the Kafka community (unless I
> have a completely skewed sample).
>
> Gwen
>
>
>
> On Wed, Mar 11, 2015 at 9:18 PM, Jay Kreps  wrote:
> > Yeah the real question is always what will we block on?
> >
> > I don't think we should try to hold back smaller changes. In this bucket
> I
> > would include most things you described: mm improvements, replica
> > assignment tool improvements, flush, purgatory improvements, compression
> > optimization, etc. Likely these will all get done in time as well as many
> > things that kind of pop up from users but probably aren't worth doing a
> > release for on their own. If one of them slips that fine. I also don't
> > think we should try to hold back work that is done if it isn't on a list.
> >
> > I would consider either SSL+SASL or the consumer worthy of a release on
> its
> > own. If they finish close to the same time that is great. We can maybe
> just
> > assess as these evolve where the other one is at and make a call whether
> it
> > will be one or both?
> >
> > -Jay
> >
> > On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira 
> wrote:
> >
> >> If we are going in terms of features, I can see the following features
> >> getting in in the next month or two:
> >>
> >> * New consumer
> >> * Improved Mirror Maker (I've seen tons of interest)
> >> * Centralized admin requests (aka KIP-4)
> >> * Nicer replica-reassignment tool
> >> * SSL (and perhaps also SASL)?
> >>
> >> I think this collection will make a nice release. Perhaps we can cap
> >> it there and focus (as a community) on getting these in, we can have a
> >> release without too much scope creep in the not-very-distant-future?
> >> Even just 3 out of these 5 will still make a nice incremental
> >> improvement.
> >>
> >> Gwen
> >>
> >>
> >> On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps  wrote:
> >> > Yeah I'd be in favor of a quicker, smaller release but I think as
> long as
> >> > we have these big things in flight we should probably keep the release
> >> > criteria feature-based rather than time-based, though (e.g. "when X
> >> works"
> >> > not "every other month).
> >> >
> >> > Ideally the next release would have at least a "beta" version of the
> new
> >> > consumer. I think having a new hunk of code like that available but
> >> marked
> >> > as "beta" is maybe a good way to go, as it gets it into peoples hands
> for
> >> > testing. This way we can declare the API not fully locked down until
> the
> >> > final release too, since mostly users only look at stuff after we
> release
> >> > it. Maybe we can try to construct a schedule around this?
> >> >
> >> > -Jay
> >> >
> >> >
> >> > On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein 
> wrote:
> >> >
> >> >> There hasn't been any public discussion about the 0.8.3 release plan.
> >> >>
> >> >> There seems to be a lot of work in flight, work with patches and
> review
> >> >> that could/should get committed but now just pending KIPS, work
> without
> >> KIP
> >> >> but that is in trunk already (e.g. the new Consumer) that would be
> the
> >> the
> >> >> release but missing the KIP for the release...
> >> >>
> >> >> What does this mean for the 0.8.3 release? What are we trying to get
> out
> >> >> and when?
> >> >>
> >> >> Also looking at
> >> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
> >> >> there
> >> >> seems to be things we are getting earlier (which is great of course)
> so
> >> are
> >> >> we going to try to up the version and go with 0.9.0?
> >> >>
> >> >> 0.8.2.0 ended up getting very bloated and that delayed it much longer
> >> than
> >> >> we had originally communicated to the community and want to make
> sure we
> >> >> take that feedback from the community and try to improve upon it.
> >> >>
> >> >> Thanks!
> >> >>
> >> >> ~ Joe Stein
> >> >> - - - - - - - - - - - - - - - - -
> >> >>
> >> >>   http://www.stealth.ly
> >> >> - - - - - - - - - - - - - - - - -
> >> >>
> >>

Re: 0.8.3 release plan

2015-03-11 Thread Gwen Shapira
So basically you are suggesting - lets do a beta release whenever we
feel the new consumer is done?

This can definitely work.

I'd prefer holding for MM improvements too. IMO, its not just more
improvements like flush() and compression optimization.
Current MirrorMaker can lose data, which makes it pretty useless for
its job. We hear lots of requests for robust MM from our customers, so
I can imagine its pretty important to the Kafka community (unless I
have a completely skewed sample).

Gwen



On Wed, Mar 11, 2015 at 9:18 PM, Jay Kreps  wrote:
> Yeah the real question is always what will we block on?
>
> I don't think we should try to hold back smaller changes. In this bucket I
> would include most things you described: mm improvements, replica
> assignment tool improvements, flush, purgatory improvements, compression
> optimization, etc. Likely these will all get done in time as well as many
> things that kind of pop up from users but probably aren't worth doing a
> release for on their own. If one of them slips that fine. I also don't
> think we should try to hold back work that is done if it isn't on a list.
>
> I would consider either SSL+SASL or the consumer worthy of a release on its
> own. If they finish close to the same time that is great. We can maybe just
> assess as these evolve where the other one is at and make a call whether it
> will be one or both?
>
> -Jay
>
> On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira  wrote:
>
>> If we are going in terms of features, I can see the following features
>> getting in in the next month or two:
>>
>> * New consumer
>> * Improved Mirror Maker (I've seen tons of interest)
>> * Centralized admin requests (aka KIP-4)
>> * Nicer replica-reassignment tool
>> * SSL (and perhaps also SASL)?
>>
>> I think this collection will make a nice release. Perhaps we can cap
>> it there and focus (as a community) on getting these in, we can have a
>> release without too much scope creep in the not-very-distant-future?
>> Even just 3 out of these 5 will still make a nice incremental
>> improvement.
>>
>> Gwen
>>
>>
>> On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps  wrote:
>> > Yeah I'd be in favor of a quicker, smaller release but I think as long as
>> > we have these big things in flight we should probably keep the release
>> > criteria feature-based rather than time-based, though (e.g. "when X
>> works"
>> > not "every other month).
>> >
>> > Ideally the next release would have at least a "beta" version of the new
>> > consumer. I think having a new hunk of code like that available but
>> marked
>> > as "beta" is maybe a good way to go, as it gets it into peoples hands for
>> > testing. This way we can declare the API not fully locked down until the
>> > final release too, since mostly users only look at stuff after we release
>> > it. Maybe we can try to construct a schedule around this?
>> >
>> > -Jay
>> >
>> >
>> > On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein  wrote:
>> >
>> >> There hasn't been any public discussion about the 0.8.3 release plan.
>> >>
>> >> There seems to be a lot of work in flight, work with patches and review
>> >> that could/should get committed but now just pending KIPS, work without
>> KIP
>> >> but that is in trunk already (e.g. the new Consumer) that would be the
>> the
>> >> release but missing the KIP for the release...
>> >>
>> >> What does this mean for the 0.8.3 release? What are we trying to get out
>> >> and when?
>> >>
>> >> Also looking at
>> >> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
>> >> there
>> >> seems to be things we are getting earlier (which is great of course) so
>> are
>> >> we going to try to up the version and go with 0.9.0?
>> >>
>> >> 0.8.2.0 ended up getting very bloated and that delayed it much longer
>> than
>> >> we had originally communicated to the community and want to make sure we
>> >> take that feedback from the community and try to improve upon it.
>> >>
>> >> Thanks!
>> >>
>> >> ~ Joe Stein
>> >> - - - - - - - - - - - - - - - - -
>> >>
>> >>   http://www.stealth.ly
>> >> - - - - - - - - - - - - - - - - -
>> >>
>>


[jira] [Updated] (KAFKA-1930) Move server over to new metrics library

2015-03-11 Thread Aditya Auradkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Auradkar updated KAFKA-1930:
---
Assignee: Aditya Auradkar

> Move server over to new metrics library
> ---
>
> Key: KAFKA-1930
> URL: https://issues.apache.org/jira/browse/KAFKA-1930
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>Assignee: Aditya Auradkar
>
> We are using org.apache.kafka.common.metrics on the clients, but using Coda 
> Hale metrics on the server. We should move the server over to the new metrics 
> package as well. This will help to make all our metrics self-documenting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1930) Move server over to new metrics library

2015-03-11 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358077#comment-14358077
 ] 

Aditya Auradkar commented on KAFKA-1930:


I plan to work on this ticket since this has been called out as a pre-requisite 
for implementing quotas (KIP 13) in Kafka. I shall circulate a KIP once I 
understand the scope of the change well enough.

> Move server over to new metrics library
> ---
>
> Key: KAFKA-1930
> URL: https://issues.apache.org/jira/browse/KAFKA-1930
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> We are using org.apache.kafka.common.metrics on the clients, but using Coda 
> Hale metrics on the server. We should move the server over to the new metrics 
> package as well. This will help to make all our metrics self-documenting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 0.8.3 release plan

2015-03-11 Thread Jay Kreps
Yeah the real question is always what will we block on?

I don't think we should try to hold back smaller changes. In this bucket I
would include most things you described: mm improvements, replica
assignment tool improvements, flush, purgatory improvements, compression
optimization, etc. Likely these will all get done in time as well as many
things that kind of pop up from users but probably aren't worth doing a
release for on their own. If one of them slips that fine. I also don't
think we should try to hold back work that is done if it isn't on a list.

I would consider either SSL+SASL or the consumer worthy of a release on its
own. If they finish close to the same time that is great. We can maybe just
assess as these evolve where the other one is at and make a call whether it
will be one or both?

-Jay

On Wed, Mar 11, 2015 at 8:51 PM, Gwen Shapira  wrote:

> If we are going in terms of features, I can see the following features
> getting in in the next month or two:
>
> * New consumer
> * Improved Mirror Maker (I've seen tons of interest)
> * Centralized admin requests (aka KIP-4)
> * Nicer replica-reassignment tool
> * SSL (and perhaps also SASL)?
>
> I think this collection will make a nice release. Perhaps we can cap
> it there and focus (as a community) on getting these in, we can have a
> release without too much scope creep in the not-very-distant-future?
> Even just 3 out of these 5 will still make a nice incremental
> improvement.
>
> Gwen
>
>
> On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps  wrote:
> > Yeah I'd be in favor of a quicker, smaller release but I think as long as
> > we have these big things in flight we should probably keep the release
> > criteria feature-based rather than time-based, though (e.g. "when X
> works"
> > not "every other month).
> >
> > Ideally the next release would have at least a "beta" version of the new
> > consumer. I think having a new hunk of code like that available but
> marked
> > as "beta" is maybe a good way to go, as it gets it into peoples hands for
> > testing. This way we can declare the API not fully locked down until the
> > final release too, since mostly users only look at stuff after we release
> > it. Maybe we can try to construct a schedule around this?
> >
> > -Jay
> >
> >
> > On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein  wrote:
> >
> >> There hasn't been any public discussion about the 0.8.3 release plan.
> >>
> >> There seems to be a lot of work in flight, work with patches and review
> >> that could/should get committed but now just pending KIPS, work without
> KIP
> >> but that is in trunk already (e.g. the new Consumer) that would be the
> the
> >> release but missing the KIP for the release...
> >>
> >> What does this mean for the 0.8.3 release? What are we trying to get out
> >> and when?
> >>
> >> Also looking at
> >> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
> >> there
> >> seems to be things we are getting earlier (which is great of course) so
> are
> >> we going to try to up the version and go with 0.9.0?
> >>
> >> 0.8.2.0 ended up getting very bloated and that delayed it much longer
> than
> >> we had originally communicated to the community and want to make sure we
> >> take that feedback from the community and try to improve upon it.
> >>
> >> Thanks!
> >>
> >> ~ Joe Stein
> >> - - - - - - - - - - - - - - - - -
> >>
> >>   http://www.stealth.ly
> >> - - - - - - - - - - - - - - - - -
> >>
>


[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358074#comment-14358074
 ] 

Aditya Auradkar commented on KAFKA-1546:


I'll write a short KIP on this and circulate it tomorrow. In the meantime, I 
guess Jun/Neha can also review it since the actual fix has been discussed in 
enough detail on this jira.

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1831) Producer does not provide any information about which host the data was sent to

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1831:
-
Fix Version/s: 0.8.2.0

> Producer does not provide any information about which host the data was sent 
> to
> ---
>
> Key: KAFKA-1831
> URL: https://issues.apache.org/jira/browse/KAFKA-1831
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8.1.1
>Reporter: Mark Payne
>Assignee: Jun Rao
> Fix For: 0.8.2.0
>
>
> For traceability purposes and for troubleshooting, when sending data to 
> Kafka, the Producer should provide information about which host the data was 
> sent to. This works well already in the SimpleConsumer, which provides host() 
> and port() methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1852) OffsetCommitRequest can commit offset on unknown topic

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1852:
-
Fix Version/s: 0.8.3

> OffsetCommitRequest can commit offset on unknown topic
> --
>
> Key: KAFKA-1852
> URL: https://issues.apache.org/jira/browse/KAFKA-1852
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.3
>Reporter: Jun Rao
>Assignee: Sriharsha Chintalapani
> Fix For: 0.8.3
>
> Attachments: KAFKA-1852.patch, KAFKA-1852_2015-01-19_10:44:01.patch, 
> KAFKA-1852_2015-02-12_16:46:10.patch, KAFKA-1852_2015-02-16_13:21:46.patch, 
> KAFKA-1852_2015-02-18_13:13:17.patch, KAFKA-1852_2015-02-27_13:50:34.patch
>
>
> Currently, we allow an offset to be committed to Kafka, even when the 
> topic/partition for the offset doesn't exist. We probably should disallow 
> that and send an error back in that case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1865) Add a flush() call to the new producer API

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1865:
-
Fix Version/s: 0.8.3

> Add a flush() call to the new producer API
> --
>
> Key: KAFKA-1865
> URL: https://issues.apache.org/jira/browse/KAFKA-1865
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jay Kreps
>Assignee: Jay Kreps
> Fix For: 0.8.3
>
> Attachments: KAFKA-1865.patch, KAFKA-1865_2015-02-21_15:36:54.patch, 
> KAFKA-1865_2015-02-22_16:26:46.patch, KAFKA-1865_2015-02-23_18:29:16.patch, 
> KAFKA-1865_2015-02-25_17:15:26.patch, KAFKA-1865_2015-02-26_10:37:16.patch
>
>
> The postconditions of this would be that any record enqueued prior to flush() 
> would have completed being sent (either successfully or not).
> An open question is whether you can continue sending new records while this 
> call is executing (on other threads).
> We should only do this if it doesn't add inefficiencies for people who don't 
> use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1914) Count TotalProduceRequestRate and TotalFetchRequestRate in BrokerTopicMetrics

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1914:
-
Fix Version/s: 0.8.3

> Count TotalProduceRequestRate and TotalFetchRequestRate in BrokerTopicMetrics
> -
>
> Key: KAFKA-1914
> URL: https://issues.apache.org/jira/browse/KAFKA-1914
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: Aditya A Auradkar
>Assignee: Aditya Auradkar
> Fix For: 0.8.3
>
> Attachments: KAFKA-1914.patch, KAFKA-1914.patch, 
> KAFKA-1914_2015-02-17_15:46:27.patch
>
>
> Currently the BrokerTopicMetrics only counts the failedProduceRequestRate and 
> the failedFetchRequestRate. We should add 2 metrics to count the overall 
> produce/fetch request rates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1910) Refactor KafkaConsumer

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1910:
-
Fix Version/s: 0.8.3

> Refactor KafkaConsumer
> --
>
> Key: KAFKA-1910
> URL: https://issues.apache.org/jira/browse/KAFKA-1910
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Fix For: 0.8.3
>
> Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch
>
>
> KafkaConsumer now contains all the logic on the consumer side, making it a 
> very huge class file, better re-factoring it to have multiple layers on top 
> of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1959) Class CommitThread overwrite group of Thread class causing compile errors

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1959:
-
Fix Version/s: 0.8.3

> Class CommitThread overwrite group of Thread class causing compile errors
> -
>
> Key: KAFKA-1959
> URL: https://issues.apache.org/jira/browse/KAFKA-1959
> Project: Kafka
>  Issue Type: Bug
>  Components: core
> Environment: scala 2.10.4
>Reporter: Tong Li
>Assignee: Tong Li
>  Labels: newbie
> Fix For: 0.8.3
>
> Attachments: KAFKA-1959.patch, compileError.png
>
>
> class CommitThread(id: Int, partitionCount: Int, commitIntervalMs: Long, 
> zkClient: ZkClient)
> extends ShutdownableThread("commit-thread")
> with KafkaMetricsGroup {
> private val group = "group-" + id
> group overwrite class Thread group member, causing the following compile 
> error:
> overriding variable group in class Thread of type ThreadGroup;  value group 
> has weaker access privileges; it should not be private



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1925) Coordinator Node Id set to INT_MAX breaks coordinator metadata updates

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1925:
-
Fix Version/s: 0.8.3

> Coordinator Node Id set to INT_MAX breaks coordinator metadata updates
> --
>
> Key: KAFKA-1925
> URL: https://issues.apache.org/jira/browse/KAFKA-1925
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Critical
> Fix For: 0.8.3
>
> Attachments: KAFKA-1925.v1.patch
>
>
> KafkaConsumer used INT_MAX to mimic a new socket for coordinator (details can 
> be found in KAFKA-1760). However, this behavior breaks the coordinator as the 
> underlying NetworkClient only used the node id to determine when to initiate 
> a new connection:
> {code}
> if (connectionStates.canConnect(node.id(), now))
> // if we are interested in sending to a node and we don't have a 
> connection to it, initiate one
> initiateConnect(node, now);
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1943) Producer request failure rate should not include MessageSetSizeTooLarge and MessageSizeTooLargeException

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1943:
-
Fix Version/s: 0.8.3

> Producer request failure rate should not include MessageSetSizeTooLarge and 
> MessageSizeTooLargeException
> 
>
> Key: KAFKA-1943
> URL: https://issues.apache.org/jira/browse/KAFKA-1943
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Aditya A Auradkar
>Assignee: Aditya Auradkar
> Fix For: 0.8.3
>
> Attachments: KAFKA-1943.patch
>
>
> If MessageSetSizeTooLargeException or MessageSizeTooLargeException is thrown 
> from Log, then ReplicaManager counts it as a failed produce request. My 
> understanding is that this metric should only count failures as a result of 
> broker issues and not bad requests sent by the clients.
> If the message or message set is too large, then it is a client side error 
> and should not be reported. (similar to NotLeaderForPartitionException, 
> UnknownTopicOrPartitionException).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1957) code doc typo

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1957:
-
Fix Version/s: 0.8.3

> code doc typo
> -
>
> Key: KAFKA-1957
> URL: https://issues.apache.org/jira/browse/KAFKA-1957
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Reporter: Yaguo Zhou
>Priority: Trivial
> Fix For: 0.8.3
>
> Attachments: 
> 0001-fix-typo-SO_SNDBUFF-SO_SNDBUF-SO_RCVBUFF-SO_RCVBUF.patch
>
>
> There are doc typo in kafka.server.KafkaConfig.scala, SO_SNDBUFF should be 
> SO_SNDBUF and SO_RCVBUFF should be SO_RCVBUF



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1938) [doc] Quick start example should reference appropriate Kafka version

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1938:
-
Fix Version/s: 0.8.3

> [doc] Quick start example should reference appropriate Kafka version
> 
>
> Key: KAFKA-1938
> URL: https://issues.apache.org/jira/browse/KAFKA-1938
> Project: Kafka
>  Issue Type: Improvement
>  Components: website
>Affects Versions: 0.8.2.0
>Reporter: Stevo Slavic
>Assignee: Manikumar Reddy
>Priority: Trivial
> Fix For: 0.8.3
>
> Attachments: lz4-compression.patch, remove-081-references-1.patch, 
> remove-081-references.patch
>
>
> Kafka 0.8.2.0 documentation, quick start example on 
> https://kafka.apache.org/documentation.html#quickstart in step 1 links and 
> instructs reader to download Kafka 0.8.1.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1948) kafka.api.consumerTests are hanging

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1948:
-
Fix Version/s: 0.8.3

> kafka.api.consumerTests are hanging
> ---
>
> Key: KAFKA-1948
> URL: https://issues.apache.org/jira/browse/KAFKA-1948
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Guozhang Wang
> Fix For: 0.8.3
>
> Attachments: KAFKA-1948.patch
>
>
> Noticed today that very often when I run the full test suite, it hangs on 
> kafka.api.consumerTest (not always same test though). It doesn't reproduce 
> 100% of the time, but enough to be very annoying.
> I also saw it happening on trunk after KAFKA-1333:
> https://builds.apache.org/view/All/job/Kafka-trunk/389/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1975) testGroupConsumption occasionally hang

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1975:
-
Fix Version/s: 0.8.3

> testGroupConsumption occasionally hang
> --
>
> Key: KAFKA-1975
> URL: https://issues.apache.org/jira/browse/KAFKA-1975
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.3
>Reporter: Jun Rao
> Fix For: 0.8.3
>
> Attachments: stack.out
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1964) testPartitionReassignmentCallback hangs occasionally

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1964:
-
Fix Version/s: 0.8.3

>  testPartitionReassignmentCallback hangs occasionally
> -
>
> Key: KAFKA-1964
> URL: https://issues.apache.org/jira/browse/KAFKA-1964
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.8.3
>Reporter: Jun Rao
>Assignee: Guozhang Wang
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: stack.out
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1960) .gitignore does not exclude test generated files and folders.

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1960:
-
Fix Version/s: 0.8.3

> .gitignore does not exclude test generated files and folders.
> -
>
> Key: KAFKA-1960
> URL: https://issues.apache.org/jira/browse/KAFKA-1960
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Reporter: Tong Li
>Assignee: Tong Li
>Priority: Minor
>  Labels: newbie
> Fix For: 0.8.3
>
> Attachments: KAFKA-1960.patch
>
>
> gradle test can create quite few folders, .gitignore should exclude these 
> files for an easier git submit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1969) NPE in unit test for new consumer

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1969:
-
Fix Version/s: 0.8.3

> NPE in unit test for new consumer
> -
>
> Key: KAFKA-1969
> URL: https://issues.apache.org/jira/browse/KAFKA-1969
> Project: Kafka
>  Issue Type: Bug
>Reporter: Neha Narkhede
>Assignee: Guozhang Wang
>  Labels: newbie
> Fix For: 0.8.3
>
> Attachments: stack.out
>
>
> {code}
> kafka.api.ConsumerTest > testConsumptionWithBrokerFailures FAILED
> java.lang.NullPointerException
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.ensureCoordinatorReady(KafkaConsumer.java:1238)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.initiateCoordinatorRequest(KafkaConsumer.java:1189)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commit(KafkaConsumer.java:777)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commit(KafkaConsumer.java:816)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:704)
> at 
> kafka.api.ConsumerTest.consumeWithBrokerFailures(ConsumerTest.scala:167)
> at 
> kafka.api.ConsumerTest.testConsumptionWithBrokerFailures(ConsumerTest.scala:152)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1986) Producer request failure rate should not include InvalidMessageSizeException and OffsetOutOfRangeException

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1986:
-
Fix Version/s: 0.8.3

> Producer request failure rate should not include InvalidMessageSizeException 
> and OffsetOutOfRangeException
> --
>
> Key: KAFKA-1986
> URL: https://issues.apache.org/jira/browse/KAFKA-1986
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Aditya Auradkar
>Assignee: Aditya Auradkar
> Fix For: 0.8.3
>
> Attachments: KAFKA-1986.patch
>
>
> InvalidMessageSizeException and OffsetOutOfRangeException should not be 
> counted a failedProduce and failedFetch requests since they are client side 
> errors. They should be treated similarly to UnknownTopicOrPartitionException 
> or NotLeaderForPartitionException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2001) OffsetCommitTest hang during setup

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2001:
-
Fix Version/s: 0.8.3

> OffsetCommitTest hang during setup
> --
>
> Key: KAFKA-2001
> URL: https://issues.apache.org/jira/browse/KAFKA-2001
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.3
>Reporter: Jun Rao
>Assignee: Joel Koshy
> Fix For: 0.8.3
>
>
> OffsetCommitTest seems to hang in trunk reliably. The following is the 
> stacktrace. It seems to hang during setup.
> "Test worker" prio=5 tid=7fb5ab154800 nid=0x11198e000 waiting on condition 
> [11198c000]
>java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at 
> kafka.server.OffsetCommitTest$$anonfun$setUp$2.apply(OffsetCommitTest.scala:59)
> at 
> kafka.server.OffsetCommitTest$$anonfun$setUp$2.apply(OffsetCommitTest.scala:58)
> at scala.collection.immutable.Stream.dropWhile(Stream.scala:825)
> at kafka.server.OffsetCommitTest.setUp(OffsetCommitTest.scala:58)
> at junit.framework.TestCase.runBare(TestCase.java:132)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:232)
> at junit.framework.TestSuite.run(TestSuite.java:227)
> at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:91)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:86)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:49)
> at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:69)
> at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2009) Fix UncheckedOffset.removeOffset synchronization and trace logging issue in mirror maker

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2009:
-
Fix Version/s: 0.8.3

> Fix UncheckedOffset.removeOffset synchronization and trace logging issue in 
> mirror maker
> 
>
> Key: KAFKA-2009
> URL: https://issues.apache.org/jira/browse/KAFKA-2009
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.8.3
>
> Attachments: KAFKA-2009.patch, KAFKA-2009_2015-03-11_11:26:57.patch
>
>
> This ticket is to fix the mirror maker problem on current trunk which is 
> introduced in KAFKA-1650.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358063#comment-14358063
 ] 

Jay Kreps commented on KAFKA-1546:
--

Well iiuc the wonderfulness of this feature is that it actually doesn't add any 
new configs, it removes an old one that was impossible to set correctly and 
slightly modifies the meaning of an existing one to do what it sounds like it 
does. So I do think that for 99.5% of the world this will be like, wow, Kafka 
replication is much more robust.

I do think it is definitely a bug fix not a feature. But hey, I love me some 
KIPs, so I can't object to a nice write-up if you think it would be good to 
have.

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Joe Stein (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1546:
-
Fix Version/s: 0.8.3

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: 0.8.3 release plan

2015-03-11 Thread Gwen Shapira
If we are going in terms of features, I can see the following features
getting in in the next month or two:

* New consumer
* Improved Mirror Maker (I've seen tons of interest)
* Centralized admin requests (aka KIP-4)
* Nicer replica-reassignment tool
* SSL (and perhaps also SASL)?

I think this collection will make a nice release. Perhaps we can cap
it there and focus (as a community) on getting these in, we can have a
release without too much scope creep in the not-very-distant-future?
Even just 3 out of these 5 will still make a nice incremental
improvement.

Gwen


On Wed, Mar 11, 2015 at 8:29 PM, Jay Kreps  wrote:
> Yeah I'd be in favor of a quicker, smaller release but I think as long as
> we have these big things in flight we should probably keep the release
> criteria feature-based rather than time-based, though (e.g. "when X works"
> not "every other month).
>
> Ideally the next release would have at least a "beta" version of the new
> consumer. I think having a new hunk of code like that available but marked
> as "beta" is maybe a good way to go, as it gets it into peoples hands for
> testing. This way we can declare the API not fully locked down until the
> final release too, since mostly users only look at stuff after we release
> it. Maybe we can try to construct a schedule around this?
>
> -Jay
>
>
> On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein  wrote:
>
>> There hasn't been any public discussion about the 0.8.3 release plan.
>>
>> There seems to be a lot of work in flight, work with patches and review
>> that could/should get committed but now just pending KIPS, work without KIP
>> but that is in trunk already (e.g. the new Consumer) that would be the the
>> release but missing the KIP for the release...
>>
>> What does this mean for the 0.8.3 release? What are we trying to get out
>> and when?
>>
>> Also looking at
>> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
>> there
>> seems to be things we are getting earlier (which is great of course) so are
>> we going to try to up the version and go with 0.9.0?
>>
>> 0.8.2.0 ended up getting very bloated and that delayed it much longer than
>> we had originally communicated to the community and want to make sure we
>> take that feedback from the community and try to improve upon it.
>>
>> Thanks!
>>
>> ~ Joe Stein
>> - - - - - - - - - - - - - - - - -
>>
>>   http://www.stealth.ly
>> - - - - - - - - - - - - - - - - -
>>


[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358058#comment-14358058
 ] 

Joe Stein commented on KAFKA-1546:
--

we could also mark the JIRA as a bug instead of improvment

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-15 add a close method with timeout to KafkaProducer

2015-03-11 Thread Jay Kreps
Yeah guys, I'd like to second that. I'd really really love to get the
quality of these to the point where we could broadly solicit user input and
use them as a permanent document of the alternatives and rationale.

I know it is a little painful to have process, but I think we all saw what
happened to the previous clients as public interfaces so I really really
really want us to just be incredibly thoughtful and disciplined as we make
changes. I think we all want to avoid another "client rewrite".

To second Joe's question in a more specific way, I think an alternative I
don't see considered to give close() a bounded time is just to enforce the
request time on the client side, which will cause all requests to be failed
after the request timeout expires. This was the same behavior as for flush.
In the case where the user just wants to ensure close doesn't block forever
I think that may be sufficient?

So one alternative might be to just do that request timeout feature and add
a new producer config that is something like
  abort.on.failure=false
which causes the producer to hard exit if it can't send a request. Which I
think is closer to what you want, with this just being a way to implement
that behavior.

I'm not sure if this is better or worse, but we should be sure before we
make the change.

I also have a concern about
  producer.close(0, TimeUnit.MILLISECONDS)
not meaning close with a timeout of 0 ms.

I realize this exists in other java apis, but it is so confusing it even
confused us into having that recent producer bug because of course all the
other numbers mean "wait that long".

I'd propose
  close()--block until all completed
  close(0, TimeUnit.MILLISECONDS)--block for 0 ms
  close(5, TimeUnit.MILLISECONDS)--block for 5 ms
  close(-1, TimeUnit.MILLISECONDS)--error because blocking for negative ms
would mean completing in the past :-)

-Jay

On Wed, Mar 11, 2015 at 8:31 PM, Joe Stein  wrote:

> Could the KIP confluence please have updated the discussion thread link,
> thanks... could you also remove the template boilerplate at the top "*This
> page is meant as a template ..*" so we can capture it for the release
> cleanly.
>
> Also I don't really/fully understand how this is different than
> flush(time); close() and why close has its own timeout also?
>
> Lastly, what is the forceClose flag? This isn't documented in the public
> interface so it isn't clear how to completely use the feature just by
> reading the KIP.
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
>
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
>
> On Wed, Mar 11, 2015 at 11:24 PM, Guozhang Wang 
> wrote:
>
> > +1 (binding)
> >
> > On Wed, Mar 11, 2015 at 8:10 PM, Jiangjie Qin  >
> > wrote:
> >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer
> > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358056#comment-14358056
 ] 

Joe Stein commented on KAFKA-1461:
--

I personally think it is over kill but i bring it up because it seems to be 
required for other changes so I am just asking a question. If we are using the 
KIP to help folks understand the reason behind changes then we should do that 
and be complete or not.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358054#comment-14358054
 ] 

Joe Stein commented on KAFKA-1546:
--

if folks are going to read the KIP to understand for a release what features 
went in and why and this isn't there I think that would be odd.  How will they 
know what to-do with the setting?  granted that is what JIRA is for too if 
folks read the JIRA for a release but that isn't how the KIP have been 
discussed and working so far regardless about what I think here for this.

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-15 add a close method with timeout to KafkaProducer

2015-03-11 Thread Joe Stein
Could the KIP confluence please have updated the discussion thread link,
thanks... could you also remove the template boilerplate at the top "*This
page is meant as a template ..*" so we can capture it for the release
cleanly.

Also I don't really/fully understand how this is different than
flush(time); close() and why close has its own timeout also?

Lastly, what is the forceClose flag? This isn't documented in the public
interface so it isn't clear how to completely use the feature just by
reading the KIP.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Wed, Mar 11, 2015 at 11:24 PM, Guozhang Wang  wrote:

> +1 (binding)
>
> On Wed, Mar 11, 2015 at 8:10 PM, Jiangjie Qin 
> wrote:
>
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer
> >
> >
>
>
> --
> -- Guozhang
>


Re: [VOTE] KIP-15 add a close method with timeout to KafkaProducer

2015-03-11 Thread Harsha
+1 non-binding
-Harsha

On Wed, Mar 11, 2015, at 08:24 PM, Guozhang Wang wrote:
> +1 (binding)
> 
> On Wed, Mar 11, 2015 at 8:10 PM, Jiangjie Qin 
> wrote:
> 
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer
> >
> >
> 
> 
> -- 
> -- Guozhang


Re: 0.8.3 release plan

2015-03-11 Thread Jay Kreps
Yeah I'd be in favor of a quicker, smaller release but I think as long as
we have these big things in flight we should probably keep the release
criteria feature-based rather than time-based, though (e.g. "when X works"
not "every other month).

Ideally the next release would have at least a "beta" version of the new
consumer. I think having a new hunk of code like that available but marked
as "beta" is maybe a good way to go, as it gets it into peoples hands for
testing. This way we can declare the API not fully locked down until the
final release too, since mostly users only look at stuff after we release
it. Maybe we can try to construct a schedule around this?

-Jay


On Wed, Mar 11, 2015 at 7:55 PM, Joe Stein  wrote:

> There hasn't been any public discussion about the 0.8.3 release plan.
>
> There seems to be a lot of work in flight, work with patches and review
> that could/should get committed but now just pending KIPS, work without KIP
> but that is in trunk already (e.g. the new Consumer) that would be the the
> release but missing the KIP for the release...
>
> What does this mean for the 0.8.3 release? What are we trying to get out
> and when?
>
> Also looking at
> https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
> there
> seems to be things we are getting earlier (which is great of course) so are
> we going to try to up the version and go with 0.9.0?
>
> 0.8.2.0 ended up getting very bloated and that delayed it much longer than
> we had originally communicated to the community and want to make sure we
> take that feedback from the community and try to improve upon it.
>
> Thanks!
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
>
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
>


Re: [VOTE] KIP-15 add a close method with timeout to KafkaProducer

2015-03-11 Thread Guozhang Wang
+1 (binding)

On Wed, Mar 11, 2015 at 8:10 PM, Jiangjie Qin 
wrote:

>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer
>
>


-- 
-- Guozhang


Re: Review Request 31967: Patch for KAFKA-1546

2015-03-11 Thread Aditya Auradkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31967/
---

(Updated March 12, 2015, 3:17 a.m.)


Review request for kafka.


Bugs: KAFKA-1546
https://issues.apache.org/jira/browse/KAFKA-1546


Repository: kafka


Description (updated)
---

PATCH for KAFKA-1546

Brief summary of changes:
- Added a lagBegin metric inside Replica to track the lag in terms of time 
since the replica did not read from the LEO
- Using lag begin value in the check for ISR expand and shrink
- Removed the max lag messages config since it is no longer necessary
- Returning the initialLogEndOffset in LogReadResult corresponding to the the 
LEO before actually reading from the log.
- Unit test cases to test ISR shrinkage and expansion


Diffs
-

  core/src/main/scala/kafka/cluster/Partition.scala 
c4bf48a801007ebe7497077d2018d6dffe1677d4 
  core/src/main/scala/kafka/cluster/Replica.scala 
bd13c20338ce3d73113224440e858a12814e5adb 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
c5274822c57bf3c1f9e4135c0bdcaa87ee50ce20 
  core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
92152358c95fa9178d71bd1c079af0a0bd8f1da8 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 
92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 
  core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 

Diff: https://reviews.apache.org/r/31967/diff/


Testing
---


Thanks,

Aditya Auradkar



[VOTE] KIP-15 add a close method with timeout to KafkaProducer

2015-03-11 Thread Jiangjie Qin
https://cwiki.apache.org/confluence/display/KAFKA/KIP-15+-+Add+a+close+method+with+a+timeout+in+the+producer



[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358040#comment-14358040
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

[~charmalloc] since there aren't any interface changes I am not sure if a KIP 
is necessary. Ofcourse we added a new config for replica.fetch.backoff.ms If 
this warrants a KIP than I can write up one.

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358039#comment-14358039
 ] 

Jay Kreps commented on KAFKA-1546:
--

Personally I don't think it really needs a KIP, it subtly changes the meaning 
of one config, but it actually changes it to mean what everyone thinks it 
currently means. What do you think? I think this one is less about user 
expectations or our opinions and more about "does it actually work". Speaking 
of which...

[~auradkar] What is the test plan for this? It is trivially easy to reproduce 
the problems with the old approach. Start a server with default settings and 
1-2 replicas and use the perf test to generate a ton of load with itty bitty 
messages and just watch the replicas drop in and out of sync. We should concoct 
the most brutal case of this and validate that unless the follower actually 
falls behind it never failure detects out of the ISR. But we also need to check 
the reverse condition, that both a soft death and a lag are still detected. You 
can cause a soft death by setting the zk session timeout to something massive 
and just using unix signals to pause the process. You can cause lag by just 
running some commands on one of the followers to eat up all the cpu or I/O 
while a load test is running until the follower falls behind. Both cases should 
get caught.

Anyhow, awesome job getting this done. I think this is one of the biggest 
stability issues in Kafka right now. The patch lgtm, but it would be good for 
[~junrao] and [~nehanarkhede] to take a look.



> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


0.8.3 release plan

2015-03-11 Thread Joe Stein
There hasn't been any public discussion about the 0.8.3 release plan.

There seems to be a lot of work in flight, work with patches and review
that could/should get committed but now just pending KIPS, work without KIP
but that is in trunk already (e.g. the new Consumer) that would be the the
release but missing the KIP for the release...

What does this mean for the 0.8.3 release? What are we trying to get out
and when?

Also looking at
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan there
seems to be things we are getting earlier (which is great of course) so are
we going to try to up the version and go with 0.9.0?

0.8.2.0 ended up getting very bloated and that delayed it much longer than
we had originally communicated to the community and want to make sure we
take that feedback from the community and try to improve upon it.

Thanks!

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -


[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358017#comment-14358017
 ] 

Joe Stein commented on KAFKA-1546:
--

Shouldn't we have a KIP for this? It seems like we are changing/adding public 
features that will affects folks.

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Joe Stein (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358018#comment-14358018
 ] 

Joe Stein commented on KAFKA-1461:
--

Shouldn't there be a KIP for this?

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1997) Refactor Mirror Maker

2015-03-11 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357993#comment-14357993
 ] 

Jiangjie Qin commented on KAFKA-1997:
-

Updated reviewboard https://reviews.apache.org/r/31706/diff/
 against branch origin/trunk

> Refactor Mirror Maker
> -
>
> Key: KAFKA-1997
> URL: https://issues.apache.org/jira/browse/KAFKA-1997
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
> KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, 
> KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, 
> KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch, 
> KAFKA-1997_2015-03-11_19:10:53.patch
>
>
> Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1997) Refactor Mirror Maker

2015-03-11 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1997:

Attachment: KAFKA-1997_2015-03-11_19:10:53.patch

> Refactor Mirror Maker
> -
>
> Key: KAFKA-1997
> URL: https://issues.apache.org/jira/browse/KAFKA-1997
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
> KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, 
> KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, 
> KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch, 
> KAFKA-1997_2015-03-11_19:10:53.patch
>
>
> Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/
---

(Updated March 12, 2015, 2:10 a.m.)


Review request for kafka.


Bugs: KAFKA-1997
https://issues.apache.org/jira/browse/KAFKA-1997


Repository: kafka


Description (updated)
---

Addressed Guozhang's comments.


Changed the exit behavior on send failure because close(0) is not ready yet. 
Will submit followup patch after KAFKA-1660 is checked in.


Expanded imports from _ and * to full class path


Incorporated Joel's comments.


Addressed Joel's comments.


Addressed Guozhang's comments.


Incorporated Guozhang's comments.


Fix a transient bug in ZookeeperConsumerConnectTest


Diffs (updated)
-

  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
  core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
e6ff7683a0df4a7d221e949767e57c34703d5aad 
  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
5487259751ebe19f137948249aa1fd2637d2deb4 
  core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
  core/src/main/scala/kafka/tools/MirrorMaker.scala 
bafa379ff57bc46458ea8409406f5046dc9c973e 
  core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
543070f4fd3e96f3183cae9ee2ccbe843409ee58 
  core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
19640cc55b5baa0a26a808d708b7f4caf491c9f0 

Diff: https://reviews.apache.org/r/31706/diff/


Testing
---


Thanks,

Jiangjie Qin



Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Jiangjie Qin


> On March 12, 2015, 1:22 a.m., Guozhang Wang wrote:
> > Hit this unit test failure, is this relevant?
> > 
> > --
> > 
> > kafka.consumer.ZookeeperConsumerConnectorTest > 
> > testConsumerRebalanceListener FAILED
> > junit.framework.AssertionFailedError: 
> > expected: but 
> > was:
> > at junit.framework.Assert.fail(Assert.java:47)
> > at junit.framework.Assert.failNotEquals(Assert.java:277)
> > at junit.framework.Assert.assertEquals(Assert.java:64)
> > at junit.framework.Assert.assertEquals(Assert.java:71)
> > at 
> > kafka.consumer.ZookeeperConsumerConnectorTest.testConsumerRebalanceListener(ZookeeperConsumerConnectorTest.scala:393)

It is a bug but irrelavant to this patch I believe. The reason is that consumer 
2 finishes rebalance before consumer 1 does, so zookeeper did not have the 
complete ownership info yet when we do the assertion. I fixed this by consuming 
one message from consumer 1 before checking ownership info from zookeeper.


- Jiangjie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/#review76189
---


On March 11, 2015, 10:20 p.m., Jiangjie Qin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31706/
> ---
> 
> (Updated March 11, 2015, 10:20 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1997
> https://issues.apache.org/jira/browse/KAFKA-1997
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Addressed Guozhang's comments.
> 
> 
> Changed the exit behavior on send failure because close(0) is not ready yet. 
> Will submit followup patch after KAFKA-1660 is checked in.
> 
> 
> Expanded imports from _ and * to full class path
> 
> 
> Incorporated Joel's comments.
> 
> 
> Addressed Joel's comments.
> 
> 
> Addressed Guozhang's comments.
> 
> 
> Incorporated Guozhang's comments.
> 
> 
> Diffs
> -
> 
>   
> clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
>  d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
>   core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
> e6ff7683a0df4a7d221e949767e57c34703d5aad 
>   core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
> 5487259751ebe19f137948249aa1fd2637d2deb4 
>   core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
> 7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
>   core/src/main/scala/kafka/tools/MirrorMaker.scala 
> bafa379ff57bc46458ea8409406f5046dc9c973e 
>   core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
> 543070f4fd3e96f3183cae9ee2ccbe843409ee58 
>   
> core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
> 19640cc55b5baa0a26a808d708b7f4caf491c9f0 
> 
> Diff: https://reviews.apache.org/r/31706/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jiangjie Qin
> 
>



Re: Review Request 31967: Patch for KAFKA-1546

2015-03-11 Thread Aditya Auradkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31967/
---

(Updated March 12, 2015, 1:48 a.m.)


Review request for kafka.


Bugs: KAFKA-1546
https://issues.apache.org/jira/browse/KAFKA-1546


Repository: kafka


Description (updated)
---

PATCH for KAFKA-1546


Diffs (updated)
-

  core/src/main/scala/kafka/cluster/Partition.scala 
c4bf48a801007ebe7497077d2018d6dffe1677d4 
  core/src/main/scala/kafka/cluster/Replica.scala 
bd13c20338ce3d73113224440e858a12814e5adb 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
c5274822c57bf3c1f9e4135c0bdcaa87ee50ce20 
  core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
92152358c95fa9178d71bd1c079af0a0bd8f1da8 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 
92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 
  core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 

Diff: https://reviews.apache.org/r/31967/diff/


Testing
---


Thanks,

Aditya Auradkar



[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Aditya A Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357956#comment-14357956
 ] 

Aditya A Auradkar commented on KAFKA-1546:
--

Updated reviewboard https://reviews.apache.org/r/31967/diff/
 against branch origin/trunk

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Aditya A Auradkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya A Auradkar updated KAFKA-1546:
-
Attachment: KAFKA-1546_2015-03-11_18:48:09.patch

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31967: Patch for KAFKA-1546

2015-03-11 Thread Aditya Auradkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31967/
---

(Updated March 12, 2015, 1:39 a.m.)


Review request for kafka.


Bugs: KAFKA-1546
https://issues.apache.org/jira/browse/KAFKA-1546


Repository: kafka


Description (updated)
---

Patch for KAFKA-1546
Brief summary of changes:
- Added a lagBegin metric inside Replica to track the lag in terms of time 
since the replica did not read from the LEO
- Using lag begin value in the check for ISR expand and shrink
- Removed the max lag messages config since it is no longer necessary
- Returning the initialLogEndOffset in LogReadResult corresponding to the the 
LEO before actually reading from the log.
- Unit test cases to test ISR shrinkage and expansion


Diffs
-

  core/src/main/scala/kafka/cluster/Partition.scala 
c4bf48a801007ebe7497077d2018d6dffe1677d4 
  core/src/main/scala/kafka/cluster/Replica.scala 
bd13c20338ce3d73113224440e858a12814e5adb 
  core/src/main/scala/kafka/log/Log.scala 
06b8ecc5d11a1acfbaf3c693c42bf3ce5b2cd86d 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
c5274822c57bf3c1f9e4135c0bdcaa87ee50ce20 
  core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
92152358c95fa9178d71bd1c079af0a0bd8f1da8 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 
92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 
  core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 

Diff: https://reviews.apache.org/r/31967/diff/


Testing
---


Thanks,

Aditya Auradkar



[jira] [Updated] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Aditya A Auradkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya A Auradkar updated KAFKA-1546:
-
Attachment: KAFKA-1546.patch

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Attachments: KAFKA-1546.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Aditya A Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357947#comment-14357947
 ] 

Aditya A Auradkar commented on KAFKA-1546:
--

Created reviewboard https://reviews.apache.org/r/31967/diff/
 against branch origin/trunk

> Automate replica lag tuning
> ---
>
> Key: KAFKA-1546
> URL: https://issues.apache.org/jira/browse/KAFKA-1546
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
>Reporter: Neha Narkhede
>Assignee: Aditya Auradkar
>  Labels: newbie++
> Attachments: KAFKA-1546.patch
>
>
> Currently, there is no good way to tune the replica lag configs to 
> automatically account for high and low volume topics on the same cluster. 
> For the low-volume topic it will take a very long time to detect a lagging
> replica, and for the high-volume topic it will have false-positives.
> One approach to making this easier would be to have the configuration
> be something like replica.lag.max.ms and translate this into a number
> of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 31967: Patch for KAFKA-1546

2015-03-11 Thread Aditya Auradkar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31967/
---

Review request for kafka.


Bugs: KAFKA-1546
https://issues.apache.org/jira/browse/KAFKA-1546


Repository: kafka


Description
---

Patch for KAFKA-1546


Diffs
-

  core/src/main/scala/kafka/cluster/Partition.scala 
c4bf48a801007ebe7497077d2018d6dffe1677d4 
  core/src/main/scala/kafka/cluster/Replica.scala 
bd13c20338ce3d73113224440e858a12814e5adb 
  core/src/main/scala/kafka/log/Log.scala 
06b8ecc5d11a1acfbaf3c693c42bf3ce5b2cd86d 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/ReplicaManager.scala 
c5274822c57bf3c1f9e4135c0bdcaa87ee50ce20 
  core/src/test/scala/unit/kafka/server/ISRExpirationTest.scala 
92152358c95fa9178d71bd1c079af0a0bd8f1da8 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala 
92d6b2c672f74cdd526f2e98da8f7fb3696a88e3 
  core/src/test/scala/unit/kafka/server/SimpleFetchTest.scala 
efb457334bd87e4c5b0dd2c66ae1995993cd0bc1 

Diff: https://reviews.apache.org/r/31967/diff/


Testing
---


Thanks,

Aditya Auradkar



Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/#review76189
---


Hit this unit test failure, is this relevant?

--

kafka.consumer.ZookeeperConsumerConnectorTest > testConsumerRebalanceListener 
FAILED
junit.framework.AssertionFailedError: 
expected: but 
was:
at junit.framework.Assert.fail(Assert.java:47)
at junit.framework.Assert.failNotEquals(Assert.java:277)
at junit.framework.Assert.assertEquals(Assert.java:64)
at junit.framework.Assert.assertEquals(Assert.java:71)
at 
kafka.consumer.ZookeeperConsumerConnectorTest.testConsumerRebalanceListener(ZookeeperConsumerConnectorTest.scala:393)

- Guozhang Wang


On March 11, 2015, 10:20 p.m., Jiangjie Qin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31706/
> ---
> 
> (Updated March 11, 2015, 10:20 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1997
> https://issues.apache.org/jira/browse/KAFKA-1997
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Addressed Guozhang's comments.
> 
> 
> Changed the exit behavior on send failure because close(0) is not ready yet. 
> Will submit followup patch after KAFKA-1660 is checked in.
> 
> 
> Expanded imports from _ and * to full class path
> 
> 
> Incorporated Joel's comments.
> 
> 
> Addressed Joel's comments.
> 
> 
> Addressed Guozhang's comments.
> 
> 
> Incorporated Guozhang's comments.
> 
> 
> Diffs
> -
> 
>   
> clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
>  d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
>   core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
> e6ff7683a0df4a7d221e949767e57c34703d5aad 
>   core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
> 5487259751ebe19f137948249aa1fd2637d2deb4 
>   core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
> 7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
>   core/src/main/scala/kafka/tools/MirrorMaker.scala 
> bafa379ff57bc46458ea8409406f5046dc9c973e 
>   core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
> 543070f4fd3e96f3183cae9ee2ccbe843409ee58 
>   
> core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
> 19640cc55b5baa0a26a808d708b7f4caf491c9f0 
> 
> Diff: https://reviews.apache.org/r/31706/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jiangjie Qin
> 
>



[jira] [Updated] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-1461:
--
Attachment: KAFKA-1461_2015-03-11_18:17:51.patch

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357927#comment-14357927
 ] 

Sriharsha Chintalapani commented on KAFKA-1461:
---

Updated reviewboard https://reviews.apache.org/r/31927/diff/
 against branch origin/trunk

> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31927: Patch for KAFKA-1461

2015-03-11 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31927/
---

(Updated March 12, 2015, 1:17 a.m.)


Review request for kafka.


Bugs: KAFKA-1461
https://issues.apache.org/jira/browse/KAFKA-1461


Repository: kafka


Description
---

KAFKA-1461. Replica fetcher thread does not implement any back-off behavior.


Diffs (updated)
-

  core/src/main/scala/kafka/consumer/ConsumerFetcherThread.scala 
ee6139c901082358382c5ef892281386bf6fc91b 
  core/src/main/scala/kafka/server/AbstractFetcherThread.scala 
8c281d4668f92eff95a4a5df3c03c4b5b20e7095 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
d6d14fbd167fb8b085729cda5158898b1a3ee314 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/utils/TestUtils.scala 
52c79201af7c60f9b44a0aaa09cdf968d89a7b87 

Diff: https://reviews.apache.org/r/31927/diff/


Testing
---


Thanks,

Sriharsha Chintalapani



Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Abhishek Nigam


> On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
> > core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 183
> > 
> >
> > This is essentially a sync approach, can we use callback to do this?
> 
> Abhishek Nigam wrote:
> This is intentional. We want to make sure the event has successfully 
> reached the brokers. This enables us to form a reasonable expectation of what 
> the consumer should expect.
> 
> Jiangjie Qin wrote:
> The callback should be able to make sure everything goes well otherwise 
> you can chose stop sending or do whatever you want. One big issue about this 
> approach is that you will only send a single message out for each batch, and 
> that would be very slow especially if you don's set linger time to be some 
> thing very small even zero.
> Ideally the test case should work with differnt producer settings, I 
> would say at least ack=1 and ack=-1, also with different batch size and 
> linger time. Sending a single message out for each batch does not look a very 
> useful test case.

I think what you bring about ack=1 and ack=-1 is a very good point but 
orthogonal to this discussion. This can be a config paramter and we could have 
different configurations as you pointed out. 

This is however not intended to be a performance test since there are other 
system tests which do that much better.
LeaderNotAvailableException and UnkownHostOrTopicException are actual 
exceptions that I see in my logs every time I ran this test and it does not 
make sense to stop the producer thread in this case.

To clarify the purpose of this test:
a) Data integrity - old consumer can pull from the new broker without any 
issues. There is no data format change which will cause issues for the old 
consumers consuming from new broker. (This is tested through bootstrap)
b) Data is received in order and this guarantee is not violated despite the 
rolling bounce.


> On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
> > core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 184
> > 
> >
> > When a send fails, should we at least log the sequence number?
> 
> Abhishek Nigam wrote:
> I log the exception and the logger gives me the timestamp in the logs.
> Maybe I am missing something. Can you explain the rationale of why we 
> would want to log the sequence number on the producer side when send fails.
> 
> Jiangjie Qin wrote:
> Suppose someone is reading the log because something went wrong, wouldn't 
> it be much faster to see how many and which messages are lost if you have 
> sequence number logged? 
> For example, with sequence number, you can see producer saying that I 
> messge 1,2,3 sent successfully while message 4 failed. And consumer would 
> say, I expect to see 1,2,3 but I just got 1,3. 2 is lost.
> In your current log, what you can see is just a message wasn't sent 
> successfully, and one message was not consumed as expected. It's more 
> complicated to debug, right?

Since the sender re-tries the message till it succeeds I am not sure how we 
would arrive at the issue you are mentioning but maybe I am missing something.


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76173
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30809/
> ---
> 
> (Updated March 9, 2015, 11:55 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1888
> https://issues.apache.org/jira/browse/KAFKA-1888
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
> setup
> 
> 
> Diffs
> -
> 
>   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
>   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
>   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
>   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
>   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30809/diff/
> 
> 
> Testing
> ---
> 
> Scripted it to run 20 times without any failures.
> Command-line: broker-upgrade/bin/test.sh  
> 
> 
> Thanks,
> 
> Abhishek Nigam
> 
>



Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Jiangjie Qin


> On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
> > core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 183
> > 
> >
> > This is essentially a sync approach, can we use callback to do this?
> 
> Abhishek Nigam wrote:
> This is intentional. We want to make sure the event has successfully 
> reached the brokers. This enables us to form a reasonable expectation of what 
> the consumer should expect.

The callback should be able to make sure everything goes well otherwise you can 
chose stop sending or do whatever you want. One big issue about this approach 
is that you will only send a single message out for each batch, and that would 
be very slow especially if you don's set linger time to be some thing very 
small even zero.
Ideally the test case should work with differnt producer settings, I would say 
at least ack=1 and ack=-1, also with different batch size and linger time. 
Sending a single message out for each batch does not look a very useful test 
case.


> On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
> > core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 184
> > 
> >
> > When a send fails, should we at least log the sequence number?
> 
> Abhishek Nigam wrote:
> I log the exception and the logger gives me the timestamp in the logs.
> Maybe I am missing something. Can you explain the rationale of why we 
> would want to log the sequence number on the producer side when send fails.

Suppose someone is reading the log because something went wrong, wouldn't it be 
much faster to see how many and which messages are lost if you have sequence 
number logged? 
For example, with sequence number, you can see producer saying that I messge 
1,2,3 sent successfully while message 4 failed. And consumer would say, I 
expect to see 1,2,3 but I just got 1,3. 2 is lost.
In your current log, what you can see is just a message wasn't sent 
successfully, and one message was not consumed as expected. It's more 
complicated to debug, right?


- Jiangjie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76173
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30809/
> ---
> 
> (Updated March 9, 2015, 11:55 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1888
> https://issues.apache.org/jira/browse/KAFKA-1888
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
> setup
> 
> 
> Diffs
> -
> 
>   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
>   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
>   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
>   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
>   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30809/diff/
> 
> 
> Testing
> ---
> 
> Scripted it to run 20 times without any failures.
> Command-line: broker-upgrade/bin/test.sh  
> 
> 
> Thanks,
> 
> Abhishek Nigam
> 
>



Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Abhishek Nigam


> On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
> > This looks like a very good start. I think the framework is flexible enough 
> > to allow us to add a variety of upgrade tests. I'm looking forward to it.
> > 
> > 
> > I have few comments, but mostly I'm still confused on how this will be 
> > used. Perhaps more comments or even a readme is in order
> > 
> > You wrote that we invoke "test.sh  ", what should each 
> > directory contain? just the kafka jar of different versions? or an entire 
> > installation (including bin/ and conf/)?
> > Which one of the directories should be the newer and which is older? does 
> > it matter?
> > Which version of clients will be used.
> > 
> > Perhaps a more descriptive name for test.sh can help too. I'm guessing 
> > we'll have a whole collection of those test scripts soon.
> > 
> > Gwen

The directory containing the kafka jars. 
kafka_2.10-0.8.3-SNAPSHOT.jar
kafka-clients-0.8.3-SNAPSHOT.jar
The other jars are shared between both the kafka brokers.


> On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
> > build.gradle, line 209
> > 
> >
> > This should probably be a test dependency (if needed at all)
> > 
> > Packaging Guava will be a pain, since so many systems use different 
> > versions of Guava and they are all incompatible.

Guava provides an excellent rate limiter which I am using in the test and have 
used in the past.
When you talk about packaging we are already pulling in other external 
libraries like zookeeper with a specific version which the applications might 
be using extensively and might similarly run into conflicts.

If you have a suggestion for a library which provides rate limiting(less 
popular) than guava then I can use that instead otherwise I will move this 
dependency to the test for now.


> On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
> > core/src/main/scala/kafka/tools/ContinuousValidationTest.java, lines 409-440
> > 
> >
> > Do we really want to do this? 
> > 
> > We are using joptsimple for a bunch of other tools. It is easier to 
> > read, maintain, nice error messages, help screen, etc.

Thanks, I will switch to jobOpts.


> On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
> > system_test/broker_upgrade/bin/kafka-run-class.sh, lines 152-156
> > 
> >
> > Why did we decide to duplicate this entire file?

The only difference is that it takes an additional argument which contains the 
directory from which the kafka jars should be pulled.
Would you recommend adding it to the original script as an optional argument?


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76157
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30809/
> ---
> 
> (Updated March 9, 2015, 11:55 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1888
> https://issues.apache.org/jira/browse/KAFKA-1888
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
> setup
> 
> 
> Diffs
> -
> 
>   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
>   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
>   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
>   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
>   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30809/diff/
> 
> 
> Testing
> ---
> 
> Scripted it to run 20 times without any failures.
> Command-line: broker-upgrade/bin/test.sh  
> 
> 
> Thanks,
> 
> Abhishek Nigam
> 
>



Jenkins build is back to normal : Kafka-trunk #423

2015-03-11 Thread Apache Jenkins Server
See 



Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Abhishek Nigam


> On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
> > core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 183
> > 
> >
> > This is essentially a sync approach, can we use callback to do this?

This is intentional. We want to make sure the event has successfully reached 
the brokers. This enables us to form a reasonable expectation of what the 
consumer should expect.


> On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
> > core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 184
> > 
> >
> > When a send fails, should we at least log the sequence number?

I log the exception and the logger gives me the timestamp in the logs.
Maybe I am missing something. Can you explain the rationale of why we would 
want to log the sequence number on the producer side when send fails.


> On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
> > core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 321
> > 
> >
> > Similar to producer, can we log the expected sequence number and the 
> > seq we actually saw?

Sure in the cases where this a mismatch I could do that.


> On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
> > core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 386
> > 
> >
> > Can we use KafkaThread here?

I will take a look at that.


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76173
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30809/
> ---
> 
> (Updated March 9, 2015, 11:55 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1888
> https://issues.apache.org/jira/browse/KAFKA-1888
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
> setup
> 
> 
> Diffs
> -
> 
>   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
>   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
>   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
>   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
>   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30809/diff/
> 
> 
> Testing
> ---
> 
> Scripted it to run 20 times without any failures.
> Command-line: broker-upgrade/bin/test.sh  
> 
> 
> Thanks,
> 
> Abhishek Nigam
> 
>



Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76173
---



core/src/main/scala/kafka/tools/ContinuousValidationTest.java


This is essentially a sync approach, can we use callback to do this?



core/src/main/scala/kafka/tools/ContinuousValidationTest.java


When a send fails, should we at least log the sequence number?



core/src/main/scala/kafka/tools/ContinuousValidationTest.java


Similar to producer, can we log the expected sequence number and the seq we 
actually saw?



core/src/main/scala/kafka/tools/ContinuousValidationTest.java


Can we use KafkaThread here?


- Jiangjie Qin


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30809/
> ---
> 
> (Updated March 9, 2015, 11:55 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1888
> https://issues.apache.org/jira/browse/KAFKA-1888
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
> setup
> 
> 
> Diffs
> -
> 
>   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
>   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
>   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
>   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
>   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30809/diff/
> 
> 
> Testing
> ---
> 
> Scripted it to run 20 times without any failures.
> Command-line: broker-upgrade/bin/test.sh  
> 
> 
> Thanks,
> 
> Abhishek Nigam
> 
>



Build failed in Jenkins: KafkaPreCommit #36

2015-03-11 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-1910 Follow-up again; fix ListOffsetResponse handling for the 
expected error codes

--
[...truncated 1643 lines...]
at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:40)

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.PrimitiveApiTest.kafka$integration$KafkaServerTestHarness$$super$setUp(PrimitiveApiTest.scala:40)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44)
at 
kafka.integration.PrimitiveApiTest.kafka$integration$ProducerConsumerTestHarness$$super$setUp(PrimitiveApiTest.scala:40)
at 
kafka.integration.ProducerConsumerTestHarness$class.setUp(ProducerConsumerTestHarness.scala:33)
at kafka.integration.PrimitiveApiTest.setUp(PrimitiveApiTest.scala:40)

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooHigh 
FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44)
at 
kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46)

kafka.integration.AutoOffsetResetTest > testResetToEarliestWhenOffsetTooLow 
FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44)
at 
kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46)

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooHigh 
FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at kafka.zk.EmbeddedZookeeper.(EmbeddedZookeeper.scala:33)
at 
kafka.zk.ZooKeeperTestHarness$class.setUp(ZooKeeperTestHarness.scala:33)
at 
kafka.integration.AutoOffsetResetTest.kafka$integration$KafkaServerTestHarness$$super$setUp(AutoOffsetResetTest.scala:32)
at 
kafka.integration.KafkaServerTestHarness$class.setUp(KafkaServerTestHarness.scala:44)
at 
kafka.integration.AutoOffsetResetTest.setUp(AutoOffsetResetTest.scala:46)

kafka.integration.AutoOffsetResetTest > testResetToLatestWhenOffsetTooLow FAILED
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOS

[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer

2015-03-11 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357794#comment-14357794
 ] 

Guozhang Wang commented on KAFKA-1910:
--

Thanks Jun, incorporated the comments and commit to trunk.

> Refactor KafkaConsumer
> --
>
> Key: KAFKA-1910
> URL: https://issues.apache.org/jira/browse/KAFKA-1910
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch
>
>
> KafkaConsumer now contains all the logic on the consumer side, making it a 
> very huge class file, better re-factoring it to have multiple layers on top 
> of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76157
---


This looks like a very good start. I think the framework is flexible enough to 
allow us to add a variety of upgrade tests. I'm looking forward to it.


I have few comments, but mostly I'm still confused on how this will be used. 
Perhaps more comments or even a readme is in order

You wrote that we invoke "test.sh  ", what should each directory 
contain? just the kafka jar of different versions? or an entire installation 
(including bin/ and conf/)?
Which one of the directories should be the newer and which is older? does it 
matter?
Which version of clients will be used.

Perhaps a more descriptive name for test.sh can help too. I'm guessing we'll 
have a whole collection of those test scripts soon.

Gwen


build.gradle


This should probably be a test dependency (if needed at all)

Packaging Guava will be a pain, since so many systems use different 
versions of Guava and they are all incompatible.



core/src/main/scala/kafka/tools/ContinuousValidationTest.java


Do we really want to do this? 

We are using joptsimple for a bunch of other tools. It is easier to read, 
maintain, nice error messages, help screen, etc.



system_test/broker_upgrade/bin/kafka-run-class.sh


Why did we decide to duplicate this entire file?


- Gwen Shapira


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30809/
> ---
> 
> (Updated March 9, 2015, 11:55 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1888
> https://issues.apache.org/jira/browse/KAFKA-1888
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
> setup
> 
> 
> Diffs
> -
> 
>   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
>   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
>   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
>   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
>   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
>   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30809/diff/
> 
> 
> Testing
> ---
> 
> Scripted it to run 20 times without any failures.
> Command-line: broker-upgrade/bin/test.sh  
> 
> 
> Thanks,
> 
> Abhishek Nigam
> 
>



Re: Review Request 31925: KAFKA-1054: Fix remaining compiler warnings

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31925/#review76149
---

Ship it!


Makes sense. Not a committer but looks good to me :)
Unit tests passed and compilation warnings went away.

- Jiangjie Qin


On March 11, 2015, 4:35 a.m., Blake Smith wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31925/
> ---
> 
> (Updated March 11, 2015, 4:35 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1054
> https://issues.apache.org/jira/browse/KAFKA-1054
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Brought the patch outlined [here](https://reviews.apache.org/r/25461/diff/) 
> up to date with the latest trunk and fixed 2 more lingering compiler warnings.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/AdminUtils.scala b700110 
>   core/src/main/scala/kafka/coordinator/DelayedJoinGroup.scala df60cbc 
>   core/src/main/scala/kafka/server/KafkaServer.scala dddef93 
>   core/src/main/scala/kafka/utils/Utils.scala 738c1af 
>   core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala b46daa4 
> 
> Diff: https://reviews.apache.org/r/31925/diff/
> 
> 
> Testing
> ---
> 
> Ran full test suite.
> 
> 
> Thanks,
> 
> Blake Smith
> 
>



[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer

2015-03-11 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357725#comment-14357725
 ] 

Jun Rao commented on KAFKA-1910:


Are the changes in ConsumerTest needed? The extra logging and sleep seem to be 
just for debugging. Other than that. +1. Ran the tests locally 20 times and 
they all pass.

> Refactor KafkaConsumer
> --
>
> Key: KAFKA-1910
> URL: https://issues.apache.org/jira/browse/KAFKA-1910
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch
>
>
> KafkaConsumer now contains all the logic on the consumer side, making it a 
> very huge class file, better re-factoring it to have multiple layers on top 
> of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1997) Refactor Mirror Maker

2015-03-11 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-1997:

Attachment: KAFKA-1997_2015-03-11_15:20:18.patch

> Refactor Mirror Maker
> -
>
> Key: KAFKA-1997
> URL: https://issues.apache.org/jira/browse/KAFKA-1997
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
> KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, 
> KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, 
> KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch
>
>
> Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1997) Refactor Mirror Maker

2015-03-11 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357714#comment-14357714
 ] 

Jiangjie Qin commented on KAFKA-1997:
-

Updated reviewboard https://reviews.apache.org/r/31706/diff/
 against branch origin/trunk

> Refactor Mirror Maker
> -
>
> Key: KAFKA-1997
> URL: https://issues.apache.org/jira/browse/KAFKA-1997
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Attachments: KAFKA-1997.patch, KAFKA-1997_2015-03-03_16:28:46.patch, 
> KAFKA-1997_2015-03-04_15:07:46.patch, KAFKA-1997_2015-03-04_15:42:45.patch, 
> KAFKA-1997_2015-03-05_20:14:58.patch, KAFKA-1997_2015-03-09_18:55:54.patch, 
> KAFKA-1997_2015-03-10_18:31:34.patch, KAFKA-1997_2015-03-11_15:20:18.patch
>
>
> Refactor mirror maker based on KIP-3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/
---

(Updated March 11, 2015, 10:20 p.m.)


Review request for kafka.


Bugs: KAFKA-1997
https://issues.apache.org/jira/browse/KAFKA-1997


Repository: kafka


Description (updated)
---

Addressed Guozhang's comments.


Changed the exit behavior on send failure because close(0) is not ready yet. 
Will submit followup patch after KAFKA-1660 is checked in.


Expanded imports from _ and * to full class path


Incorporated Joel's comments.


Addressed Joel's comments.


Addressed Guozhang's comments.


Incorporated Guozhang's comments.


Diffs (updated)
-

  
clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
 d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
  core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
e6ff7683a0df4a7d221e949767e57c34703d5aad 
  core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
5487259751ebe19f137948249aa1fd2637d2deb4 
  core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
  core/src/main/scala/kafka/tools/MirrorMaker.scala 
bafa379ff57bc46458ea8409406f5046dc9c973e 
  core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
543070f4fd3e96f3183cae9ee2ccbe843409ee58 
  core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
19640cc55b5baa0a26a808d708b7f4caf491c9f0 

Diff: https://reviews.apache.org/r/31706/diff/


Testing
---


Thanks,

Jiangjie Qin



Re: Review Request 31830: Patch for KAFKA-2009

2015-03-11 Thread Onur Karaman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31830/#review76145
---

Ship it!


Ship It!

- Onur Karaman


On March 11, 2015, 6:26 p.m., Jiangjie Qin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31830/
> ---
> 
> (Updated March 11, 2015, 6:26 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2009
> https://issues.apache.org/jira/browse/KAFKA-2009
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Because the send callback could be fired in producer.send() as well, so 
> unacked offset needs to be added to unacked offsets list before call 
> producer.send()
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/tools/MirrorMaker.scala 
> bafa379ff57bc46458ea8409406f5046dc9c973e 
> 
> Diff: https://reviews.apache.org/r/31830/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jiangjie Qin
> 
>



Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Jiangjie Qin


> On March 11, 2015, 8:39 p.m., Guozhang Wang wrote:
> > core/src/main/scala/kafka/tools/MirrorMaker.scala, line 195
> > 
> >
> > Should we add the index as the suffix to the consumer id in the 
> > consumerConfig to distinguish different connector instances? Otherwise the 
> > consumer metrics would be collided.

Good point!


> On March 11, 2015, 8:39 p.m., Guozhang Wang wrote:
> > core/src/main/scala/kafka/tools/MirrorMaker.scala, lines 477-481
> > 
> >
> > Why not just import java.util.List?

Because List is a native type for scala, even after we imported java.util.List, 
we still need util.List to avoid collision.


- Jiangjie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/#review76121
---


On March 11, 2015, 1:31 a.m., Jiangjie Qin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31706/
> ---
> 
> (Updated March 11, 2015, 1:31 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1997
> https://issues.apache.org/jira/browse/KAFKA-1997
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Addressed Guozhang's comments.
> 
> 
> Changed the exit behavior on send failure because close(0) is not ready yet. 
> Will submit followup patch after KAFKA-1660 is checked in.
> 
> 
> Expanded imports from _ and * to full class path
> 
> 
> Incorporated Joel's comments.
> 
> 
> Addressed Joel's comments.
> 
> 
> Addressed Guozhang's comments.
> 
> 
> Diffs
> -
> 
>   
> clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
>  d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
>   core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
> e6ff7683a0df4a7d221e949767e57c34703d5aad 
>   core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
> 5487259751ebe19f137948249aa1fd2637d2deb4 
>   core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
> 7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
>   core/src/main/scala/kafka/tools/MirrorMaker.scala 
> bafa379ff57bc46458ea8409406f5046dc9c973e 
>   core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
> 543070f4fd3e96f3183cae9ee2ccbe843409ee58 
>   
> core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
> 19640cc55b5baa0a26a808d708b7f4caf491c9f0 
> 
> Diff: https://reviews.apache.org/r/31706/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jiangjie Qin
> 
>



Re: Review Request 31925: KAFKA-1054: Fix remaining compiler warnings

2015-03-11 Thread Blake Smith


> On March 11, 2015, 8:39 p.m., Jiangjie Qin wrote:
> > Works for me but still see the following line:
> > there were 12 feature warning(s); re-run with -feature for details
> > I tried ./gradlew jar -feature, but it seems does not work at all. If this 
> > is the related issue, can we solve it in this patch? Otherwise we can 
> > create another ticket to address it.

Thanks for looking at this Jiangjie,

In order for feature warnings to display, you have to add 
scalaCompileOptions.additionalParameters = ["-feature"] below line 131 in 
build.gradle at the root of the project. I posted the verbose output from 
enabling the flag on the ticket: 
https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14356280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14356280.

Bottom line is: I don't think we can fix these feature warnings until Kafka 
stops supporting scala 2.9. I get these build errors when trying to build a 2.9 
jar with the language imports brought in:



/Users/blake/src/kafka/core/src/main/scala/kafka/javaapi/Implicits.scala:19: 
object language is not a member of package scala
import scala.language.implicitConversions

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or 
--debug option to get more log output.

BUILD FAILED

Does that make sense?


- Blake


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31925/#review76122
---


On March 11, 2015, 4:35 a.m., Blake Smith wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31925/
> ---
> 
> (Updated March 11, 2015, 4:35 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1054
> https://issues.apache.org/jira/browse/KAFKA-1054
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Brought the patch outlined [here](https://reviews.apache.org/r/25461/diff/) 
> up to date with the latest trunk and fixed 2 more lingering compiler warnings.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/AdminUtils.scala b700110 
>   core/src/main/scala/kafka/coordinator/DelayedJoinGroup.scala df60cbc 
>   core/src/main/scala/kafka/server/KafkaServer.scala dddef93 
>   core/src/main/scala/kafka/utils/Utils.scala 738c1af 
>   core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala b46daa4 
> 
> Diff: https://reviews.apache.org/r/31925/diff/
> 
> 
> Testing
> ---
> 
> Ran full test suite.
> 
> 
> Thanks,
> 
> Blake Smith
> 
>



[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use

2015-03-11 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357673#comment-14357673
 ] 

Ewen Cheslack-Postava commented on KAFKA-1501:
--

[~guozhang] Yes, that's correct, so the only way to completely avoid the 
problem is to allow the kernel to assign the port automatically. I haven't 
checked, but I also wouldn't be surprised if the kernel actually saves recently 
freed ports to use for a fast path, which could explain why this happens more 
frequently than you might think it would given the fairly large range used for 
ephemeral ports.

> transient unit tests failures due to port already in use
> 
>
> Key: KAFKA-1501
> URL: https://issues.apache.org/jira/browse/KAFKA-1501
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jun Rao
>Assignee: Guozhang Wang
>  Labels: newbie
> Attachments: KAFKA-1501-choosePorts.patch, KAFKA-1501.patch, 
> KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501.patch, 
> KAFKA-1501_2015-03-09_11:41:07.patch, test-100.out, test-100.out, 
> test-27.out, test-29.out, test-32.out, test-35.out, test-38.out, test-4.out, 
> test-42.out, test-45.out, test-46.out, test-51.out, test-55.out, test-58.out, 
> test-59.out, test-60.out, test-69.out, test-72.out, test-74.out, test-76.out, 
> test-84.out, test-87.out, test-91.out, test-92.out
>
>
> Saw the following transient failures.
> kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckOne FAILED
> kafka.common.KafkaException: Socket server failed to bind to 
> localhost:59909: Address already in use.
> at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195)
> at kafka.network.Acceptor.(SocketServer.scala:141)
> at kafka.network.SocketServer.startup(SocketServer.scala:68)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:95)
> at kafka.utils.TestUtils$.createServer(TestUtils.scala:123)
> at 
> kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-11 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357668#comment-14357668
 ] 

Jun Rao commented on KAFKA-1461:


[~sriharsha] and [~guozhang], thinking about this a bit more. There are really 
two types of states that we need to manage in AbstractFetcherThread. The first 
one is the connection state, i.e., if a connection breaks, we want to backoff 
the reconnection. The second one is the partition state, i.e., if the partition 
hits an exception, we want to backoff that particular partition a bit.

The first one is what [~sriharsha]'s current RB is addressing. How about let's 
complete this first since it affects the performance of the unit tests? Once 
that's committed, we can address the second one, which is in [~sriharsha]'s 
initial patch.


> Replica fetcher thread does not implement any back-off behavior
> ---
>
> Key: KAFKA-1461
> URL: https://issues.apache.org/jira/browse/KAFKA-1461
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Sam Meder
>Assignee: Sriharsha Chintalapani
>  Labels: newbie++
> Fix For: 0.8.3
>
> Attachments: KAFKA-1461.patch, KAFKA-1461.patch, 
> KAFKA-1461_2015-03-11_10:41:26.patch
>
>
> The current replica fetcher thread will retry in a tight loop if any error 
> occurs during the fetch call. For example, we've seen cases where the fetch 
> continuously throws a connection refused exception leading to several replica 
> fetcher threads that spin in a pretty tight loop.
> To a much lesser degree this is also an issue in the consumer fetcher thread, 
> although the fact that erroring partitions are removed so a leader can be 
> re-discovered helps some.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1684) Implement TLS/SSL authentication

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357660#comment-14357660
 ] 

Sriharsha Chintalapani commented on KAFKA-1684:
---

Created reviewboard https://reviews.apache.org/r/31958/diff/
 against branch origin/trunk

> Implement TLS/SSL authentication
> 
>
> Key: KAFKA-1684
> URL: https://issues.apache.org/jira/browse/KAFKA-1684
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 0.9.0
>Reporter: Jay Kreps
>Assignee: Ivan Lyutov
> Attachments: KAFKA-1684.patch, KAFKA-1684.patch
>
>
> Add an SSL port to the configuration and advertise this as part of the 
> metadata request.
> If the SSL port is configured the socket server will need to add a second 
> Acceptor thread to listen on it. Connections accepted on this port will need 
> to go through the SSL handshake prior to being registered with a Processor 
> for request processing.
> SSL requests and responses may need to be wrapped or unwrapped using the 
> SSLEngine that was initialized by the acceptor. This wrapping and unwrapping 
> is very similar to what will need to be done for SASL-based authentication 
> schemes. We should have a uniform interface that covers both of these and we 
> will need to store the instance in the session with the request. The socket 
> server will have to use this object when reading and writing requests. We 
> will need to take care with the FetchRequests as the current 
> FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we 
> can only use this optimization for unencrypted sockets that don't require 
> userspace translation (wrapping).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1684) Implement TLS/SSL authentication

2015-03-11 Thread Sriharsha Chintalapani (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriharsha Chintalapani updated KAFKA-1684:
--
Attachment: KAFKA-1684.patch

> Implement TLS/SSL authentication
> 
>
> Key: KAFKA-1684
> URL: https://issues.apache.org/jira/browse/KAFKA-1684
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Affects Versions: 0.9.0
>Reporter: Jay Kreps
>Assignee: Ivan Lyutov
> Attachments: KAFKA-1684.patch, KAFKA-1684.patch
>
>
> Add an SSL port to the configuration and advertise this as part of the 
> metadata request.
> If the SSL port is configured the socket server will need to add a second 
> Acceptor thread to listen on it. Connections accepted on this port will need 
> to go through the SSL handshake prior to being registered with a Processor 
> for request processing.
> SSL requests and responses may need to be wrapped or unwrapped using the 
> SSLEngine that was initialized by the acceptor. This wrapping and unwrapping 
> is very similar to what will need to be done for SASL-based authentication 
> schemes. We should have a uniform interface that covers both of these and we 
> will need to store the instance in the session with the request. The socket 
> server will have to use this object when reading and writing requests. We 
> will need to take care with the FetchRequests as the current 
> FileChannel.transferTo mechanism will be incompatible with wrap/unwrap so we 
> can only use this optimization for unencrypted sockets that don't require 
> userspace translation (wrapping).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 31958: Patch for KAFKA-1684

2015-03-11 Thread Sriharsha Chintalapani

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31958/
---

Review request for kafka.


Bugs: KAFKA-1684
https://issues.apache.org/jira/browse/KAFKA-1684


Repository: kafka


Description
---

KAFKA-1684. Implement TLS/SSL authentication.


Diffs
-

  core/src/main/scala/kafka/network/Channel.scala PRE-CREATION 
  core/src/main/scala/kafka/network/SocketServer.scala 
76ce41aed6e04ac5ba88395c4d5008aca17f9a73 
  core/src/main/scala/kafka/network/ssl/SSLChannel.scala PRE-CREATION 
  core/src/main/scala/kafka/network/ssl/SSLConnectionConfig.scala PRE-CREATION 
  core/src/main/scala/kafka/server/KafkaConfig.scala 
48e33626695ad8a28b0018362ac225f11df94973 
  core/src/main/scala/kafka/server/KafkaServer.scala 
dddef938fabae157ed8644536eb1a2f329fb42b7 
  core/src/main/scala/kafka/utils/SSLAuthUtils.scala PRE-CREATION 
  core/src/test/scala/unit/kafka/network/SocketServerTest.scala 
0af23abf146d99e3d6cf31e5d6b95a9e63318ddb 
  core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
  core/src/test/scala/unit/kafka/utils/TestSSLUtils.scala PRE-CREATION 

Diff: https://reviews.apache.org/r/31958/diff/


Testing
---


Thanks,

Sriharsha Chintalapani



Re: Review Request 31927: Patch for KAFKA-1461

2015-03-11 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31927/#review76130
---


Yes, we need to set the backoff for the consumerFetcherThread as well. We can 
just use refreshLeaderBackoffMs.


core/src/main/scala/kafka/server/AbstractFetcherThread.scala


Instead of adding the new backOffWaitLatch, we can probably just wait on 
the existing partitionMapCond. This way, if there is a change in partitions, 
the thread can wake up immediately.

We probably can improve shutdown() a bit to wake up the thread waiting on 
the partitionMapCond. To do that, we can do the following steps in shutdown().

initiateShutdown()
partitionMapCond.signalAll()
awaitShutdown()
simpleConsumer.close()


- Jun Rao


On March 11, 2015, 5:41 p.m., Sriharsha Chintalapani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31927/
> ---
> 
> (Updated March 11, 2015, 5:41 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1461
> https://issues.apache.org/jira/browse/KAFKA-1461
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1461. Replica fetcher thread does not implement any back-off behavior.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/AbstractFetcherThread.scala 
> 8c281d4668f92eff95a4a5df3c03c4b5b20e7095 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> 48e33626695ad8a28b0018362ac225f11df94973 
>   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
> d6d14fbd167fb8b085729cda5158898b1a3ee314 
>   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
> c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 52c79201af7c60f9b44a0aaa09cdf968d89a7b87 
> 
> Diff: https://reviews.apache.org/r/31927/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>



[jira] [Commented] (KAFKA-2011) Rebalance with auto.leader.rebalance.enable=false

2015-03-11 Thread K Zakee (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357643#comment-14357643
 ] 

K Zakee commented on KAFKA-2011:


Did you mean 600 secs (10 mins)?

> Rebalance with auto.leader.rebalance.enable=false 
> --
>
> Key: KAFKA-2011
> URL: https://issues.apache.org/jira/browse/KAFKA-2011
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.0
> Environment: 5 Hosts of below config:
> "x86_64" "32-bit, 64-bit" "Little Endian" "24 GenuineIntel CPUs Model 44 
> 1600.000MHz" "RAM 189 GB" GNU/Linux
>Reporter: K Zakee
>Priority: Blocker
> Attachments: controller-logs-1.zip, controller-logs-2.zip
>
>
> Started with clean cluster 0.8.2 with 5 brokers. Setting the properties as 
> below:
> auto.leader.rebalance.enable=false
> controlled.shutdown.enable=true
> controlled.shutdown.max.retries=1
> controlled.shutdown.retry.backoff.ms=5000
> default.replication.factor=3
> log.cleaner.enable=true
> log.cleaner.threads=5
> log.cleanup.policy=delete
> log.flush.scheduler.interval.ms=3000
> log.retention.minutes=1440
> log.segment.bytes=1073741824
> message.max.bytes=100
> num.io.threads=14
> num.network.threads=14
> num.partitions=10
> queued.max.requests=500
> num.replica.fetchers=4
> replica.fetch.max.bytes=1048576
> replica.fetch.min.bytes=51200
> replica.lag.max.messages=5000
> replica.lag.time.max.ms=3
> replica.fetch.wait.max.ms=1000
> fetch.purgatory.purge.interval.requests=5000
> producer.purgatory.purge.interval.requests=5000
> delete.topic.enable=true
> Logs show rebalance happening well up to 24 hours after the start.
> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
> election for partitions:  (kafka.controller.KafkaController)
> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
> preferred replica election:  (kafka.controller.KafkaController)
> …
> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
> election for partitions:  (kafka.controller.KafkaController)
> ...
> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
> election for partitions:  (kafka.controller.KafkaController)
> ...
> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
> leader election for partitions  (kafka.controller.KafkaController)
> ...
> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
> preferred replica election:  (kafka.controller.KafkaController)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1501) transient unit tests failures due to port already in use

2015-03-11 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357640#comment-14357640
 ] 

Guozhang Wang commented on KAFKA-1501:
--

Thanks for the patch [~ewencp]. I have started to look at your patch but want 
to get some clarifications: 

TestUtils.choosePorts should return a random port number each time it gets 
called, but then since the socket is closed, the next time it gets called, the 
same port number could be returned before it gets used, hence causing the 
conflict?

> transient unit tests failures due to port already in use
> 
>
> Key: KAFKA-1501
> URL: https://issues.apache.org/jira/browse/KAFKA-1501
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jun Rao
>Assignee: Guozhang Wang
>  Labels: newbie
> Attachments: KAFKA-1501-choosePorts.patch, KAFKA-1501.patch, 
> KAFKA-1501.patch, KAFKA-1501.patch, KAFKA-1501.patch, 
> KAFKA-1501_2015-03-09_11:41:07.patch, test-100.out, test-100.out, 
> test-27.out, test-29.out, test-32.out, test-35.out, test-38.out, test-4.out, 
> test-42.out, test-45.out, test-46.out, test-51.out, test-55.out, test-58.out, 
> test-59.out, test-60.out, test-69.out, test-72.out, test-74.out, test-76.out, 
> test-84.out, test-87.out, test-91.out, test-92.out
>
>
> Saw the following transient failures.
> kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckOne FAILED
> kafka.common.KafkaException: Socket server failed to bind to 
> localhost:59909: Address already in use.
> at kafka.network.Acceptor.openServerSocket(SocketServer.scala:195)
> at kafka.network.Acceptor.(SocketServer.scala:141)
> at kafka.network.SocketServer.startup(SocketServer.scala:68)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:95)
> at kafka.utils.TestUtils$.createServer(TestUtils.scala:123)
> at 
> kafka.api.ProducerFailureHandlingTest.setUp(ProducerFailureHandlingTest.scala:68)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 31927: Patch for KAFKA-1461

2015-03-11 Thread Sriharsha Chintalapani


> On March 11, 2015, 5:06 p.m., Jun Rao wrote:
> > core/src/main/scala/kafka/server/KafkaConfig.scala, line 97
> > 
> >
> > How about making the default 1000 ms?

On a second thought I think we should put a longer backoff as the default. Most 
of the users won't override all of these configs so having a 1 sec here doesn't 
help much if the ReplicaFetcherThread encountering any errors.


- Sriharsha


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31927/#review76075
---


On March 11, 2015, 5:41 p.m., Sriharsha Chintalapani wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31927/
> ---
> 
> (Updated March 11, 2015, 5:41 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1461
> https://issues.apache.org/jira/browse/KAFKA-1461
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-1461. Replica fetcher thread does not implement any back-off behavior.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/AbstractFetcherThread.scala 
> 8c281d4668f92eff95a4a5df3c03c4b5b20e7095 
>   core/src/main/scala/kafka/server/KafkaConfig.scala 
> 48e33626695ad8a28b0018362ac225f11df94973 
>   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
> d6d14fbd167fb8b085729cda5158898b1a3ee314 
>   core/src/test/scala/unit/kafka/server/KafkaConfigConfigDefTest.scala 
> c124c8df5b5079e5ffbd0c4ea359562a66aaf317 
>   core/src/test/scala/unit/kafka/utils/TestUtils.scala 
> 52c79201af7c60f9b44a0aaa09cdf968d89a7b87 
> 
> Diff: https://reviews.apache.org/r/31927/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sriharsha Chintalapani
> 
>



Re: Can I be added as a contributor?

2015-03-11 Thread Brock Noland
Hi,

Sorry to pile on :) but could I be added as a contributor and to
confluence as well? I am brocknoland on JIRA and brockn at gmail on
confluence.

Cheers!
Brock

On Wed, Mar 11, 2015 at 1:44 PM, Joe Stein  wrote:
> Grant, I added your perms for Confluence.
>
> Grayson, I couldn't find a confluence account for you so couldn't give you
> perms.
>
> ~ Joe Stein
> - - - - - - - - - - - - - - - - -
>
>   http://www.stealth.ly
> - - - - - - - - - - - - - - - - -
>
> On Tue, Mar 10, 2015 at 8:20 AM, Grant Henke  wrote:
>
>> Thanks Joe. I added a Confluence account.
>>
>> On Tue, Mar 10, 2015 at 12:04 AM, Joe Stein  wrote:
>>
>> > Grant, I added you.
>> >
>> > Grayson and Grant, you should both also please setup Confluence accounts
>> > and we can grant perms to Confluence also too for your username.
>> >
>> > ~ Joe Stein
>> > - - - - - - - - - - - - - - - - -
>> >
>> >   http://www.stealth.ly
>> > - - - - - - - - - - - - - - - - -
>> >
>> > On Tue, Mar 10, 2015 at 12:54 AM, Grant Henke 
>> wrote:
>> >
>> > > I am also starting to work with the Kafka codebase with plans to
>> > contribute
>> > > more significantly in the near future. Could I also be added to the
>> > > contributor list so that I can assign myself tickets?
>> > >
>> > > Thank you,
>> > > Grant
>> > >
>> > > On Mon, Mar 9, 2015 at 1:39 PM, Guozhang Wang 
>> > wrote:
>> > >
>> > > > Added grayson.c...@gmail.com to the list.
>> > > >
>> > > > On Mon, Mar 9, 2015 at 10:41 AM, Grayson Chao
>> > > > > >
>> > > > wrote:
>> > > >
>> > > > > Hello Kafka devs,
>> > > > >
>> > > > > I'm working on the ops side of Kafka at LinkedIn (embedded SRE on
>> the
>> > > > > Kafka team) and would like to start familiarizing myself with the
>> > > > codebase
>> > > > > with a view to eventually making substantial contributions. Could
>> you
>> > > > > please add me as a contributor to the Kafka JIRA so that I can
>> assign
>> > > > > myself a newbie ticket?
>> > > > >
>> > > > > Thanks!
>> > > > > Grayson
>> > > > > --
>> > > > > Grayson Chao
>> > > > > Data Infra Streaming SRE
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > -- Guozhang
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Grant Henke
>> > > Solutions Consultant | Cloudera
>> > > ghe...@cloudera.com | 920-980-8979
>> > > twitter.com/ghenke  |
>> > > linkedin.com/in/granthenke
>> > >
>> >
>>
>>
>>
>> --
>> Grant Henke
>> Solutions Consultant | Cloudera
>> ghe...@cloudera.com | 920-980-8979
>> twitter.com/ghenke  |
>> linkedin.com/in/granthenke
>>


Re: Can I be added as a contributor?

2015-03-11 Thread Joe Stein
Grant, I added your perms for Confluence.

Grayson, I couldn't find a confluence account for you so couldn't give you
perms.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Tue, Mar 10, 2015 at 8:20 AM, Grant Henke  wrote:

> Thanks Joe. I added a Confluence account.
>
> On Tue, Mar 10, 2015 at 12:04 AM, Joe Stein  wrote:
>
> > Grant, I added you.
> >
> > Grayson and Grant, you should both also please setup Confluence accounts
> > and we can grant perms to Confluence also too for your username.
> >
> > ~ Joe Stein
> > - - - - - - - - - - - - - - - - -
> >
> >   http://www.stealth.ly
> > - - - - - - - - - - - - - - - - -
> >
> > On Tue, Mar 10, 2015 at 12:54 AM, Grant Henke 
> wrote:
> >
> > > I am also starting to work with the Kafka codebase with plans to
> > contribute
> > > more significantly in the near future. Could I also be added to the
> > > contributor list so that I can assign myself tickets?
> > >
> > > Thank you,
> > > Grant
> > >
> > > On Mon, Mar 9, 2015 at 1:39 PM, Guozhang Wang 
> > wrote:
> > >
> > > > Added grayson.c...@gmail.com to the list.
> > > >
> > > > On Mon, Mar 9, 2015 at 10:41 AM, Grayson Chao
> >  > > >
> > > > wrote:
> > > >
> > > > > Hello Kafka devs,
> > > > >
> > > > > I'm working on the ops side of Kafka at LinkedIn (embedded SRE on
> the
> > > > > Kafka team) and would like to start familiarizing myself with the
> > > > codebase
> > > > > with a view to eventually making substantial contributions. Could
> you
> > > > > please add me as a contributor to the Kafka JIRA so that I can
> assign
> > > > > myself a newbie ticket?
> > > > >
> > > > > Thanks!
> > > > > Grayson
> > > > > --
> > > > > Grayson Chao
> > > > > Data Infra Streaming SRE
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> > >
> > >
> > > --
> > > Grant Henke
> > > Solutions Consultant | Cloudera
> > > ghe...@cloudera.com | 920-980-8979
> > > twitter.com/ghenke  |
> > > linkedin.com/in/granthenke
> > >
> >
>
>
>
> --
> Grant Henke
> Solutions Consultant | Cloudera
> ghe...@cloudera.com | 920-980-8979
> twitter.com/ghenke  |
> linkedin.com/in/granthenke
>


Re: Review Request 31925: KAFKA-1054: Fix remaining compiler warnings

2015-03-11 Thread Jiangjie Qin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31925/#review76122
---


Works for me but still see the following line:
there were 12 feature warning(s); re-run with -feature for details
I tried ./gradlew jar -feature, but it seems does not work at all. If this is 
the related issue, can we solve it in this patch? Otherwise we can create 
another ticket to address it.

- Jiangjie Qin


On March 11, 2015, 4:35 a.m., Blake Smith wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31925/
> ---
> 
> (Updated March 11, 2015, 4:35 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1054
> https://issues.apache.org/jira/browse/KAFKA-1054
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Brought the patch outlined [here](https://reviews.apache.org/r/25461/diff/) 
> up to date with the latest trunk and fixed 2 more lingering compiler warnings.
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/admin/AdminUtils.scala b700110 
>   core/src/main/scala/kafka/coordinator/DelayedJoinGroup.scala df60cbc 
>   core/src/main/scala/kafka/server/KafkaServer.scala dddef93 
>   core/src/main/scala/kafka/utils/Utils.scala 738c1af 
>   core/src/test/scala/unit/kafka/server/ServerShutdownTest.scala b46daa4 
> 
> Diff: https://reviews.apache.org/r/31925/diff/
> 
> 
> Testing
> ---
> 
> Ran full test suite.
> 
> 
> Thanks,
> 
> Blake Smith
> 
>



Re: Review Request 31706: Patch for KAFKA-1997

2015-03-11 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31706/#review76121
---


LGTM overall except one potential issue on consumer metrics colliding.


core/src/main/scala/kafka/tools/MirrorMaker.scala


Should we add the index as the suffix to the consumer id in the 
consumerConfig to distinguish different connector instances? Otherwise the 
consumer metrics would be collided.



core/src/main/scala/kafka/tools/MirrorMaker.scala


Add a comment like "Creating just on stream per each connector instance"



core/src/main/scala/kafka/tools/MirrorMaker.scala


Leave a TODO comment that after KAFKA-1660 this will be changed accordingly.



core/src/main/scala/kafka/tools/MirrorMaker.scala


Why not just import java.util.List?


- Guozhang Wang


On March 11, 2015, 1:31 a.m., Jiangjie Qin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31706/
> ---
> 
> (Updated March 11, 2015, 1:31 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-1997
> https://issues.apache.org/jira/browse/KAFKA-1997
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Addressed Guozhang's comments.
> 
> 
> Changed the exit behavior on send failure because close(0) is not ready yet. 
> Will submit followup patch after KAFKA-1660 is checked in.
> 
> 
> Expanded imports from _ and * to full class path
> 
> 
> Incorporated Joel's comments.
> 
> 
> Addressed Joel's comments.
> 
> 
> Addressed Guozhang's comments.
> 
> 
> Diffs
> -
> 
>   
> clients/src/main/java/org/apache/kafka/clients/producer/internals/RecordAccumulator.java
>  d5c79e2481d5e9a2524ac2ef6a6879f61cb7cb5f 
>   core/src/main/scala/kafka/consumer/PartitionAssignor.scala 
> e6ff7683a0df4a7d221e949767e57c34703d5aad 
>   core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
> 5487259751ebe19f137948249aa1fd2637d2deb4 
>   core/src/main/scala/kafka/javaapi/consumer/ConsumerRebalanceListener.java 
> 7f45a90ba6676290172b7da54c15ee5dc1a42a2e 
>   core/src/main/scala/kafka/tools/MirrorMaker.scala 
> bafa379ff57bc46458ea8409406f5046dc9c973e 
>   core/src/test/scala/unit/kafka/consumer/PartitionAssignorTest.scala 
> 543070f4fd3e96f3183cae9ee2ccbe843409ee58 
>   
> core/src/test/scala/unit/kafka/consumer/ZookeeperConsumerConnectorTest.scala 
> 19640cc55b5baa0a26a808d708b7f4caf491c9f0 
> 
> Diff: https://reviews.apache.org/r/31706/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jiangjie Qin
> 
>



[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer

2015-03-11 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357529#comment-14357529
 ] 

Guozhang Wang commented on KAFKA-1910:
--

[~junrao] Could you take a look at the patch so that I can check-in the fix if 
it LGTY?

> Refactor KafkaConsumer
> --
>
> Key: KAFKA-1910
> URL: https://issues.apache.org/jira/browse/KAFKA-1910
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch
>
>
> KafkaConsumer now contains all the logic on the consumer side, making it a 
> very huge class file, better re-factoring it to have multiple layers on top 
> of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2011) Rebalance with auto.leader.rebalance.enable=false

2015-03-11 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357509#comment-14357509
 ] 

Jiangjie Qin commented on KAFKA-2011:
-

Yes, 6000 ms sounds too short. Can you try bumping it up to 60?
200MB/s sounds OK for 5 brokers.

> Rebalance with auto.leader.rebalance.enable=false 
> --
>
> Key: KAFKA-2011
> URL: https://issues.apache.org/jira/browse/KAFKA-2011
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.0
> Environment: 5 Hosts of below config:
> "x86_64" "32-bit, 64-bit" "Little Endian" "24 GenuineIntel CPUs Model 44 
> 1600.000MHz" "RAM 189 GB" GNU/Linux
>Reporter: K Zakee
>Priority: Blocker
> Attachments: controller-logs-1.zip, controller-logs-2.zip
>
>
> Started with clean cluster 0.8.2 with 5 brokers. Setting the properties as 
> below:
> auto.leader.rebalance.enable=false
> controlled.shutdown.enable=true
> controlled.shutdown.max.retries=1
> controlled.shutdown.retry.backoff.ms=5000
> default.replication.factor=3
> log.cleaner.enable=true
> log.cleaner.threads=5
> log.cleanup.policy=delete
> log.flush.scheduler.interval.ms=3000
> log.retention.minutes=1440
> log.segment.bytes=1073741824
> message.max.bytes=100
> num.io.threads=14
> num.network.threads=14
> num.partitions=10
> queued.max.requests=500
> num.replica.fetchers=4
> replica.fetch.max.bytes=1048576
> replica.fetch.min.bytes=51200
> replica.lag.max.messages=5000
> replica.lag.time.max.ms=3
> replica.fetch.wait.max.ms=1000
> fetch.purgatory.purge.interval.requests=5000
> producer.purgatory.purge.interval.requests=5000
> delete.topic.enable=true
> Logs show rebalance happening well up to 24 hours after the start.
> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
> election for partitions:  (kafka.controller.KafkaController)
> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
> preferred replica election:  (kafka.controller.KafkaController)
> …
> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
> election for partitions:  (kafka.controller.KafkaController)
> ...
> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
> election for partitions:  (kafka.controller.KafkaController)
> ...
> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
> leader election for partitions  (kafka.controller.KafkaController)
> ...
> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
> preferred replica election:  (kafka.controller.KafkaController)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2011) Rebalance with auto.leader.rebalance.enable=false

2015-03-11 Thread K Zakee (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357426#comment-14357426
 ] 

K Zakee commented on KAFKA-2011:


1) The zk timeout I see occurring when the controller migration happened. 
Current ZK timeout setting on brokers is default value of 6000ms. 
Is increasing the timeout recommended?

2) As for the data loads, we did have the high producer loads unto 200MB/s for 
a stretch of hours, and reducing gradually to 150 to 130. But given the 1GBps 
NIC on each of five brokers and 3 zookeepers, do you think this data loads 
would be a problem? 

3) Do you think changing any configuration setting provided above may help ?

> Rebalance with auto.leader.rebalance.enable=false 
> --
>
> Key: KAFKA-2011
> URL: https://issues.apache.org/jira/browse/KAFKA-2011
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.0
> Environment: 5 Hosts of below config:
> "x86_64" "32-bit, 64-bit" "Little Endian" "24 GenuineIntel CPUs Model 44 
> 1600.000MHz" "RAM 189 GB" GNU/Linux
>Reporter: K Zakee
>Priority: Blocker
> Attachments: controller-logs-1.zip, controller-logs-2.zip
>
>
> Started with clean cluster 0.8.2 with 5 brokers. Setting the properties as 
> below:
> auto.leader.rebalance.enable=false
> controlled.shutdown.enable=true
> controlled.shutdown.max.retries=1
> controlled.shutdown.retry.backoff.ms=5000
> default.replication.factor=3
> log.cleaner.enable=true
> log.cleaner.threads=5
> log.cleanup.policy=delete
> log.flush.scheduler.interval.ms=3000
> log.retention.minutes=1440
> log.segment.bytes=1073741824
> message.max.bytes=100
> num.io.threads=14
> num.network.threads=14
> num.partitions=10
> queued.max.requests=500
> num.replica.fetchers=4
> replica.fetch.max.bytes=1048576
> replica.fetch.min.bytes=51200
> replica.lag.max.messages=5000
> replica.lag.time.max.ms=3
> replica.fetch.wait.max.ms=1000
> fetch.purgatory.purge.interval.requests=5000
> producer.purgatory.purge.interval.requests=5000
> delete.topic.enable=true
> Logs show rebalance happening well up to 24 hours after the start.
> [2015-03-07 16:52:48,969] INFO [Controller 2]: Resuming preferred replica 
> election for partitions:  (kafka.controller.KafkaController)
> [2015-03-07 16:52:48,969] INFO [Controller 2]: Partitions that completed 
> preferred replica election:  (kafka.controller.KafkaController)
> …
> [2015-03-07 12:07:06,783] INFO [Controller 4]: Resuming preferred replica 
> election for partitions:  (kafka.controller.KafkaController)
> ...
> [2015-03-07 09:10:41,850] INFO [Controller 3]: Resuming preferred replica 
> election for partitions:  (kafka.controller.KafkaController)
> ...
> [2015-03-07 08:26:56,396] INFO [Controller 1]: Starting preferred replica 
> leader election for partitions  (kafka.controller.KafkaController)
> ...
> [2015-03-06 16:52:59,506] INFO [Controller 2]: Partitions undergoing 
> preferred replica election:  (kafka.controller.KafkaController)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

2015-03-11 Thread Joe Stein
Sorry for not catching up on this thread earlier, I wanted to-do this
before the KIP got its updates so we could discuss if need be and not waste
more time re-writing/working things that folks have issues with or such. I
captured all the comments so far here with responses.

<< So fair assignment by count (taking into account the current partition
count of each broker) is very good. However, it's worth noting that all
partitions are not created equal. We have actually been performing more
rebalance work based on the partition size on disk, as given equal
retention of all topics, the size on disk is a better indicator of the
amount of traffic a partition gets, both in terms of storage and network
traffic. Overall, this seems to be a better balance.

Agreed though this is out of scope (imho) for what the motivations for the
KIP were. The motivations section is blank (that is on me) but honestly it
is because we did all the development, went back and forth with Neha on the
testing and then had to back it all into the KIP process... Its a
time/resource/scheduling and hope to update this soon on the KIP ... all of
this is in the JIRA and code patch so its not like it is not there just not
in the place maybe were folks are looking since we changed where folks
should look.

Initial cut at "Motivations": the --generate is not used by a lot of folks
because they don't trust it. Issues such as giving different results
sometimes when you run it. Also other feedback from the community that it
does not account for specific uses cases like "adding new brokers" and
"removing brokers" (which is where that patch started
https://issues.apache.org/jira/browse/KAFKA-1678 but then we changed it
after review into just --rebalance
https://issues.apache.org/jira/browse/KAFKA-1792). The use case for add and
remove brokers is one that happens in AWS and auto scailing. There are
other reasons for this too of course.  The goal originally was to make what
folks are already coding today (with the output of " available in the
project for the community. Based on the discussion in the JIRA with Neha we
all agreed that making it be a faire rebalance would fulfill both uses
cases.

<< In addition to this, I think there is very much a need to have Kafka be
rack-aware. That is, to be able to assure that for a given cluster, you
never assign all replicas for a given partition in the same rack. This
would allow us to guard against maintenances or power failures that affect
a full rack of systems (or a given switch).

Agreed, this though I think is out of scope for this change and something
we can also do in the future. There is more that we have to figure out for
rack aware specifically answering "how do we know what rack the broker is
on". I really really (really) worry that we keep trying to put too much
into a single change the discussions go into rabbit holes and good
important features (that are community driven) that could get out there
will get bogged down with different uses cases and scope creep. So, I think
rack awareness is its own KIP that has two parts... setting broker rack and
rebalancing for that. That features doesn't invalidate the need for
--rebalance but can be built on top of it.

<< I think it would make sense to implement the reassignment logic as a
pluggable component. That way it would be easy to select a scheme when
performing a reassignment (count, size, rack aware). Configuring a default
scheme for a cluster would allow for the brokers to create new topics and
partitions in compliance with the requested policy.

I don't agree with this because right now you get back "the current state
of the partitions" so you can (today) write whatever logic you want (with
the information that is there). With --rebalance you also get that back so
moving forward. Moving forward we can maybe expose more information so that
folks can write different logic they want
(like partition number, location (label string for rack), size, throughput
average, etc, etc, etc... but again... that to me is a different
KIP entirely from the motivations of this one. If eventually we want to
make it plugable then we should have a KIP and discussion around it I just
don't see how it relates to the original motivations of the change.

<< Is it possible to describe the proposed partition reassignment algorithm
in more detail on the KIP? In fact, it would be really easy to understand
if we had some concrete examples comparing partition assignment with the
old algorithm and the new.

sure, it is in the JIRA linked to the KIP too though
https://issues.apache.org/jira/browse/KAFKA-1792 and documented in comments
in the patch also as requested. Let me know if this is what you are looking
for and we can simply update the KIP with this information or give more
detail specifically what you think might be missing please.

<< Would we want to
support some kind of automated reassignment of existing partitions
(personally - no. I want to trigger that manually because i

[jira] [Commented] (KAFKA-1910) Refactor KafkaConsumer

2015-03-11 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357366#comment-14357366
 ] 

Guozhang Wang commented on KAFKA-1910:
--

Got some problems with RB, uploading the patch here for a quick review:

{code}
diff --git 
a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
 
b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
index e972efb..436f9b2 100644
--- 
a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
+++ 
b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
@@ -129,7 +129,7 @@ public final class Coordinator {
 
 // process the response
 JoinGroupResponse response = new 
JoinGroupResponse(resp.responseBody());
-// TODO: needs to handle disconnects and errors
+// TODO: needs to handle disconnects and errors, should not just throw 
exceptions
 Errors.forCode(response.errorCode()).maybeThrow();
 this.consumerId = response.consumerId();
 
diff --git 
a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
 
b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
index 27c78b8..8b71fba 100644
--- 
a/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
+++ 
b/clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java
@@ -231,11 +231,12 @@ public class Fetcher {
 log.debug("Fetched offset {} for partition {}", 
offset, topicPartition);
 return offset;
 } else if (errorCode == 
Errors.NOT_LEADER_FOR_PARTITION.code()
-|| errorCode == Errors.LEADER_NOT_AVAILABLE.code()) {
+|| errorCode == 
Errors.UNKNOWN_TOPIC_OR_PARTITION.code()) {
 log.warn("Attempt to fetch offsets for partition {} 
failed due to obsolete leadership information, retrying.",
 topicPartition);
 awaitMetadataUpdate();
 } else {
+// TODO: we should not just throw exceptions but 
should handle and log it.
 Errors.forCode(errorCode).maybeThrow();
 }
 }
diff --git 
a/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java
 
b/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java
index af704f3..f706086 100644
--- 
a/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java
+++ 
b/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetResponse.java
@@ -45,7 +45,9 @@ public class ListOffsetResponse extends 
AbstractRequestResponse {
 /**
  * Possible error code:
  *
- * TODO
+ *  UNKNOWN_TOPIC_OR_PARTITION (3)
+ *  NOT_LEADER_FOR_PARTITION (6)
+ *  UNKNOWN (-1)
  */
 
 private static final String OFFSETS_KEY_NAME = "offsets";
diff --git a/core/src/test/scala/integration/kafka/api/ConsumerTest.scala 
b/core/src/test/scala/integration/kafka/api/ConsumerTest.scala
index fed37e3..8eae1ab 100644
--- a/core/src/test/scala/integration/kafka/api/ConsumerTest.scala
+++ b/core/src/test/scala/integration/kafka/api/ConsumerTest.scala
@@ -260,8 +260,10 @@ class ConsumerTest extends IntegrationTestHarness with 
Logging {
 var iter: Int = 0
 
 override def doWork(): Unit = {
-  killRandomBroker()
+  info("Killed broker %d".format(killRandomBroker()))
+  Thread.sleep(500)
   restartDeadBrokers()
+  info("Restarted all brokers")
 
   iter += 1
   if (iter == numIters)
{code}

> Refactor KafkaConsumer
> --
>
> Key: KAFKA-1910
> URL: https://issues.apache.org/jira/browse/KAFKA-1910
> Project: Kafka
>  Issue Type: Sub-task
>  Components: consumer
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Attachments: KAFKA-1910.patch, KAFKA-1910_2015-03-05_14:55:33.patch
>
>
> KafkaConsumer now contains all the logic on the consumer side, making it a 
> very huge class file, better re-factoring it to have multiple layers on top 
> of KafkaClient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >