from:"Joe Stein"

Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Joe Stein

Congrats!


~ Joe Stein

On Fri, Jun 9, 2017 at 6:49 PM, Neha Narkhede <n...@confluent.io> wrote:

> Well deserved. Congratulations Damian!
>
> On Fri, Jun 9, 2017 at 1:34 PM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hello all,
> >
> >
> > The PMC of Apache Kafka is pleased to announce that we have invited
> Damian
> > Guy as a committer to the project.
> >
> > Damian has made tremendous contributions to Kafka. He has not only
> > contributed a lot into the Streams api, but have also been involved in
> many
> > other areas like the producer and consumer clients, broker-side
> > coordinators (group coordinator and the ongoing transaction coordinator).
> > He has contributed more than 100 patches so far, and have been driving
> on 6
> > KIP contributions.
> >
> > More importantly, Damian has been a very prolific reviewer on open PRs
> and
> > has been actively participating on community activities such as email
> lists
> > and slack overflow questions. Through his code contributions and reviews,
> > Damian has demonstrated good judgement on system design and code
> qualities,
> > especially on thorough unit test coverages. We believe he will make a
> great
> > addition to the committers of the community.
> >
> >
> > Thank you for your contributions, Damian!
> >
> >
> > -- Guozhang, on behalf of the Apache Kafka PMC
> >
> --
> Thanks,
> Neha
>

Re: [VOTE] 0.10.0.0 RC6

2016-05-20 Thread Joe Stein

+1 ran quick start from source and binary release

On Fri, May 20, 2016 at 1:07 PM, Ewen Cheslack-Postava 
wrote:

> +1 validated connect with a couple of simple connectors and console
> producer/consumer.
>
> -Ewen
>
> On Fri, May 20, 2016 at 9:53 AM, Guozhang Wang  wrote:
>
> > +1. Validated maven (should be
> > https://repository.apache.org/content/groups/staging/org/apache/kafka/
> > btw)
> > and binary libraries, quick start.
> >
> > On Fri, May 20, 2016 at 9:36 AM, Harsha  wrote:
> >
> > > +1 . Ran a 3-node cluster with few system tests on our side. Looks
> good.
> > >
> > > -Harsha
> > >
> > > On Thu, May 19, 2016, at 07:47 PM, Jun Rao wrote:
> > > > Thanks for running the release. +1 from me. Verified the quickstart.
> > > >
> > > > Jun
> > > >
> > > > On Tue, May 17, 2016 at 10:00 PM, Gwen Shapira 
> > > wrote:
> > > >
> > > > > Hello Kafka users, developers and client-developers,
> > > > >
> > > > > This is the seventh (!) candidate for release of Apache Kafka
> > > > > 0.10.0.0. This is a major release that includes: (1) New message
> > > > > format including timestamps (2) client interceptor API (3) Kafka
> > > > > Streams.
> > > > >
> > > > > This RC was rolled out to fix an issue with our packaging that
> caused
> > > > > dependencies to leak in ways that broke our licensing, and an issue
> > > > > with protocol versions that broke upgrade for LinkedIn and others
> who
> > > > > may run from trunk. Thanks to Ewen, Ismael, Becket and Jun for the
> > > > > finding and fixing of issues.
> > > > >
> > > > > Release notes for the 0.10.0.0 release:
> > > > > http://home.apache.org/~gwenshap/0.10.0.0-rc6/RELEASE_NOTES.html
> > > > >
> > > > > Lets try to vote within the 72h release vote window and get this
> baby
> > > > > out already!
> > > > >
> > > > > *** Please download, test and vote by Friday, May 20, 23:59 PT
> > > > >
> > > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > > http://kafka.apache.org/KEYS
> > > > >
> > > > > * Release artifacts to be voted upon (source and binary):
> > > > > http://home.apache.org/~gwenshap/0.10.0.0-rc6/
> > > > >
> > > > > * Maven artifacts to be voted upon:
> > > > > https://repository.apache.org/content/groups/staging/
> > > > >
> > > > > * java-doc
> > > > > http://home.apache.org/~gwenshap/0.10.0.0-rc6/javadoc/
> > > > >
> > > > > * tag to be voted upon (off 0.10.0 branch) is the 0.10.0.0 tag:
> > > > >
> > > > >
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=065899a3bc330618e420673acf9504d123b800f3
> > > > >
> > > > > * Documentation:
> > > > > http://kafka.apache.org/0100/documentation.html
> > > > >
> > > > > * Protocol:
> > > > > http://kafka.apache.org/0100/protocol.html
> > > > >
> > > > > /**
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Gwen
> > > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> Thanks,
> Ewen
>

Re: KIP-4 Wiki Update

2016-03-30 Thread Joe Stein

If the meta data change for KIP-4 happens maybe we just update the KIP-4
page for the release, it would be great to have this available in 0.10

On Wed, Mar 30, 2016 at 3:36 PM, Grant Henke  wrote:

> Additionally feel free to review the PR (#1095
> ) to see what the current
> proposal/implementation looks like. Right now its using null to signify no
> topics. That can be changed based on the discussion here.
>
> On Wed, Mar 30, 2016 at 2:34 PM, Grant Henke  wrote:
>
> > Thank you for the input everyone! I will try to address some of it below.
> > I will stick to the metadata related discussion first and we can go from
> > there.
> >
> > I would like to stay away from language/client style discussion to start.
> > Instead I think it would be useful to focus on reviewing the wire
> protocol
> > suggested. Languages have many ways of handling null, but this can all be
> > handled by the client API to be user friendly/clear.
> >
> > I think right now the most critical question, is if null vs empty list is
> > clear enough for protocol usage. Or is a boolean better and why?
> >
> >
> > MetadataRequest v1: long-term / conceptually, I think a "null" topic list
> >> aligns better with fetching all topics. Empty list aligns better with
> >> fetching no topics. I recognize this means that empty list behaves
> >> differently in v0 versus v1. But hey, what are protocol versions good
> for
> >> if not changing behavior... :) API design comment. take it or leave it.
> >
> >
> > I do agree that if we are making a change to use nulls, empty list = no
> > topics and null = all makes more sense. But it also makes the behavior
> > transition of the protocol a bit more confusing. Since the existing
> > behavior changes, not just the new behavior.
> >
> >
> > In the wire protocol, we would still use a size of -1 like we normally do
> >> for nullable strings and nullable bytes. So, it's still a bit magical,
> but
> >> efficient and associated with the right field (which avoids some invalid
> >> states that are possible if we use two fields). In other words, each
> >> implementation of the protocol is responsible for figuring out an
> >> idiomatic
> >> and hopefully safe way to represent the absence of a value.
> >
> >
> > Agreed. Thank you for summarizing.
> >
> >
> > Process comment: Since KIP-4 is voted and signed. Perhaps a small KIP
> >> detailing the suggested Metadata API changes so it will be easy to refer
> >> to
> >> what we are discussing?
> >>
> >
> > I would prefer not to break KIP-4 out into more KIPs, since the KIP
> > encompasses the high level goal (it the sum of the parts that is the
> > functionality/goal). That said, I do like breaking it up to get things in
> > and make progress. Could we just hold multiple votes for various pieces?
> In
> > this case we would be voting for the "KIP-4 Metadata API Changes".
> >
> >
> > On Wed, Mar 30, 2016 at 1:02 PM, Ismael Juma  wrote:
> >
> >> We can add an internal class until then (it's pretty trivial) since the
> >> request classes are internal.
> >>
> >> Ismael
> >>
> >> On Wed, Mar 30, 2016 at 7:00 PM, Gwen Shapira 
> wrote:
> >>
> >> > I like it, but we are not on Java8 yet, and I don't think we want to
> >> block
> >> > on that :)
> >> >
> >> > On Wed, Mar 30, 2016 at 10:53 AM, Ismael Juma 
> >> wrote:
> >> >
> >> > > On Wed, Mar 30, 2016 at 6:21 PM, Gwen Shapira 
> >> wrote:
> >> > >
> >> > > > Ismael, can you detail how the Optional approach would work in the
> >> wire
> >> > > > protocol? It sounds good, but I'm unclear on what this would look
> >> like
> >> > on
> >> > > > the wire.
> >> > > >
> >> > >
> >> > > In the wire protocol, we would still use a size of -1 like we
> >> normally do
> >> > > for nullable strings and nullable bytes. So, it's still a bit
> magical,
> >> > but
> >> > > efficient and associated with the right field (which avoids some
> >> invalid
> >> > > states that are possible if we use two fields). In other words, each
> >> > > implementation of the protocol is responsible for figuring out an
> >> > idiomatic
> >> > > and hopefully safe way to represent the absence of a value.
> >> > >
> >> > > In Java (Scala would be similar) we would convert this to an
> >> > > Optional to make it clear that the value could be
> absent
> >> > (and
> >> > > avoid NPEs). The fact that absence of a value means "all topics"
> makes
> >> > > sense if one thinks about that field as a filter (absence of a value
> >> > means
> >> > > no filter).
> >> > >
> >> > > I can see pros and cons for each approach, personally. :)
> >> > >
> >> > > Ismael
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com |

Re: [VOTE] Release plan - Kafka 0.10.0

2016-03-07 Thread Joe Stein

+1

quick question/definition for the release cut (assuming vote passes) your
proposing please.

Critical bug fixes for new features and regression or just regression and
new feature can get pulled if not working right if less impactful to-do so?
Understandably that is dependent on the feature and/or fix but we have a
bunch on the plan for
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan and
is that the hard freeze? I think the producer and consumer interceptors are
part of streams so maybe just an update on that?

+1 the timeline seems manageable and adjusting for what is released or not
doesn't affect the approach so g2g lots to get out

for 0.11 I would like to suggest trying to nominate or if volunteer always
good a release manager along with what is being worked on and collaborate
around for different KIP so folks know where to contribute and work
downstream too operational.

~ Joe Stein

On Mon, Mar 7, 2016 at 12:27 PM, Gwen Shapira <g...@confluent.io> wrote:

> Greetings Kafka Developer Community,
>
> As you all know, we have few big features that are almost complete
> (Timestamps! Interceptors! Streams!). It is time to start planning our
> next release.
>
> I suggest the following:
> * Cut branches on March 21st
> * Publish the first release candidate the next day
> * Start testing, finding important issues, fixing them, rolling out new
> releases
> * And eventually get a release candidate that we all agree is awesome
> enough to release. Hopefully this won't take too many iterations :)
>
> Note that this is a 2 weeks heads-up on branch cutting. After we cut
> branches, we will try to minimize cherrypicks to just critical bugs
> (because last major release was a bit insane).
> Therefore,  if you have a feature that you really want to see in
> 0.10.0 - you'll need to have it committed by March 21st. As a curtesy
> to the release manager, if you have features that you are not planning
> on getting in for 0.10.0, please change the "fix version" field in
> JIRA accordingly.
>
> I will send a heads-up few days before cutting branches, to give
> everyone a chance to get stragglers in.
>
> The vote will be open for 72 hours.
> All in favor, please reply with +1.
>
> Gwen Shapira
>

Re: [VOTE] Deprecating the old Scala producers for 0.10.0.0

2016-03-03 Thread Joe Stein

+1

~ Joestein
On Mar 3, 2016 3:01 PM, "Neha Narkhede"  wrote:

> +1
>
> On Thu, Mar 3, 2016 at 2:36 PM, Ismael Juma  wrote:
>
> > Hi all,
> >
> > The new Java producer was introduced in 0.8.2.0 (released in February
> > 2015). It has become the default implementation for various tools since
> > 0.9.0.0 (released in October 2015) and it is the only implementation with
> > support for the security features introduced in 0.9.0.0.
> >
> > Given this, I think we should deprecate the old Scala producers for
> > 0.10.0.0 by adding @deprecated annotations in the code and updating the
> the
> > documentation to encourage usage of the new Java producer. This would
> give
> > our users a stronger signal regarding our plans to focus on the new Java
> > producer going forward.
> >
> > Note that this proposal is only about deprecating the old Scala producers
> > as,
> > in my opinion, it is too early to do the same for the old Scala
> consumers.
> > The
> > new Java consumer was only introduced in 0.9.0.0 and it's still marked as
> > beta. It would be good to have a full release cycle where the new
> consumer
> > is no longer in beta before we deprecate the old consumers. We are hoping
> > to remove the beta label for the consumer for 0.10.0.0, but that's a
> > separate discussion.
> >
> > With regards to removal of the deprecated producers, the current thinking
> > is to remove all Scala clients at the same time, so it will take at least
> > two non bug-fix release cycles (it could take longer depending on users'
> > feedback).
> >
> > The feedback was mostly positive in the discuss thread although some
> points
> > were raised about deprecating the old producers before deprecating the
> old
> > consumers:
> >
> >
> >
> http://search-hadoop.com/m/uyzND1KVJJmcbgAf2=+DISCUSS+Deprecating+the+old+Scala+producers+for+the+next+release
> >
> > The JIRA for tracking this is KAFKA-2982.
> >
> > The vote will run for 72 hours.
> >
> > Thanks,
> > Ismael
> >
>
>
>
> --
> Thanks,
> Neha
>

[jira] [Commented] (KAFKA-3015) Improve JBOD data balancing

2015-12-22 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069215#comment-15069215
 ] 

Joe Stein commented on KAFKA-3015:
--

Can we do both of these at the same time 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-18+-+JBOD+Support and 
provide an option for folks by topic for which it is using? I haven't taken a 
look in a while at KAFKA-2188 if that is also a good direction for folks we 
should talk about picking that back up too. Its a little stale but some re-base 
and reviews, fixes, reviews if folks have need for Kafka brokers staying up on 
disk failure without RAID. So it would be like at least 3 parts to it. There 
may be other items in the "JBOD" realm folks want to work on too.

> Improve JBOD data balancing
> ---
>
> Key: KAFKA-3015
> URL: https://issues.apache.org/jira/browse/KAFKA-3015
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jay Kreps
>
> When running with multiple data directories (i.e. JBOD) we currently place 
> partitions entirely within one data directory. This tends to lead to poor 
> balancing across disks as some topics have more throughput/retention and not 
> all disks get data from all topics. You can't fix this problem with smarter 
> partition placement strategies because ultimately you don't know when a 
> partition is created when or how heavily it will be used (this is a subtle 
> point, and the tendency is to try to think of some more sophisticated way to 
> place partitions based on current data size but this is actually 
> exceptionally dangerous and can lead to much worse imbalance when creating 
> many partitions at once as they would all go to the disk with the least 
> data). We don't support online rebalancing across directories/disks so this 
> imbalance is a big problem and limits the usefulness of this configuration. 
> Implementing online rebalancing of data across disks without downtime is 
> actually quite hard and requires lots of I/O since you have to actually 
> rewrite full partitions of data.
> An alternative would be to place each partition in *all* directories/drives 
> and round-robin *segments* within the partition across the directories. So 
> the layout would be something like:
>   drive-a/mytopic-0/
>   000.data
>   000.index
>   0024680.data
>   0024680.index
>   drive-a/mytopic-0/
>   0012345.data
>   0012345.index
>   0036912.data
>   0036912.index
> This is a little harder to implement than the current approach but not very 
> hard, and it is a lot easier than implementing online data balancing across 
> disks while retaining the current approach. I think this could easily be done 
> in a backwards compatible way.
> I think the balancing you would get from this in most cases would be good 
> enough to make JBOD the default configuration. Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Unifying kafka-clients call signatures

2015-12-22 Thread Joe Stein

Along with the KIP what do folks think about (if vote passes and code
commits) a 0.9.0.1? We could slate the 0.9.0.1 release for second week of
January maybe?

It might be too soon to flip the entire "unstable" bit at once. a few more
weeks might help flesh that out some. We could also keep doing that until
0.9.1.0 and call it stable then.

It looks like the KIP has to get written up and voted on. The discuss looks
like it happened in the JIRA we should give folks at least time (via a
[DISCUSS]) to see that and can comment and chat more if need be in the
JIRA.

~ Joe Stein

On Tue, Dec 22, 2015 at 9:57 PM, Gwen Shapira <g...@confluent.io> wrote:

> (Moving discussion to dev)
>
> Since this is a public API change, don't we technically need a KIP + Vote?
>
> On Mon, Dec 21, 2015 at 11:38 PM, Pierre-Yves Ritschard <p...@spootnik.org>
> wrote:
>
> > Hi list,
> >
> > I've been working on an issue at
> > https://issues.apache.org/jira/browse/KAFKA-3006 and it is now a good
> > time to ask for feedback.
> >
> > The attached PR moves all signatures which accepted either arrays or
> > java.util.List to accept java.util.Collection. The aim is to provide
> > consumers of kafka-clients a unified way to work with sequences.
> >
> > Some concern was raised in the issue wrt to potential source
> > compatibility issues when different versions of the kafka-clients JAR
> > end up on a given classpath. Any people who feel they might be impacted
> > is encouraged to mention it here to inform the decision (it would still
> > be possible to keep the other signatures around but it adds a load of
> > bloat and decreases legibility/clarity IMO).
> >
>

Re: [ANNOUNCE] New Kafka Committer Ewen Cheslack-Postava

2015-12-08 Thread Joe Stein

Ewen,

Congrats!

~ Joestein

On Tue, Dec 8, 2015 at 2:51 PM, Guozhang Wang  wrote:

> Congrats Ewen! Welcome onboard.
>
> Guozhang
>
> On Tue, Dec 8, 2015 at 11:42 AM, Liquan Pei  wrote:
>
> > Congrats, Ewen!
> >
> > On Tue, Dec 8, 2015 at 11:37 AM, Neha Narkhede 
> wrote:
> >
> > > I am pleased to announce that the Apache Kafka PMC has voted to
> > > invite Ewen Cheslack-Postava as a committer and Ewen has accepted.
> > >
> > > Ewen is an active member of the community and has contributed and
> > reviewed
> > > numerous patches to Kafka. His most significant contribution is Kafka
> > > Connect which was released few days ago as part of 0.9.
> > >
> > > Please join me on welcoming and congratulating Ewen.
> > >
> > > Ewen, we look forward to your continued contributions to the Kafka
> > > community!
> > >
> > > --
> > > Thanks,
> > > Neha
> > >
> >
> >
> >
> > --
> > Liquan Pei
> > Department of Physics
> > University of Massachusetts Amherst
> >
>
>
>
> --
> -- Guozhang
>

Re: [DISCUSS] KIP-30 Allow for brokers to have plug-able consensus and meta data storage sub systems

2015-12-01 Thread Joe Stein

Yeah, lets do both! :) I always had trepidations about leaving things as is
with ZooKeeper there. Can we have this new internal system be what replaces
that but still make it modular somewhat.

The problem with any new system is that everyone already trusts and relies
on the existing scars we know heal. That is why we all are still using
ZooKeeper ( I bet at least 3 clusters are still on 3.3.4 and one maybe
3.3.1 or something nutty ).

etcd
consul
c*
riak
akka

All have viable solutions and i have no idea what will be best or worst or
even work but lots of folks are working on it now trying to get things to
be different and work right for them.

I think a native version should be there in the project and I am 100% on
board with that native version NOT be ZooKeeper but homegrown.

I also think the native default should use the KIP-30 interface so other
server can also connect the feature they are solving also (that way
deployments that have already adopted XYZ for consensus can use it).

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

On Tue, Dec 1, 2015 at 2:58 PM, Jay Kreps <j...@confluent.io> wrote:

> Hey Joe,
>
> Thanks for raising this. People really want to get rid of the ZK
> dependency, I agree it is among the most asked for things. Let me give a
> quick critique and a more radical plan.
>
> I don't think making ZK pluggable is the right thing to do. I have a lot of
> experience with this dynamic of introducing plugins for core functionality
> because I previously worked on a key-value store called Voldemort in which
> we made both the protocol and storage engine totally pluggable. I
> originally felt this was a good thing both philosophically and practically,
> but in retrospect came to believe it was a huge mistake--what people really
> wanted was one really excellent implementation with the kind of insane
> levels of in-production usage and test coverage that infrastructure
> demands. Pluggability is actually really at odds with this, and the ability
> to actually abstract over some really meaty dependency like a storage
> engine never quite works.
>
> People dislike the ZK dependency because it effectively doubles the
> operational load of Kafka--it doubles the amount of configuration,
> monitoring, and understanding needed. Replacing ZK with a similar system
> won't fix this problem though--all the other consensus services are equally
> complex (and often less mature)--and it will cause two new problems. First
> there will be a layer of indirection that will make reasoning and improving
> the ZK implementation harder. For example, note that your plug-in api
> doesn't seem to cover multi-get and multi-write, when we added that we
> would end up breaking all plugins. Each new thing will be like that. Ops
> tools, config, documentation, etc will no longer be able to include any
> coverage of ZK because we can't assume ZK so all that becomes much harder.
> The second problem is that this introduces a combinatorial testing problem.
> People say they want to swap out ZK but they are assuming whatever they
> swap in will work equally well. How will we know that is true? The only way
> to explode out the testing to run with every possible plugin.
>
> If you want to see this in action take a look at ActiveMQ. ActiveMQ is less
> a system than a family of co-operating plugins and a configuration language
> for assembling them. Software engineers and open source communities are
> really prone to this kind of thing because "we can just make it pluggable"
> ends any argument. But the actual implementation is a mess, and later
> improvements in their threading, I/O, and other core models simply couldn't
> be made across all the plugins.
>
> This blog post on configurability in UI is a really good summary of a
> similar dynamic:
> http://ometer.com/free-software-ui.html
>
> Anyhow, not to go too far off on a rant. Clearly I have plugin PTSD :-)
>
> I think instead we should explore the idea of getting rid of the zookeeper
> dependency and replace it with an internal facility. Let me explain what I
> mean. In terms of API what Kafka and ZK do is super different, but
> internally it is actually quite similar--they are both trying to maintain a
> CP log.
>
> What would actually make the system significantly simpler would be to
> reimplement the facilities you describe on top of Kafka's existing
> infrastructure--using the same log implementation, network stack, config,
> monitoring, etc. If done correctly this would dramatically lower the
> operational load of the system versus the current Kafka+ZK or proposed
> Kafka+X.
>
> I don't have a proposal for how this would work and it's some

[DISCUSS] KIP-30 Allow for brokers to have plug-able consensus and meta data storage sub systems

2015-12-01 Thread Joe Stein

I would like to start a discussion around the work that has started in
regards to KIP-30
https://cwiki.apache.org/confluence/display/KAFKA/KIP-30+-+Allow+for+brokers+to+have+plug-able+consensus+and+meta+data+storage+sub+systems

The impetus for working on this came a lot from the community. For the last
year(~+) it has been the most asked question at any talk I have given
(personally speaking). It has come up a bit also on the mailing list
talking about zkclient vs currator. A lot of folks want to use Kafka but
introducing dependencies are hard for the enterprise so the goals behind
this is making it so that using Kafka can be done as easy as possible for
the operations teams to-do when they do. If they are already supporting
ZooKeeper they can keep doing that but if not they want (users) to use
something else they are already supporting that can plug-in to-do the same
things.

For the core project I think we should leave in upstream what we have. This
gives a great baseline regression for folks and makes the work for "making
what we have plug-able work" a good defined task (carve out, layer in API
impl, push back tests pass). From there then when folks want their
implementation to be something besides ZooKeeper they can develop, test and
support that if they choose.

We would like to suggest that we have the plugin interface be Java based
for minimizing depends for JVM impl. This could be in another directory
something TBD /.

If you have a server you want to try to get it working but you aren't on
the JVM don't be afraid just think about a REST impl and if you can work
inside of that you have some light RPC layers (this was the first pass
prototype we did to flush-out the public api presented on the KIP).

There are a lot of parts to working on this and the more implementations we
have the better we can flush out the public interface. I will leave the
technical details and design to JIRA tickets that are linked through the
confluence page as these decisions come about and code starts for reviews
and we can target the specific modules having the context separate is
helpful especially if multiple folks are working on it.
https://issues.apache.org/jira/browse/KAFKA-2916

Do other folks want to build implementations? Maybe we should start a
confluence page for those or use an existing one and add to it so we can
coordinate some there to.

Thanks!

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

Re: [kafka-clients] Re: 0.9.0.0 RC4

2015-11-24 Thread Joe Stein

Thanks for everyone that contributed to this release! It has been a long
time in the works with some really great new additions for folks waiting
with excitement of the new consumer, security and connect (copycat) and
everything else baked in.

Thanks again!

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

On Mon, Nov 23, 2015 at 11:49 PM, Jun Rao <j...@confluent.io> wrote:

> Thanks everyone for voting.
>
> The following are the results of the votes.
>
> +1 binding = 4 votes (Neha Narkhede, Sriharsha Chintalapani, Guozhang
> Wang, Jun Rao)
> +1 non-binding = 3 votes
> -1 = 0 votes
> 0 = 0 votes
>
> The vote passes.
>
> I will release artifacts to maven central, update the dist svn and download
> site. Will send out an announce after that.
>
> Jun
>
> On Mon, Nov 23, 2015 at 8:46 PM, Jun Rao <j...@confluent.io> wrote:
>
>> +1
>>
>> Thanks,
>>
>> Jun
>>
>> On Fri, Nov 20, 2015 at 5:21 PM, Jun Rao <j...@confluent.io> wrote:
>>
>>> This is the fourth candidate for release of Apache Kafka 0.9.0.0. This a
>>> major release that includes (1) authentication (through SSL and SASL) and
>>> authorization, (2) a new java consumer, (3) a Kafka connect framework for
>>> data ingestion and egression, and (4) quotas. Since this is a major
>>> release, we will give people a bit more time for trying this out.
>>>
>>> Release Notes for the 0.9.0.0 release
>>>
>>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/RELEASE_NOTES.html
>>>
>>> *** Please download, test and vote by Monday, Nov. 23, 6pm PT
>>>
>>> Kafka's KEYS file containing PGP keys we use to sign the release:
>>> http://kafka.apache.org/KEYS in addition to the md5, sha1
>>> and sha2 (SHA256) checksum.
>>>
>>> * Release artifacts to be voted upon (source and binary):
>>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/
>>>
>>> * Maven artifacts to be voted upon prior to release:
>>> https://repository.apache.org/content/groups/staging/
>>>
>>> * scala-doc
>>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/scaladoc/
>>>
>>> * java-doc
>>> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate4/javadoc/
>>>
>>> * The tag to be voted upon (off the 0.9.0 branch) is the 0.9.0.0 tag
>>>
>>> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=132943b0f83831132cd46ac961cf6f1c00132565
>>>
>>> * Documentation
>>> http://kafka.apache.org/090/documentation.html
>>>
>>> /***
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "kafka-clients" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kafka-clients+unsubscr...@googlegroups.com.
> To post to this group, send email to kafka-clie...@googlegroups.com.
> Visit this group at http://groups.google.com/group/kafka-clients.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kafka-clients/CAFc58G9cig_D_9b4cvcs%3DnLyC-1%2BwF1f%2B%2BnMg%2B89qDHGVcLb_A%40mail.gmail.com
> <https://groups.google.com/d/msgid/kafka-clients/CAFc58G9cig_D_9b4cvcs%3DnLyC-1%2BwF1f%2B%2BnMg%2B89qDHGVcLb_A%40mail.gmail.com?utm_medium=email_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

Re: One more Kafka Meetup hosted by LinkedIn in 2015 (this time in San Francisco) - does anyone want to talk?

2015-11-04 Thread Joe Stein

They should all be on the user groups section of the confluence page
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+papers+and+presentations
for which there were video. It might need some curating but that is where
it has been going so far.

~ Joe Stein

On Tue, Nov 3, 2015 at 4:48 PM, Grant Henke <ghe...@cloudera.com> wrote:

> Is there a place where we can find all previously streamed/recorded
> meetups?
>
> Thank you,
> Grant
>
> On Tue, Nov 3, 2015 at 2:07 PM, Ed Yakabosky <eyakabo...@linkedin.com>
> wrote:
>
> > I'm sorry to hear that Lukas.  I have heard that people are starting to
> do
> > carpools via rydeful.com for some of these meetups.
> >
> > Additionally, we will live stream and record the presentations, so you
> can
> > participate remotely.
> >
> > Ed
> >
> > On Tue, Nov 3, 2015 at 10:43 AM, Lukas Steiblys <lu...@doubledutch.me>
> > wrote:
> >
> > > This is sad news. I was looking forward to finally going to a Kafka or
> > > Samza meetup. Going to Mountain View for a meetup is just unrealistic
> > with
> > > 2h travel time each way.
> > >
> > > Lukas
> > >
> > > -Original Message- From: Ed Yakabosky
> > > Sent: Tuesday, November 3, 2015 10:36 AM
> > > To: us...@kafka.apache.org ; dev@kafka.apache.org ; Clark Haskins
> > > Subject: Re: One more Kafka Meetup hosted by LinkedIn in 2015 (this
> time
> > > in San Francisco) - does anyone want to talk?
> > >
> > > Hi all,
> > >
> > > Two corrections to the invite:
> > >
> > >   1. The invitation is for November 18, 2015.  *NOT 2016.*  I was a
> > little
> > >   hasty...
> > >   2. LinkedIn has finished remodeling our broadcast room, so we are
> going
> > >
> > >   to host the meet up in Mountain View, not San Francisco.
> > >
> > > We've arranged for speakers from HortonWorks to talk about Security and
> > > LinkedIn to talk about Quotas.  We are still looking for one more
> > speaker,
> > > so please let me know if you are interested.
> > >
> > > Thanks!
> > > Ed
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Oct 30, 2015 at 12:49 PM, Ed Yakabosky <
> eyakabo...@linkedin.com>
> > > wrote:
> > >
> > > Hi all,
> > >>
> > >> LinkedIn is hoping to host one more Apache Kafka meetup this year on
> > >> November 18 in our San Francisco office.  We're working on building
> the
> > >> agenda now.  Does anyone want to talk?  Please send me (and Clark) a
> > >> private email with a short description of what you would be talking
> > about
> > >> if interested.
> > >>
> > >> --
> > >> Thanks,
> > >>
> > >> Ed Yakabosky
> > >> Technical Program Management @ LinkedIn>
> > >>
> > >>
> > >
> > > --
> > > Thanks,
> > > Ed Yakabosky
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ed Yakabosky
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>

[jira] [Commented] (KAFKA-2079) Support exhibitor

2015-09-25 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14907723#comment-14907723
 ] 

Joe Stein commented on KAFKA-2079:
--

Not yet, I started a KIP for it 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-30+-+Allow+for+brokers+to+have+plug-able+consensus+and+meta+data+storage+sub+systems
 

I think we need a way to have some plug-able support for different remote 
interface/system libraries.

It would be great for the base Kafka code to continue to support how it works 
now. The existing code should get refactored and included with the release the 
kafka-zkclient-connector.jar (or such) still support how it works now for folks.
We could then have a way for launching other libraries (not much unlike for 
metrics) to implement the remote interfaces.

- async watchers
- leader election
- meta data storage

I don't think we should separate out the plug-in separately it should contain 
at least those 3 base functionality. 

Then folks can work on and build out different implementations and/or 
collaborate on implementations. 

I still need to write it up some more in the KIP and start a discussion on the 
mailing list. 

KAFKA-873 could be another implementation to use curator (or maybe it can get 
into exhibitor or something). Other implementations folks have brought up are 
Akka, Consul and etcd. I think folks can work on and build out and support the 
different implementations and the existing Kafka brokers can still work how 
they do now. At some point in a subsequent release we could replace the project 
released jar with something else.

> Support exhibitor
> -
>
> Key: KAFKA-2079
> URL: https://issues.apache.org/jira/browse/KAFKA-2079
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Aaron Dixon
>
> Exhibitor (https://github.com/Netflix/exhibitor) is a discovery/monitoring 
> solution for managing Zookeeper clusters. It supports use cases like 
> discovery, node replacements and auto-scaling of Zk cluster hosts (so you 
> don't have to manage a fixed set of Zk hosts--especially useful in cloud 
> environments.)
> The easiest way for Kafka to support connection to Zk clusters via exhibitor 
> is to use curator as its client. There is already a separate ticket for this: 
> KAFKA-873



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] KIP-28 - Add a processor client for stream data processing

2015-09-21 Thread Joe Stein

+1

~ Joestein
On Sep 21, 2015 6:28 PM, "Guozhang Wang"  wrote:

> Hello all,
>
> I would like to start the voting process on the following KIP: add a
> processor client
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client
> >
> .
>
> The design summary and the discussion threads can be found on the wiki.
>
> The vote will run for 72 hours.
>
> -- Guozhang
>

Re: Maybe 0.8.3 should really be 0.9.0?

2015-09-10 Thread Joe Stein

are we going to deem the new consumer in 0.9.0 as beta? Do we want to-do a
0.9.0-beta and this way when the consumer is g2g we 0.9.0.0

0.9.0-beta also allows us to release a lot of new things a bit sooner and
have some good cycles of fixes (because you know they will come)

There is enough new stuff that 0.9-something makes sense, +1 on not 0.8.3


On Thu, Sep 10, 2015 at 11:01 AM, Grant Henke  wrote:

> +1 for 0.9
>
> On Wed, Sep 9, 2015 at 2:20 AM, Stevo Slavić  wrote:
>
> > +1 (non-binding) for 0.9
> >
> > On Wed, Sep 9, 2015 at 6:41 AM, Jun Rao  wrote:
> >
> > > +1 for 0.9.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Sep 8, 2015 at 3:04 PM, Ismael Juma  wrote:
> > >
> > > > +1 (non-binding) for 0.9.
> > > >
> > > > Ismael
> > > >
> > > > On Tue, Sep 8, 2015 at 10:19 AM, Gwen Shapira 
> > wrote:
> > > >
> > > > > Hi Kafka Fans,
> > > > >
> > > > > What do you think of making the next release (the one with
> security,
> > > new
> > > > > consumer, quotas, etc) a 0.9.0 instead of 0.8.3?
> > > > >
> > > > > It has lots of new features, and new consumer was pretty much
> scoped
> > > for
> > > > > 0.9.0, so it matches our original roadmap. I feel that so many
> > awesome
> > > > > features deserve a better release number.
> > > > >
> > > > > The downside is mainly some confusion (we refer to 0.8.3 in bunch
> of
> > > > > places), and noisy emails from JIRA while we change "fix version"
> > field
> > > > > everywhere.
> > > > >
> > > > > Thoughts?
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>

Re: Maybe 0.8.3 should really be 0.9.0?

2015-09-10 Thread Joe Stein

Jun,

Makes sense, thanks!

~ Joestein
On Sep 10, 2015 1:05 PM, "Jun Rao" <j...@confluent.io> wrote:

> Hi, Joe,
>
> One of the reasons that we have been doing beta releases before is to
> stabilize the public apis. However, in trunk, we have introduced the api
> stability annotation. The new java consumer api is marked as unstable. With
> this, even if we name the first release of the new consumer as 0.9.0.0
> (i.e., w/o beta), the users will understand that the api is subject to
> change. Then, we just need to be prepared for 0.9.0.x releases soon after
> for critical bug fixes since there are lots of new code in 0.9.0.0.
>
> Thanks,
>
> Jun
>
> On Thu, Sep 10, 2015 at 8:24 AM, Joe Stein <joe.st...@stealth.ly> wrote:
>
> > are we going to deem the new consumer in 0.9.0 as beta? Do we want to-do
> a
> > 0.9.0-beta and this way when the consumer is g2g we 0.9.0.0
> >
> > 0.9.0-beta also allows us to release a lot of new things a bit sooner and
> > have some good cycles of fixes (because you know they will come)
> >
> > There is enough new stuff that 0.9-something makes sense, +1 on not 0.8.3
> >
> >
> > On Thu, Sep 10, 2015 at 11:01 AM, Grant Henke <ghe...@cloudera.com>
> wrote:
> >
> > > +1 for 0.9
> > >
> > > On Wed, Sep 9, 2015 at 2:20 AM, Stevo Slavić <ssla...@gmail.com>
> wrote:
> > >
> > > > +1 (non-binding) for 0.9
> > > >
> > > > On Wed, Sep 9, 2015 at 6:41 AM, Jun Rao <j...@confluent.io> wrote:
> > > >
> > > > > +1 for 0.9.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Sep 8, 2015 at 3:04 PM, Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > > > >
> > > > > > +1 (non-binding) for 0.9.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Tue, Sep 8, 2015 at 10:19 AM, Gwen Shapira <g...@confluent.io
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi Kafka Fans,
> > > > > > >
> > > > > > > What do you think of making the next release (the one with
> > > security,
> > > > > new
> > > > > > > consumer, quotas, etc) a 0.9.0 instead of 0.8.3?
> > > > > > >
> > > > > > > It has lots of new features, and new consumer was pretty much
> > > scoped
> > > > > for
> > > > > > > 0.9.0, so it matches our original roadmap. I feel that so many
> > > > awesome
> > > > > > > features deserve a better release number.
> > > > > > >
> > > > > > > The downside is mainly some confusion (we refer to 0.8.3 in
> bunch
> > > of
> > > > > > > places), and noisy emails from JIRA while we change "fix
> version"
> > > > field
> > > > > > > everywhere.
> > > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Grant Henke
> > > Software Engineer | Cloudera
> > > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> > >
> >
>

Re: question in regard to KIP-30

2015-08-17 Thread Joe Stein

I don't think it makes sense to change the core default implementation with
KIP-30. Too much risk both in stability and in increasing the time to
getting it done and available for folks that want to try Kafka without
Zookeeper.

It would be interesting to see how that implementation would work along
with the others too if that would be something folks wanted to support in
their environment I encourage that.

I will be sending out a discuss thread on KIP-30 hopefully in the next day
or so we can go back and forth on the motivations and purposes behind it
and whatever technical details are required too of course.

~ Joe Stein

On Mon, Aug 17, 2015 at 9:29 AM, Sergiy Yevtushenko 
sergiy.yevtushe...@gmail.com wrote:

 Hi,

 Are there any plans to work on this improvement?
 As a possible core for default implementation it might worth to consider
 https://github.com/belaban/jgroups-raft .
 It already contains RAFT consensus algorithm implementation. Adding
 something like distributed hash map for shared metadata should not be an
 issue.

 Regards,
 Sergiy.

[jira] [Commented] (KAFKA-2339) broker becomes unavailable if bad data is passed through the protocol

2015-07-20 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633756#comment-14633756
 ] 

Joe Stein commented on KAFKA-2339:
--

I haven't had a chance to try to reproduce this yet more exactly. I will see 
about doing that in the next day or so.

 broker becomes unavailable if bad data is passed through the protocol
 -

 Key: KAFKA-2339
 URL: https://issues.apache.org/jira/browse/KAFKA-2339
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Assignee: Timothy Chen
Priority: Critical
 Fix For: 0.8.3


 I ran into a situation that a non integer value got past for the partition 
 and the brokers went bonkers.
 reproducible
 {code}
 ah=1..2
 echo don't do this in production|kafkacat -b localhost:9092 -p $ah
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-2339) broker becomes unavailable if bad data is passed through the protocol

2015-07-15 Thread Joe Stein (JIRA)

Joe Stein created KAFKA-2339:


 Summary: broker becomes unavailable if bad data is passed through 
the protocol
 Key: KAFKA-2339
 URL: https://issues.apache.org/jira/browse/KAFKA-2339
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Priority: Critical
 Fix For: 0.8.3


I ran into a situation that a non integer value got past for the partition and 
the brokers went bonkers.

reproducible

{code}
ah=1..2
echo don't do this in production|kafkacat -b localhost:9092 -p $ah
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Json libraries for Kafka

2015-07-14 Thread Joe Stein

Fasterxml/Jackson +1 to that. The scala databinds to case classes are gr8.

~ Joestein
On Jul 14, 2015 5:42 PM, Ewen Cheslack-Postava e...@confluent.io wrote:

Currently the clients/server mismatch wouldn't be an issue since there are
no client-side uses of JSON, right? That said, if Copycat ends up included
in Kafka we'll need to provide at least one serializer which would be
written in Java and I suspect some people would like JSON to be a candidate
for that.

I'd personally go with Jackson just because of how widely used it is and so
we have one library for both Scala and Java. The use of JSON in the code
base isn't terribly complex, so I don't think a specialized API for scala
provides much benefit.

-Ewen

On Mon, Jul 13, 2015 at 2:05 PM, Ismael Juma ism...@juma.me.uk wrote:

Hi all,

Kafka currently use scala.util.parsing.json.JSON as its json parser and
it
has a number of issues:

* It encourages unsafe casts (returns `Option[Any]`)
* It's slow (it relies on parser combinators under the hood)
* It's not thread-safe (so external locks are needed to use it in a
concurrent environment)
* It's deprecated (it should have never been included in the standard
library in the first place)

KAFKA-1595[1] has been filed to track this issue.

I initially proposed a change using spray-json's AST with the jawn
parser[2]. Gwen expressed some reservations about the choice (a previous
discussion had concluded that Jackson should be used instead) and asked
me
to raise the issue in the mailing list[3].

In order to have a fair comparison, I implemented the change using
Jackson
as well[4]. I paste part of the commit message:

A thin wrapper over Jackson's Tree Model API is used as the replacement.
This wrapper
increases safety while providing a simple, but powerful API through the
usage of the
`DecodeJson` type class. Even though this has a maintenance cost, it
makes
the API
much more convenient from Scala. A number of tests were added to verify
the
behaviour of this wrapper. The Scala module for Jackson doesn't provide
any
help for our current usage, so we don't
depend on it.

A comparison between the two approaches as I see it:

Similarities:

1. The code for users of the JSON library is similar
2. No third-party dependencies
3. Good performance

In favour of using Jackson:

1. Same library for client and broker
2. Widely used

In favour of using spray-json and jawn:

1. Simple type class based API is included and it has a number of nice
features:
1. Support for parsing into case classes (we don't use this yet,
but
we could use it to make the code safer and more readable in some
cases)[5].
2. Very little reflection used (only for retrieving case classes
field names).
3. Write support (could replace our `Json.encode` method).
2. Less code to maintain (ie we don't need a wrapper to make it nice
to
use from Scala)
3. No memory overhead from wrapping the Jackson classes (probably not
a
big deal)

I am happy to go either way as both approaches have been implemented and
I
am torn between the options.

What do you think?

Best,
Ismael

[1] https://issues.apache.org/jira/browse/KAFKA-1595
[2]

https://github.com/ijuma/kafka/commit/80974afefc00eb6313a7357e7942d5d86ffce84d
[3]

https://issues.apache.org/jira/browse/KAFKA-1595?focusedCommentId=14512881page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512881
[4]

https://github.com/ijuma/kafka/commit/4ca0feb37e8be2d388b60efacc19bc6788b6
[5] The Scala module for Jackson (which is not being used in the commit
above) also supports this, but it uses a reflection-based approach
instead
of type classes.

--
Thanks,
Ewen

Re: [DISCUSS] Json libraries for Kafka

2015-07-14 Thread Joe Stein

Maybe after the existing scala clients are deprecated.

~ Joestein
On Jul 14, 2015 6:04 PM, Jay Kreps j...@confluent.io wrote:

Is this going to become a dependency for core and then transitively for the
old clients? The current json library is definitely not great, but it does
parse json and it's not used in any context where performance is a concern.

Because the older clients aren't well modularized, adding core dependencies
sucks these up into every user of the clients. This particularly becomes a
problem with common libraries since it will turn out we require version X
but other code in the same app requires version Y.

The new clients fix this issue but not everyone is using them yet.

If there is a pressing need maybe we should just do it and people who have
problems can just hack their build to exclude the dependency (since the
client code won't need it). If not it might be better just to leave it for
a bit until we have at least get a couple releases with both the new
producer and the new consumer.

-Jay

On Mon, Jul 13, 2015 at 2:05 PM, Ismael Juma ism...@juma.me.uk wrote:

Hi all,

Kafka currently use scala.util.parsing.json.JSON as its json parser and
it
has a number of issues:

KAFKA-1595[1] has been filed to track this issue.

In order to have a fair comparison, I implemented the change using
Jackson
as well[4]. I paste part of the commit message:

A comparison between the two approaches as I see it:

Similarities:

1. The code for users of the JSON library is similar
2. No third-party dependencies
3. Good performance

In favour of using Jackson:

1. Same library for client and broker
2. Widely used

In favour of using spray-json and jawn:

I am happy to go either way as both approaches have been implemented and
I
am torn between the options.

What do you think?

Best,
Ismael

[1] https://issues.apache.org/jira/browse/KAFKA-1595
[2]

https://github.com/ijuma/kafka/commit/80974afefc00eb6313a7357e7942d5d86ffce84d
[3]

https://issues.apache.org/jira/browse/KAFKA-1595?focusedCommentId=14512881page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512881
[4]

Re: [VOTE] KIP-26 Add Copycat connector framework for data import/export

2015-07-14 Thread Joe Stein

+1 (binding)

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

On Tue, Jul 14, 2015 at 5:09 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 Hi all,

 Let's start a vote on KIP-26: Add Copycat connector framework for data
 import/export

 For reference, here's the wiki:
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767
 And the mailing list thread (split across two months):

 http://mail-archives.apache.org/mod_mbox/kafka-dev/201506.mbox/%3CCAE1jLMOEJjnorFK5CtR3g-n%3Dm_AkrFsYeccsB4QimTRfGBrAGQ%40mail.gmail.com%3E

 http://mail-archives.apache.org/mod_mbox/kafka-dev/201507.mbox/%3CCAHwHRrUeNh%2BnCHwCTUCrcipHM3Po0ECUysO%2B%3DX3nwUeOGrcgdw%40mail.gmail.com%3E

 Just to clarify since this is a bit different from the KIPs voted on so
 far, the KIP just covers including Copycat in Kafka (rather than having it
 as a separate project). While the KIP aimed to be clear about the exact
 scope, the details require further discussion. The aim is to include some
 connectors as well, at a minimum for demonstration purposes, but the
 expectation is that connector development will, by necessity, be federated.

 I'll kick it off with a +1 (non-binding).

 --
 Thanks,
 Ewen

Re: [DISCUSS] JIRA issue required even for minor/hotfix pull requests?

2015-07-13 Thread Joe Stein

Ismael,

If you create a pull request on github today then a JIRA is created so
folks can see and respond and such. The JIRA hooks also provide in comment
updates too.

What issue are you having or looking to-do?

~ Joe Stein

On Mon, Jul 13, 2015 at 6:52 AM, Ismael Juma ism...@juma.me.uk wrote:

 Hi all,

 Guozhang raised this topic in the [DISCUSS] Using GitHub Pull Requests for
 contributions and code review thread and suggested starting a new thread
 for it.

 In the Spark project, they say:

 If the change is new, then it usually needs a new JIRA. However, trivial
 changes, where what should change is virtually the same as how it should
 change do not require a JIRA.
 Example: Fix typos in Foo scaladoc.

 In such cases, the commit message would be prefixed with [MINOR] or
 [HOTFIX] instead of [KAFKA-xxx].

 I can see the pros and cons for each approach.

 Always requiring a JIRA ticket makes it more consistent and makes it
 possible to use JIRA as the place to prioritise what needs attention
 (although this is imperfect as code review will take place in the pull
 request and it's likely that JIRA won't always be fully in sync for
 in-progress items).

 Skipping JIRA tickets for minor/hotfix pull requests (where the JIRA ticket
 just duplicates the information in the pull request) eliminates redundant
 work and reduces the barrier to contribution (it is likely that people will
 occasionally submit PRs without a JIRA even when the change is too big for
 that though).

 Guozhang suggested in the original thread:

 Personally I think it is better to not enforcing a JIRA ticket for minor /
 hotfix commits, for example, we can format the title with [MINOR] [HOTFIX]
 etc as in Spark

 What do others think?

 Best,
 Ismael

Re: [DISCUSS] JIRA issue required even for minor/hotfix pull requests?

2015-07-13 Thread Joe Stein

Sorry, meant to say 'an email to dev list' instead of 'a JIRA' below. The
hooks in JIRA comments I have seen working recently.

~ Joe Stein

On Mon, Jul 13, 2015 at 8:42 AM, Joe Stein joe.st...@stealth.ly wrote:

 Ismael,

 If you create a pull request on github today then a JIRA is created so
 folks can see and respond and such. The JIRA hooks also provide in comment
 updates too.

 What issue are you having or looking to-do?

 ~ Joe Stein

 On Mon, Jul 13, 2015 at 6:52 AM, Ismael Juma ism...@juma.me.uk wrote:

 Hi all,

 Guozhang raised this topic in the [DISCUSS] Using GitHub Pull Requests
 for
 contributions and code review thread and suggested starting a new thread
 for it.

 In the Spark project, they say:

 If the change is new, then it usually needs a new JIRA. However, trivial
 changes, where what should change is virtually the same as how it
 should
 change do not require a JIRA.
 Example: Fix typos in Foo scaladoc.

 In such cases, the commit message would be prefixed with [MINOR] or
 [HOTFIX] instead of [KAFKA-xxx].

 I can see the pros and cons for each approach.

 Always requiring a JIRA ticket makes it more consistent and makes it
 possible to use JIRA as the place to prioritise what needs attention
 (although this is imperfect as code review will take place in the pull
 request and it's likely that JIRA won't always be fully in sync for
 in-progress items).

 Skipping JIRA tickets for minor/hotfix pull requests (where the JIRA
 ticket
 just duplicates the information in the pull request) eliminates redundant
 work and reduces the barrier to contribution (it is likely that people
 will
 occasionally submit PRs without a JIRA even when the change is too big for
 that though).

 Guozhang suggested in the original thread:

 Personally I think it is better to not enforcing a JIRA ticket for minor
 /
 hotfix commits, for example, we can format the title with [MINOR] [HOTFIX]
 etc as in Spark

 What do others think?

 Best,
 Ismael

Re: [DISCUSS] JIRA issue required even for minor/hotfix pull requests?

2015-07-13 Thread Joe Stein

Ismael,

If the patch lives on a pull request and is a simple hotfix a committer
could +1 and commit it. I don't see anything in the
https://cwiki.apache.org/confluence/display/KAFKA/Bylaws preventing this
already now. I guess I am still struggling between what is not setup that
you think we need to get setup or changes that you are looking to make
differently? What are we trying to discuss and decide up in regards to this?

~ Joe Stein

On Mon, Jul 13, 2015 at 8:51 AM, Ismael Juma ism...@juma.me.uk wrote:

 Hi Joe,

 Yes, I am aware of the emails and automatic JIRA updates.

 The question is whether a contributor who wants to make a simple change (eg
 fix a typo, improve a scaladoc, make a small code improvement) should have
 to create a JIRA for it and then submit the PR or if they can just skip the
 JIRA step. I will update the following wiki page accordingly once we decide
 one way or another:

 https://cwiki.apache.org/confluence/display/KAFKA/Contributing+Code+Changes

 Best,
 Ismael

 On Mon, Jul 13, 2015 at 1:46 PM, Joe Stein joe.st...@stealth.ly wrote:

  Sorry, meant to say 'an email to dev list' instead of 'a JIRA' below. The
  hooks in JIRA comments I have seen working recently.
 
  ~ Joe Stein
 
  On Mon, Jul 13, 2015 at 8:42 AM, Joe Stein joe.st...@stealth.ly wrote:
 
   Ismael,
  
   If you create a pull request on github today then a JIRA is created so
   folks can see and respond and such. The JIRA hooks also provide in
  comment
   updates too.
  
   What issue are you having or looking to-do?
  
   ~ Joe Stein
  
   On Mon, Jul 13, 2015 at 6:52 AM, Ismael Juma ism...@juma.me.uk
 wrote:
  
   Hi all,
  
   Guozhang raised this topic in the [DISCUSS] Using GitHub Pull
 Requests
   for
   contributions and code review thread and suggested starting a new
  thread
   for it.
  
   In the Spark project, they say:
  
   If the change is new, then it usually needs a new JIRA. However,
  trivial
   changes, where what should change is virtually the same as how it
   should
   change do not require a JIRA.
   Example: Fix typos in Foo scaladoc.
  
   In such cases, the commit message would be prefixed with [MINOR] or
   [HOTFIX] instead of [KAFKA-xxx].
  
   I can see the pros and cons for each approach.
  
   Always requiring a JIRA ticket makes it more consistent and makes it
   possible to use JIRA as the place to prioritise what needs attention
   (although this is imperfect as code review will take place in the pull
   request and it's likely that JIRA won't always be fully in sync for
   in-progress items).
  
   Skipping JIRA tickets for minor/hotfix pull requests (where the JIRA
   ticket
   just duplicates the information in the pull request) eliminates
  redundant
   work and reduces the barrier to contribution (it is likely that people
   will
   occasionally submit PRs without a JIRA even when the change is too big
  for
   that though).
  
   Guozhang suggested in the original thread:
  
   Personally I think it is better to not enforcing a JIRA ticket for
  minor
   /
   hotfix commits, for example, we can format the title with [MINOR]
  [HOTFIX]
   etc as in Spark
  
   What do others think?
  
   Best,
   Ismael

Re: [Discussion] Limitations on topic names

2015-07-12 Thread Joe Stein

Can we provide a tool so folks can sync back old topic names to new so
their clusters aren't format lopsided.

~ Joestein
On Jul 11, 2015 1:33 PM, Todd Palino tpal...@gmail.com wrote:

 I tend to agree with this as a compromise at this point. The reality is
 that this is technical debt that has built up in the project, and it does
 not go away by documenting it, and it will only get worse.

 As pointed out, eliminating either character at this point is going to
 cause problems for someone. And unfortunately, Guozhang, converting to __
 doesn't really solve the problem either because that is still a valid topic
 name that could collide. It's less likely, but all it does is move the debt
 around a little.

 -Todd

  On Jul 11, 2015, at 10:16 AM, Brock Noland br...@apache.org wrote:
 
  On Sat, Jul 11, 2015 at 12:54 AM, Ewen Cheslack-Postava
  e...@confluent.io wrote:
  On Fri, Jul 10, 2015 at 4:41 PM, Gwen Shapira gshap...@cloudera.com
 wrote:
 
  Yeah, I have an actual customer who ran into this. Unfortunately,
  inconsistencies in the way things are named are pretty common - just
  look at Kafka's many CLI options.
 
  I don't think that supporting both and pointing at the docs with I
  told you so when our metrics break is a good solution.
 
  I agree, especially since we don't *already* have something in the docs
  indicating this will be an issue. I was flippant about the situation
  because I *wish* there was more careful consideration + naming policy in
  place, but I realize that doesn't always happen in practice. I guess I
 need
  to take Compatibility Czar more seriously :)
 
  I see think the obvious practical options are as follows:
 
  1. Kill support for _. Piss off the entire set of people who currently
  use _ anywhere in topic names.
  2. Kill support for .. Piss off the entire set of people who currently
  use . anywhere in topic names.
  3. Tell people they need to be careful about this issue. Piss off the
 set
  of people who use both _ and . *and* happen to have conflicting
 topic
  names. They will have some pain when they discover the issue and have to
  figure out how to move one of those topics over to a non-conflicting
 name.
  I'm going to claim that this group must be an *extremely* small
 fraction of
  users, which doesn't make it better to allow things to break for them,
 but
  at least gives us an idea of the scale of impact.
 
  (One other alternative suggested earlier was encoding metric names to
  account for differences; given the metric renaming mess in the last
  release, I'm extremely hesitant to suggest anything of the sort...)
 
  None of the options are ideal, but to me, 3 seems like the least
 painful.
  Both for us, and for the vast majority of users. It seems to me that the
  number of users that would complain about (1) or (2) drastically
 outweigh
  (3).
 
  At this point, I don't think it's practical to keep switching the rules
  about which characters are allowed and which aren't because the previous
  attempts haven't been successful -- it seems the rules have changed
  multiple times, whether intentionally or accidentally, such that any
 more
  changes will cause problems. At this point, I think we just need to
 accept
  being liberal in accepting the range of topic names that have been
  permitted so far and make the best of the situation, even if it means
 only
  being able to warn people of conflicts.
 
  Here's another alternative: how about being liberal with topic name
  characters, but upon topic creation we convert the name to the metric
 name
  and fail if there's a conflict with another topic? This is relatively
  expensive (requires getting the metric name of all other topics), but it
  avoids the bad situation we're encountering here (conflicting metrics),
  avoids getting into a persistent conflict (we kill topic creation when
 we
  detect the issue rather than noticing it when the metrics conflict
  happens), and keeps the vast majority of existing users happy (both _
 and .
  work in topic names as long as you don't create topics with conflicting
  metric names).
 
  There are definitely details to be worked out (auto topic creation?),
 but
  it seems like a more realistic solution than to start disallowing _ or
 . in
  topic names.
 
  I was thinking the same. Allow a.b or a_b but not a.b and a_b. This
  seems like it will impact a trivial amount of users and keep both the
  . and _ camps happy.
 
 
  -Ewen
 
 
 
  On Fri, Jul 10, 2015 at 4:33 PM, Ewen Cheslack-Postava
  e...@confluent.io wrote:
  I figure you'll probably see complaints no matter what change you
 make.
  Gwen, given that you raised this, another important question might be
 how
  many people you see using *both*. I'm guessing this question came up
  because you actually saw a conflict? But I'd imagine (or at least
 hope)
  that most organizations are mostly consistent about naming topics --
 they
  standardize on one or the other.
 
  Since there's no right way to name them,

[jira] [Commented] (KAFKA-2310) Add config to prevent broker becoming controller

2015-07-08 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14618034#comment-14618034
 ] 

Joe Stein commented on KAFKA-2310:
--

Hey [~jjkoshy] I had suggested to Andrii that it might make sense to make this 
a new ticket. The old ticket (which I think we should close) had the idea that 
we wanted to re-elect the controller. This would be problematic for what is 
trying to solve based on what we have seen in the field. e.g. if you have 12 
brokers and they are under heavy load then providing a way to bounce the 
controller around is going to help if when it gets to a broker it can't 
perform its responsibilities sufficiently.  The consensus I have been able to 
get from ops folks is that separating/isolating the controller onto two brokers 
on two (for redundancy) lower end equipment solve the problem fully. Since this 
is just another config I didn't think that i needed it KIP but honestly wasn't 
100% sure otherwise would have already committed this feature. The purpose of 
the patch for different versions is because I know a bunch of folks that are 
going to take it for the version of kafka they are using and start using the 
feature.

 Add config to prevent broker becoming controller
 

 Key: KAFKA-2310
 URL: https://issues.apache.org/jira/browse/KAFKA-2310
 Project: Kafka
  Issue Type: Bug
Reporter: Andrii Biletskyi
Assignee: Andrii Biletskyi
 Attachments: KAFKA-2310.patch, KAFKA-2310_0.8.1.patch, 
 KAFKA-2310_0.8.2.patch


 The goal is to be able to specify which cluster brokers can serve as a 
 controller and which cannot. This way it will be possible to reserve 
 particular, not overloaded with partitions and other operations, broker as 
 controller.
 Proposed to add config _controller.eligibility_ defaulted to true (for 
 backward compatibility, since now any broker can become a controller)
 Patch will be available for trunk, 0.8.2 and 0.8.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Dropping support for Scala 2.9.x

2015-07-08 Thread Joe Stein

We should consider deprecating the scala API so scala version doesn't even
matter anymore for folks... we could even pin the broker to a specific
Scala version too

Of course this makes sense for the Java produce but maybe not just yet for
the consumer, maybe 0.9.0.

Not having to build in 0.8.3 for 2.9 make sense yeah ... folks can still
use their 0.8.2.1 - 2.9 clients with 0.8.3 so there shouldn't be much fus

+1

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

On Wed, Jul 8, 2015 at 11:07 AM, Ashish Singh asi...@cloudera.com wrote:

 +1

 On Wed, Jul 8, 2015 at 9:52 AM, Guozhang Wang wangg...@gmail.com wrote:

  +1.
 
  Scala 2.9 has been 4 years old and I think it is time to drop it.
 
  On Wed, Jul 8, 2015 at 7:22 AM, Grant Henke ghe...@cloudera.com wrote:
 
   +1 for dropping 2.9
  
   On Wed, Jul 8, 2015 at 9:15 AM, Sriharsha Chintalapani 
 ka...@harsha.io
   wrote:
  
I am +1 on dropping 2.9.x support.
   
Thanks,
Harsha
   
   
On July 8, 2015 at 7:08:12 AM, Ismael Juma (mli...@juma.me.uk)
 wrote:
   
Hi,
   
The responses in this thread were positive, but there weren't many. A
  few
months passed and Sriharsha encouraged me to reopen the thread given
  that
the 2.9 build has been broken for at least a week[1] and no-one
 seemed
  to
notice.
   
Do we want to invest more time so that the 2.9 build continues to
 work
  or
do we want to focus our efforts on 2.10 and 2.11? Please share your
opinion.
   
Best,
Ismael
   
[1] https://issues.apache.org/jira/browse/KAFKA-2325
   
On Fri, Mar 27, 2015 at 2:20 PM, Ismael Juma mli...@juma.me.uk
  wrote:
   
 Hi all,

 The Kafka build currently includes support for Scala 2.9, which
 means
that
 it cannot take advantage of features introduced in Scala 2.10 or
  depend
on
 libraries that require it.

 This restricts the solutions available while trying to solve
 existing
 issues. I was browsing JIRA looking for areas to contribute and I
   quickly
 ran into two issues where this is the case:

 * KAFKA-1351: String.format is very expensive in Scala could be
   solved
 nicely by using the String interpolation feature introduced in
 Scala
2.10.

 * KAFKA-1595: Remove deprecated and slower scala JSON parser from
 kafka.consumer.TopicCount could be solved by using an existing
 JSON
 library, but both jackson-scala and play-json require 2.10
 (argonaut
 supports Scala 2.9, but it brings other dependencies like scalaz).
 We
   can
 workaround this by writing our own code instead of using libraries,
  of
 course, but it's not ideal.

 Other features like Scala Futures and value classes would also be
   useful
 in some situations, I would think (for a more extensive list of new
 features, see

   
  
 
 http://scala-language.1934581.n4.nabble.com/Scala-2-10-0-now-available-td4634126.html
 ).

 Another pain point of supporting 2.9.x is that it doubles the
 number
  of
 build and test configurations required from 2 to 4 (because the
 2.9.x
 series was not necessarily binary compatible).

 A strong argument for maintaining support for 2.9.x was the client
 library, but that has been rewritten in Java.

 It's also worth mentioning that Scala 2.9.1 was released in August
  2011
 (more than 3.5 years ago) and the 2.9.x series hasn't received
  updates
   of
 any sort since early 2013. Scala 2.10.0, in turn, was released in
   January
 2013 (over 2 years ago) and 2.10.5, the last planned release in the
2.10.x
 series, has been recently released (so even 2.10.x won't be
 receiving
 updates any longer).

 All in all, I think it would not be unreasonable to drop support
 for
Scala
 2.9.x in a future release, but I may be missing something. What do
   others
 think?

 Ismael

   
  
  
  
   --
   Grant Henke
   Solutions Consultant | Cloudera
   ghe...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
  
 
 
 
  --
  -- Guozhang
 



 --

 Regards,
 Ashish

[jira] [Commented] (KAFKA-2304) Support enabling JMX in Kafka Vagrantfile

2015-07-07 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616979#comment-14616979
 ] 

Joe Stein commented on KAFKA-2304:
--

Thanks [~ewencp] for the review, will take a look later tonight and commit if 
good to go

 Support enabling JMX in Kafka Vagrantfile
 -

 Key: KAFKA-2304
 URL: https://issues.apache.org/jira/browse/KAFKA-2304
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.3
Reporter: Stevo Slavic
Assignee: Joe Stein
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-2304-JMX.patch, KAFKA-2304-JMX.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-2304) Support enabling JMX in Kafka Vagrantfile

2015-07-07 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2304:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to trunk, thanks for the patch and review

 Support enabling JMX in Kafka Vagrantfile
 -

 Key: KAFKA-2304
 URL: https://issues.apache.org/jira/browse/KAFKA-2304
 Project: Kafka
  Issue Type: Bug
Reporter: Stevo Slavic
Assignee: Stevo Slavic
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-2304-JMX.patch, KAFKA-2304-JMX.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KAFKA-2304) Support enabling JMX in Kafka Vagrantfile

2015-07-07 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616979#comment-14616979
 ] 

Joe Stein edited comment on KAFKA-2304 at 7/7/15 5:01 PM:
--

Thanks [~ewencp] for the review, will take a look now and commit if good to go.


was (Author: joestein):
Thanks [~ewencp] for the review, will take a look later tonight and commit if 
good to go

 Support enabling JMX in Kafka Vagrantfile
 -

 Key: KAFKA-2304
 URL: https://issues.apache.org/jira/browse/KAFKA-2304
 Project: Kafka
  Issue Type: Bug
Reporter: Stevo Slavic
Assignee: Stevo Slavic
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-2304-JMX.patch, KAFKA-2304-JMX.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-2304) Support enabling JMX in Kafka Vagrantfile

2015-07-07 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2304:
-
Affects Version/s: (was: 0.8.3)
 Reviewer: Ewen Cheslack-Postava

 Support enabling JMX in Kafka Vagrantfile
 -

 Key: KAFKA-2304
 URL: https://issues.apache.org/jira/browse/KAFKA-2304
 Project: Kafka
  Issue Type: Bug
Reporter: Stevo Slavic
Assignee: Stevo Slavic
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-2304-JMX.patch, KAFKA-2304-JMX.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-2304) Support enabling JMX in Kafka Vagrantfile

2015-07-07 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2304:
-
Fix Version/s: 0.8.3

 Support enabling JMX in Kafka Vagrantfile
 -

 Key: KAFKA-2304
 URL: https://issues.apache.org/jira/browse/KAFKA-2304
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.3
Reporter: Stevo Slavic
Assignee: Stevo Slavic
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-2304-JMX.patch, KAFKA-2304-JMX.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-2304) Support enabling JMX in Kafka Vagrantfile

2015-07-07 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2304:
-
Assignee: Stevo Slavic  (was: Joe Stein)

 Support enabling JMX in Kafka Vagrantfile
 -

 Key: KAFKA-2304
 URL: https://issues.apache.org/jira/browse/KAFKA-2304
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.3
Reporter: Stevo Slavic
Assignee: Stevo Slavic
Priority: Minor
 Fix For: 0.8.3

 Attachments: KAFKA-2304-JMX.patch, KAFKA-2304-JMX.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[ANNOUNCE] New Committer

2015-07-06 Thread Joe Stein

I am pleased to announce that the Apache Kafka PMC has voted to invite Gwen
Shapira as a committer and Gwen has accepted.

Please join me on welcoming and congratulating Gwen.

Thanks for the contribution both in the project (code, email, etc, etc,
etc) and in throughout the community too(other projects, conferences, etc,
etc, etc). I look forward to your continued contributions and much more to
come!

~ Joe Stein
- - - - - - - - - - - - - - - - - - -
 [image: Logo-Black.jpg]
  http://www.elodina.net
http://www.stealth.ly
- - - - - - - - - - - - - - - - - - -

[jira] [Commented] (KAFKA-1173) Using Vagrant to get up and running with Apache Kafka

2015-06-28 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605003#comment-14605003
 ] 

Joe Stein commented on KAFKA-1173:
--

Can you create a new ticket please, thanks.

 Using Vagrant to get up and running with Apache Kafka
 -

 Key: KAFKA-1173
 URL: https://issues.apache.org/jira/browse/KAFKA-1173
 Project: Kafka
  Issue Type: Improvement
Reporter: Joe Stein
Assignee: Ewen Cheslack-Postava
 Fix For: 0.8.3

 Attachments: KAFKA-1173-JMX.patch, KAFKA-1173.patch, 
 KAFKA-1173_2013-12-07_12:07:55.patch, KAFKA-1173_2014-11-11_13:50:55.patch, 
 KAFKA-1173_2014-11-12_11:32:09.patch, KAFKA-1173_2014-11-18_16:01:33.patch


 Vagrant has been getting a lot of pickup in the tech communities.  I have 
 found it very useful for development and testing and working with a few 
 clients now using it to help virtualize their environments in repeatable ways.
 Using Vagrant to get up and running.
 For 0.8.0 I have a patch on github https://github.com/stealthly/kafka
 1) Install Vagrant [http://www.vagrantup.com/](http://www.vagrantup.com/)
 2) Install Virtual Box 
 [https://www.virtualbox.org/](https://www.virtualbox.org/)
 In the main kafka folder
 1) ./sbt update
 2) ./sbt package
 3) ./sbt assembly-package-dependency
 4) vagrant up
 once this is done 
 * Zookeeper will be running 192.168.50.5
 * Broker 1 on 192.168.50.10
 * Broker 2 on 192.168.50.20
 * Broker 3 on 192.168.50.30
 When you are all up and running you will be back at a command brompt.  
 If you want you can login to the machines using vagrant shh machineName but 
 you don't need to.
 You can access the brokers and zookeeper by their IP
 e.g.
 bin/kafka-console-producer.sh --broker-list 
 192.168.50.10:9092,192.168.50.20:9092,192.168.50.30:9092 --topic sandbox
 bin/kafka-console-consumer.sh --zookeeper 192.168.50.5:2181 --topic sandbox 
 --from-beginning



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-2254) The shell script should be optimized , even kafka-run-class.sh has a syntax error.

2015-06-16 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2254:
-
Fix Version/s: (was: 0.8.2.1)
   0.8.3

 The shell script should be optimized , even kafka-run-class.sh has a syntax 
 error.
 --

 Key: KAFKA-2254
 URL: https://issues.apache.org/jira/browse/KAFKA-2254
 Project: Kafka
  Issue Type: Bug
  Components: build
Affects Versions: 0.8.2.1
 Environment: linux
Reporter: Bo Wang
  Labels: client-script, kafka-run-class.sh, shell-script
 Fix For: 0.8.3

 Attachments: kafka-shell-script.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

  kafka-run-class.sh 128 line has a syntax error(missing a space):
 127-loggc)
 128 if [ -z $KAFKA_GC_LOG_OPTS] ; then
 129GC_LOG_ENABLED=true
 130 fi
 And use the ShellCheck to check the shell scripts, the results shows some 
 errors 、 warnings and notes：
 https://github.com/koalaman/shellcheck/wiki/SC2068
 https://github.com/koalaman/shellcheck/wiki/Sc2046
 https://github.com/koalaman/shellcheck/wiki/Sc2086



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (KAFKA-1000) Inbuilt consumer offset management feature for kakfa

2015-06-16 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein resolved KAFKA-1000.
--
Resolution: Fixed

 Inbuilt consumer offset management feature for kakfa
 

 Key: KAFKA-1000
 URL: https://issues.apache.org/jira/browse/KAFKA-1000
 Project: Kafka
  Issue Type: New Feature
  Components: consumer
Affects Versions: 0.8.1
Reporter: Tejas Patil
Assignee: Tejas Patil
Priority: Minor
  Labels: features
 Fix For: 0.8.2.0


 Kafka currently stores offsets in zookeeper. This is a problem for several 
 reasons. First it means the consumer must embed the zookeeper client which is 
 not available in all languages. Secondly offset commits are actually quite 
 frequent and Zookeeper does not scale this kind of high-write load. 
 This Jira is for tracking the phase #2 of Offset Management [0]. Joel and I 
 have been working on this. [1] is the overall design of the feature.
 [0] : https://cwiki.apache.org/confluence/display/KAFKA/Offset+Management
 [1] : 
 https://cwiki.apache.org/confluence/display/KAFKA/Inbuilt+Consumer+Offset+Management



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-2207) The testCannotSendToInternalTopic test method in ProducerFailureHandlingTest fails consistently with the following exception:

2015-06-16 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2207:
-
Fix Version/s: (was: 0.8.2.1)
   0.8.3

 The testCannotSendToInternalTopic test method in ProducerFailureHandlingTest 
 fails consistently with the following exception:
 -

 Key: KAFKA-2207
 URL: https://issues.apache.org/jira/browse/KAFKA-2207
 Project: Kafka
  Issue Type: Bug
Reporter: Deepthi
 Fix For: 0.8.3

 Attachments: KAFKA-2207.patch


 kafka.api.ProducerFailureHandlingTest  testCannotSendToInternalTopic FAILED
 java.util.concurrent.ExecutionException: 
 org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
 after 3000 ms.
 at 
 org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.init(KafkaProducer.java:437)
 at 
 org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:352)
 at 
 org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:248)
 at 
 kafka.api.ProducerFailureHandlingTest.testCannotSendToInternalTopic(ProducerFailureHandlingTest.scala:309)
 Caused by:
 org.apache.kafka.common.errors.TimeoutException: Failed to update 
 metadata after 3000 ms.
 The following attached patch has resolved the issue 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1326) New consumer checklist

2015-06-16 Thread Joe Stein (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joe Stein updated KAFKA-1326:
-
Fix Version/s: 0.8.3

New consumer checklist
--

Key: KAFKA-1326
URL: https://issues.apache.org/jira/browse/KAFKA-1326
Project: Kafka
Issue Type: New Feature
Components: consumer
Affects Versions: 0.9.0
Reporter: Neha Narkhede
Assignee: Neha Narkhede
Labels: feature
Fix For: 0.8.3

We will use this JIRA to track the list of issues to resolve to get a working
new consumer client. The consumer client can work in phases -
1. Add new consumer APIs and configs
2. Refactor Sender. We will need to use some common APIs from Sender.java
(https://issues.apache.org/jira/browse/KAFKA-1316)
3. Add metadata fetch and refresh functionality to the consumer (This will
require https://issues.apache.org/jira/browse/KAFKA-1316)
4. Add functionality to support subscribe(TopicPartition...partitions). This
will add SimpleConsumer functionality to the new consumer. This does not
include any group management related work.
5. Add ability to commit offsets to Kafka. This will include adding
functionality to the commit()/commitAsync()/committed() APIs. This still does
not include any group management related work.
6. Add functionality to the offsetsBeforeTime() API.
7. Add consumer co-ordinator election to the server. This will only add a new
module for the consumer co-ordinator, but not necessarily all the logic to do
group management.
At this point, we will have a fully functional standalone consumer and a
server side co-ordinator module. This will be a good time to start adding
group management functionality to the server and consumer.
8. Add failure detection capability to the consumer when group management is
used. This will not include any rebalancing logic, just the ability to detect
failures using session.timeout.ms.
9. Add rebalancing logic to the server and consumer. This will be a tricky
and potentially large change since it will involve implementing the group
management protocol.
10. Add system tests for the new consumer
11. Add metrics
12. Convert mirror maker to use the new consumer.
13. Convert perf test to use the new consumer
14. Performance testing and analysis.
15. Review and fine tune log4j logging

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-26 - Add Copycat, a connector framework for data import/export

2015-06-16 Thread Joe Stein

Hey Ewen, very interesting!

I like the idea of the connector and making one side always being Kafka for
all the reasons you mentioned. It makes having to build consumers (over and
over and over (and over)) again for these type of tasks much more
consistent for everyone.

Some initial comments (will read a few more times and think more through
it).

1) Copycat, it might be weird/hard to talk about producers, consumers,
brokers and copycat for what and how kafka runs. I think the other naming
makes sense but maybe we can call it something else? Sinks or whatever
(don't really care just bringing up it might be something to consider). We
could also just call it connectors...dunno producers, consumers,
brokers and connectors...

2) Can we do copycat-workers without having to rely on Zookeeper? So much
work has been done to remove this dependency if we can do something without
ZK lets try (or at least abstract it so it is easier later to make it
pluggable).

3) Even though connectors being managed in project has already been
rejected... maybe we want to have a few (or one) that are in the project
and maintained. This makes out of the box really out of the box (if only
file or hdfs or something).

4) all records include schemas which describe the format of their data I
don't totally get this... a lot of data doesn't have the schema with it, we
have to plug that in... so would the plugin you are talking about for
serializer would inject the schema to use with the record when it sees the
data?


~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Tue, Jun 16, 2015 at 4:33 PM, Ewen Cheslack-Postava e...@confluent.io
wrote:

 Oops, linked the wrong thing. Here's the correct one:
 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=58851767

 -Ewen

 On Tue, Jun 16, 2015 at 4:32 PM, Ewen Cheslack-Postava e...@confluent.io
 wrote:

  Hi all,
 
  I just posted KIP-26 - Add Copycat, a connector framework for data
  import/export here:
 
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
 
  This is a large KIP compared to what we've had so far, and is a bit
  different from most. We're proposing the addition of a fairly big new
  component to Kafka because we think including it as part of Kafka rather
  than as an external project is in the best interest of both Copycat and
  Kafka itself.
 
  The goal with this KIP is to decide whether such a tool would make sense
  in Kafka, give a high level sense of what it would entail, and scope what
  would be included vs what would be left to third-parties. I'm hoping to
  leave discussion of specific design and implementation details, as well
  logistics like how best to include it in the Kafka repository  project,
 to
  the subsequent JIRAs or follow up KIPs.
 
  Looking forward to your feedback!
 
  -Ewen
 
  P.S. Preemptive relevant XKCD: https://xkcd.com/927/
 
 


 --
 Thanks,
 Ewen

Re: [VOTE] KIP-25 System test improvements

2015-06-10 Thread Joe Stein

+1

~ Joestein
On Jun 10, 2015 10:21 PM, Neha Narkhede n...@confluent.io wrote:

 +1. Thanks Geoff!





 On Wed, Jun 10, 2015 at 6:20 PM -0700, Gwen Shapira 
 gshap...@cloudera.com wrote:










 +1 (non-binding. Actually, since this is non-binding anyway, lets make
 it +100. I'm so so excited about having a usable testing framework)

 On Wed, Jun 10, 2015 at 6:10 PM, Geoffrey Anderson  wrote:
  Hi Kafka,
 
  After a few rounds of discussion on KIP-25, there doesn't seem to be
  opposition, so I'd like to propose a vote.
 
  Thanks,
  Geoff
 
  On Mon, Jun 8, 2015 at 10:56 PM, Geoffrey Anderson
  wrote:
 
  Hi KIP-25 thread,
 
  I consolidated some of the questions from this thread and elsewhere.
 
  Q: Can we see a map of what system-test currently tests, which ones we
  want to replace and JIRAs for replacing?
  A: Initial draft here:
 
 https://cwiki.apache.org/confluence/display/KAFKA/Roadmap+-+port+existing+system+tests
 
  Q: Will ducktape be maintained separately as a github repo?
  A: Yes https://github.com/confluentinc/ducktape
 
  Q: How easy is viewing the test results and logs, how will test output
 be
  structured?
  A: Hierarchical structure as outlined here:
  https://github.com/confluentinc/ducktape/wiki/Design-overview#output
 
  Q: Does it support code coverage? If not, how easy/ difficult would it
 be
  to support?
  A: It does not, and we have no immediate plans to support this.
 Difficulty
  unclear.
 
  Q: It would be nice if each Kafka version that we release will also
  have a separate tests artifact that users can download, untar and
 easily
  run against a Kafka cluster of the same version.
  A: This seems reasonable and not too much extra work. Definitely open to
  discussion on this.
 
  Q: Why not share running services across multiple tests?
  A: Prefer to optimize for simplicity and correctness over what might be
 a
  questionable improvement in run-time.
 
  Q: Are regressions - in the road map?
  A: yes
 
  Q: Are Jepsen style tests involving network failures in the road map?
  A: yes
 
  Thanks much,
  Geoff

[jira] [Commented] (KAFKA-2161) Fix a few copyrights

2015-06-03 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571985#comment-14571985
 ] 

Joe Stein commented on KAFKA-2161:
--

We stopped running rat when we moved to gradle 
https://github.com/apache/kafka/blob/trunk/build.gradle#L44 in 0.8.1. We should 
add running rat again for a release. I don't think we need to put the script 
back into the repo though to-do that. I never used that script I always did 
java -jar ../../apache-rat-0.8/apache-rat-0.8.jar in 0.8.0 and below in release 
steps https://cwiki.apache.org/confluence/display/KAFKA/Release+Process

 Fix a few copyrights
 

 Key: KAFKA-2161
 URL: https://issues.apache.org/jira/browse/KAFKA-2161
 Project: Kafka
  Issue Type: Bug
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Trivial
 Attachments: KAFKA-2161.patch


 I noticed that I accidentally let some incorrect copyright headers slip in 
 with the KAKFA-1501 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2161) Fix a few copyrights

2015-06-03 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14572045#comment-14572045
 ] 

Joe Stein commented on KAFKA-2161:
--

[~ewencp] that would be great

 Fix a few copyrights
 

 Key: KAFKA-2161
 URL: https://issues.apache.org/jira/browse/KAFKA-2161
 Project: Kafka
  Issue Type: Bug
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Trivial
 Attachments: KAFKA-2161.patch


 I noticed that I accidentally let some incorrect copyright headers slip in 
 with the KAKFA-1501 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: KIP Wiki

2015-06-01 Thread Joe Stein

We should probably have some release/vXYZ section so that over time we can
keep track of what KIP where approved for what release, etc/

Anything not in a release folder (we could do now release/v0.8.3.0 for
everything already approved) would be where it is deemed under discussion,
or such.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Mon, Jun 1, 2015 at 9:46 PM, Guozhang Wang wangg...@gmail.com wrote:

 +1

 On Mon, Jun 1, 2015 at 12:00 PM, Jiangjie Qin j...@linkedin.com.invalid
 wrote:

  +1
 
  On 6/1/15, 11:53 AM, Ashish Singh asi...@cloudera.com wrote:
 
  I like the idea!
  
  
  On Mon, Jun 1, 2015 at 9:51 AM, Aditya Auradkar 
  aaurad...@linkedin.com.invalid wrote:
  
   Hey everyone,
  
   We have enough KIP's now (25) that it's a bit hard to tell which ones
  are
   adopted or under discussion by glancing at the wiki. Any concerns if I
   split it into 3 tables (adopted, discarded and KIP's under
 discussion)?
  
  
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Propo
  sals
  
   Aditya
  
  
  
  
  --
  
  Regards,
  Ashish
 
 


 --
 -- Guozhang

Re: Review Request 34103: Patch for KAFKA-2188

2015-05-30 Thread Joe Stein


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34103/#review85856
---



core/src/main/scala/kafka/controller/KafkaController.scala
https://reviews.apache.org/r/34103/#comment137695

Online to Offline



core/src/main/scala/kafka/controller/KafkaController.scala
https://reviews.apache.org/r/34103/#comment137696

Don't totally understand this, after line end please add some example to 
what you are proposing with example partitions, leaders, followers more of what 
is going on.



core/src/main/scala/kafka/controller/KafkaController.scala
https://reviews.apache.org/r/34103/#comment137697

At this point if we can't restart partitions we are deciding here by 
throwing that the server will then get shutdown? If that is our intent we 
should do that explicitly imho.



core/src/main/scala/kafka/log/Log.scala
https://reviews.apache.org/r/34103/#comment137698

I think this try / catch (and all the ditto ones after this) can become 
passed in as anon func to another reused func

//something like
def kafkaStorageCheck(f :() = Unit): Unit = {
 try {
   f()
 catch {
   case ... 
 }
}

... then for what we have now just wrap it 
kafkaStorageCheck(() = { 
  existingFunc()
  var exist = 2
  moreFunc()
})

will help to centralize and reason more about what exceptions are doing by 
maybe being able to eventually remove them from the application control flow. 
But, at least for now we can structure it more concisely.



core/src/main/scala/kafka/log/Log.scala
https://reviews.apache.org/r/34103/#comment137699

ditto



core/src/main/scala/kafka/log/Log.scala
https://reviews.apache.org/r/34103/#comment137700

ditto



core/src/main/scala/kafka/log/Log.scala
https://reviews.apache.org/r/34103/#comment137701

ditto



core/src/main/scala/kafka/log/Log.scala
https://reviews.apache.org/r/34103/#comment137702

ditto



core/src/main/scala/kafka/log/Log.scala
https://reviews.apache.org/r/34103/#comment137703

ditto



core/src/main/scala/kafka/log/Log.scala
https://reviews.apache.org/r/34103/#comment137704

ditto



core/src/main/scala/kafka/log/LogManager.scala
https://reviews.apache.org/r/34103/#comment137705

ditto



core/src/main/scala/kafka/log/LogManager.scala
https://reviews.apache.org/r/34103/#comment137706

ditto



core/src/main/scala/kafka/log/LogManager.scala
https://reviews.apache.org/r/34103/#comment137707

I feel like this should be in its own structure? I can imaging admin 
commands to interact with this in the future and more interactions with other 
parts of the code.



core/src/main/scala/kafka/log/LogSegment.scala
https://reviews.apache.org/r/34103/#comment137708

is there a way to change this to not use control flow logic and still make 
sure we aren't killing the broker?


- Joe Stein


On May 12, 2015, 12:39 p.m., Andrii Biletskyi wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34103/
 ---
 
 (Updated May 12, 2015, 12:39 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-2188
 https://issues.apache.org/jira/browse/KAFKA-2188
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-2188 - JBOD Support
 
 
 Diffs
 -
 
   core/src/main/scala/kafka/cluster/Partition.scala 
 122b1dbbe45cb27aed79b5be1e735fb617c716b0 
   core/src/main/scala/kafka/common/GenericKafkaStorageException.scala 
 PRE-CREATION 
   core/src/main/scala/kafka/controller/KafkaController.scala 
 a6351163f5b6f080d6fa50bcc3533d445fcbc067 
   core/src/main/scala/kafka/controller/PartitionLeaderSelector.scala 
 3b15ab4eef22c6f50a7483e99a6af40fb55aca9f 
   core/src/main/scala/kafka/log/Log.scala 
 84e7b8fe9dd014884b60c4fbe13c835cf02a40e4 
   core/src/main/scala/kafka/log/LogManager.scala 
 e781ebac2677ebb22e0c1fef0cf7e5ad57c74ea4 
   core/src/main/scala/kafka/log/LogSegment.scala 
 ed039539ac18ea4d65144073915cf112f7374631 
   core/src/main/scala/kafka/server/KafkaApis.scala 
 417960dd1ab407ebebad8fdb0e97415db3e91a2f 
   core/src/main/scala/kafka/server/KafkaConfig.scala 
 9efa15ca5567b295ab412ee9eea7c03eb4cdc18b 
   core/src/main/scala/kafka/server/OffsetCheckpoint.scala 
 8c5b0546908d3b3affb9f48e2ece9ed252518783 
   core/src/main/scala/kafka/server/ReplicaFetcherThread.scala 
 b31b432a226ba79546dd22ef1d2acbb439c2e9a3 
   core/src/main/scala/kafka/server/ReplicaManager.scala 
 59c9bc3ac3a8afc07a6f8c88c5871304db588d17 
   core/src/main/scala/kafka/utils/ZkUtils.scala 
 1da8f90b3a7abda5868186bddf221e31adbe02ce 
   core/src/test/scala/unit/kafka/log/LogManagerTest.scala 
 01dfbc4f8d21f6905327cd4ed6c61d657adc0143 
   core/src/test/scala/unit/kafka/server/HighwatermarkPersistenceTest.scala

[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function

2015-05-30 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565914#comment-14565914
 ] 

Joe Stein commented on KAFKA-1778:
--

Hey, sorry for late reply. I have seen now on a few dozen clusters situations 
where the broker gets into a state where the controller is hung and the only 
recourse is to either delete the znode from Zookeeper (/controller) to force a 
re-election or shutdown the broker. In the former case I have seen in one 
situation where the entire cluster went down. I am fairly certain this was 
because of the version of Zookeeper they were running (3.4.5) however I haven't 
ever tried to reproduce it. The latter case many folks don't want to shutdown 
the broker because they are in high traffic situations and doing so we could be 
a lot worse than the controller not working... sometimes that changes and they 
shut the broker down so the controller can fail over and their partition 
reassignment can continue to the new brokers they just launched (as an example).

So, originally we were thinking of fixing this be having an admin call that 
could trigger safely another leader election. We have been finding though that 
just having the broker start without it ever being able to be the controller 
(can.be.controller = false) is preferable in *a lot* of cases. This way there 
are brokers that will never be the controller and then some that could and with 
the brokers that could one of them would.

~ Joestein

 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Nagging - pending review requests :)

2015-05-29 Thread Joe Stein

Hey Jai, see below

On Fri, May 29, 2015 at 3:03 AM, Jaikiran Pai jai.forums2...@gmail.com
wrote:

 Hi Joe,

 Comments inline.

 On Friday 29 May 2015 12:15 PM, Joe Stein wrote:

 see below

 On Fri, May 29, 2015 at 2:25 AM, Jaikiran Pai jai.forums2...@gmail.com
 wrote:

  Could someone please look at these few review requests and let me know if
 any changes are needed:

 https://reviews.apache.org/r/34394/ related to
 https://issues.apache.org/jira/browse/KAFKA-1907


 I haven't looked at all the other changes that would be introduced from
 their release that could break between zk and kafka by introducing a zk
 client bump. A less ops negative way to deal with this might be to create
 a
 plugable interface, then someone can use a patched zkclient if they
 wanted,
 or exhibitor, or consul, or akka, etc.



 The ZkClient has already been bumped to this newer version as part of a
 separate task https://issues.apache.org/jira/browse/KAFKA-2169 and it's
 already in trunk. This change in my review request only passes along an
 (optional) value to the ZkClient constructor that was introduced in that
 newer version.


I left a comment in the review.





  https://reviews.apache.org/r/30403/ related to
 https://issues.apache.org/jira/browse/KAFKA-1906


 I don't understand the patch and how it would fix the issue. I also don't
 think necessarily there is an issue. Its a balance from the community
 having a good out of the box experience vs taking defaults and rushing
 them
 into production. No matter what we do we can't stop the latter from
 happening, which will also cause issues.


 The change to use a default directory that's within the Kafka installation
 path rather than /tmp folder (which get erased on restarts) is more from a
 development environment point of view rather than production. As you note,
 production environments will anyway have to deal with setting the right
 configs. From a developer perspective, I like the Kafka logs to survive
 system restarts when I'm working on applications which use Kafka. Of
 course, I can go ahead and change that default value in the
 server.properties on each fresh installation. But personally, I like it
 more if the logs are are stored within the Kafka installation itself so
 that even if I have multiple different versions of Kafka running (for
 different applications) on the same system, the logs are isolated to the
 Kafka installation and don't interfere with each other. We currently have a
 development setup where we have a bunch of VMs with different Kafka
 installations. These VMs are then handed out to developers to work on
 various different applications (which are under development). The first
 thing we currently do is edit the server.properties and update the log path
 (and that's the only change we do for dev). It would be much more easier
 and convenient/manageable if this log directory default to a path within
 the Kafka installation.


Developers like things to work when they try them out too. If there is
another way to have something other than /tmp be the default for log.dirs
and still run kinda everywhere folks want it too then lets discuss that as
a thread separately. If you have a proposal for what that is and how it
work you could submit it to
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals.
I think most developers that use Kafka use it because they have an eye to
production and they check and change things in the configs like the data
being saved to /tmp.  The relative dir is a tad scary especially when you
have log and kafka-logs which is which?

This will also be be _really_ confusing to people imho

-# A comma seperated list of directories under which to store log files
-log.dirs=/tmp/kafka-logs
+# A comma separated list of directories under which to store log files
+#log.dirs=






 There's also this one https://reviews.apache.org/r/34697/ for
 https://issues.apache.org/jira/browse/KAFKA-2221 but it's only been up
 since a couple of days and is a fairly minor one.


 Folks should start to transition in 0.8.3 to the new java consumer (which
 is on trunk). If this fix is so critical we should release it in 0.8.2.2
 otherwise continue to try to not make changes to the existing scalal
 consumer.


 Fair enough. It was more to help narrow down the real issues when a
 reconnect happens and isn't that critical. Do you want me to close that
 review request?


Your call. Folks may want to patch the change so knowing what version it is
for in the fix is helpful for them to-do that if they wanted. It is also
one less ticket to look at for folks.




 -Jaikiran

Re: Review Request 34394: Patch for KAFKA-1907

2015-05-29 Thread Joe Stein


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34394/#review85691
---



core/src/main/scala/kafka/utils/ZkUtils.scala
https://reviews.apache.org/r/34394/#comment137408

if we are going to add this it should be exposed as a configuration and 
written up in a KIP. We can't hard code values that folks won't understand 
without some clear information about why it is 5000


- Joe Stein


On May 25, 2015, 3:49 a.m., Jaikiran Pai wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34394/
 ---
 
 (Updated May 25, 2015, 3:49 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1907
 https://issues.apache.org/jira/browse/KAFKA-1907
 
 
 Repository: kafka
 
 
 Description
 ---
 
 KAFKA-1907 Set operation retry timeout on ZkClient. Also mark certain Kafka 
 threads as daemon to allow proper JVM shutdown
 
 
 Diffs
 -
 
   clients/src/main/java/org/apache/kafka/common/utils/Utils.java 
 f73eedb030987f018d8446bb1dcd98d19fa97331 
   core/src/main/scala/kafka/network/SocketServer.scala 
 edf6214278935c031cf493d72d266e715d43dd06 
   core/src/main/scala/kafka/server/DelayedOperation.scala 
 123078d97a7bfe2121655c00f3b2c6af21c53015 
   core/src/main/scala/kafka/server/KafkaServer.scala 
 e66710d2368334ece66f70d55f57b3f888262620 
   core/src/main/scala/kafka/utils/ZkUtils.scala 
 78475e3d5ec477cef00caeaa34ff2d196466be96 
 
 Diff: https://reviews.apache.org/r/34394/diff/
 
 
 Testing
 ---
 
 ZkClient was recently upgraded to 0.5 version, as part of KAFKA-2169. The 0.5 
 version of ZkClient contains an enhancement which allows passing of operation 
 retry timeout https://github.com/sgroschupf/zkclient/pull/29. This now allows 
 us to fix the issue reported in 
 https://issues.apache.org/jira/browse/KAFKA-1907.
 
 The commit here passes the operation retry timeout while creating the 
 ZkClient instances. The commit was contains a change to mark certain threads 
 as daemon to allow a clean shutdown of the Kafka server when the zookeeper 
 instance has gone done first.
 
 I've locally tested that shutting down Kafka, after zookeeper has already 
 shutdown, works fine now (it tries to reconnect to zoookeeper for a maximum 
 of 5 seconds before cleanly shutting down). I've also checked that shutting 
 down Kafka first, when zookeeper is still up, works fine too.
 
 
 Thanks,
 
 Jaikiran Pai

Re: Mesos Community Networking Hangout

2015-05-28 Thread Joe Stein

sorry, wrong email list :)

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, May 28, 2015 at 11:41 AM, Joe Stein joe.st...@stealth.ly wrote:

 Hi, a few folks from a few different companies (Cisco, Elodina,
 Mesosphere, Moz)  are going to be having a hangout
 https://plus.google.com/hangouts/_/stealth.ly/mesos-network every Friday
 11am PT / 2pm ET and wanted to invite any other interested parties to join
 too. We have already talked with folks from a few other companies so if we
 chatted or not would be good for you to join if you have something you want
 to contribute (requirements, code, whatever).

 The goal is to have a single solution for Mesos networking so that all of
 the different requirements around floating ip can be achieved with
 different implementations and use cases. What goes into mesos, what is a
 module, what is something to plugin that others can help define interfaces
 for, etc, etc.

 If you are interested let me know and I can add you to the invite, thanks!

 ~ Joe Stein
 - - - - - - - - - - - - - - - - -

   http://www.stealth.ly
 - - - - - - - - - - - - - - - - -

Mesos Community Networking Hangout

2015-05-28 Thread Joe Stein

Hi, a few folks from a few different companies (Cisco, Elodina, Mesosphere,
Moz)  are going to be having a hangout
https://plus.google.com/hangouts/_/stealth.ly/mesos-network every Friday
11am PT / 2pm ET and wanted to invite any other interested parties to join
too. We have already talked with folks from a few other companies so if we
chatted or not would be good for you to join if you have something you want
to contribute (requirements, code, whatever).

The goal is to have a single solution for Mesos networking so that all of
the different requirements around floating ip can be achieved with
different implementations and use cases. What goes into mesos, what is a
module, what is something to plugin that others can help define interfaces
for, etc, etc.

If you are interested let me know and I can add you to the invite, thanks!

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

[jira] [Created] (KAFKA-2218) reassignment tool needs to parse and validate the json

2015-05-23 Thread Joe Stein (JIRA)

Joe Stein created KAFKA-2218:


 Summary: reassignment tool needs to parse and validate the json
 Key: KAFKA-2218
 URL: https://issues.apache.org/jira/browse/KAFKA-2218
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Priority: Critical
 Fix For: 0.8.3


Ran into a production issue with the broker.id being set to a string instead of 
integer and the controller had nothing in the log and stayed stuck. Eventually 
we saw this in the log of the brokers where coming from 


me  11:42 AM
[2015-05-23 15:41:05,863] 67396362 [ZkClient-EventThread-14-ERROR 
org.I0Itec.zkclient.ZkEventThread - Error handling event ZkEvent[Data of 
/admin/reassign_partitions changed sent to 
kafka.controller.PartitionsReassignedListener@78c6aab8]
java.lang.ClassCastException: java.lang.String cannot be cast to 
java.lang.Integer
 at scala.runtime.BoxesRunTime.unboxToInt(Unknown Source)
 at kafka.controller.KafkaController$$anonfun$4.apply(KafkaController.scala:579)

we then had to delete the znode from zookeeper (admin/reassign_partition) and 
then fix the json and try it again





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [Vote] KIP-11 Authorization design for kafka security

2015-05-18 Thread Joe Stein

+1

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, May 15, 2015 at 7:35 PM, Jun Rao j...@confluent.io wrote:

 +1

 Thanks,

 Jun

 On Fri, May 15, 2015 at 9:18 AM, Parth Brahmbhatt 
 pbrahmbh...@hortonworks.com wrote:

  Hi,
 
  Opening the voting thread for KIP-11.
 
  Link to the KIP:
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+Interface
  Link to Jira: https://issues.apache.org/jira/browse/KAFKA-1688
 
  Thanks
  Parth

[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function

2015-05-18 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547838#comment-14547838
 ] 

Joe Stein commented on KAFKA-1778:
--

I was thinking that the broker when starting up would have another property. 
can.be.controller=false || can.be.controller=true

If a broker has this value to true, then it can be the controller and the 
thread starts up for the KafkaController, else it doesn't. Should be a few 
lines change in KafkaServer and config mod

 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Add missing API to old high level consumer

2015-05-15 Thread Joe Stein

I peeked at the Java producer SSL changes, haven't tried it yet though. I
can see about getting a Go version to help testing compatibility done in
the next few weeks.

I still don't understand the Auth pieces, I haven't been able to make the
KIP lately I need to try to attend like every other or something.

I will re-read Auth this weekend. I guess my question still is how do I
test it. With SSL it is configure and it works or it doesn't. Auth is not
so straightforward and more opinionated.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Fri, May 15, 2015 at 2:27 AM, Gwen Shapira gshap...@cloudera.com wrote:

 I thought we wanted security on 0.8.3 too... the SSL + Authz patches seem
 close to ready, no?

 On Fri, May 15, 2015 at 3:56 AM, Joe Stein joe.st...@stealth.ly wrote:

  Hey Becket, yeah good point. Officially there is no 0.8.3
  https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan
  release
  planned.
 
  I agree we should have the new consumer beta and a patch for the old one.
  If we do that in 0.8.3 that makes good sense, yup. We should also include
  https://issues.apache.org/jira/browse/KAFKA-1694 server side admin in
  0.8.3
  too. We are testing that next week in a few languages to get it through
  testing first before committing.
 
  Maybe we branch 0.8.3 in the near future so that can go through getting
  released while the security changes for 0.9 get on trunk.
 
  ~ Joe Stein
 
  On Thu, May 14, 2015 at 8:00 PM, Jiangjie Qin j...@linkedin.com.invalid
 
  wrote:
 
   Hey Joe,
  
   Actually this API was asked for before, and we have several use cases
 in
   LinkedIn as well. I thought we have added that in KAFKA-1650 but
  obviously
   I forgot to do that.
  
   My understanding is that we won¹t really deprecate high level consumer
   until we move to 0.9.0. So we can have this API either in 0.8.3 or
   0.8.2.2. Do you mean we only add them to those releases but not put it
   into trunk? Any specific concern on that?
  
   Considering this API has already been provided in new consumer. Adding
   this method probably won¹t cause any API compatibility issue even if
   people move to new consumer later.
   Given it is both backward and forward compatible and is a one line
  change,
   I think it is probably OK to have it added.
  
   Thanks,
  
   Jiangjie (Becket) Qin
  
  
  
   On 5/13/15, 3:18 PM, Joe Stein joe.st...@stealth.ly wrote:
  
   My gut reaction is that this isn't that important for folks otherwise
  they
   would have complained already. If it is a blocker for folks upgrading
 to
   0.8.2.1 then we should do a 0.8.2.2 release with this fix in it. For
   0.9.0.
   we are pushing for folks to start using the new consumer and that is
 the
   upgrade path we should continue on, imho. If we are going to phase out
  the
   scala clients then we need to strive to not be making changes to them
 on
   trunk.
   
   ~ Joe Stein
   - - - - - - - - - - - - - - - - -
   
 http://www.stealth.ly
   - - - - - - - - - - - - - - - - -
   
   On Wed, May 13, 2015 at 6:01 PM, Jiangjie Qin
 j...@linkedin.com.invalid
  
   wrote:
   
Add the DISCUSS prefix to the email title : )
   
From: Jiangjie Qin j...@linkedin.commailto:j...@linkedin.com
Date: Tuesday, May 12, 2015 at 4:51 PM
To: dev@kafka.apache.orgmailto:dev@kafka.apache.org 
dev@kafka.apache.orgmailto:dev@kafka.apache.org
Subject: Add missing API to old high level consumer
   
Hi,
   
I just noticed that in KAFKA-1650 (which is before we use KIP) we
  added
   an
offset commit method in high level consumer that commits offsets
  using a
user provided offset map.
   
public void commitOffsets(MapTopicPartition, OffsetAndMetadata
offsetsToCommit, boolean retryOnFailure);
   
This method was added to all the Scala classes but I forgot to add
 it
  to
Java API of ConsumerConnector. (Already regretting now. . .)
This method is very useful in several cases and has been asked for
  from
time to time. For example, people have several threads consuming
   messages
and processing them. Without this method, one thread will
 unexpectedly
commit offsets for another thread, thus might lose some messages if
something goes wrong.
   
I created KAFKA-2186 and hope we can add this missing method into
 the
   Java
API of old high level consumer (literarily one line change).
Although this method should have been there since KAFKA-1650,
 adding
   this
method to Java API now is a public API change, just want to see if
   people
think we need a KIP for this.
   
Thanks.
   
Jiangjie (Becket) Qin

Re: [DISCUSS] Add missing API to old high level consumer

2015-05-14 Thread Joe Stein

Hey Becket, yeah good point. Officially there is no 0.8.3
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan release
planned.

I agree we should have the new consumer beta and a patch for the old one.
If we do that in 0.8.3 that makes good sense, yup. We should also include
https://issues.apache.org/jira/browse/KAFKA-1694 server side admin in 0.8.3
too. We are testing that next week in a few languages to get it through
testing first before committing.

Maybe we branch 0.8.3 in the near future so that can go through getting
released while the security changes for 0.9 get on trunk.

~ Joe Stein

On Thu, May 14, 2015 at 8:00 PM, Jiangjie Qin j...@linkedin.com.invalid
wrote:

 Hey Joe,

 Actually this API was asked for before, and we have several use cases in
 LinkedIn as well. I thought we have added that in KAFKA-1650 but obviously
 I forgot to do that.

 My understanding is that we won¹t really deprecate high level consumer
 until we move to 0.9.0. So we can have this API either in 0.8.3 or
 0.8.2.2. Do you mean we only add them to those releases but not put it
 into trunk? Any specific concern on that?

 Considering this API has already been provided in new consumer. Adding
 this method probably won¹t cause any API compatibility issue even if
 people move to new consumer later.
 Given it is both backward and forward compatible and is a one line change,
 I think it is probably OK to have it added.

 Thanks,

 Jiangjie (Becket) Qin



 On 5/13/15, 3:18 PM, Joe Stein joe.st...@stealth.ly wrote:

 My gut reaction is that this isn't that important for folks otherwise they
 would have complained already. If it is a blocker for folks upgrading to
 0.8.2.1 then we should do a 0.8.2.2 release with this fix in it. For
 0.9.0.
 we are pushing for folks to start using the new consumer and that is the
 upgrade path we should continue on, imho. If we are going to phase out the
 scala clients then we need to strive to not be making changes to them on
 trunk.
 
 ~ Joe Stein
 - - - - - - - - - - - - - - - - -
 
   http://www.stealth.ly
 - - - - - - - - - - - - - - - - -
 
 On Wed, May 13, 2015 at 6:01 PM, Jiangjie Qin j...@linkedin.com.invalid
 wrote:
 
  Add the DISCUSS prefix to the email title : )
 
  From: Jiangjie Qin j...@linkedin.commailto:j...@linkedin.com
  Date: Tuesday, May 12, 2015 at 4:51 PM
  To: dev@kafka.apache.orgmailto:dev@kafka.apache.org 
  dev@kafka.apache.orgmailto:dev@kafka.apache.org
  Subject: Add missing API to old high level consumer
 
  Hi,
 
  I just noticed that in KAFKA-1650 (which is before we use KIP) we added
 an
  offset commit method in high level consumer that commits offsets using a
  user provided offset map.
 
  public void commitOffsets(MapTopicPartition, OffsetAndMetadata
  offsetsToCommit, boolean retryOnFailure);
 
  This method was added to all the Scala classes but I forgot to add it to
  Java API of ConsumerConnector. (Already regretting now. . .)
  This method is very useful in several cases and has been asked for from
  time to time. For example, people have several threads consuming
 messages
  and processing them. Without this method, one thread will unexpectedly
  commit offsets for another thread, thus might lose some messages if
  something goes wrong.
 
  I created KAFKA-2186 and hope we can add this missing method into the
 Java
  API of old high level consumer (literarily one line change).
  Although this method should have been there since KAFKA-1650,  adding
 this
  method to Java API now is a public API change, just want to see if
 people
  think we need a KIP for this.
 
  Thanks.
 
  Jiangjie (Becket) Qin

Re: [DISCUSS] Add missing API to old high level consumer

2015-05-13 Thread Joe Stein

My gut reaction is that this isn't that important for folks otherwise they
would have complained already. If it is a blocker for folks upgrading to
0.8.2.1 then we should do a 0.8.2.2 release with this fix in it. For 0.9.0.
we are pushing for folks to start using the new consumer and that is the
upgrade path we should continue on, imho. If we are going to phase out the
scala clients then we need to strive to not be making changes to them on
trunk.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Wed, May 13, 2015 at 6:01 PM, Jiangjie Qin j...@linkedin.com.invalid
wrote:

 Add the DISCUSS prefix to the email title : )

 From: Jiangjie Qin j...@linkedin.commailto:j...@linkedin.com
 Date: Tuesday, May 12, 2015 at 4:51 PM
 To: dev@kafka.apache.orgmailto:dev@kafka.apache.org 
 dev@kafka.apache.orgmailto:dev@kafka.apache.org
 Subject: Add missing API to old high level consumer

 Hi,

 I just noticed that in KAFKA-1650 (which is before we use KIP) we added an
 offset commit method in high level consumer that commits offsets using a
 user provided offset map.

 public void commitOffsets(MapTopicPartition, OffsetAndMetadata
 offsetsToCommit, boolean retryOnFailure);

 This method was added to all the Scala classes but I forgot to add it to
 Java API of ConsumerConnector. (Already regretting now. . .)
 This method is very useful in several cases and has been asked for from
 time to time. For example, people have several threads consuming messages
 and processing them. Without this method, one thread will unexpectedly
 commit offsets for another thread, thus might lose some messages if
 something goes wrong.

 I created KAFKA-2186 and hope we can add this missing method into the Java
 API of old high level consumer (literarily one line change).
 Although this method should have been there since KAFKA-1650,  adding this
 method to Java API now is a public API change, just want to see if people
 think we need a KIP for this.

 Thanks.

 Jiangjie (Becket) Qin

Re: KAFKA-1977

2015-05-09 Thread Joe Stein

Hey Will, I took a quick look at your patch  JIRA. For patches could you
please use
https://cwiki.apache.org/confluence/display/KAFKA/Patch+submission+and+review
.

I think changing the behavior of the scala high level consumer will be
confusing for folks. If this is something we want todo then it should be in
the new java consumer, please.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Sat, May 9, 2015 at 2:52 PM, Will Funnell w.f.funn...@gmail.com wrote:

 Hey,

 Still waiting for some feedback on the patch for a while now, please can
 you take a look.

 Many thanks,

 Will.

[jira] [Created] (KAFKA-2180) topics never create on brokers though it succeeds in tool and is in zookeeper

2015-05-07 Thread Joe Stein (JIRA)

Joe Stein created KAFKA-2180:


 Summary: topics never create on brokers though it succeeds in tool 
and is in zookeeper
 Key: KAFKA-2180
 URL: https://issues.apache.org/jira/browse/KAFKA-2180
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.2
Reporter: Joe Stein
Priority: Critical
 Fix For: 0.8.3


Ran into an issue with a 0.8.2.1 cluster where create topic was succeeding when 
running bin/kafka-topics.sh --create and seen in zookeeper but brokers never 
get updated. 

We ended up fixing this by deleting the /controller znode so controller leader 
election would result. Wwe really should have some better way to make the 
controller failover ( KAFKA-1778 ) than rmr /controller in the zookeeper shell




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-2179) no graceful nor fast way to shutdown every broker without killing them

2015-05-07 Thread Joe Stein (JIRA)

Joe Stein created KAFKA-2179:


 Summary: no graceful nor fast way to shutdown every broker without 
killing them
 Key: KAFKA-2179
 URL: https://issues.apache.org/jira/browse/KAFKA-2179
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.2
Reporter: Joe Stein
Priority: Minor
 Fix For: 0.8.3


if you do a controlled shutdown of every broker at the same time the controlled 
shutdown process spins out of control. Every leader can't go anywhere because 
every broker is trying to controlled shutdown itself. The result is the brokers 
take a long (variable) time before it eventually does actually shutdown.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] KIP-4 Admin Commands / Phase-1

2015-05-06 Thread Joe Stein

+1 (binding)

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Tue, May 5, 2015 at 11:16 AM, Andrii Biletskyi 
andrii.bilets...@stealth.ly wrote:

 Hi all,

 This is a voting thread for KIP-4 Phase-1. It will include Wire protocol
 changes
 and server side handling code.


 https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations

 Thanks,
 Andrii Biletskyi

Re: [DISCUSS] KIP-21 Configuration Management

2015-05-04 Thread Joe Stein

Aditya, when I think about the motivation of not having to restart brokers
to change a config I think about all of the configurations I have seen
having to get changed in brokers and restarted (which is just about all of
them). What I mean by stop the world is when producers and/or consumers
will not be able to use the broker(s) for a period of time or something
within the broker holds/blocks everything for the changes to take affect
and LeaderElection is going to occur or ISR change.

Lets say someone wanted to change replicaFetchMaxBytes
or replicaFetchBackoffMs dynamically you would have to stop the
ReplicaFetcherManager. If you use a watcher then then all brokers at the
same time will have to stop and (hopefully) start ReplicaFetcherManager at
the same time. Or lets say someone wanted to change NumNetworkThreads, the
entire SocketServer for every broker at the same time would have to stop
and (hopefully) start.I believe most of the configurations fall into this
category and using a watcher notification to every broker without some
control is going to be a problem. If the notification just goes to the
controller and the controller is able to managing the processing for every
broker that might work but doesn't solve all the problems to be worked on.
We would also have to think about what to-do for the controller broker also
itself (unless we make the controller maybe not a broker as possible) as
well as how to deal with some of these changes that could take brokers in
and out of the ISR or cause Leader Election. If we can make these changes
without stopping the world (not just a matter of having the controller
managing the broker by broker restart) so that Brokers that are leaders
would still be leaders (perhaps the connections for producing / consuming
get buffered or something) when (if) they come back online.

The thing is that lots of folks want all (as many as possible) the
configuration to be dynamic and I am concerned that if we don't code for
the harder cases then we only have one or two configurations able to be
dynamic. If that is the motivation for this KIP so quotas work that is ok.

The more I think about it I am not sure just labeling certain configs to be
dynamic is going to be helpful for folks because they are still having to
manage the updates for all the configurations, restarting brokers and now a
new burden to understand dynamic properties. I think we need to add
solutions for folks where we can to make things easier without having to
add new items for them to contend with.

Thanks!

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Sun, May 3, 2015 at 8:23 PM, Aditya Auradkar 
aaurad...@linkedin.com.invalid wrote:

 Hey Joe,

 Can you elaborate what you mean by a stop the world change? In this
 protocol, we can target notifications to a subset of brokers in the cluster
 (controller if we need to). Is the AdminChangeNotification a ZK
 notification or a request type exposed by each broker?

 Thanks,
 Aditya

 
 From: Joe Stein [joe.st...@stealth.ly]
 Sent: Friday, May 01, 2015 5:25 AM
 To: dev@kafka.apache.org
 Subject: Re: [DISCUSS] KIP-21 Configuration Management

 Hi Aditya, thanks for the write up and focusing on this piece.

 Agreed we need something that we can do broker changes dynamically without
 rolling restarts.

 I think though if every broker is getting changes it with notifications it
 is going to limit which configs can be dynamic.

 We could never deliver a stop the world configuration change because then
 that would happen on the entire cluster to every broker on the same time.

 Can maybe just the controller get the notification?

 And we provide a layer for brokers to work with the controller to-do the
 config change operations at is discretion (so it can stop things if needs).

 controller gets notification, sends AdminChangeNotification to broker [X ..
 N] then brokers can do their things, even send a response for heartbeating
 while it takes the few milliseconds it needs or crashes. We need to go
 through both scenarios.

 I am worried we put this change in like this and it works for quotas and
 maybe a few other things but nothing else gets dynamic and we don't get far
 enough for almost no more rolling restarts.

 ~ Joe Stein
 - - - - - - - - - - - - - - - - -

   http://www.stealth.ly
 - - - - - - - - - - - - - - - - -

 On Thu, Apr 30, 2015 at 8:14 PM, Joel Koshy jjkosh...@gmail.com wrote:

  1. I have deep concerns about managing configuration in ZooKeeper.
  First, Producers and Consumers shouldn't depend on ZK at all, this
  seems
  to add back a dependency we are trying to get away from.
 
  The KIP probably needs to be clarified here - I don't think Aditya was
  referring to client (producer/consumer) configs. These are global
  client-id-specific configs that need to be managed centrally.
  (Specifically, quota overrides on a per-client basis).

Re: [DISCUSS] KIP-21 Configuration Management

2015-05-01 Thread Joe Stein

Hi Aditya, thanks for the write up and focusing on this piece.

Agreed we need something that we can do broker changes dynamically without
rolling restarts.

I think though if every broker is getting changes it with notifications it
is going to limit which configs can be dynamic.

We could never deliver a stop the world configuration change because then
that would happen on the entire cluster to every broker on the same time.

Can maybe just the controller get the notification?

And we provide a layer for brokers to work with the controller to-do the
config change operations at is discretion (so it can stop things if needs).

controller gets notification, sends AdminChangeNotification to broker [X ..
N] then brokers can do their things, even send a response for heartbeating
while it takes the few milliseconds it needs or crashes. We need to go
through both scenarios.

I am worried we put this change in like this and it works for quotas and
maybe a few other things but nothing else gets dynamic and we don't get far
enough for almost no more rolling restarts.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Apr 30, 2015 at 8:14 PM, Joel Koshy jjkosh...@gmail.com wrote:

 1. I have deep concerns about managing configuration in ZooKeeper.
 First, Producers and Consumers shouldn't depend on ZK at all, this
 seems
 to add back a dependency we are trying to get away from.

 The KIP probably needs to be clarified here - I don't think Aditya was
 referring to client (producer/consumer) configs. These are global
 client-id-specific configs that need to be managed centrally.
 (Specifically, quota overrides on a per-client basis).

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein

Hi, sorry I am coming in late to chime back in on this thread and haven't
been able to make the KIP hangouts the last few weeks. Sorry if any of this
was brought up already or I missed it.

I read through the KIP and the thread(s) and a couple of things jumped out.


   - Can we break out the open issues in JIRA (maybe during the hangout)
   that are in the KIP and resolve/flesh those out more?



   - I don't see any updates with the systems test or how we can know the
   code works.



   - We need some implementation/example/sample that we know can work in
   all different existing entitlement servers and not just ones that run in
   types of data centers too. I am not saying we should support everything but
   if someone had to implement
   https://docs.oracle.com/cd/E19225-01/820-6551/bzafm/index.html with
   Kafka it has to work for them out of the box.



   - We should shy away from storing JSON in Zookeeper. Lets store bytes in
   Storage.



   - We should spend some time thinking through exceptions in the wire
   protocol maybe as part of this so it can keep moving forward.


~ Joe Stein

On Tue, Apr 28, 2015 at 3:33 AM, Sun, Dapeng dapeng@intel.com wrote:

 Thank you for your reply, Gwen.

 1. Complex rule systems can be difficult to reason about and therefore
 end up being less secure. The rule Deny always wins is very easy to grasp.
 Yes, I'm agreed with your point: we should not make the rule complex.

 2. We currently don't have any mechanism for specifying IP ranges (or host
 ranges) at all. I think its a pretty significant deficiency, but it does
 mean that we don't need to worry about the issue of blocking a large range
 while unblocking few servers in the range.
 Support ranges sounds reasonable. If this feature will be in development
 plan, I also don't think we can put the best matching acl and  Support
 ip ranges together.

 We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss this
 and other outstanding design issues (not all related to security). If you
 are interested in joining - let me know and I'll forward you the invite.
 Thank you, Gwen. I have the invite and I should be at home at that time.
 But due to network issue, I may can't join the meeting smoothly.

 Regards
 Dapeng

 -Original Message-
 From: Gwen Shapira [mailto:gshap...@cloudera.com]
 Sent: Tuesday, April 28, 2015 1:31 PM
 To: dev@kafka.apache.org
 Subject: Re: [VOTE] KIP-11- Authorization design for kafka security

 While I see the advantage of being able to say something like: deny user
 X from hosts h1...h200 also allow user X from host h189, there are two
 issues here:

 1. Complex rule systems can be difficult to reason about and therefore end
 up being less secure. The rule Deny always wins is very easy to grasp.

 2. We currently don't have any mechanism for specifying IP ranges (or host
 ranges) at all. I think its a pretty significant deficiency, but it does
 mean that we don't need to worry about the issue of blocking a large range
 while unblocking few servers in the range.

 Gwen

 P.S
 We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss this
 and other outstanding design issues (not all related to security). If you
 are interested in joining - let me know and I'll forward you the invite.

 Gwen

 On Mon, Apr 27, 2015 at 10:15 PM, Sun, Dapeng dapeng@intel.com
 wrote:

  Attach the image.
 
  https://raw.githubusercontent.com/sundapeng/attachment/master/kafka-ac
  l1.png
 
  Regards
  Dapeng
 
  From: Sun, Dapeng [mailto:dapeng@intel.com]
  Sent: Tuesday, April 28, 2015 11:44 AM
  To: dev@kafka.apache.org
  Subject: RE: [VOTE] KIP-11- Authorization design for kafka security
 
 
  Thank you for your rapid reply, Parth.
 
 
 
  * I think the wiki already describes the precedence order as Deny
  taking
  precedence over allow when conflicting acls are found
  https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorizati
  on+In
 
  terface#KIP-11-AuthorizationInterface-PermissionType
 
  Got it, thank you.
 
 
 
  * In the first version that I am currently writing there is no group
  support. Even when we add it I don't see the need to add a precedence
  for evaluation. it does not matter which principal matches as long as
 
   we have a match.
 
 
 
  About this part, I think we should choose the best matching acl for
  authorization, no matter we support group or not.
 
 
 
  For the case
 
   [cid:image001.png@01D08197.E94BD410]
 
  https://raw.githubusercontent.com/sundapeng/attachment/master/kafka-ac
  l1.png
 
 
 
  if 2 Acls are defined, one that deny an operation from all hosts and
  one that allows the operation from host1, the operation from host1
  will be denied or allowed?
 
  According wiki Deny will take precedence over Allow in competing
  acls., it seems acl_1 will win the competition, but customers'
  intention may be allow.
 
  I think deny always take precedence over Allow is okay, but  host1
  - user1host1 default may

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein

 j...@confluent.io wrote:
 
  Joe,
  
  Could you elaborate on why we should not store JSON in ZK? So far, all
  existing ZK data are in JSON.
  
  Thanks,
  
  Jun
  
  On Thu, Apr 30, 2015 at 2:06 AM, Joe Stein joe.st...@stealth.ly
 wrote:
  
   Hi, sorry I am coming in late to chime back in on this thread and
  haven't
   been able to make the KIP hangouts the last few weeks. Sorry if any of
  this
   was brought up already or I missed it.
  
   I read through the KIP and the thread(s) and a couple of things jumped
  out.
  
  
  - Can we break out the open issues in JIRA (maybe during the
 hangout)
  that are in the KIP and resolve/flesh those out more?
  
  
  
  - I don't see any updates with the systems test or how we can know
  the
  code works.
  
  
  
  - We need some implementation/example/sample that we know can work
 in
  all different existing entitlement servers and not just ones that
  run in
  types of data centers too. I am not saying we should support
  everything
   but
  if someone had to implement
  https://docs.oracle.com/cd/E19225-01/820-6551/bzafm/index.html
 with
  Kafka it has to work for them out of the box.
  
  
  
  - We should shy away from storing JSON in Zookeeper. Lets store
  bytes in
  Storage.
  
  
  
  - We should spend some time thinking through exceptions in the wire
  protocol maybe as part of this so it can keep moving forward.
  
  
   ~ Joe Stein
  
   On Tue, Apr 28, 2015 at 3:33 AM, Sun, Dapeng dapeng@intel.com
  wrote:
  
Thank you for your reply, Gwen.
   
1. Complex rule systems can be difficult to reason about and
  therefore
end up being less secure. The rule Deny always wins is very easy
 to
   grasp.
Yes, I'm agreed with your point: we should not make the rule
 complex.
   
2. We currently don't have any mechanism for specifying IP ranges
 (or
   host
ranges) at all. I think its a pretty significant deficiency, but it
  does
mean that we don't need to worry about the issue of blocking a large
   range
while unblocking few servers in the range.
Support ranges sounds reasonable. If this feature will be in
  development
plan, I also don't think we can put the best matching acl and 
  Support
ip ranges together.
   
We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss
  this
and other outstanding design issues (not all related to security).
 If
  you
are interested in joining - let me know and I'll forward you the
  invite.
Thank you, Gwen. I have the invite and I should be at home at that
  time.
But due to network issue, I may can't join the meeting smoothly.
   
Regards
Dapeng
   
-Original Message-
From: Gwen Shapira [mailto:gshap...@cloudera.com]
Sent: Tuesday, April 28, 2015 1:31 PM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
   
While I see the advantage of being able to say something like: deny
  user
X from hosts h1...h200 also allow user X from host h189, there
 are
  two
issues here:
   
1. Complex rule systems can be difficult to reason about and
 therefore
   end
up being less secure. The rule Deny always wins is very easy to
  grasp.
   
2. We currently don't have any mechanism for specifying IP ranges
 (or
   host
ranges) at all. I think its a pretty significant deficiency, but it
  does
mean that we don't need to worry about the issue of blocking a large
   range
while unblocking few servers in the range.
   
Gwen
   
P.S
We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss
  this
and other outstanding design issues (not all related to security).
 If
  you
are interested in joining - let me know and I'll forward you the
  invite.
   
Gwen
   
On Mon, Apr 27, 2015 at 10:15 PM, Sun, Dapeng dapeng@intel.com
 
wrote:
   
 Attach the image.


  https://raw.githubusercontent.com/sundapeng/attachment/master/kafka-ac
 l1.png

 Regards
 Dapeng

 From: Sun, Dapeng [mailto:dapeng@intel.com]
 Sent: Tuesday, April 28, 2015 11:44 AM
 To: dev@kafka.apache.org
 Subject: RE: [VOTE] KIP-11- Authorization design for kafka
 security


 Thank you for your rapid reply, Parth.



 * I think the wiki already describes the precedence order as Deny
 taking
 precedence over allow when conflicting acls are found

  https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorizati
 on+In

 terface#KIP-11-AuthorizationInterface-PermissionType

 Got it, thank you.



 * In the first version that I am currently writing there is no
  group
 support. Even when we add it I don't see the need to add a
  precedence
 for evaluation. it does not matter which principal matches as long
  as

  we have a match

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein

Ok, I read through it all again a few times. I get the provider broker
piece now.

The configurations are still confusing if there are 2 or 3 and they should
be called out more specifically than as a change to a class. Configs are a
public interface we should be a bit more explicit.

Was there any discussion about any auditing component? How would anyone
know if the authorization plugin was running for when or what it was doing?

If we can't audit the access then what good is controlling the access?

I still don't see where all the command line configuration options come in.
There are a lot of things to-do with it but not sure how to use it yet.

This plug-in still feels like a very specific case and we should try to
generalize it down some more to make it more straight forward for folks.

~ Joestein

On Thu, Apr 30, 2015 at 3:51 PM, Parth Brahmbhatt 
pbrahmbh...@hortonworks.com wrote:

 During the discussion Jun pointed out that mirror maker, which right now
 does not copy any zookeeper config overrides, will now replicate topics
 but will not replicate any acls. Given the authorizer interface exposes
 the acl management apis, list/get/add/remove, weproposed that mirror
 maker can just instantiate an instance of authorizer and call these apis
 directly to get acls for a topic and add it to the destination cluster if
 we want to add acls to be replicated as part of mirror maker.

 Thanks
 Parth

 On 4/30/15, 12:43 PM, Joe Stein joe.st...@stealth.ly wrote:

 Parth,
 
 Can you explain how Mirror maker will have to start using new acl
 management tool) and it not affect any other client. If you aren't
 changing the wire protocol then how do clients use it?
 
 ~ Joe stein
 
 
 On Thu, Apr 30, 2015 at 3:15 PM, Parth Brahmbhatt 
 pbrahmbh...@hortonworks.com wrote:
 
  Hi Joe,
 
  Regarding open question: I changed the title to “Questions resolved
 after
  community discussions” let me know if you have a better name. I have a
  question and a bullet point under each question describing the final
  decision. Not sure how can I make it any cleaner so appreciate any
  suggestion.
 
  Regarding system tests: I went through a bunch of KIP none of which
  mentions what test cases will be added. Do you want to add a “How do you
  plan to tet” section in the general KIP template or you think this is
  just a special case where the test cases should be listed and discussed
 as
  part of KIP? I am not sure if KIP really is the right forum for this
  discussion. This can easily be addressed during code review if people
  think we don’t have enough test coverage.
 
  I am still not sure which part is not clear. The scal exception is
 added
  for internal server side rpresentation. In the end all of our responses
  always return just an error code for which we will add an
  AuthorizationErroCode mapped to AuthorizationException. The error code
 it
  self will not reveal any informationother then the fact that you are
 not
  authorized to perform an operation on a resource and you will get this
  error code even for non existent topics if no acls exist for those
 topics.
 
   can add a diagram if that makes things more clear, I am not convinced
  its needed given we have come so far without it. Essentially there are 3
  steps
  * users use the acl cli to add acls to their
 topics/groups/cluster
  * brokers start with a broker config that specifies what
 authorizer
  iplementation to use.
  * every api request first goes through the authorizer and fails
 if
  authorizer denies it. (authorizer implementation described in the doc
 with
  pseudo code)
 
  Note: Authentication/Wire Encryption is a separate piece and is being
  discussed actively in another KIP if that is the detail you are looking
  for.
 
  I think the description under this section
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+
 In
  terface#KIP-11-AuthorizationInterface-DataFlows captures the internal
  details.
 
  Thanks
  Parth
 
  On 4/30/15, 11:24 AM, Joe Stein joe.st...@stealth.ly wrote:
 
  Gwen  regarding additional authorizers
  
  I think having these i the system tests duals as both good confidence
 in
  language independency of the changes. It also makes sure that when we
  release that we don't go breaking Sentry or Ranger or anyone else that
  wants to integrate.
  
  Gwen  Regading AuthorizationException
  
  Yeah so I have two issues. The one you raised yes, 100%. Also I don't
  unerstand how that is not a broker wire protocol response and only a
 JVM
  exception.
  
  Jun  Could you elaborate on why we should not store JSON in ZK? So
 far,
  all existing ZK data are in JSON.
  
  If I have 1,000,000 users in LDAP and 150 get access to Kafka topics
  through this mechanism then I have to go and parse and push all of my
  changes into zookeeper for it to take affect?
  
  If someone wanted to implement SAML I don't think this would work. Not
  sure
  how it wold work with NiFi either

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein

If you have bucket A and Bucket B and in Bucket A there are patients with
Disease X and Bucket B patients without Disease X.

Now you try to access Alice from bucket A and you get a 403  and then
from Bucket B you get a 404.

What does that tell you now about Alice? Yup, she has Disease X.

Uniform none existence is a good policy for protecting data. If you don't
have permission then 404 not found works too.

The context that I thought that applied with this discussion is because I
thought the authorization module was going to be a bit more integration
where the api responses were happening

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Apr 30, 2015 at 6:51 PM, Suresh Srinivas sur...@hortonworks.com
wrote:

 Comment on AuthorizationException. I think the intent of exception should
 be to capture why a request is rejected. It is important from API
 perspective to be specific to aid debugging. Having a generic or obfuscated
 exception is not very useful. Does someone on getting an exception reach
 out to an admin to understand if a topic exists or it's an authorization
 issue?

 I am not getting the security concern. System must be ensure disallowing
 the access by implementing the security correctly. Not based on security by
 obscurity.

 Regards,
 Suresh

 Sent from phone

 _
 From: Gwen Shapira gshap...@cloudera.commailto:gshap...@cloudera.com
 Sent: Thursday, April 30, 2015 10:14 AM
 Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
 To: dev@kafka.apache.orgmailto:dev@kafka.apache.org


 * Regarding additional authorizers:
 Prasad, who is a PMC on Apache Sentry reviewed the design and confirmed
 Sentry can integrate with the current APIs. Dapeng Sun, a committer on
 Sentry had some concerns about the IP privileges and how we prioritize
 privileges - but nothing that prevents Sentry from integrating with the
 existing solution, from what I could see. It seems to me that the design is
 very generic and adapters can be written for other authorization systems
 (after all, you just need to implement setACL, getACL and Authorize - all
 pretty basic), although I can't speak for Oracle's Identity Manager
 specifically.

 * Regarding AuthorizationException to indicate that an operation was not
 authorized: Sorry I missed this in previous reviewed, but now that I look
 at it - Many systems intentionally don't return AuthorizationException when
 READ privilege is missing, since this already gives too much information
 (that the topic exists and that you don't have privileges on it). Instead
 they return a variant of doesn't exist. I'm wondering if this approach is
 applicable / desirable for Kafka as well.
 Note that this doesn't remove the need for AuthorizationException - I'm
 just suggesting a possible refinement on its use.

 Gwen



 On Thu, Apr 30, 2015 at 9:52 AM, Parth Brahmbhatt 
 pbrahmbh...@hortonworks.commailto:pbrahmbh...@hortonworks.com wrote:

  Hi Joe, Thanks for taking the time to review.
 
  * All the open issues already have a resolution , I can open a jira for
  each one and add the resolution to it and resolve them immediately if you
  want this for tracking purposes.
  * We will update system tests to verify that the code works. We have
  thorough unit tests for all the new code except for modifications made to
  KafkaAPI as that has way too many dependencies to be mocked which I guess
  is the reason for no existing unit tests.
  * I don’t know if I completely understand the concern. We have talked
 with
  Ranger team (Don Bosco Durai) so we at least have one custom authorizer
  implementation that has approved this design and they will be able to
  inject their authorization framework with current interfaces. Do you see
  any issue with the design which will prevent anyone from providing a
  custom implementation?
  * Did not understand the concern around wire protocol, we are adding
  AuthorizationException to indicate that an operation was not authorized.
 
  Thanks
  Parth
 
  On 4/30/15, 5:59 AM, Jun Rao j...@confluent.iomailto:j...@confluent.io
 wrote:
 
  Joe,
  
  Could you elaborate on why we should not store JSON in ZK? So far, all
  existing ZK data are in JSON.
  
  Thanks,
  
  Jun
  
  On Thu, Apr 30, 2015 at 2:06 AM, Joe Stein joe.st...@stealth.ly
 mailto:joe.st...@stealth.ly wrote:
  
   Hi, sorry I am coming in late to chime back in on this thread and
  haven't
   been able to make the KIP hangouts the last few weeks. Sorry if any of
  this
   was brought up already or I missed it.
  
   I read through the KIP and the thread(s) and a couple of things jumped
  out.
  
  
  - Can we break out the open issues in JIRA (maybe during the
 hangout)
  that are in the KIP and resolve/flesh those out more?
  
  
  
  - I don't see any updates with the systems test or how we can know
  the
  code works.
  
  
  
  - We need some implementation/example

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein

I kind of thought of the authorization module as something that happens in
handle(request: RequestChannel.Reuqest) in the request.requestId match

If the request doesn't do what it is allowed too it should stop right
there. That what it is allowed to-do is a true/false callback to the
class loaded with 1 function to accept the data and some more about what it
is about (that we have access to).

I think all of the other features are awesome but you can build them on top
of this and then other can do the same.

I am more hooked on the authorization module being a watch dog above
handle() than I am on the plug-in implementation options (less is more
imho).

If we do this approach the audit fits in nice because we are seeing more
what happens in one place and decision made for access right there.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Apr 30, 2015 at 6:59 PM, Suresh Srinivas sur...@hortonworks.com
wrote:

 Joe,

 Can you add more details on what generalization looks like? Also is this a
 design issue or code issue?

 One more question. Does Kafka have audit capabilities today for topic
 creation, deletion, access etc.?

 Regards,
 Suresh

 Sent from phone

 _
 From: Joe Stein joe.st...@stealth.lymailto:joe.st...@stealth.ly
 Sent: Thursday, April 30, 2015 3:27 PM
 Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
 To: dev@kafka.apache.orgmailto:dev@kafka.apache.org


 Ok, I read through it all again a few times. I get the provider broker
 piece now.

 The configurations are still confusing if there are 2 or 3 and they should
 be called out more specifically than as a change to a class. Configs are a
 public interface we should be a bit more explicit.

 Was there any discussion about any auditing component? How would anyone
 know if the authorization plugin was running for when or what it was doing?

 If we can't audit the access then what good is controlling the access?

 I still don't see where all the command line configuration options come in.
 There are a lot of things to-do with it but not sure how to use it yet.

 This plug-in still feels like a very specific case and we should try to
 generalize it down some more to make it more straight forward for folks.

 ~ Joestein

 On Thu, Apr 30, 2015 at 3:51 PM, Parth Brahmbhatt 
 pbrahmbh...@hortonworks.commailto:pbrahmbh...@hortonworks.com wrote:

  During the discussion Jun pointed out that mirror maker, which right now
  does not copy any zookeeper config overrides, will now replicate topics
  but will not replicate any acls. Given the authorizer interface exposes
  the acl management apis, list/get/add/remove, weproposed that mirror
  maker can just instantiate an instance of authorizer and call these apis
  directly to get acls for a topic and add it to the destination cluster if
  we want to add acls to be replicated as part of mirror maker.
 
  Thanks
  Parth
 
  On 4/30/15, 12:43 PM, Joe Stein joe.st...@stealth.lymailto:
 joe.st...@stealth.ly wrote:
 
  Parth,
  
  Can you explain how Mirror maker will have to start using new acl
  management tool) and it not affect any other client. If you aren't
  changing the wire protocol then how do clients use it?
  
  ~ Joe stein
  
  
  On Thu, Apr 30, 2015 at 3:15 PM, Parth Brahmbhatt 
  pbrahmbh...@hortonworks.commailto:pbrahmbh...@hortonworks.com wrote:
  
   Hi Joe,
  
   Regarding open question: I changed the title to “Questions resolved
  after
   community discussions” let me know if you have a better name. I have a
   question and a bullet point under each question describing the final
   decision. Not sure how can I make it any cleaner so appreciate any
   suggestion.
  
   Regarding system tests: I went through a bunch of KIP none of which
   mentions what test cases will be added. Do you want to add a “How do
 you
   plan to tet” section in the general KIP template or you think this is
   just a special case where the test cases should be listed and
 discussed
  as
   part of KIP? I am not sure if KIP really is the right forum for this
   discussion. This can easily be addressed during code review if people
   think we don’t have enough test coverage.
  
   I am still not sure which part is not clear. The scal exception is
  added
   for internal server side rpresentation. In the end all of our
 responses
   always return just an error code for which we will add an
   AuthorizationErroCode mapped to AuthorizationException. The error code
  it
   self will not reveal any informationother then the fact that you are
  not
   authorized to perform an operation on a resource and you will get this
   error code even for non existent topics if no acls exist for those
  topics.
  
can add a diagram if that makes things more clear, I am not convinced
   its needed given we have come so far without it. Essentially there
 are 3
   steps
   * users use the acl cli to add acls

[jira] [Commented] (KAFKA-2132) Move Log4J appender to clients module

2015-04-26 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14513110#comment-14513110
 ] 

Joe Stein commented on KAFKA-2132:
--

+1 - make a stand-alone module called log4j/ that has this one class, move the 
class to Java so people don't have the scala dependency

 Move Log4J appender to clients module
 -

 Key: KAFKA-2132
 URL: https://issues.apache.org/jira/browse/KAFKA-2132
 Project: Kafka
  Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: Ashish K Singh

 Log4j appender is just a producer.
 Since we have a new producer in the clients module, no need to keep Log4J 
 appender in core and force people to package all of Kafka with their apps.
 Lets move the Log4jAppender to clients module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2132) Move Log4J appender to clients module

2015-04-21 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505370#comment-14505370
 ] 

Joe Stein commented on KAFKA-2132:
--

Can we put it in the new new admin client tools jar that KAFKA-1694 is 
creating?  
tools/src/main/java/org/apache/kafka/loggers/KafkaLog4JAppenderBasic.java or 
something... That is all Java code and think the Log4j being in Java code would 
be preferable.

 Move Log4J appender to clients module
 -

 Key: KAFKA-2132
 URL: https://issues.apache.org/jira/browse/KAFKA-2132
 Project: Kafka
  Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: Ashish K Singh

 Log4j appender is just a producer.
 Since we have a new producer in the clients module, no need to keep Log4J 
 appender in core and force people to package all of Kafka with their apps.
 Lets move the Log4jAppender to clients module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Can't see KIP Template after click Create on https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

2015-04-20 Thread Joe Stein

give it a try now

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Mon, Apr 20, 2015 at 9:22 PM, Honghai Chen honghai.c...@microsoft.com
wrote:

 Username: waldenchen
 Email:waldenc...@163.com

 Thanks, Honghai

 -Original Message-
 From: Joe Stein [mailto:joe.st...@stealth.ly]
 Sent: Tuesday, April 21, 2015 9:19 AM
 To: dev@kafka.apache.org
 Subject: Re: Can't see KIP Template after click Create on
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

 What is your confluence username?

 ~ Joe Stein
 - - - - - - - - - - - - - - - - -

   http://www.stealth.ly
 - - - - - - - - - - - - - - - - -

 On Mon, Apr 20, 2015 at 9:15 PM, Honghai Chen honghai.c...@microsoft.com
 wrote:

   Hi dear dev,
 
  Need create on KIP with title “Add one configuration
  log.preallocate” for https://issues.apache.org/jira/browse/KAFKA-1646
 
  But can't see KIP Template after click Create on
  https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Pr
  oposals
 
  The below picture is what I see.
 
  Can you see it? Is there any where can get the
  permission or setting?
 
 
 
 
 
 
 
 
 
  Thanks, Honghai

Re: Can't see KIP Template after click Create on https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

2015-04-20 Thread Joe Stein

What is your confluence username?

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Mon, Apr 20, 2015 at 9:15 PM, Honghai Chen honghai.c...@microsoft.com
wrote:

  Hi dear dev,

 Need create on KIP with title “Add one configuration
 log.preallocate” for https://issues.apache.org/jira/browse/KAFKA-1646

 But can't see KIP Template after click Create on
 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

 The below picture is what I see.

 Can you see it? Is there any where can get the permission
 or setting?









 Thanks, Honghai

Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations (Thread 2)

2015-04-16 Thread Joe Stein

1. agreed

2. agree new error

3. having discrete operations for tasks makes sense, combining them is
confusing for users I think. + 1 for let user change only one thing at a
time

4. lets be consistent both to the new code and existing code. lets not
confuse the user but give them the right error information so they know
what they did wrong without much fuss.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Wed, Apr 15, 2015 at 1:23 PM, Andrii Biletskyi 
andrii.bilets...@stealth.ly wrote:

 Guys,

 Thanks for the discussion!

 Summary:

 1. Q: How KAFKA-1367 (isr is inconsistent in brokers' metadata cache) can
 affect implementation?
 A: We can fix this issue for the leading broker - ReplicaManager (or
 Partition)
 component should have accurate isr list, then with leading broker
 having correct
 info, to do a describe-topic we will need to define leading brokers
 for partitions
 and ask those for a correct isr list.
 Also, we should consider adding lag information to TMR for each
 follower for
 partition reassignment, as Jun suggested above.

 2. Q: What if user adds different alter commands for the same topic in
 scope
  of one batch request?
 A: Because of the async nature of AlterTopicRequest it will be very
 hard then
 to assemble the expected (in terms of checking whether request is
 complete)
 result if we let users do this. Also it will be very confusing. It
 was proposed not to
 let users do this (probably add new Error for such cases).

 3. Q: AlterTopicRequest semantics: now when we merged AlterTopic and
 ReassingPartitons in which order AlterTopic fields should be
 resolved?
 A: This item is not clear. There was a proposal to let user change only
 one thing at a time, e.g. specify either new Replicas, or
 ReplicaAssignment.
 This can be a simple solution, but it's a very strict rule. E.g.
 currently with
 TopicCommand user can increase nr of partitions and define replica
 assignment
 for newly added partitions. Taking into account item 2. this will
 be even harder
 for user to achieve this.

 4. Q: Do we need such accurate errors returned from the server:
 InvalidArgumentPartitions,
  InvalidArgumentReplicas etc.
 A: I started implementation to add proposed error codes and now I think
 probably
 InvalidArgumentError should be sufficient. We can do simple
 validations on
 the client side (e.g. AdminClient can ensure nr of partitions
 argument is positive),
 others - which can be covered only on server (probably invalid
 topic config,
 replica assignment includes dead broker etc) - will be done on
 server, and in case
 of invalid argument we will return InvalidArgumentError without
 specifying the
 concrete field.

 It'd be great if we could cover these remaining issues, looks like they are
 minor,
 at least related to specific messages, not the overall protocol. - I think
 with that I can
 update confluence page and update patch to reflect all discussed items.
 This patch
 will probably include Wire protocol messages and server-side code to handle
 new
 requests. AdminClient and cli-tool implementation can be the next step.

 Thanks,
 Andrii Biletskyi

 On Wed, Apr 15, 2015 at 7:26 PM, Jun Rao j...@confluent.io wrote:

  Andrii,
 
  500. I think what you suggested also sounds reasonable. Since ISR is only
  maintained accurately at the leader, TMR can return ISR if the broker is
  the leader of a partition. Otherwise, we can return an empty ISR. For
  partition reassignment, it would be useful to know the lag of each
  follower. Again, the leader knows this info. We can probably include that
  info in TMR as well.
 
  300. I think it's probably reasonable to restrict AlterTopicRequest to
  change only one thing at a time, i.e., either partitions, replicas,
 replica
  assignment or config.
 
  Thanks,
 
  Jun
 
  On Mon, Apr 13, 2015 at 10:56 AM, Andrii Biletskyi 
  andrii.bilets...@stealth.ly wrote:
 
   Jun,
  
   404. Great, thanks!
  
   500. If I understand correctly KAFKA-1367 says ISR part of TMR may
   be inconsistent. If so, then I believe all admin commands but
  describeTopic
   are not affected. Let me emphasize that it's about AdminClient
  operations,
   not about Wire Protocol requests. What I mean:
   To verify AdminClient.createTopic we will need (consistent) 'topics'
 set
   from TMR (we don't need isr)
   To verify alterTopic - again, probably 'topics' and 'assigned
 replicas' +
   configs
   To verify deleteTopic - only 'topics'
   To verify preferredReplica - 'leader', 'assigned replicas'
   To verify reassignPartitions - 'assigned replicas' ? (I'm not sure
 about
   this one)
   If everything above is correct, then AdminClient.describeTopic is the
  only
   command under risk. We can actually workaround it - find out

[jira] [Commented] (KAFKA-2079) Support exhibitor

2015-04-01 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391680#comment-14391680
 ] 

Joe Stein commented on KAFKA-2079:
--

I think if we do this (which we should) the work should also go into separating 
out the meta data storage and consensus service (async watchers and leader 
election). Once that is plug-able then you can use Exhibitor, Akka, Consul, 
Zookeeper, etc, whatever folks like to-do. From there I think the project can 
either adopt support for the exhibitor version or stick to the zkclient for out 
of the box support. Zkclient has worked and we should just abstract around it 
to reduce the most risk in code changes. Once its plug-able then Exhibitor can 
get used then too by folks that want to build and support it. 

 Support exhibitor
 -

 Key: KAFKA-2079
 URL: https://issues.apache.org/jira/browse/KAFKA-2079
 Project: Kafka
  Issue Type: Improvement
Reporter: Aaron Dixon

 Exhibitor (https://github.com/Netflix/exhibitor) is a discovery/monitoring 
 solution for managing Zookeeper clusters. It supports use cases like 
 discovery, node replacements and auto-scaling of Zk cluster hosts (so you 
 don't have to manage a fixed set of Zk hosts--especially useful in cloud 
 environments.)
 The easiest way for Kafka to support connection to Zk clusters via exhibitor 
 is to use curator as its client. There is already a separate ticket for this: 
 KAFKA-873



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSSION] Keep docs updated per jira

2015-03-26 Thread Joe Stein

Could we move to git from svn so we can have the docs in the patch with the
code too, easier to review and commit and work on for contribs also? To
start we could do something easy like create a new directory /site cp -r
from svn and then when we release cp -r /site/* svn/site and commit for the
release push to the website... or something.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Mar 26, 2015 at 9:27 PM, Jun Rao j...@confluent.io wrote:

 Hi, Everyone,

 Quite a few jiras these days require documentation changes (e.g., wire
 protocol, ZK layout, configs, jmx, etc). Historically, we have been
 updating the documentation just before we do a release. The issue is that
 some of the changes will be missed since they were done a while back.
 Another way to do that is to keep the docs updated as we complete each
 jira. Currently, our documentations are in the following places.

 wire protocol:

 https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
 ZK layout:

 https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper
 configs/jmx: https://svn.apache.org/repos/asf/kafka/site/083

 We probably don't need to update configs already ported to ConfigDef since
 they can be generated automatically. However, for the rest of the doc
 related changes, keeping they updated per jira seems a better approach.
 What do people think?

 Thanks,

 Jun

[jira] [Updated] (KAFKA-1856) Add PreCommit Patch Testing

2015-03-25 Thread Joe Stein (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joe Stein updated KAFKA-1856:
-
Resolution: Fixed
Fix Version/s: 0.8.3
Status: Resolved (was: Patch Available)

This still requires the jenkins build to get updated with the job I created
https://builds.apache.org/job/KafkaPreCommit/rename?newName=PreCommit-Kafka and
I will ping INFRA about getting that connected.

Thanks!

Add PreCommit Patch Testing
---

Key: KAFKA-1856
URL: https://issues.apache.org/jira/browse/KAFKA-1856
Project: Kafka
Issue Type: Task
Reporter: Ashish K Singh
Assignee: Ashish K Singh
Fix For: 0.8.3

Attachments: KAFKA-1845.result.txt, KAFKA-1856.patch,
KAFKA-1856_2015-01-18_21:43:56.patch, KAFKA-1856_2015-02-04_14:57:05.patch,
KAFKA-1856_2015-02-04_15:44:47.patch

h1. Kafka PreCommit Patch Testing - *Don't wait for it to break*
h2. Motivation
*With great power comes great responsibility* - Uncle Ben. As Kafka user list
is growing, mechanism to ensure quality of the product is required. Quality
becomes hard to measure and maintain in an open source project, because of a
wide community of contributors. Luckily, Kafka is not the first open source
project and can benefit from learnings of prior projects.
PreCommit tests are the tests that are run for each patch that gets attached
to an open JIRA. Based on tests results, test execution framework, test bot,
+1 or -1 the patch. Having PreCommit tests take the load off committers to
look at or test each patch.
h2. Tests in Kafka
h3. Unit and Integraiton Tests
[Unit and Integration
tests|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Unit+and+Integration+Tests]
are cardinal to help contributors to avoid breaking existing functionalities
while adding new functionalities or fixing older ones. These tests, atleast
the ones relevant to the changes, must be run by contributors before
attaching a patch to a JIRA.
h3. System Tests
[System
tests|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+System+Tests]
are much wider tests that, unlike unit tests, focus on end-to-end scenarios
and not some specific method or class.
h2. Apache PreCommit tests
Apache provides a mechanism to automatically build a project and run a series
of tests whenever a patch is uploaded to a JIRA. Based on test execution, the
test framework will comment with a +1 or -1 on the JIRA.
You can read more about the framework here:
http://wiki.apache.org/general/PreCommitBuilds
h2. Plan
# Create a test-patch.py script (similar to the one used in Flume, Sqoop and
other projects) that will take a jira as a parameter, apply on the
appropriate branch, build the project, run tests and report results. This
script should be committed into the Kafka code-base. To begin with, this will
only run unit tests. We can add code sanity checks, system_tests, etc in the
future.
# Create a jenkins job for running the test (as described in
http://wiki.apache.org/general/PreCommitBuilds) and validate that it works
manually. This must be done by a committer with Jenkins access.
# Ask someone with access to https://builds.apache.org/job/PreCommit-Admin/
to add Kafka to the list of projects PreCommit-Admin triggers.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-1856) Add PreCommit Patch Testing

2015-03-24 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379331#comment-14379331
 ] 

Joe Stein commented on KAFKA-1856:
--

Testing file 
[KAFKA-1856_2015-02-04_15%3A44%3A47.patch|https://issues.apache.org/jira/secure/attachment/12696611/KAFKA-1856_2015-02-04_15%3A44%3A47.patch]
 against branch trunk took 0:25:20.725178.

{color:red}Overall:{color} -1 due to 2 errors

{color:red}ERROR:{color} Some unit tests failed (report)
{color:red}ERROR:{color} Failed unit test: 
{{unit.kafka.consumer.PartitionAssignorTest  testRangePartitionAssignor FAILED
}}
{color:green}SUCCESS:{color} Gradle bootstrap was successful
{color:green}SUCCESS:{color} Clean was successful
{color:green}SUCCESS:{color} Patch applied correctly
{color:green}SUCCESS:{color} Patch add/modify test case
{color:green}SUCCESS:{color} Gradle bootstrap was successful
{color:green}SUCCESS:{color} Patch compiled
{color:green}SUCCESS:{color} Checked style for Main
{color:green}SUCCESS:{color} Checked style for Test

This message is automatically generated.

 Add PreCommit Patch Testing
 ---

 Key: KAFKA-1856
 URL: https://issues.apache.org/jira/browse/KAFKA-1856
 Project: Kafka
  Issue Type: Task
Reporter: Ashish K Singh
Assignee: Ashish K Singh
 Attachments: KAFKA-1845.result.txt, KAFKA-1856.patch, 
 KAFKA-1856_2015-01-18_21:43:56.patch, KAFKA-1856_2015-02-04_14:57:05.patch, 
 KAFKA-1856_2015-02-04_15:44:47.patch


 h1. Kafka PreCommit Patch Testing - *Don't wait for it to break*
 h2. Motivation
 *With great power comes great responsibility* - Uncle Ben. As Kafka user list 
 is growing, mechanism to ensure quality of the product is required. Quality 
 becomes hard to measure and maintain in an open source project, because of a 
 wide community of contributors. Luckily, Kafka is not the first open source 
 project and can benefit from learnings of prior projects.
 PreCommit tests are the tests that are run for each patch that gets attached 
 to an open JIRA. Based on tests results, test execution framework, test bot, 
 +1 or -1 the patch. Having PreCommit tests take the load off committers to 
 look at or test each patch.
 h2. Tests in Kafka
 h3. Unit and Integraiton Tests
 [Unit and Integration 
 tests|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+0.9+Unit+and+Integration+Tests]
  are cardinal to help contributors to avoid breaking existing functionalities 
 while adding new functionalities or fixing older ones. These tests, atleast 
 the ones relevant to the changes, must be run by contributors before 
 attaching a patch to a JIRA.
 h3. System Tests
 [System 
 tests|https://cwiki.apache.org/confluence/display/KAFKA/Kafka+System+Tests] 
 are much wider tests that, unlike unit tests, focus on end-to-end scenarios 
 and not some specific method or class.
 h2. Apache PreCommit tests
 Apache provides a mechanism to automatically build a project and run a series 
 of tests whenever a patch is uploaded to a JIRA. Based on test execution, the 
 test framework will comment with a +1 or -1 on the JIRA.
 You can read more about the framework here:
 http://wiki.apache.org/general/PreCommitBuilds
 h2. Plan
 # Create a test-patch.py script (similar to the one used in Flume, Sqoop and 
 other projects) that will take a jira as a parameter, apply on the 
 appropriate branch, build the project, run tests and report results. This 
 script should be committed into the Kafka code-base. To begin with, this will 
 only run unit tests. We can add code sanity checks, system_tests, etc in the 
 future.
 # Create a jenkins job for running the test (as described in 
 http://wiki.apache.org/general/PreCommitBuilds) and validate that it works 
 manually. This must be done by a committer with Jenkins access.
 # Ask someone with access to https://builds.apache.org/job/PreCommit-Admin/ 
 to add Kafka to the list of projects PreCommit-Admin triggers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1912) Create a simple request re-routing facility

2015-03-17 Thread Joe Stein (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joe Stein updated KAFKA-1912:
-
Fix Version/s: 0.8.3

Create a simple request re-routing facility
---

Key: KAFKA-1912
URL: https://issues.apache.org/jira/browse/KAFKA-1912
Project: Kafka
Issue Type: Improvement
Reporter: Jay Kreps
Fix For: 0.8.3

We are accumulating a lot of requests that have to be directed to the correct
server. This makes sense for high volume produce or fetch requests. But it is
silly to put the extra burden on the client for the many miscellaneous
requests such as fetching or committing offsets and so on.
This adds a ton of practical complexity to the clients with little or no
payoff in performance.
We should add a generic request-type agnostic re-routing facility on the
server. This would allow any server to accept a request and forward it to the
correct destination, proxying the response back to the user. Naturally it
needs to do this without blocking the thread.
The result is that a client implementation can choose to be optimally
efficient and manage a local cache of cluster state and attempt to always
direct its requests to the proper server OR it can choose simplicity and just
send things all to a single host and let that host figure out where to
forward it.
I actually think we should implement this more or less across the board, but
some requests such as produce and fetch require more logic to proxy since
they have to be scattered out to multiple servers and gathered back to create
the response. So these could be done in a second phase.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-1927) Replace requests in kafka.api with requests in org.apache.kafka.common.requests

2015-03-17 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1927:
-
Fix Version/s: 0.8.3

 Replace requests in kafka.api with requests in 
 org.apache.kafka.common.requests
 ---

 Key: KAFKA-1927
 URL: https://issues.apache.org/jira/browse/KAFKA-1927
 Project: Kafka
  Issue Type: Improvement
Reporter: Jay Kreps
Assignee: Gwen Shapira
 Fix For: 0.8.3


 The common package introduced a better way of defining requests using a new 
 protocol definition DSL and also includes wrapper objects for these.
 We should switch KafkaApis over to use these request definitions and consider 
 the scala classes deprecated (we probably need to retain some of them for a 
 while for the scala clients).
 This will be a big improvement because
 1. We will have each request now defined in only one place (Protocol.java)
 2. We will have built-in support for multi-version requests
 3. We will have much better error messages (no more cryptic underflow errors)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (KAFKA-1927) Replace requests in kafka.api with requests in org.apache.kafka.common.requests

2015-03-17 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein reassigned KAFKA-1927:


Assignee: Gwen Shapira

 Replace requests in kafka.api with requests in 
 org.apache.kafka.common.requests
 ---

 Key: KAFKA-1927
 URL: https://issues.apache.org/jira/browse/KAFKA-1927
 Project: Kafka
  Issue Type: Improvement
Reporter: Jay Kreps
Assignee: Gwen Shapira
 Fix For: 0.8.3


 The common package introduced a better way of defining requests using a new 
 protocol definition DSL and also includes wrapper objects for these.
 We should switch KafkaApis over to use these request definitions and consider 
 the scala classes deprecated (we probably need to retain some of them for a 
 while for the scala clients).
 This will be a big improvement because
 1. We will have each request now defined in only one place (Protocol.java)
 2. We will have built-in support for multi-version requests
 3. We will have much better error messages (no more cryptic underflow errors)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2028) Unable to start the ZK instance after myid file was missing and had to recreate it.

2015-03-17 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366349#comment-14366349
 ] 

Joe Stein commented on KAFKA-2028:
--

the issue is like related to using the /tmp directory 

dataDir=/tmp/zookeeper

you should not be using the /tmp directory for zookeeper nor kafka (check your 
server.properties log.dirs) data

what could have happened is a reboot which os in that case delete everything in 
/tmp

 Unable to start the ZK instance after myid file was missing and had to 
 recreate it.
 ---

 Key: KAFKA-2028
 URL: https://issues.apache.org/jira/browse/KAFKA-2028
 Project: Kafka
  Issue Type: Bug
  Components: admin
Affects Versions: 0.8.1.1
 Environment: Non Prod
Reporter: InduR

 Created a Dev 3 node cluster environment in Jan and the environment has been 
 up and running without any issues until few days.
  Kafka server stopped running but ZK listener was up .Noticed that the Myid 
 file was missing in all 3 servers.
 Recreated the file when ZK was still running did not help.
 Stopped all of the ZK /kafka server instances and see the following error 
 when starting ZK.
 kafka_2.10-0.8.1.1
 OS : RHEL
 [root@lablx0025 bin]# ./zookeeper-server-start.sh 
 ../config/zookeeper.properties 
 [1] 31053
 [* bin]# [2015-03-17 15:04:33,876] INFO Reading configuration from: 
 ../config/zookeeper.properties (org.apache.zookeeper. 
   
 server.quorum.QuorumPeerConfig)
 [2015-03-17 15:04:33,885] INFO Defaulting to majority quorums 
 (org.apache.zookeeper.server.quorum.QuorumPeerConfig)
 [2015-03-17 15:04:33,911] DEBUG preRegister called. 
 Server=com.sun.jmx.mbeanserver.JmxMBeanServer@4891d863, 
 name=log4j:logger=kafka (k
afka)
 [2015-03-17 15:04:33,915] INFO Starting quorum peer 
 (org.apache.zookeeper.server.quorum.QuorumPeerMain)
 [2015-03-17 15:04:33,940] INFO binding to port 0.0.0.0/0.0.0.0:2181 
 (org.apache.zookeeper.server.NIOServerCnxn)
 [2015-03-17 15:04:33,966] INFO tickTime set to 3000 
 (org.apache.zookeeper.server.quorum.QuorumPeer)
 [2015-03-17 15:04:33,966] INFO minSessionTimeout set to -1 
 (org.apache.zookeeper.server.quorum.QuorumPeer)
 [2015-03-17 15:04:33,966] INFO maxSessionTimeout set to -1 
 (org.apache.zookeeper.server.quorum.QuorumPeer)
 [2015-03-17 15:04:33,966] INFO initLimit set to 5 
 (org.apache.zookeeper.server.quorum.QuorumPeer)
 [2015-03-17 15:04:34,023] ERROR Failed to increment parent cversion for: 
 /consumers/console-consumer-6249/offsets/test (org.apache.zoo 
   
 keeper.server.persistence.FileTxnSnapLog)
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for /consumers/console-consumer-6249/offsets/test
 at 
 org.apache.zookeeper.server.DataTree.incrementCversion(DataTree.java:1218)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:222)
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:150)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
 [2015-03-17 15:04:34,027] FATAL Unable to load database on disk 
 (org.apache.zookeeper.server.quorum.QuorumPeer)
 java.io.IOException: Failed to process transaction type: 2 error: 
 KeeperErrorCode = NoNode for /consumers/console-consumer-6249/offset  
   
s/test
 at 
 org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:152)
 at 
 org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:398)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:143)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:103)
 at 
 org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:76)
 [2015-03-17 15:04:34,027] FATAL Unexpected exception, exiting abnormally

[jira] [Updated] (KAFKA-2023) git clone kafka repository requires https

2015-03-16 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2023:
-
Reviewer: Gwen Shapira

 git clone kafka repository requires https
 -

 Key: KAFKA-2023
 URL: https://issues.apache.org/jira/browse/KAFKA-2023
 Project: Kafka
  Issue Type: Bug
  Components: website
Reporter: Anatoli Fomenko
Assignee: Anatoly Fayngelerin
Priority: Minor
 Attachments: KAFKA-2023.patch


 From http://kafka.apache.org/code.html: 
 Our code is kept in git. You can check it out like this:
   git clone http://git-wip-us.apache.org/repos/asf/kafka.git kafka
 On CentOS 6.5:
 {code}
 $ git clone http://git-wip-us.apache.org/repos/asf/kafka.git kafka
 Initialized empty Git repository in /home/anatoli/git/kafka/.git/
 error: RPC failed; result=22, HTTP code = 405
 {code}
 while:
 {code}
 $ git clone https://git-wip-us.apache.org/repos/asf/kafka.git kafka
 Initialized empty Git repository in /home/anatoli/git/kafka/.git/
 remote: Counting objects: 24607, done.
 remote: Compressing objects: 100% (9212/9212), done.
 remote: Total 24607 (delta 14449), reused 19801 (delta 11465)
 Receiving objects: 100% (24607/24607), 15.61 MiB | 5.85 MiB/s, done.
 Resolving deltas: 100% (14449/14449), done.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (KAFKA-2023) git clone kafka repository requires https

2015-03-16 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein reassigned KAFKA-2023:


Assignee: Anatoly Fayngelerin

 git clone kafka repository requires https
 -

 Key: KAFKA-2023
 URL: https://issues.apache.org/jira/browse/KAFKA-2023
 Project: Kafka
  Issue Type: Bug
  Components: website
Reporter: Anatoli Fomenko
Assignee: Anatoly Fayngelerin
Priority: Minor
 Attachments: KAFKA-2023.patch


 From http://kafka.apache.org/code.html: 
 Our code is kept in git. You can check it out like this:
   git clone http://git-wip-us.apache.org/repos/asf/kafka.git kafka
 On CentOS 6.5:
 {code}
 $ git clone http://git-wip-us.apache.org/repos/asf/kafka.git kafka
 Initialized empty Git repository in /home/anatoli/git/kafka/.git/
 error: RPC failed; result=22, HTTP code = 405
 {code}
 while:
 {code}
 $ git clone https://git-wip-us.apache.org/repos/asf/kafka.git kafka
 Initialized empty Git repository in /home/anatoli/git/kafka/.git/
 remote: Counting objects: 24607, done.
 remote: Compressing objects: 100% (9212/9212), done.
 remote: Total 24607 (delta 14449), reused 19801 (delta 11465)
 Receiving objects: 100% (24607/24607), 15.61 MiB | 5.85 MiB/s, done.
 Resolving deltas: 100% (14449/14449), done.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2023) git clone kafka repository requires https

2015-03-16 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364263#comment-14364263
 ] 

Joe Stein commented on KAFKA-2023:
--

looks like maybe the issue is the version of git, i tried a few other asf repos 
same issue with git 1.7.1 what comes with yum install git

 git clone kafka repository requires https
 -

 Key: KAFKA-2023
 URL: https://issues.apache.org/jira/browse/KAFKA-2023
 Project: Kafka
  Issue Type: Bug
  Components: website
Reporter: Anatoli Fomenko
Priority: Minor
 Attachments: KAFKA-2023.patch


 From http://kafka.apache.org/code.html: 
 Our code is kept in git. You can check it out like this:
   git clone http://git-wip-us.apache.org/repos/asf/kafka.git kafka
 On CentOS 6.5:
 {code}
 $ git clone http://git-wip-us.apache.org/repos/asf/kafka.git kafka
 Initialized empty Git repository in /home/anatoli/git/kafka/.git/
 error: RPC failed; result=22, HTTP code = 405
 {code}
 while:
 {code}
 $ git clone https://git-wip-us.apache.org/repos/asf/kafka.git kafka
 Initialized empty Git repository in /home/anatoli/git/kafka/.git/
 remote: Counting objects: 24607, done.
 remote: Compressing objects: 100% (9212/9212), done.
 remote: Total 24607 (delta 14449), reused 19801 (delta 11465)
 Receiving objects: 100% (24607/24607), 15.61 MiB | 5.85 MiB/s, done.
 Resolving deltas: 100% (14449/14449), done.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-2023) git clone kafka repository requires https

2015-03-16 Thread Joe Stein (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364184#comment-14364184
 ] 

Joe Stein commented on KAFKA-2023:
--

works ok for me on ubuntu and redhat on two different networks

{code}

$ git clone http://git-wip-us.apache.org/repos/asf/kafka.git kafka
Cloning into 'kafka'...
remote: Counting objects: 24607, done.
remote: Compressing objects: 100% (9212/9212), done.
remote: Total 24607 (delta 14447), reused 19803 (delta 11465)
Receiving objects: 100% (24607/24607), 15.62 MiB | 3.46 MiB/s, done.
Resolving deltas: 100% (14447/14447), done.
Checking connectivity... done.
{code}

 git clone kafka repository requires https
 -

 Key: KAFKA-2023
 URL: https://issues.apache.org/jira/browse/KAFKA-2023
 Project: Kafka
  Issue Type: Bug
  Components: website
Reporter: Anatoli Fomenko
Priority: Minor

 From http://kafka.apache.org/code.html: 
 Our code is kept in git. You can check it out like this:
   git clone http://git-wip-us.apache.org/repos/asf/kafka.git kafka
 On CentOS 6.5:
 {code}
 $ git clone http://git-wip-us.apache.org/repos/asf/kafka.git kafka
 Initialized empty Git repository in /home/anatoli/git/kafka/.git/
 error: RPC failed; result=22, HTTP code = 405
 {code}
 while:
 {code}
 $ git clone https://git-wip-us.apache.org/repos/asf/kafka.git kafka
 Initialized empty Git repository in /home/anatoli/git/kafka/.git/
 remote: Counting objects: 24607, done.
 remote: Compressing objects: 100% (9212/9212), done.
 remote: Total 24607 (delta 14449), reused 19801 (delta 11465)
 Receiving objects: 100% (24607/24607), 15.61 MiB | 5.85 MiB/s, done.
 Resolving deltas: 100% (14449/14449), done.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] KIP-16: Replica Lag Tuning

2015-03-15 Thread Joe Stein

+1

one more minor nit, please update the KIP with the link to the discuss
thread too.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Sun, Mar 15, 2015 at 5:27 PM, Neha Narkhede n...@confluent.io wrote:

 +1 on the KIP. Minor nit: it is deemed to not be in ISR because it is not
 caught up = it is deemed to not be in the ISR because it has fallen
 behind for more than a certain amount of time as controlled by this config

 Also took a look at the patch. Looks correct, left review comments. Thanks
 for sharing the test results. This change is going to be great for users!

 On Sat, Mar 14, 2015 at 9:01 AM, Jay Kreps jay.kr...@gmail.com wrote:

  +1
 
  -Jay
 
  On Fri, Mar 13, 2015 at 9:54 AM, Aditya Auradkar 
  aaurad...@linkedin.com.invalid wrote:
 
   Details in the KIP, Jira and RB.
  
  
  
 
 https://cwiki.apache.org/confluence/display/KAFKA/KIP+16+:+Automated+Replica+Lag+Tuning
   https://issues.apache.org/jira/browse/KAFKA-1546
   https://reviews.apache.org/r/31967/
  
   Aditya
  
  
 



 --
 Thanks,
 Neha

KIP hangout next Tuesday?

2015-03-13 Thread Joe Stein

I wanted to propose another KIP hangout next Tuesday @ 2pm ET / 11am PT

I want to try to focus some of the conversations if we can for things
needing to get discussed so lets try to formulate an agenda so items folks
want/need to have discussed can get in and folks have time to review prior.

Right now all i have for the agenda is KIP-4 - new message formats and
other open discussion items

I will send out invites if I miss you or you want in please let me know and
i can update, np.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

[jira] [Updated] (KAFKA-2006) switch the broker server over to the new java protocol definitions

2015-03-12 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2006:
-
Description: 
This was brought up during the the review KAFKA-1694

The latest patch for KAFKA-1694 now uses the new java protocol for the new 
message so the work here will not be bloated for new messages just the ones 
that are already there.

 switch the broker server over to the new java protocol definitions
 --

 Key: KAFKA-2006
 URL: https://issues.apache.org/jira/browse/KAFKA-2006
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
 Fix For: 0.8.3


 This was brought up during the the review KAFKA-1694
 The latest patch for KAFKA-1694 now uses the new java protocol for the new 
 message so the work here will not be bloated for new messages just the ones 
 that are already there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations

2015-03-12 Thread Joe Stein

Guozhang and Tong, I really do like this idea and where your discussion
will lead as it will be very useful for folks to have. I am really
concerned though that we are scope creeping this KIP.

Andrii is already working on following up on ~ 14 different items of
feedback in regards to the core motivations/scope of the KIP. He has
uploaded a new patch already and the KIP based on those items and will be
responding to this thread about that and for what else still requires
discussion hopefully in the next few hours.

I want to make sure we are focusing on the open items still requiring
discussion and stabilizing what we have before trying to introducing more
new features.

Perhaps a new KIP can get added for the new features you are talking about
which can reference this and once this is committed that work can begin for
folks that are able to contribute to work on it?

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Mar 12, 2015 at 9:51 AM, Tong Li liton...@us.ibm.com wrote:

 Guozhang,
  augmenting topic is fine, but as soon as we start doing that, other
 issues follow, for example, access control, who can access the topic, who
 can grant permissions. how the information (metadata) itself gets secured.
 Should the information be saved in ZK or a datastore? Will using a metadata
 file causing long term problems such as file updates/synchronization, once
 we have this metadata file, more people will want to put more stuff in it.
 how can we control the format? K-V pair not good for large data set.
 Clearly there is a need for it, I wonder if we can make this thing
 plugable and provide a default implementation which allows us try different
 solutions and also allow people to completely ignore it if they do not want
 to deal with any of these.

 Thanks.

 Tong Li
 OpenStack  Kafka Community Development
 Building 501/B205
 liton...@us.ibm.com

 [image: Inactive hide details for Guozhang Wang ---03/12/2015 09:39:50
 AM---Folks, Just want to elaborate a bit more on the create-topi]Guozhang
 Wang ---03/12/2015 09:39:50 AM---Folks, Just want to elaborate a bit more
 on the create-topic metadata and batching

 From: Guozhang Wang wangg...@gmail.com
 To: dev@kafka.apache.org dev@kafka.apache.org
 Date: 03/12/2015 09:39 AM
 Subject: Re: [DISCUSS] KIP-4 - Command line and centralized
 administrative operations
 --



 Folks,

 Just want to elaborate a bit more on the create-topic metadata and batching
 describe-topic based on config / metadata in my previous email as we work
 on KAFKA-1694. The main motivation is to have some sort of topic management
 mechanisms, which I think is quite important in a multi-tenant / cloud
 architecture: today anyone can create topics in a shared Kafka cluster, but
 there is no concept or ownership of topics that are created by different
 users. For example, at LinkedIn we basically distinguish topic owners via
 some casual topic name prefix, which is a bit awkward and does not fly as
 we scale our customers. It would be great to use describe-topics such as:

 Describe all topics that is created by me.

 Describe all topics whose retention time is overriden to X.

 Describe all topics whose writable group include user Y (this is related to
 authorization), etc..

 One possible way to achieve this is to add a metadata file in the
 create-topic request, whose value will also be written ZK as we create the
 topic; then describe-topics can choose to batch topics based on 1) name
 regex, 2) config K-V matching, 3) metadata regex, etc.

 Thoughts?

 Guozhang

 On Thu, Mar 5, 2015 at 4:37 PM, Guozhang Wang wangg...@gmail.com wrote:

  Thanks for the updated wiki. A few comments below:
 
  1. Error description in response: I think if some errorCode could
 indicate
  several different error cases then we should really change it to multiple
  codes. In general the errorCode itself would be precise and sufficient
 for
  describing the server side errors.
 
  2. Describe topic request: it would be great to go beyond just batching
 on
  topic name regex for this request. For example, a very common use case of
  the topic command is to list all topics whose config A's value is B. With
  topic name regex then we have to first retrieve __all__ topics's
  description info and then filter at the client end, which will be a huge
  burden on ZK.
 
  3. Config K-Vs in create topic: this is related to the previous point;
  maybe we can add another metadata K-V or just a metadata string along
 side
  with config K-V in create topic like we did for offset commit request.
 This
  field can be quite useful in storing information like owner of the
 topic
  who issue the create command, etc, which is quite important for a
  multi-tenant setting. Then in the describe topic request we can also
 batch
  on regex of the metadata field.
 
  4. Today all the admin operations are async in the sense that command

Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations

2015-03-12 Thread Joe Stein

 Since we are for the first time defining a bunch of new
request formats, I feel it is better to think through the its possible
common use cases and try to incorporate them

Agreed providing we are only talking about the fields and not the
implementation of the functionality.

I worry (only a little) about incorporating fields that are not used
initially but whole heartily believe doing so will outweigh the
pre-optimization criticism because of the requirement to version the
protocol (as you brought up).  We can then use those fields later without
actually implementing the functionality now.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Mar 12, 2015 at 11:08 AM, Guozhang Wang wangg...@gmail.com wrote:

 The reason I want to bring it up sooner than later is that future changing
 a defined request protocol takes quite some effort: we need to bump up the
 version of the request, bump up the ZK path data version, and make sure
 server can handle old versions as well as new ones both from clients and
 from ZK, etc. Since we are for the first time defining a bunch of new
 request formats, I feel it is better to think through the its possible
 common use cases and try to incorporate them, but I am also fine with
 creating another KIP if most people feel it drags too long.

 Guozhang

 On Thu, Mar 12, 2015 at 7:34 AM, Joe Stein joe.st...@stealth.ly wrote:

  Guozhang and Tong, I really do like this idea and where your discussion
  will lead as it will be very useful for folks to have. I am really
  concerned though that we are scope creeping this KIP.
 
  Andrii is already working on following up on ~ 14 different items of
  feedback in regards to the core motivations/scope of the KIP. He has
  uploaded a new patch already and the KIP based on those items and will be
  responding to this thread about that and for what else still requires
  discussion hopefully in the next few hours.
 
  I want to make sure we are focusing on the open items still requiring
  discussion and stabilizing what we have before trying to introducing more
  new features.
 
  Perhaps a new KIP can get added for the new features you are talking
 about
  which can reference this and once this is committed that work can begin
 for
  folks that are able to contribute to work on it?
 
  ~ Joe Stein
  - - - - - - - - - - - - - - - - -
 
http://www.stealth.ly
  - - - - - - - - - - - - - - - - -
 
  On Thu, Mar 12, 2015 at 9:51 AM, Tong Li liton...@us.ibm.com wrote:
 
   Guozhang,
augmenting topic is fine, but as soon as we start doing that,
 other
   issues follow, for example, access control, who can access the topic,
 who
   can grant permissions. how the information (metadata) itself gets
  secured.
   Should the information be saved in ZK or a datastore? Will using a
  metadata
   file causing long term problems such as file updates/synchronization,
  once
   we have this metadata file, more people will want to put more stuff in
  it.
   how can we control the format? K-V pair not good for large data set.
   Clearly there is a need for it, I wonder if we can make this thing
   plugable and provide a default implementation which allows us try
  different
   solutions and also allow people to completely ignore it if they do not
  want
   to deal with any of these.
  
   Thanks.
  
   Tong Li
   OpenStack  Kafka Community Development
   Building 501/B205
   liton...@us.ibm.com
  
   [image: Inactive hide details for Guozhang Wang ---03/12/2015 09:39:50
   AM---Folks, Just want to elaborate a bit more on the
 create-topi]Guozhang
   Wang ---03/12/2015 09:39:50 AM---Folks, Just want to elaborate a bit
 more
   on the create-topic metadata and batching
  
   From: Guozhang Wang wangg...@gmail.com
   To: dev@kafka.apache.org dev@kafka.apache.org
   Date: 03/12/2015 09:39 AM
   Subject: Re: [DISCUSS] KIP-4 - Command line and centralized
   administrative operations
   --
  
  
  
   Folks,
  
   Just want to elaborate a bit more on the create-topic metadata and
  batching
   describe-topic based on config / metadata in my previous email as we
 work
   on KAFKA-1694. The main motivation is to have some sort of topic
  management
   mechanisms, which I think is quite important in a multi-tenant / cloud
   architecture: today anyone can create topics in a shared Kafka cluster,
  but
   there is no concept or ownership of topics that are created by
  different
   users. For example, at LinkedIn we basically distinguish topic owners
 via
   some casual topic name prefix, which is a bit awkward and does not fly
 as
   we scale our customers. It would be great to use describe-topics such
 as:
  
   Describe all topics that is created by me.
  
   Describe all topics whose retention time is overriden to X.
  
   Describe all topics whose writable group include user Y (this is
 related
  to
   authorization), etc..
  
   One possible way

[jira] [Commented] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Joe Stein (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359359#comment-14359359
]

Joe Stein commented on KAFKA-1461:
--

Here is my reasoning. Say you are an operations person. And, in the next
release we tell folks about the KIP to learn and understand changes that affect
them (yada yada language for the release). And something like this isn't in
there. We are changing the behavior of an existing config and removing another.
It makes the communication of behavior incongruent for the changes of a
release. So, while I agree we don't need it for this the reason I even brought
it up was looking at it from the release perspective for what ops folks are
going to be looking at when we get there.

Replica fetcher thread does not implement any back-off behavior
---

Key: KAFKA-1461
URL: https://issues.apache.org/jira/browse/KAFKA-1461
Project: Kafka
Issue Type: Improvement
Components: replication
Affects Versions: 0.8.1.1
Reporter: Sam Meder
Assignee: Sriharsha Chintalapani
Labels: newbie++
Fix For: 0.8.3

Attachments: KAFKA-1461.patch, KAFKA-1461.patch,
KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch

The current replica fetcher thread will retry in a tight loop if any error
occurs during the fetch call. For example, we've seen cases where the fetch
continuously throws a connection refused exception leading to several replica
fetcher threads that spin in a pretty tight loop.
To a much lesser degree this is also an issue in the consumer fetcher thread,
although the fact that erroring partitions are removed so a leader can be
re-discovered helps some.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP 16 - Replica lag tuning

2015-03-12 Thread Joe Stein

Hi Aditya, thanks for the writeup.

Lets say a broker follower goes down. And it is down for an hour or two

When the broker follower comes back up it will start sending fetch requests
(lets say every 2ms which would be under a configured lets say 100ms
(whatever)). Then right away the brokers gets added back to the ISR?

Maybe it is just the wording or how I am reading it... I think/thought that
once the replica is caught up THEN the setting goes into action and as long
as (every 100ms ... whatever) the broker leader is seeing the broker
follower as caught up then it is in the ISR.

Also, what is the definition of caught up now without the number of
messages? If it is === i worry about that not happening in some networks
where it is always off by one or something maybe?

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Mar 12, 2015 at 4:36 PM, Aditya Auradkar 
aaurad...@linkedin.com.invalid wrote:

 I wrote a KIP for this after some discussion on KAFKA-1546.

 https://cwiki.apache.org/confluence/display/KAFKA/KIP+16+:+Automated+Replica+Lag+Tuning

 The RB is here: https://reviews.apache.org/r/31967/

 Thanks,
 Aditya

[jira] [Comment Edited] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Joe Stein (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359359#comment-14359359
]

Joe Stein edited comment on KAFKA-1461 at 3/12/15 8:42 PM:
---

Here is my reasoning. Say you are an operations person. And, in the next
release we tell folks about the KIP to learn and understand changes that affect
them (yada yada language for the release). And something like this isn't in
there. We are changing the behavior of an existing config and removing another.
It makes the communication of behavior incongruent for the changes of a
release. So, while I agree we don't need it technically but for this
consistency reason is why I even brought it up. I was just looking at it from
the release perspective for what ops folks are going to be looking at when we
get there.

was (Author: joestein):
Here is my reasoning. Say you are an operations person. And, in the next
release we tell folks about the KIP to learn and understand changes that affect
them (yada yada language for the release). And something like this isn't in
there. We are changing the behavior of an existing config and removing another.
It makes the communication of behavior incongruent for the changes of a
release. So, while I agree we don't need it for this the reason I even brought
it up was looking at it from the release perspective for what ops folks are
going to be looking at when we get there.

Replica fetcher thread does not implement any back-off behavior
---

Attachments: KAFKA-1461.patch, KAFKA-1461.patch,
KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (KAFKA-1461) Replica fetcher thread does not implement any back-off behavior

2015-03-12 Thread Joe Stein (JIRA)

[
https://issues.apache.org/jira/browse/KAFKA-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joe Stein updated KAFKA-1461:
-
Comment: was deleted

(was: Here is my reasoning. Say you are an operations person. And, in the next
release we tell folks about the KIP to learn and understand changes that affect
them (yada yada language for the release). And something like this isn't in
there. We are changing the behavior of an existing config and removing another.
It makes the communication of behavior incongruent for the changes of a
release. So, while I agree we don't need it technically but for this
consistency reason is why I even brought it up. I was just looking at it from
the release perspective for what ops folks are going to be looking at when we
get there.)

Replica fetcher thread does not implement any back-off behavior
---

Attachments: KAFKA-1461.patch, KAFKA-1461.patch,
KAFKA-1461_2015-03-11_10:41:26.patch, KAFKA-1461_2015-03-11_18:17:51.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Can I be added as a contributor?

2015-03-11 Thread Joe Stein

Grant, I added your perms for Confluence.

Grayson, I couldn't find a confluence account for you so couldn't give you
perms.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Tue, Mar 10, 2015 at 8:20 AM, Grant Henke ghe...@cloudera.com wrote:

 Thanks Joe. I added a Confluence account.

 On Tue, Mar 10, 2015 at 12:04 AM, Joe Stein joe.st...@stealth.ly wrote:

  Grant, I added you.
 
  Grayson and Grant, you should both also please setup Confluence accounts
  and we can grant perms to Confluence also too for your username.
 
  ~ Joe Stein
  - - - - - - - - - - - - - - - - -
 
http://www.stealth.ly
  - - - - - - - - - - - - - - - - -
 
  On Tue, Mar 10, 2015 at 12:54 AM, Grant Henke ghe...@cloudera.com
 wrote:
 
   I am also starting to work with the Kafka codebase with plans to
  contribute
   more significantly in the near future. Could I also be added to the
   contributor list so that I can assign myself tickets?
  
   Thank you,
   Grant
  
   On Mon, Mar 9, 2015 at 1:39 PM, Guozhang Wang wangg...@gmail.com
  wrote:
  
Added grayson.c...@gmail.com to the list.
   
On Mon, Mar 9, 2015 at 10:41 AM, Grayson Chao
  gc...@linkedin.com.invalid
   
wrote:
   
 Hello Kafka devs,

 I'm working on the ops side of Kafka at LinkedIn (embedded SRE on
 the
 Kafka team) and would like to start familiarizing myself with the
codebase
 with a view to eventually making substantial contributions. Could
 you
 please add me as a contributor to the Kafka JIRA so that I can
 assign
 myself a newbie ticket?

 Thanks!
 Grayson
 --
 Grayson Chao
 Data Infra Streaming SRE

   
   
   
--
-- Guozhang
   
  
  
  
   --
   Grant Henke
   Solutions Consultant | Cloudera
   ghe...@cloudera.com | 920-980-8979
   twitter.com/ghenke http://twitter.com/gchenke |
   linkedin.com/in/granthenke
  
 



 --
 Grant Henke
 Solutions Consultant | Cloudera
 ghe...@cloudera.com | 920-980-8979
 twitter.com/ghenke http://twitter.com/gchenke |
 linkedin.com/in/granthenke

[jira] [Updated] (KAFKA-2006) switch the broker server over to the new java protocol definitions

2015-03-11 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2006:
-
Priority: Major  (was: Blocker)

 switch the broker server over to the new java protocol definitions
 --

 Key: KAFKA-2006
 URL: https://issues.apache.org/jira/browse/KAFKA-2006
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
Assignee: Andrii Biletskyi
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-2006) switch the broker server over to the new java protocol definitions

2015-03-11 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-2006:
-
Assignee: (was: Andrii Biletskyi)

 switch the broker server over to the new java protocol definitions
 --

 Key: KAFKA-2006
 URL: https://issues.apache.org/jira/browse/KAFKA-2006
 Project: Kafka
  Issue Type: Bug
Reporter: Joe Stein
 Fix For: 0.8.3






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-6 - New reassignment partition logic for re-balancing

2015-03-11 Thread Joe Stein

and network intensive process)?

You can automate the reassignment with a line of code that takes the
response and calls --execute if folks want that... I don't think we should
ever link these (or at least not yet) because of the reasons you say. I
think as long as we have a way

If there is anything else I missed please let me know so I can make sure
that the detail gets update so we minimize the back and forth both in
efforts and elapsed time. This was always supposed to be a very small fix
for something that pains A LOT of people and I want to make sure that we
aren't running scope creep on the change but are making sure that folks
understand the motivation behind a new feature.

~ Joe Stein
- - - - - - - - - - - - - - - - -

http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Sun, Mar 8, 2015 at 1:21 PM, Joe Stein joe.st...@stealth.ly wrote:

Jay,

That makes sense. I think what folks are bringing up all sounds great but
I feel can/should be done afterwards as further improvements as the scope
for this change has a very specific focus to resolve problems folks have
today with --generate (with a patch tested and ready to go ). I should be
able to update the KIP this week and followup.

~ Joestein
On Mar 8, 2015 12:54 PM, Jay Kreps jay.kr...@gmail.com wrote:

Hey Joe,

This still seems pretty incomplete. It still has most the sections just
containing the default text you are supposed to replace. It is really hard
to understand what is being proposed and why and how much of the problem
we
are addressing. For example the motivation section just says
operational.

I'd really like us to do a good job of this. I actually think putting the
time in to convey context really matters. For example I think (but can't
really know) that what you are proposing is just a simple fix to the JSON
output of the command line tool. But you can see that on the thread it is
quickly going to spiral into automatic balancing, rack awareness, data
movement throttling, etc.

Just by giving people a fairly clear description of the change and how it
fits into other efforts that could happen in the area really helps keep
things focused on what you want.

-Jay

On Wed, Jan 21, 2015 at 10:11 PM, Joe Stein joe.st...@stealth.ly wrote:

Posted a KIP for --re-balance for partition assignment in reassignment
tool.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-6+-+New+reassignment+partition+logic+for+re-balancing

JIRA https://issues.apache.org/jira/browse/KAFKA-1792

While going through the KIP I thought of one thing from the JIRA that we
should change. We should preserve --generate to be existing
functionality
for the next release it is in. If folks want to use --re-balance then
great, it just won't break any upgrade paths, yet.

/***
Joe Stein
Founder, Principal Consultant
Big Data Open Source Security LLC
http://www.stealth.ly
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/

0.8.3 release plan

2015-03-11 Thread Joe Stein

There hasn't been any public discussion about the 0.8.3 release plan.

There seems to be a lot of work in flight, work with patches and review
that could/should get committed but now just pending KIPS, work without KIP
but that is in trunk already (e.g. the new Consumer) that would be the the
release but missing the KIP for the release...

What does this mean for the 0.8.3 release? What are we trying to get out
and when?

Also looking at
https://cwiki.apache.org/confluence/display/KAFKA/Future+release+plan there
seems to be things we are getting earlier (which is great of course) so are
we going to try to up the version and go with 0.9.0?

0.8.2.0 ended up getting very bloated and that delayed it much longer than
we had originally communicated to the community and want to make sure we
take that feedback from the community and try to improve upon it.

Thanks!

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

[jira] [Updated] (KAFKA-1546) Automate replica lag tuning

2015-03-11 Thread Joe Stein (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Stein updated KAFKA-1546:
-
Fix Version/s: 0.8.3

 Automate replica lag tuning
 ---

 Key: KAFKA-1546
 URL: https://issues.apache.org/jira/browse/KAFKA-1546
 Project: Kafka
  Issue Type: Improvement
  Components: replication
Affects Versions: 0.8.0, 0.8.1, 0.8.1.1
Reporter: Neha Narkhede
Assignee: Aditya Auradkar
  Labels: newbie++
 Fix For: 0.8.3

 Attachments: KAFKA-1546.patch, KAFKA-1546_2015-03-11_18:48:09.patch


 Currently, there is no good way to tune the replica lag configs to 
 automatically account for high and low volume topics on the same cluster. 
 For the low-volume topic it will take a very long time to detect a lagging
 replica, and for the high-volume topic it will have false-positives.
 One approach to making this easier would be to have the configuration
 be something like replica.lag.max.ms and translate this into a number
 of messages dynamically based on the throughput of the partition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1197 matches

Mail list logo