Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-04-30 Thread Jaikiran Pai
I think this will help a lot in contributions. Some of my local changes 
that I want to contribute back have been pending because I sometimes 
switch machines and I then have to go through setting up the Ruby/python 
and other stuff for the current review process. Using just github is 
going to help in quickly submitting the changes.


-Jaikiran
On Thursday 30 April 2015 06:42 PM, Ismael Juma wrote:

Hi all,

Kafka currently uses a combination of Review Board and JIRA for
contributions and code review. In my opinion, this makes contribution and
code review a bit harder than it has to be.

I think the approach used by Spark would improve the current situation:

"Generally, Spark uses JIRA to track logical issues, including bugs and
improvements, and uses Github pull requests to manage the review and merge
of specific code changes. That is, JIRAs are used to describe what should
be fixed or changed, and high-level approaches, and pull requests describe
how to implement that change in the project's source code. For example,
major design decisions are discussed in JIRA."[1]

It's worth reading the wiki page for all the details, but I will summarise
the suggested workflow for code changes:

1. Fork the Github repository at http://github.com/apache/kafka (if you
haven't already)
2. git checkout -b kafka-XXX
3. Make one or more commits (smaller commits can be easier to review and
reviewboard makes that hard)
4. git push origin kafka-XXX
5. Create PR against upstream/trunk (this will update JIRA
automatically[2] and it will send an email to the dev mailing list too)
6. A CI build will be triggered[3]
7. Review process happens on GitHub (it's quite handy to be able to
comment on both commit or PR-level, unlike Review Board)
8. Once all feedback has been addressed and the build is green, a
variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
push, close the PR and JIRA issue. The squashed commit generated by the
script includes a bunch of useful information including links to the
original commits[5] (in the future, I think it's worth reconsidering the
squashing of commits, but retaining the information in the commit is
already an improvement)

Neha merged a couple of commits via GitHub already and it went smoothly
although we are still missing a few of the pieces described above:

1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
we need to request it for Kafka and provide whatever configuration is
needed)
2. Adapting Spark's merge_park_pr script and integrating it into the
kafka Git repository
3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
the Git repository (this is shown when someone is creating a pull request)
4. Go through existing GitHub pull requests and close the ones that are
no longer relevant (there are quite a few as people have been opening them
over the years, but nothing was done about most of them)
5. Other things I may be missing

I am volunteering to help with the above if people agree that this is the
right direction for Kafka. Thoughts?

Best.
Ismael

P.S. I was told in the Apache Infra HipChat that it's not currently
possible (and there are no plans to change that in the near future) to use
the GitHub merge button to merge PRs. The merge script does quite a few
useful things that the merge button does not in any case.

[1] https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
[2]
https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
[3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
[4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
[5]
https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00





[jira] [Commented] (KAFKA-2084) byte rate metrics per client ID (producer and consumer)

2015-04-30 Thread Dong Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522757#comment-14522757
 ] 

Dong Lin commented on KAFKA-2084:
-

Here is my 2 cents. I vote for your 2nd approach: ClientQuotaMetrics2 which 
maintains per-client metrics for a single entity. My argument is that the 
clientQuotaMetrics class doesn't need to distinguish between producer quota or 
consumer quota. This semantics difference is better maintained by the one who 
creates clientQuotaMetrics instance.

> byte rate metrics per client ID (producer and consumer)
> ---
>
> Key: KAFKA-2084
> URL: https://issues.apache.org/jira/browse/KAFKA-2084
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Aditya Auradkar
>Assignee: Aditya Auradkar
> Attachments: KAFKA-2084.patch, KAFKA-2084_2015-04-09_18:10:56.patch, 
> KAFKA-2084_2015-04-10_17:24:34.patch, KAFKA-2084_2015-04-21_12:21:18.patch, 
> KAFKA-2084_2015-04-21_12:28:05.patch
>
>
> We need to be able to track the bytes-in/bytes-out rate on a per-client ID 
> basis. This is necessary for quotas.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-04-30 Thread Neha Narkhede
Thanks a bunch for taking this up, Ismael!

+1, I think it will be much more convenient to sunset RB and move to
github. Especially looking forward to the CIs on PRs and also the merge
script.

Alas, my wonderful patch-review script will be retired :-)

On Thu, Apr 30, 2015 at 6:12 AM, Ismael Juma  wrote:

> Hi all,
>
> Kafka currently uses a combination of Review Board and JIRA for
> contributions and code review. In my opinion, this makes contribution and
> code review a bit harder than it has to be.
>
> I think the approach used by Spark would improve the current situation:
>
> "Generally, Spark uses JIRA to track logical issues, including bugs and
> improvements, and uses Github pull requests to manage the review and merge
> of specific code changes. That is, JIRAs are used to describe what should
> be fixed or changed, and high-level approaches, and pull requests describe
> how to implement that change in the project's source code. For example,
> major design decisions are discussed in JIRA."[1]
>
> It's worth reading the wiki page for all the details, but I will summarise
> the suggested workflow for code changes:
>
>1. Fork the Github repository at http://github.com/apache/kafka (if you
>haven't already)
>2. git checkout -b kafka-XXX
>3. Make one or more commits (smaller commits can be easier to review and
>reviewboard makes that hard)
>4. git push origin kafka-XXX
>5. Create PR against upstream/trunk (this will update JIRA
>automatically[2] and it will send an email to the dev mailing list too)
>6. A CI build will be triggered[3]
>7. Review process happens on GitHub (it's quite handy to be able to
>comment on both commit or PR-level, unlike Review Board)
>8. Once all feedback has been addressed and the build is green, a
>variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
>push, close the PR and JIRA issue. The squashed commit generated by the
>script includes a bunch of useful information including links to the
>original commits[5] (in the future, I think it's worth reconsidering the
>squashing of commits, but retaining the information in the commit is
>already an improvement)
>
> Neha merged a couple of commits via GitHub already and it went smoothly
> although we are still missing a few of the pieces described above:
>
>1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
>we need to request it for Kafka and provide whatever configuration is
>needed)
>2. Adapting Spark's merge_park_pr script and integrating it into the
>kafka Git repository
>3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
>the Git repository (this is shown when someone is creating a pull
> request)
>4. Go through existing GitHub pull requests and close the ones that are
>no longer relevant (there are quite a few as people have been opening
> them
>over the years, but nothing was done about most of them)
>5. Other things I may be missing
>
> I am volunteering to help with the above if people agree that this is the
> right direction for Kafka. Thoughts?
>
> Best.
> Ismael
>
> P.S. I was told in the Apache Infra HipChat that it's not currently
> possible (and there are no plans to change that in the near future) to use
> the GitHub merge button to merge PRs. The merge script does quite a few
> useful things that the merge button does not in any case.
>
> [1]
> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
> [2]
>
> https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
> [3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
> [4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
> [5]
>
> https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00
>



-- 
Thanks,
Neha


Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Don Bosco Durai

>>Gwen << regarding additional authorizers

>I think having these in the system tests duals as both good confidence in
>language independency of the changes. It also makes sure that when we
>release that we don't go breaking Sentry or Ranger or anyone else that
>wants to integate.

As much I would like to have Ranger included in the Kafka system tests, I
don’t think it is the right thing to do. The interface and default Kafka
implementation should be final contract. This also ensures that there are
no 3rd party dependencies within Kafka.

>>Jun << Could you elaborate on why we should not store JSON in ZK? So
>>far, all existing ZK data are in JSON.

>>If I have 1,000,000 users in LDAP and 150 get access to Kafka topics
>>through this mechanism then I have to go and parse and push all of my
>>change into zookeeper for it to take affect?

I assume, the entries in the zookeeper will be limited based on the number
of users who will be accessing and the number topics they have permission.
So in your case, for one topic there might be only 150 entries. Or 1
entry, with an array of 150 elements (users).  I don’t know whether this
will be a limitation on the Zookeeper side or Kafka implementation. Based
on the discssion, each broker is going to store this in memory and on
regular interval refresh it. So the interaction with ZooKeeper will be
also limited. I hope this won’t be a huge burden.

>If someone wanted to implement SAML I don't think this would work. Not
>sure how it would work with NiFi either (something around here I think
>maybe 
>https://git-wip-us.apache.org/repos/asf?p=incubator-nifi.git;a=blb;f=nar-
>bundles/framework-bundle/framework/web/websecurity/src/main/java/org/apach
>e/nifi/web/security/authorization/NiFiAuthorizationService.java;hb=e67eb4f
>5).

I would put SAML and any other token based solution (OAuth/OpenId) in the
bucket of Authentication. These would be beyond the scope of the this
(authorization) KIP and should be parallel to the Authentication KIPs. If
any of these solution also provide Authorization capability, then it would
require another (SAML) Authorization implementation of the current
interface. The flow I guess will be, during authentication with SAML, get
the roles from the payload and store it in the session context and during
authorization, get the roles (or privileges) from the session context and
enforce it. 

I hope this addresses some of your concerns.

Thanks

Bosco





On 4/30/15, 11:24 AM, "Joe Stein"  wrote:

>Gwen << regarding additional authorizers
>
>I think having these in the system tests duals as both good confidence in
>language independency of the changes. It also makes sure that when we
>release that we don't go breaking Sentry or Ranger or anyone else that
>wants to integrate.
>
>Gwen << Regarding "AuthorizationException
>
>Yeah so I have two issues. The one you raised yes, 100%. Also I don't
>unerstand how that is not a broker wire protocol response and only a JVM
>exception.
>
>Jun << Could you elaborate on why we should not store JSON in ZK? So far,
>all existing ZK data are in JSON.
>
>If I have 1,000,000 users in LDAP and 150 get access to Kafka topics
>through this mechanism then I have to go and parse and push all of my
>changes into zookeeper for it to take affect?
>
>If someone wanted to implement SAML I don't think this would work. Not
>sure
>how it would work with NiFi either (something around here I think maybe
>https://git-wip-us.apache.org/repos/asf?p=incubator-nifi.git;a=blob;f=nar-
>bundles/framework-bundle/framework/web/web-security/src/main/java/org/apac
>he/nifi/web/security/authorization/NiFiAuthorizationService.java;hb=e67eb4
>f5>).
>
>Parth << All the open issues already have a resolution , I can open a jira
>for each one and add the resolution to it and resolve them immediately if
>you want thisfor tracking purposes.
>
>Are those inline to the question with the   I didn't quite get
>that
>section at all. If the open questions are answered then they aren't open
>can you tidy that up then.
>
>Parth <<  We will update system tests to verify that the code works. We
>have thorough unit tests for all the new code except for modificaions
>made
>to KafkaAPI as that has way too many dependencies to be mocked which I
>guess is the reason for no existing unit tests.
>
>Can you update the KIP with some more detail about that please.
>
>Parth << I don’t know if I completely understand the concern. We have
>talked with Ranger team (Don Bosco Durai) so we at least have one custom
>authorizer implementation that as approved this design and they will be
>able to inject their authorization framework with current interfaces. Do
>you see any issue with the design which will prevent anyone frm providing
>a custom implementation?
>
>Maybe a diagram for all of the different parts interacting. I still don't
>get why there are no wire protocol changes and just change in the JVM.
>What
>do non-jvm clients do and how do they work with Kafka. Very confusing,

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Suresh Srinivas
Gwen,

Thanks for the clarification.

My objection is, we should not do it just because of the reason that
databases have always done it this way. May be there is a history
there that might have forced a choice like that. That has led to
other DBs to comply with it. Kafka is a different system. Let's do 
what is the correct thing to do.

I also think it is not clear what users want here. But as an API developer
I would want error conditions to be correctly identified so that 
supportability of the product does not suffer.

Today in HDFS (for that matter Hadoop in general), the error conditions
are clearly identified, such as:
- Object you are trying to access does not exist
- You do not have permission to access the object
- The operation you are trying to do is invalid

Here are some error codes that Amazon Kinesis support describing the
failure/error conditions clearly:
http://docs.aws.amazon.com/kinesis/latest/APIReference/CommonErrors.html

From: Gwen Shapira 
Sent: Thursday, April 30, 2015 6:05 PM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-11- Authorization design for kafka security

I think Kafka's behavior should be driven by what users want. My only
indication to what they may want is what we were forced to fix in similar
cases. This is why I am advocating this behavior.

I agree that this is a minor point that should not be blocking the vote. I
already gave my non-binding +1 and thats the best I can do to drive this
forward.

If this vote passes without the behavior I believe is the right one, I will
create a follow up JIRA. However, since we are still in a discussion and
since both options are trivial to implement - why exactly are you objecting
to Kafka behaving more like a DB in this scenario?

Gwen



On Thu, Apr 30, 2015 at 5:54 PM, Suresh Srinivas 
wrote:

> It is a strange choice to return "does not exist" when the condition is
> actually "not authorized". I have hard time understanding why that is
> better for security. Perhaps in DB world this is expected and changes may
> be necessary to comply with such behavior. But that should not guide what
> we do in Kafka.
>
> This is a voting thread for an important feature. Security is the number
> one feature that our users are asking for. Can't minor things like this be
> done in a follow up jiras? Should the focus be brought back to voting?
>
> Btw since I am new to the Kafka community, is there a period when voting
> thread needs to wrap up by? Other projects generally follow 3 or 7 days.
>
> Regards,
> Suresh
>
> Sent from phone
>
> _
> From: Gwen Shapira mailto:gshap...@cloudera.com>>
> Sent: Thursday, April 30, 2015 5:32 PM
> Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
> To: mailto:dev@kafka.apache.org>>
>
>
> On Thu, Apr 30, 2015 at 4:39 PM, Parth Brahmbhatt <
> pbrahmbh...@hortonworks.com> wrote:
>
> > Hi Joe,
> >
> > Let me clarify on authZException. The caller gets a 403 regardless of
> > existence of the topic, even if the topic does not exist you always get
> > 403. This will fall under the case wherewe do not find any acls for a
> > resource and as per our last decision by default we are going to deny
> this
> > request.
> >
>
> The reason I'm digging into this is that in Hive we had to fix existing
> behavior after financial customers objected loudly to getting "insufficient
> privileges" when a real database would return "table does not exist".
>
> I completely agree that having to handle two separate error conditions
> ("TopicNotExist" if user doesn't have READ, unless user has CREATE in which
> case he can see all topics and can get "Unauthorized") adds complexity and
> will not be fun to debug. However, when implementing security, a lot of the
> stuff we do is around making customers pass security audits, and I suspect
> that "can't know that tables even exist" test is a thing.
>
> We share pretty much the same financial customers and they seem to have the
> same concerns. Perhaps you can double check if you also have this
> requirement?
>
> (and again, sorry for not seeing this earlier and holding up the vote on
> what seems like a minor point. I just don't want to punt for later
> something when we already have an idea of what customers expect)
>
> Gwen
>
>
>
> >
> > The configurations are listed explicitly here
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+In
> > terface#KIP-11-AuthorizatiInterface-Changestoexistingclasses under
> > KafkaConfig. We may add an optional config to allow authorizer to read an
> > arbitrary property files incrementally but that does not need to be part
> > of this same KIP.
> >
> > The statement “If we can't audit the access then wht good is controlling
> > the access?” seems extreme because we still get to control the access
> > which IMHO is a huge win. The default authorizer implementation right now
> > logs every allowed/denied 

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Gwen Shapira
I think Kafka's behavior should be driven by what users want. My only
indication to what they may want is what we were forced to fix in similar
cases. This is why I am advocating this behavior.

I agree that this is a minor point that should not be blocking the vote. I
already gave my non-binding +1 and thats the best I can do to drive this
forward.

If this vote passes without the behavior I believe is the right one, I will
create a follow up JIRA. However, since we are still in a discussion and
since both options are trivial to implement - why exactly are you objecting
to Kafka behaving more like a DB in this scenario?

Gwen



On Thu, Apr 30, 2015 at 5:54 PM, Suresh Srinivas 
wrote:

> It is a strange choice to return "does not exist" when the condition is
> actually "not authorized". I have hard time understanding why that is
> better for security. Perhaps in DB world this is expected and changes may
> be necessary to comply with such behavior. But that should not guide what
> we do in Kafka.
>
> This is a voting thread for an important feature. Security is the number
> one feature that our users are asking for. Can't minor things like this be
> done in a follow up jiras? Should the focus be brought back to voting?
>
> Btw since I am new to the Kafka community, is there a period when voting
> thread needs to wrap up by? Other projects generally follow 3 or 7 days.
>
> Regards,
> Suresh
>
> Sent from phone
>
> _
> From: Gwen Shapira mailto:gshap...@cloudera.com>>
> Sent: Thursday, April 30, 2015 5:32 PM
> Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
> To: mailto:dev@kafka.apache.org>>
>
>
> On Thu, Apr 30, 2015 at 4:39 PM, Parth Brahmbhatt <
> pbrahmbh...@hortonworks.com> wrote:
>
> > Hi Joe,
> >
> > Let me clarify on authZException. The caller gets a 403 regardless of
> > existence of the topic, even if the topic does not exist you always get
> > 403. This will fall under the case wherewe do not find any acls for a
> > resource and as per our last decision by default we are going to deny
> this
> > request.
> >
>
> The reason I'm digging into this is that in Hive we had to fix existing
> behavior after financial customers objected loudly to getting "insufficient
> privileges" when a real database would return "table does not exist".
>
> I completely agree that having to handle two separate error conditions
> ("TopicNotExist" if user doesn't have READ, unless user has CREATE in which
> case he can see all topics and can get "Unauthorized") adds complexity and
> will not be fun to debug. However, when implementing security, a lot of the
> stuff we do is around making customers pass security audits, and I suspect
> that "can't know that tables even exist" test is a thing.
>
> We share pretty much the same financial customers and they seem to have the
> same concerns. Perhaps you can double check if you also have this
> requirement?
>
> (and again, sorry for not seeing this earlier and holding up the vote on
> what seems like a minor point. I just don't want to punt for later
> something when we already have an idea of what customers expect)
>
> Gwen
>
>
>
> >
> > The configurations are listed explicitly here
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+In
> > terface#KIP-11-AuthorizatiInterface-Changestoexistingclasses under
> > KafkaConfig. We may add an optional config to allow authorizer to read an
> > arbitrary property files incrementally but that does not need to be part
> > of this same KIP.
> >
> > The statement “If we can't audit the access then wht good is controlling
> > the access?” seems extreme because we still get to control the access
> > which IMHO is a huge win. The default authorizer implementation right now
> > logs every allowed/denied access (see here
> >
> https://github.com/Parth-Brahmbhatt/kafka/blob/KAFKA-1688-impl/core/src/mai
> > n/scala/kafka/security/auth/SimpleAclAthorizer.scala) in debug mode.
> > Anybody who needs auditing could create a lo4j appender to allow debug
> > access to this class and send the log output to some audit fil.
> >
> > Auditing is still a separate piece, we could either add an auditor
> > interface that wraps authorizer or the other way around so authorizer and
> > auditor can be two separate implementation. I woud love to start a new
> > KIP and jira to discuss approaches in more details but I don’t see the
> > need to hold up Authorization work for the same.
> >
> > I don’t agree with the “this design seems too specific” given we already
> > have 3 implementation (default, ranger, sentry) that can be supported
> with
> > the current design.
> >
> > The authorization happens as part of handle and it is the first action,
> > see here
> >
> https://github.com/Parth-Brahmbhatt/kafka/blob/KAFKA-1688-impl/core/src/mai
> > n/scala/kafka/server/KafkaApis.scala#L103 for one example.
> >
> > Thanks
> > Parth
> >
> >
> >
> > On 4/30/15, 4:24

[jira] [Commented] (KAFKA-2145) An option to add topic owners.

2015-04-30 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522578#comment-14522578
 ] 

Neelesh Srinivas Salian commented on KAFKA-2145:


[~parth.brahmbhatt]

I had a few additional questions:
1) Does the patch mean simply adding a field for ownername as String?
2) I feel the dependencies for such would extend to each topic and be 
associated with the Topic generation as well.
3) TopicCommand.scala has describeTopic, alterTopic and createTopic methods 
that could use a owner tag.

Thanks.



> An option to add topic owners. 
> ---
>
> Key: KAFKA-2145
> URL: https://issues.apache.org/jira/browse/KAFKA-2145
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Parth Brahmbhatt
>Assignee: Neelesh Srinivas Salian
>
> We need to expose a way so users can identify users/groups that share 
> ownership of topic. We discussed adding this as part of 
> https://issues.apache.org/jira/browse/KAFKA-2035 and agreed that it will be 
> simpler to add owner as a logconfig. 
> The owner field can be used for auditing and also by authorization layer to 
> grant access without having to explicitly configure acls. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2132) Move Log4J appender to clients module

2015-04-30 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522575#comment-14522575
 ] 

Gwen Shapira commented on KAFKA-2132:
-

+1 (non-binding). Looks good to me.

[~jkreps] [~joestein] - mind taking a look and committing if you agree?

> Move Log4J appender to clients module
> -
>
> Key: KAFKA-2132
> URL: https://issues.apache.org/jira/browse/KAFKA-2132
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Ashish K Singh
> Attachments: KAFKA-2132.patch, KAFKA-2132_2015-04-27_19:59:46.patch, 
> KAFKA-2132_2015-04-30_12:22:02.patch, KAFKA-2132_2015-04-30_15:53:17.patch
>
>
> Log4j appender is just a producer.
> Since we have a new producer in the clients module, no need to keep Log4J 
> appender in "core" and force people to package all of Kafka with their apps.
> Lets move the Log4jAppender to clients module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33614: Patch for KAFKA-2132

2015-04-30 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33614/#review82231
---

Ship it!


Ship It!

- Gwen Shapira


On April 30, 2015, 10:53 p.m., Ashish Singh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33614/
> ---
> 
> (Updated April 30, 2015, 10:53 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2132
> https://issues.apache.org/jira/browse/KAFKA-2132
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2132: Move Log4J appender to clients module
> 
> 
> Diffs
> -
> 
>   build.gradle fef515b3b2276b1f861e7cc2e33e74c3ce5e405b 
>   checkstyle/import-control.xml f2e6cec267e67ce8e261341e373718e14a8e8e03 
>   core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala 
> 5d36a019e3dbfb93737a9cd23404dcd1c5d836d1 
>   core/src/test/scala/unit/kafka/log4j/KafkaLog4jAppenderTest.scala 
> 41366a14590d318fced0e83d6921d8035fa882da 
>   
> log4j-appender/src/main/java/org/apache/kafka/log4jappender/KafkaLog4jAppender.java
>  PRE-CREATION 
>   
> log4j-appender/src/test/java/org/apache/kafka/log4jappender/KafkaLog4jAppenderTest.java
>  PRE-CREATION 
>   
> log4j-appender/src/test/java/org/apache/kafka/log4jappender/MockKafkaLog4jAppender.java
>  PRE-CREATION 
>   settings.gradle 83f764e6a4a15a5fdba232dce74a369870f26b45 
> 
> Diff: https://reviews.apache.org/r/33614/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Ashish Singh
> 
>



Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Don Bosco Durai
Joe, these are good use cases, however in the firt phase the granularity
is at the Topic (your e.g. bucket) level and not what you are accessing
within the Topic. So in your use case, if you don’t have access to “Bucket
A”, then you won’t know who is in it, so you won’t know “Alice” or anyone
who as “X”.

The use case here, there is a HL7 topic with specific for “New Patients”,
then only users “A,B or C” can publish to it and only users “X, Y o Z”
can consume from it. In addition, only admin users “P, Q and R” can manage
the topic permissions.

I feel, keeping it simple should be good enough for the first phase.

Thanks

Bosco



On 4/30/15, 3:59 PM, "Joe Stein"  wrote:

>If you have bucket A and Bucket B and in Bucket A there are patients with
>"Disease X" and Bucket B patients without "Disease X".
>
>Now you try to access "Alice" from bucket A and you get a 403  and then
>from Bucket "B" you get a 404.
>
>What does that tell you now about Alice? Yup, she has "Disease X".
>
>Uniform none existence is a good policy for protecting data. If you don't
>have permission then 404 not found works too.
>
>The context that I thought that applied with this discussion is because I
>thought the authorization module was going to be a bit more integration
>where the api responses were happening
>
>~ Joe Stein
>- - - - - - - - - - - - - - - - -
>
>  http://www.stealth.ly
>- - - - - - - - - - - - - - - - -
>
>On Thu, Apr 30, 2015 at 6:51 PM, Suresh Srinivas 
>wrote:
>
>> Comment on AuthorizationException. I think the intent of exception
>>should
>> be to capture why a request is rejected. It is important from API
>> perspective to be specific to aid debugging. Having a generic or
>>obfuscated
>> exception is not very useful. Does someone on getting an exceptionreach
>> out to an admin to understand if a topic exists or it's an authorization
>> issue?
>>
>> I am not getting the security concern. System must be ensure disallowing
>> the access by implementing the security correctly. Not based on
>>security by
>> obscurity.
>>
>> Regards,
>> Suresh
>>
>> Sent from phone
>>
>> _
>> From: Gwen Shapira mailto:gshap...@cloudera.com>>
>> Sent: Thursday, April 30, 2015 10:14 AM
>> Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
>> To: mailto:dev@kafka.apache.org>>
>>
>>
>> * Regarding additional authorizers:
>> Prasad, who is a PMC on Apache Sentry reviewed the design and confirmed
>> Sentry can integrate with the current APIs. Dapeng Sun, a committer on
>> Sentry had sme concerns about the IP privileges and how we prioritize
>> privileges - but nothing that prevents Sentry from integrating with the
>> existing solution, from what I coul see. It seems to me that the
>>design is
>> very generic and adapters can be written for other authorization systems
>> (after all, you just need to implement setACL, getACL and Authorize -
>>all
>> pretty basic), although I can't speak for Oracle's Identity Manager
>> specifically.
>>
>> * Regarding "AuthorizationException to indicate that anoperation was
>>not
>> authorized": Sorry I missed this in previous reviewed, but now that I
>>look
>> at it - Many systems intentionally don't return AuthorizationException
>>when
>> READ privilege is missing, since this already gives too much information
>> (that the topic exists and that you don't have privileges on it).
>>Instead
>> they return a variant of "doesn't exist. I'm wondering if this
>>approach is
>> applicable / desirable for Kafka as well.
>> Note that this doesn't remove the need for AuthorizationException - I'm
>> just suggesting apossible refinement on its use.
>>
>> Gwen
>>
>>
>>
>> On Thu, Apr 30, 2015 at 9:52 AM, Parth Brahmbhatt <
>> pbrahmbh...@hortonworks.com> wrote:
>>
>> > Hi Joe, Thanks for taking the time to review.
>> >
>> > * All the open issues already have a resolution , I can open a jira
>>for
>> > each one and add the resolution to it and resolve them immediately if
>>you
>> > want this for tracking purposes.
>> > * We will update system tests to verify that the code woks. We have
>> > thorough unit tests for all the new code except for modifications
>>made to
>> > KafkaAPI as that has way too many dependencies to be mocked which I
>>guess
>> > is the reason for no existing unit tests.
>> > * I don’t know if I completely understand the concern. We have talked
>> with
>> > Ranger team (Don Bosco Durai) so we at least have one custom
>>authorizer
>> > implementation that has approved this design and they will be able to
>> > inject their authorization framework with current interfaces. Do you
>>see
>> > any issue with the design which will prevent anyone from providing a
>> > custom implementation?
>> > * Did not understand the concern around wire protocol, we are adding
>> > AuthorizationException to indicate that an operation was not
>>authorized.
>> >
>> > Thanks
>> > Parth
>> >
>> > On 4/30/15, 5:59 AM, "Jun Rao"
>>mailto:j...@confluent.io

[jira] [Commented] (KAFKA-1928) Move kafka.network over to using the network classes in org.apache.kafka.common.network

2015-04-30 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522567#comment-14522567
 ] 

Gwen Shapira commented on KAFKA-1928:
-

Updated reviewboard https://reviews.apache.org/r/33065/diff/
 against branch trunk

> Move kafka.network over to using the network classes in 
> org.apache.kafka.common.network
> ---
>
> Key: KAFKA-1928
> URL: https://issues.apache.org/jira/browse/KAFKA-1928
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Jay Kreps
>Assignee: Gwen Shapira
> Attachments: KAFKA-1928.patch, KAFKA-1928_2015-04-28_00:09:40.patch, 
> KAFKA-1928_2015-04-30_17:48:33.patch
>
>
> As part of the common package we introduced a bunch of network related code 
> and abstractions.
> We should look into replacing a lot of what is in kafka.network with this 
> code. Duplicate classes include things like Receive, Send, etc. It is likely 
> possible to also refactor the SocketServer to make use of Selector which 
> should significantly simplify it's code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1928) Move kafka.network over to using the network classes in org.apache.kafka.common.network

2015-04-30 Thread Gwen Shapira (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gwen Shapira updated KAFKA-1928:

Attachment: KAFKA-1928_2015-04-30_17:48:33.patch

> Move kafka.network over to using the network classes in 
> org.apache.kafka.common.network
> ---
>
> Key: KAFKA-1928
> URL: https://issues.apache.org/jira/browse/KAFKA-1928
> Project: Kafka
>  Issue Type: Sub-task
>  Components: security
>Reporter: Jay Kreps
>Assignee: Gwen Shapira
> Attachments: KAFKA-1928.patch, KAFKA-1928_2015-04-28_00:09:40.patch, 
> KAFKA-1928_2015-04-30_17:48:33.patch
>
>
> As part of the common package we introduced a bunch of network related code 
> and abstractions.
> We should look into replacing a lot of what is in kafka.network with this 
> code. Duplicate classes include things like Receive, Send, etc. It is likely 
> possible to also refactor the SocketServer to make use of Selector which 
> should significantly simplify it's code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33065: Patch for KAFKA-1928

2015-04-30 Thread Gwen Shapira

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33065/
---

(Updated May 1, 2015, 12:48 a.m.)


Review request for kafka.


Bugs: KAFKA-1928
https://issues.apache.org/jira/browse/KAFKA-1928


Repository: kafka


Description (updated)
---

first pass on replacing Send


implement maxSize and improved docs


Merge branch 'trunk' of http://git-wip-us.apache.org/repos/asf/kafka into 
KAFKA-1928-v2


Merge branch 'trunk' of http://git-wip-us.apache.org/repos/asf/kafka into 
KAFKA-1928-v2

Conflicts:
core/src/main/scala/kafka/network/RequestChannel.scala

moved selector out of abstract thread


mid-way through putting selector in SocketServer


Merge branch 'trunk' of http://git-wip-us.apache.org/repos/asf/kafka into 
KAFKA-1928-v2

Also, SocketServer is now using Selector. Stil a bit messy - but all tests pass.

Merge branch 'trunk' of http://git-wip-us.apache.org/repos/asf/kafka into 
KAFKA-1928-v2


renamed requestKey to connectionId to reflect new use and changed type from Any 
to String


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/clients/ClusterConnectionStates.java 
da76cc257b4cfe3c4bce7120a1f14c7f31ef8587 
  clients/src/main/java/org/apache/kafka/clients/InFlightRequests.java 
936487b16e7ac566f8bdcd39a7240ceb619fd30e 
  clients/src/main/java/org/apache/kafka/clients/KafkaClient.java 
1311f85847b022efec8cb05c450bb18231db6979 
  clients/src/main/java/org/apache/kafka/clients/NetworkClient.java 
b7ae595f2cc46e5dfe728bc3ce6082e9cd0b6d36 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
 e55ab11df4db0b0084f841a74cbcf819caf780d5 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/Fetcher.java 
ef9dd5238fbc771496029866ece1d85db6d7b7a5 
  clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java 
b2db91ca14bbd17fef5ce85839679144fff3f689 
  clients/src/main/java/org/apache/kafka/common/network/ByteBufferReceive.java 
129ae827bccbd982ad93d56e46c6f5c46f147fe0 
  clients/src/main/java/org/apache/kafka/common/network/ByteBufferSend.java 
c8213e156ec9c9af49ee09f5238492318516aaa3 
  clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java 
fc0d168324aaebb97065b0aafbd547a1994d76a7 
  clients/src/main/java/org/apache/kafka/common/network/NetworkSend.java 
68327cd3a734fd429966d3e2016a2488dbbb19e5 
  clients/src/main/java/org/apache/kafka/common/network/Receive.java 
4e33078c1eec834bd74aabcb5fc69f18c9d6d52a 
  clients/src/main/java/org/apache/kafka/common/network/Selectable.java 
b5f8d83e89f9026dc0853e5f92c00b2d7f043e22 
  clients/src/main/java/org/apache/kafka/common/network/Selector.java 
57de0585e5e9a53eb9dcd99cac1ab3eb2086a302 
  clients/src/main/java/org/apache/kafka/common/network/Send.java 
5d321a09e470166a1c33639cf0cab26a3bce98ec 
  clients/src/main/java/org/apache/kafka/common/requests/RequestSend.java 
27cbf390c7f148ffa8c5abc154c72cbf0829715c 
  clients/src/main/java/org/apache/kafka/common/requests/ResponseSend.java 
PRE-CREATION 
  clients/src/test/java/org/apache/kafka/clients/MockClient.java 
5e3fab13e3c02eb351558ec973b949b3d1196085 
  clients/src/test/java/org/apache/kafka/clients/NetworkClientTest.java 
8b278892883e63899b53e15efb9d8c926131e858 
  clients/src/test/java/org/apache/kafka/common/network/SelectorTest.java 
d5b306b026e788b4e5479f3419805aa49ae889f3 
  clients/src/test/java/org/apache/kafka/test/MockSelector.java 
ea89b06a4c9e5bb351201299cd3037f5226f0e6c 
  core/src/main/scala/kafka/admin/ConsumerGroupCommand.scala 
1c3b3802ac221d570e7610458e50518b4499e7ed 
  core/src/main/scala/kafka/api/ConsumerMetadataRequest.scala 
a3b1b78adb760eaeb029466b54f335a29caf3b0f 
  core/src/main/scala/kafka/api/ControlledShutdownRequest.scala 
fe81635c864cec03ca1d4681c9c47c3fc4f975ee 
  core/src/main/scala/kafka/api/FetchRequest.scala 
b038c15186c0cbcc65b59479324052498361b717 
  core/src/main/scala/kafka/api/FetchResponse.scala 
75aaf57fb76ec01660d93701a57ae953d877d81c 
  core/src/main/scala/kafka/api/LeaderAndIsrRequest.scala 
431190ab94afc4acfc14348a1fc720e17c071cea 
  core/src/main/scala/kafka/api/OffsetCommitRequest.scala 
cf8e6acc426aef6eb19d862bf6a108a5fc37907a 
  core/src/main/scala/kafka/api/OffsetFetchRequest.scala 
67811a752a470bf9bdbc8c5419e8d6e20a006169 
  core/src/main/scala/kafka/api/OffsetRequest.scala 
3d483bc7518ad76f9548772522751afb4d046b78 
  core/src/main/scala/kafka/api/ProducerRequest.scala 
570b2da1d865086f9830aa919a49063abbbe574d 
  core/src/main/scala/kafka/api/StopReplicaRequest.scala 
5e14987c990fe561c01dac2909f5ed21a506e038 
  core/src/main/scala/kafka/api/TopicMetadataRequest.scala 
363bae01752318f3849242b97a6619747697c1d9 
  core/src/main/scala/kafka/api/UpdateMetadataRequest.scala 
69f0397b187a737b4ddf50e390d3c2f418ce6b5d 
  core/src/main/scala/kafka/client/ClientUtils.scala 
62394c0d3813f19a443cf862c8bc6c

[jira] [Created] (KAFKA-2162) Kafka Auditing functionality

2015-04-30 Thread Sriharsha Chintalapani (JIRA)
Sriharsha Chintalapani created KAFKA-2162:
-

 Summary: Kafka Auditing functionality
 Key: KAFKA-2162
 URL: https://issues.apache.org/jira/browse/KAFKA-2162
 Project: Kafka
  Issue Type: Bug
Reporter: Sriharsha Chintalapani
Assignee: Parth Brahmbhatt


During Kafka authorization  discussion thread . There was concerns raised about 
not having Auditing. Auditing is important functionality but its not part of 
authorizer. This jira will track adding audit functionality to kafka.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-12 - Kafka Sasl/Kerberos implementation

2015-04-30 Thread Joel Koshy
Just went through this thread. I'm on-board with this as well.

@Gwen - yes at LinkedIn we do need to support both
authenticated/unauthenticated users on the same Kafka cluster because
we cannot switch all clients simultaneously. I would be surprised if
this is unique to LinkedIn.  Also, I think the multi-port approach
which you did in KAFKA-1809 is very helpful in that it makes it
feasible for us to add and/or deprecate options.  For e.g., in order
to address the concerns that Gari raised - i.e., to do full-fledged
pluggable SASL support, even though that would likely need significant
code changes in the Channel abstraction, it makes it is feasible to
migrate clients over gradually.

@Sriharsha - there was a comment earlier in the thread on support for
BlockingChannel. I think this also came up at the last KIP hangout -
i.e., ideally we should move over to NetworkClient everywhere
including controller-broker communication.

Thanks,

Joel

On Wed, Apr 29, 2015 at 10:17:52AM -0700, Gwen Shapira wrote:
> Perfect.
> 
> Can't wait to review the patch :)
> 
> On Wed, Apr 29, 2015 at 10:14 AM, Sriharsha Chintalapani 
> wrote:
> 
> > Hi Gwen,
> >  Your understanding is right :) , sorry about the confusion.
> > " If I understand your new suggestion correctly, we will control all
> > options
> >
> > as separate ports. i.e. a port for each of those:
> > * SSL non-authenticated (or authenticated with SSL keys)
> > * SSL + SASL (which is what Hadoop does?)
> > * PLAINTEXT non-authenticated
> > * PLAINTEXT + SASL "
> >
> > Yes we will control this as separate ports
> >
> > 1) * PLAINTEXT non-authenticated
> >
> >  In this case it will be exactly like existing behavior which is
> > no-auth and no-encryption. This will be the default behavior.
> > 2)  SSL  authenticated with SSL keys
> >   SSL  implementation. Authenticated principal in the session will be
> > from the certificate presented or the peer host. This will be like regular
> > ssl authentication.
> >
> > 3) * PLAINTEXT + SASL
> >   SASL authentication will be used over plaintext channel. Once the
> > sasl authentication established between client and server . Session will
> > have client’s principal as authenticated user. There won’t be any wire
> > encryption in this case as all the channel communication will be over plain
> > text .
> >
> > 4)  SSL + SASL
> >   SSL will be established initially and  SASL authentication will be
> > done over SSL. Once SASL authentication is established users principal will
> > be used as authenticated user .
> > This option is useful if users want to use SASL authentication ( for
> > example kerberos ) with wire encryption.  HDFS is not using this option yet
> > but their recommendation is to use this option as using SASL for wire
> > encryption resulted in performance degradation .
> >
> >
> > Thanks,
> > Harsha
> >
> >
> > On April 29, 2015 at 9:58:56 AM, Gwen Shapira (gshap...@cloudera.com)
> > wrote:
> >
> > Just to make sure we are on the same page:
> >
> > After yesterday's call, it sounded like we have two orthogonal decisions:
> > 1) Do we want the channel to implement SSL or PLAINTEXT for communication
> > 2) Do we want to authenticate with Kerberos or not?
> >
> > I didn't think of this distinction, but it makes perfect sense and I'm
> > glad
> > you made it.
> >
> > KIP-12 as-of-yesterday (we need revision numbers...), said that decision
> > #1
> > will be controlled by port, while decision #2 will be controlled with a
> > configuration flag. The problem of configuration-flag is that it will
> > apply
> > to everyone, and there was a requirement (LinkedIn, I think?) to support
> > both authenticated and non-authenticated users on same Kafka. (probably
> > because they can't modify all clients at once).
> >
> > If I understand your new suggestion correctly, we will control all options
> > as separate ports. i.e. a port for each of those:
> > * SSL non-authenticated (or authenticated with SSL keys)
> > * SSL + SASL (which is what Hadoop does?)
> > * PLAINTEXT non-authenticated
> > * PLAINTEXT + SASL
> >
> > Did I get it right? If I did, than +1. If I didn't, I hope you can explain
> > a bit more :)
> >
> > Gwen
> >
> >
> > On Wed, Apr 29, 2015 at 8:50 AM, Sriharsha Chintalapani 
> > wrote:
> >
> > > I wasn't sure what the "security protocol" in the endpoint now stands
> > for.
> > > Is that the transport protocol? Or is that really a choice between
> > > secure/insecure with parameters for secure transport specified
> > separately?
> > > Would inter-broker transport properties be configurable?
> > > My thinking is to have to two protocols PLAINTEXT and SSL and
> > > users can provide config if they want to use sasl authentication on top
> > of
> > > these protocols. After yesterdays’ KIP Hangout Gwen pointed out users
> > might
> > > want to run just a PLAINTEXT channel on one port ,to which users will
> > have
> > > “ANONYMOUS” access and another port they would run “PLAINTEXT with SASL
> > > aut

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Gwen Shapira
On Thu, Apr 30, 2015 at 4:39 PM, Parth Brahmbhatt <
pbrahmbh...@hortonworks.com> wrote:

> Hi Joe,
>
> Let me clarify on authZException. The caller gets a 403 regardless of
> existence of the topic, even if the topic does not exist you always get
> 403. This will fall under the case wherewe do not find any acls for a
> resource and as per our last decision by default we are going to deny this
> request.
>

The reason I'm digging into this is that in Hive we had to fix existing
behavior after financial customers objected loudly to getting "insufficient
privileges" when a real database would return "table does not exist".

I completely agree that having to handle two separate error conditions
("TopicNotExist" if user doesn't have READ, unless user has CREATE in which
case he can see all topics and can get "Unauthorized") adds complexity and
will not be fun to debug. However, when implementing security, a lot of the
stuff we do is around making customers pass security audits, and I suspect
that "can't know that tables even exist" test is a thing.

We share pretty much the same financial customers and they seem to have the
same concerns. Perhaps you can double check if you also have this
requirement?

(and again, sorry for not seeing this earlier and holding up the vote on
what seems like a minor point. I just don't want to punt for later
something when we already have an idea of what customers expect)

Gwen



>
> The configurations are listed explicitly here
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+In
> terface#KIP-11-AuthorizatiInterface-Changestoexistingclasses under
> KafkaConfig. We may add an optional config to allow authorizer to read an
> arbitrary property files incrementally but that does not need to be part
> of this same KIP.
>
> The statement “If we can't audit the access then wht good is controlling
> the access?” seems extreme because we still get to control the access
> which IMHO is a huge win. The default authorizer implementation right now
> logs every allowed/denied access (see here
> https://github.com/Parth-Brahmbhatt/kafka/blob/KAFKA-1688-impl/core/src/mai
> n/scala/kafka/security/auth/SimpleAclAthorizer.scala) in debug mode.
> Anybody who needs auditing could create a lo4j appender to allow debug
> access to this class and send the log output to some audit fil.
>
> Auditing is still a separate piece, we could either add an auditor
> interface that wraps authorizer or the other way around so authorizer and
> auditor can be two separate implementation. I woud love to start a new
> KIP and jira to discuss approaches in more details but I don’t see the
> need to hold up Authorization work for the same.
>
> I don’t agree with the “this design seems too specific” given we already
> have 3 implementation (default, ranger, sentry) that can be supported with
> the current design.
>
> The authorization happens as part of handle and it is the first action,
> see here
> https://github.com/Parth-Brahmbhatt/kafka/blob/KAFKA-1688-impl/core/src/mai
> n/scala/kafka/server/KafkaApis.scala#L103 for one example.
>
> Thanks
> Parth
>
>
>
> On 4/30/15, 4:24 PM, "Suresh Srinivas"  wrote:
>
> >Joe, thanks for the clarification.
> >
> >Regarding audits, sorry I might be misunderstanding your email.
> >Currently, if Kafka does not support audits, I think audits should be
> >considered as a separate effort. Here are the reasons:
> >- Audit,whether authorization is available or not, should record
> >operations to determine what is happening in the system. It should record
> >all the operations such as create, delete, consumption of topics along
> >with user information. It should work whether authorization is enabled or
> >not. In Hadoop long before we added real authorization, we had audit logs.
> >- Authorizaion will bring an additional element of who was denied. As
> >part of audit effort, it is important to add along with what operations
> >succeeded (and for whom), what operations were denied.
> >
> >From: Joe Stein 
> >Sent: Thursday, April 30, 2015 4:12 PM
> >To: dev@kafka.apache.org
> >Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
> >
> >I kind of thought of the authorization module as something that happens in
> >handle(request: RequestChannel.Reuqest) in the request.requestId match
> >
> >If the request doesn't do what it is allowed too it should stop right
> >there. That "what it is allowed to-do" is a true/false callback to the
> >class loadd with 1 function to accept the data and some more about what
> >it
> >is about (that we have access to).
> >
> >I think all of the other features are awesome but you can build them on
> >top
> >of this and then other can do the same.
> >
> >I am more hooked on the authorization module being a watch dog above
> >handle() than I am on the plug-in implementation options (less is more
> >imho).
> >
> >If we do this approach the audit fits in nice because we are seeing mor
>

Re: [DISCUSS] KIP-21 Configuration Management

2015-04-30 Thread Joel Koshy
>1. I have deep concerns about managing configuration in ZooKeeper.
>First, Producers and Consumers shouldn't depend on ZK at all, this seems
>to add back a dependency we are trying to get away from.

The KIP probably needs to be clarified here - I don't think Aditya was
referring to client (producer/consumer) configs. These are global
client-id-specific configs that need to be managed centrally.
(Specifically, quota overrides on a per-client basis).



Re: Review Request 33548: KAFKA-2068 Step I: merge in KAFKA-1841

2015-04-30 Thread Jun Rao

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33548/#review82224
---

Ship it!


Thanks for the patch. +1. Just a few minor comments below. Also, could you 
trying using the following test code in 0.8.1 to see if v0 OffsetCommitRequest 
still works with 0.8.3 broker?


package kafka

import kafka.utils.Logging
import kafka.consumer.SimpleConsumer
import kafka.common.{OffsetMetadataAndError, TopicAndPartition}
import kafka.api.{OffsetFetchRequest, OffsetCommitRequest}


object OffsetCommitMain extends Logging {

def main(args: Array[String]): Unit = {
val simpleConsumer = new SimpleConsumer("localhost", 9092, 100, 
64*1024, "test-client")

val topic = "topic"
// Commit an offset
val topicAndPartition = TopicAndPartition(topic, 0)
val commitRequest = OffsetCommitRequest("test-group", Map(topicAndPartition -> 
OffsetMetadataAndError(offset=42L)))
val commitResponse = simpleConsumer.commitOffsets(commitRequest)

System.out.println("OffsetCommitResponse: " + commitResponse.toString())

// Fetch it and verify
val fetchRequest = OffsetFetchRequest("test-group", Seq(topicAndPartition))
val fetchResponse = simpleConsumer.fetchOffsets(fetchRequest)

System.out.println("OffsetFetchResponse: " + fetchResponse.toString())


}
}


clients/src/main/java/org/apache/kafka/common/protocol/Protocol.java


It's probably clearer if we define OFFSET_FETCH_RESPONSE_V1.



core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala


should not exist => Committed offset should not exist


- Jun Rao


On April 25, 2015, 7:59 a.m., Guozhang Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33548/
> ---
> 
> (Updated April 25, 2015, 7:59 a.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2068
> https://issues.apache.org/jira/browse/KAFKA-2068
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> 1. Remove timestamp in partition-info for offset-commit-request.v0
> 2. Handle offset-commit-request.v0 by writting to ZK.
> 3. Add offset-fetch-request.v1 with the same format as 
> offset-fetch-request.v0, which expects the same version of 
> offset-fetch-response (v0).
> 4. Handle offset-fetch-request.v0 by reading from ZK.
> 5. Minor changes in unit tests
> 
> 
> Diffs
> -
> 
>   clients/src/main/java/org/apache/kafka/common/protocol/Protocol.java 
> d53fe45b9c5d5c873facd9696b1eacb67e812bca 
>   
> clients/src/main/java/org/apache/kafka/common/requests/OffsetCommitRequest.java
>  a0e19768ff400d74c87b592f6c25c96727d2 
>   core/src/main/scala/kafka/api/OffsetCommitRequest.scala 
> cf8e6acc426aef6eb19d862bf6a108a5fc37907a 
>   core/src/main/scala/kafka/api/OffsetFetchRequest.scala 
> 67811a752a470bf9bdbc8c5419e8d6e20a006169 
>   core/src/main/scala/kafka/common/OffsetMetadataAndError.scala 
> 139913f2a40a9afdf3baa7044af265afdebc1fda 
>   core/src/main/scala/kafka/server/KafkaApis.scala 
> b4004aa3a1456d337199aa1245fb0ae61f6add46 
>   core/src/test/scala/unit/kafka/api/RequestResponseSerializationTest.scala 
> dbf9f48fac0150bc2f1e655030c67c21bd160735 
>   core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala 
> 652208a70f66045b854549d93cbbc2b77c24b10b 
> 
> Diff: https://reviews.apache.org/r/33548/diff/
> 
> 
> Testing
> ---
> 
> Unit tests.
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>



Re: Review Request 33731: KAFKA-2160

2015-04-30 Thread Onur Karaman


> On April 30, 2015, 10:45 p.m., Onur Karaman wrote:
> > core/src/main/scala/kafka/server/DelayedOperation.scala, line 224
> > 
> >
> > We can put the key inside Watchers and just use the watchersForKey 
> > that's already in the scope of the Watchers to avoid these call-by-name 
> > calls.

This approach still isn't right.

Say thread t1 calls purgatory.tryCompleteElseWatch(op, Seq(k)) passes the line: 
val watchers = watchersFor(key)
Another thread t2 calls checkAndComplete(k), hits watchers.tryCompleteWatched, 
completes all of its items, and removes the watcher from watchersForKey.
t1 now tries to watchers.watch(operation) on the watcher which has been removed 
from watchersFromKey.


- Onur


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33731/#review82217
---


On April 30, 2015, 10:20 p.m., Guozhang Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33731/
> ---
> 
> (Updated April 30, 2015, 10:20 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2160
> https://issues.apache.org/jira/browse/KAFKA-2160
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Remove key from purgatory watchers pool when its watcher list is empty
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/DelayedOperation.scala 
> 2ed9b467c2865e5717d7f6fd933cd09a5c5b22c0 
>   core/src/main/scala/kafka/utils/timer/Timer.scala 
> b8cde820a770a4e894804f1c268b24b529940650 
>   core/src/test/scala/unit/kafka/log/LogConfigTest.scala 
> f3546adee490891e0d8d0214bef00b1dd7f42227 
>   core/src/test/scala/unit/kafka/server/DelayedOperationTest.scala 
> f3ab3f4ff8eb1aa6b2ab87ba75f72eceb6649620 
> 
> Diff: https://reviews.apache.org/r/33731/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>



Re: [DISCUSSION] java.io.Closeable in KAFKA-2121

2015-04-30 Thread Jay Kreps
Hey Jun,

I think the Closable interface is what we've used elsewhere and what the
rest of the java world uses. I don't think it is too hard for us to add the
override in our interface--implementors of the interface don't need to do
it.

-Jay

On Thu, Apr 30, 2015 at 4:02 PM, Jun Rao  wrote:

> That makes sense. Then, would it be better to have a KafkaClosable
> interface that doesn't throw exception? This way, we don't need to override
> close in every implementing class.
>
> Thanks,
>
> Jun
>
> On Wed, Apr 29, 2015 at 10:36 AM, Steven Wu  wrote:
>
> > Jun,
> >
> > we still get the benefit of extending Closeable. e.g.
> Utils.closeQuietly()
> > can take FooSerializer as an argument. we can avoid the duplication of
> > boiler-plate code.
> >
> > class FooSerializer implements Serializer {
> >
> > @Override
> > public void close() {
> > // may throw unchecked RuntimeException
> > }
> > }
> >
> > final class Utils {
> > public static void closeQuietly(Closeable c, String name,
> > AtomicReference firstException) {
> > if (c != null) {
> > try {
> > c.close();
> > } catch (Throwable t) {
> > firstException.compareAndSet(null, t);
> > log.error("Failed to close " + name, t);
> > }
> > }
> > }
> >
> > On Wed, Apr 29, 2015 at 6:51 AM, Jun Rao  wrote:
> >
> > > If you do this, the code is no longer simple, which defeats the benefit
> > of
> > > extending Closeable. We can define our own Closeable that doesn't throw
> > > exceptions, but it may be confusing. So, it seems the original code is
> > > probably better.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, Apr 28, 2015 at 3:11 PM, Steven Wu 
> wrote:
> > >
> > > > sorry for the previous empty msg.
> > > >
> > > > Jay's idea should work. basically, we override the close method in
> > > > Serializer interface.
> > > >
> > > > public interface Serializer extends Closeable {
> > > > @Override
> > > > public void close();
> > > > }
> > > >
> > > > On Tue, Apr 28, 2015 at 1:10 PM, Steven Wu 
> > wrote:
> > > >
> > > > >
> > > > >
> > > > > On Tue, Apr 28, 2015 at 1:03 PM, Ewen Cheslack-Postava <
> > > > e...@confluent.io>
> > > > > wrote:
> > > > >
> > > > >> Good point Jay. More specifically we were already implementing
> > without
> > > > the
> > > > >> checked exception, we'd need to override close() in the Serializer
> > and
> > > > >> Deserializer interfaces and omit the throws clause. That
> definitely
> > > > makes
> > > > >> them source compatible. Not sure about binary compatibility, I
> > > couldn't
> > > > >> find a quick answer but I think it's probably still compatible.
> > > > >>
> > > > >> -Ewen
> > > > >>
> > > > >> On Tue, Apr 28, 2015 at 12:30 PM, Jay Kreps 
> > > > wrote:
> > > > >>
> > > > >> > Hey guys,
> > > > >> >
> > > > >> > You can implement Closable without the checked exception. Having
> > > > close()
> > > > >> > methods throw checked exceptions isn't very useful unless there
> > is a
> > > > way
> > > > >> > for the caller to recover. In this case there really isn't,
> right?
> > > > >> >
> > > > >> > -Jay
> > > > >> >
> > > > >> > On Mon, Apr 27, 2015 at 5:51 PM, Guozhang Wang <
> > wangg...@gmail.com>
> > > > >> wrote:
> > > > >> >
> > > > >> > > Folks,
> > > > >> > >
> > > > >> > > In a recent commit I made regarding KAFKA-2121, there is an
> > > omitted
> > > > >> API
> > > > >> > > change which makes Serializer / Deserializer extending from
> > > > Closeable,
> > > > >> > > whose close() call could throw IOException by declaration.
> Hence
> > > now
> > > > >> some
> > > > >> > > scenario like:
> > > > >> > >
> > > > >> > > -
> > > > >> > >
> > > > >> > > Serializer keySerializer = ...
> > > > >> > > Serializer valueSerializer = ...
> > > > >> > > KafkaProducer producer = new KafkaProducer(config,
> > keySerializer,
> > > > >> > > valueSerializer)
> > > > >> > > // ...
> > > > >> > > keySerializer.close()
> > > > >> > > valueSerializer.close()
> > > > >> > >
> > > > >> > > -
> > > > >> > >
> > > > >> > > will need to capture IOException now.
> > > > >> > >
> > > > >> > > Want to bring this up for people's attention, and you opinion
> on
> > > > >> whether
> > > > >> > we
> > > > >> > > should revert this change?
> > > > >> > >
> > > > >> > > -- Guozhang
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Thanks,
> > > > >> Ewen
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Suresh Srinivas
Joe, thanks for the clarification.

Regarding audits, sorry I might be misunderstanding your email. Currently, if 
Kafka does not support audits, I think audits should be considered as a 
separate effort. Here are the reasons:
- Audit, whether authorization is available or not, should record operations to 
determine what is happening in the system. It should record all the operations 
such as create, delete, consumption of topics along with user information. It 
should work whether authorization is enabled or not. In Hadoop long before we 
added real authorization, we had audit logs.
- Authorization will bring an additional element of who was denied. As part of 
audit effort, it is important to add along with what operations succeeded (and 
for whom), what operations were denied.

From: Joe Stein 
Sent: Thursday, April 30, 2015 4:12 PM
To: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-11- Authorization design for kafka security

I kind of thought of the authorization module as something that happens in
handle(request: RequestChannel.Reuqest) in the request.requestId match

If the request doesn't do what it is allowed too it should stop right
there. That "what it is allowed to-do" is a true/false callback to the
class loaded with 1 function to accept the data and some more about what it
is about (that we have access to).

I think all of the other features are awesome but you can build them on top
of this and then other can do the same.

I am more hooked on the authorization module being a watch dog above
handle() than I am on the plug-in implementation options (less is more
imho).

If we do this approach the audit fits in nice because we are seeing more
what happens in one place and decision made for access right there.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Apr 30, 2015 at 6:59 PM, Suresh Srinivas 
wrote:

> Joe,
>
> Can you add more details on what generalization looks like? Also is this a
> design issue or code issue?
>
> One more question. Does Kafka have audit capabilities today for topic
> creation, deletion, access etc.?
>
> Regards,
> Suresh
>
> Sent from phone
>
> _
> From: Joe Stein mailto:joe.st...@stealth.ly>>
> Sent: Thursday, April 30, 2015 3:27 PM
> Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
> To: mailto:dev@kafka.apache.org>>
>
>
> Ok, I read through it all again a few times. I get the provider broker
> piece now.
>
> The configurations are still confusing if there are 2 or 3 and they should
> be called out more specifically than as a change to a class. Configs are a
> public interface we should be a bit more explicit.
>
> Was there any discussion about any auditing component? How would anyone
> know if the authorization plugin was running for when or what it was doing?
>
> If we can't audit the access then what good is controlling the access?
>
> I still don't see where all the command line configuration options come in.
> There are a lot of things to-do with it but not sure how to use it yet.
>
> This plug-in still feels like a very specific case and we should try to
> generalize it down some more to make it more straight forward for folks.
>
> ~ Joestein
>
> On Thu, Apr 30, 2015 at 3:51 PM, Parth Brahmbhatt <
> pbrahmbh...@hortonworks.com> wrote:
>
> > During the discussion Jun pointed out that mirror maker, which right now
> > does not copy any zookeeper config overrides, will now replicate topics
> > but will not replicate any acls. Given the authorizer interface exposes
> > the acl management apis, list/get/add/remove, weproposed that mirror
> > maker can just instantiate an instance of authorizer and call these apis
> > directly to get acls for a topic and add it to the destination cluster if
> > we want to add acls to be replicated as part of mirror maker.
> >
> > Thanks
> > Parth
> >
> > On 4/30/15, 12:43 PM, "Joe Stein"  joe.st...@stealth.ly>> wrote:
> >
> > >Parth,
> > >
> > >Can you explain how "Mirror maker will have to start using new acl
> > >management tool") and it not affect any other client. If you aren't
> > >changing the wire protocol then how do clients use it?
> > >
> > >~ Joe stein
> > >
> > >
> > >On Thu, Apr 30, 2015 at 3:15 PM, Parth Brahmbhatt <
> > >pbrahmbh...@hortonworks.com> wrote:
> > >
> > >> Hi Joe,
> > >>
> > >> Regarding open question: I changed the title to “Questions resolved
> > >>after
> > >> community discussions” let me know if you have a better name. I have a
> > >> question and a bullet point under each question describing the final
> > >> decision. Not sure how can I make it any cleaner so appreciate any
> > >> suggestion.
> > >>
> > >> Regarding system tests: I went through a bunch of KIP none of which
> > >> mentions what test cases will be added. Do you want to add a “How do
> you
> > >> plan to tet” section

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein
I kind of thought of the authorization module as something that happens in
handle(request: RequestChannel.Reuqest) in the request.requestId match

If the request doesn't do what it is allowed too it should stop right
there. That "what it is allowed to-do" is a true/false callback to the
class loaded with 1 function to accept the data and some more about what it
is about (that we have access to).

I think all of the other features are awesome but you can build them on top
of this and then other can do the same.

I am more hooked on the authorization module being a watch dog above
handle() than I am on the plug-in implementation options (less is more
imho).

If we do this approach the audit fits in nice because we are seeing more
what happens in one place and decision made for access right there.

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Apr 30, 2015 at 6:59 PM, Suresh Srinivas 
wrote:

> Joe,
>
> Can you add more details on what generalization looks like? Also is this a
> design issue or code issue?
>
> One more question. Does Kafka have audit capabilities today for topic
> creation, deletion, access etc.?
>
> Regards,
> Suresh
>
> Sent from phone
>
> _
> From: Joe Stein mailto:joe.st...@stealth.ly>>
> Sent: Thursday, April 30, 2015 3:27 PM
> Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
> To: mailto:dev@kafka.apache.org>>
>
>
> Ok, I read through it all again a few times. I get the provider broker
> piece now.
>
> The configurations are still confusing if there are 2 or 3 and they should
> be called out more specifically than as a change to a class. Configs are a
> public interface we should be a bit more explicit.
>
> Was there any discussion about any auditing component? How would anyone
> know if the authorization plugin was running for when or what it was doing?
>
> If we can't audit the access then what good is controlling the access?
>
> I still don't see where all the command line configuration options come in.
> There are a lot of things to-do with it but not sure how to use it yet.
>
> This plug-in still feels like a very specific case and we should try to
> generalize it down some more to make it more straight forward for folks.
>
> ~ Joestein
>
> On Thu, Apr 30, 2015 at 3:51 PM, Parth Brahmbhatt <
> pbrahmbh...@hortonworks.com> wrote:
>
> > During the discussion Jun pointed out that mirror maker, which right now
> > does not copy any zookeeper config overrides, will now replicate topics
> > but will not replicate any acls. Given the authorizer interface exposes
> > the acl management apis, list/get/add/remove, weproposed that mirror
> > maker can just instantiate an instance of authorizer and call these apis
> > directly to get acls for a topic and add it to the destination cluster if
> > we want to add acls to be replicated as part of mirror maker.
> >
> > Thanks
> > Parth
> >
> > On 4/30/15, 12:43 PM, "Joe Stein"  joe.st...@stealth.ly>> wrote:
> >
> > >Parth,
> > >
> > >Can you explain how "Mirror maker will have to start using new acl
> > >management tool") and it not affect any other client. If you aren't
> > >changing the wire protocol then how do clients use it?
> > >
> > >~ Joe stein
> > >
> > >
> > >On Thu, Apr 30, 2015 at 3:15 PM, Parth Brahmbhatt <
> > >pbrahmbh...@hortonworks.com> wrote:
> > >
> > >> Hi Joe,
> > >>
> > >> Regarding open question: I changed the title to “Questions resolved
> > >>after
> > >> community discussions” let me know if you have a better name. I have a
> > >> question and a bullet point under each question describing the final
> > >> decision. Not sure how can I make it any cleaner so appreciate any
> > >> suggestion.
> > >>
> > >> Regarding system tests: I went through a bunch of KIP none of which
> > >> mentions what test cases will be added. Do you want to add a “How do
> you
> > >> plan to tet” section in the general KIP template or you think this is
> > >> just a special case where the test cases should be listed and
> discussed
> > >>as
> > >> part of KIP? I am not sure if KIP really is the right forum for this
> > >> discussion. This can easily be addressed during code review if people
> > >> think we don’t have enough test coverage.
> > >>
> > >> I am still not sure which part is not clear. The scal exception is
> > >>added
> > >> for internal server side rpresentation. In the end all of our
> responses
> > >> always return just an error code for which we will add an
> > >> AuthorizationErroCode mapped to AuthorizationException. The error code
> > >>it
> > >> self will not reveal any informationother then the fact that you are
> > >>not
> > >> authorized to perform an operation on a resource and you will get this
> > >> error code even for non existent topics if no acls exist for those
> > >>topics.
> > >>
> > >>  can add a diagram if that makes things more cle

Re: [DISCUSSION] java.io.Closeable in KAFKA-2121

2015-04-30 Thread Jun Rao
That makes sense. Then, would it be better to have a KafkaClosable
interface that doesn't throw exception? This way, we don't need to override
close in every implementing class.

Thanks,

Jun

On Wed, Apr 29, 2015 at 10:36 AM, Steven Wu  wrote:

> Jun,
>
> we still get the benefit of extending Closeable. e.g. Utils.closeQuietly()
> can take FooSerializer as an argument. we can avoid the duplication of
> boiler-plate code.
>
> class FooSerializer implements Serializer {
>
> @Override
> public void close() {
> // may throw unchecked RuntimeException
> }
> }
>
> final class Utils {
> public static void closeQuietly(Closeable c, String name,
> AtomicReference firstException) {
> if (c != null) {
> try {
> c.close();
> } catch (Throwable t) {
> firstException.compareAndSet(null, t);
> log.error("Failed to close " + name, t);
> }
> }
> }
>
> On Wed, Apr 29, 2015 at 6:51 AM, Jun Rao  wrote:
>
> > If you do this, the code is no longer simple, which defeats the benefit
> of
> > extending Closeable. We can define our own Closeable that doesn't throw
> > exceptions, but it may be confusing. So, it seems the original code is
> > probably better.
> >
> > Thanks,
> >
> > Jun
> >
> > On Tue, Apr 28, 2015 at 3:11 PM, Steven Wu  wrote:
> >
> > > sorry for the previous empty msg.
> > >
> > > Jay's idea should work. basically, we override the close method in
> > > Serializer interface.
> > >
> > > public interface Serializer extends Closeable {
> > > @Override
> > > public void close();
> > > }
> > >
> > > On Tue, Apr 28, 2015 at 1:10 PM, Steven Wu 
> wrote:
> > >
> > > >
> > > >
> > > > On Tue, Apr 28, 2015 at 1:03 PM, Ewen Cheslack-Postava <
> > > e...@confluent.io>
> > > > wrote:
> > > >
> > > >> Good point Jay. More specifically we were already implementing
> without
> > > the
> > > >> checked exception, we'd need to override close() in the Serializer
> and
> > > >> Deserializer interfaces and omit the throws clause. That definitely
> > > makes
> > > >> them source compatible. Not sure about binary compatibility, I
> > couldn't
> > > >> find a quick answer but I think it's probably still compatible.
> > > >>
> > > >> -Ewen
> > > >>
> > > >> On Tue, Apr 28, 2015 at 12:30 PM, Jay Kreps 
> > > wrote:
> > > >>
> > > >> > Hey guys,
> > > >> >
> > > >> > You can implement Closable without the checked exception. Having
> > > close()
> > > >> > methods throw checked exceptions isn't very useful unless there
> is a
> > > way
> > > >> > for the caller to recover. In this case there really isn't, right?
> > > >> >
> > > >> > -Jay
> > > >> >
> > > >> > On Mon, Apr 27, 2015 at 5:51 PM, Guozhang Wang <
> wangg...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > Folks,
> > > >> > >
> > > >> > > In a recent commit I made regarding KAFKA-2121, there is an
> > omitted
> > > >> API
> > > >> > > change which makes Serializer / Deserializer extending from
> > > Closeable,
> > > >> > > whose close() call could throw IOException by declaration. Hence
> > now
> > > >> some
> > > >> > > scenario like:
> > > >> > >
> > > >> > > -
> > > >> > >
> > > >> > > Serializer keySerializer = ...
> > > >> > > Serializer valueSerializer = ...
> > > >> > > KafkaProducer producer = new KafkaProducer(config,
> keySerializer,
> > > >> > > valueSerializer)
> > > >> > > // ...
> > > >> > > keySerializer.close()
> > > >> > > valueSerializer.close()
> > > >> > >
> > > >> > > -
> > > >> > >
> > > >> > > will need to capture IOException now.
> > > >> > >
> > > >> > > Want to bring this up for people's attention, and you opinion on
> > > >> whether
> > > >> > we
> > > >> > > should revert this change?
> > > >> > >
> > > >> > > -- Guozhang
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Thanks,
> > > >> Ewen
> > > >>
> > > >
> > > >
> > >
> >
>


Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Gwen Shapira
Ah, I'm not talking about security by obscurity.

At least in the database world, if you don't have SELECT on a table, you
won't even see it when saying "show tables" because the very fact that the
table exists is privileged. In that case, a denied SELECT attempt will
return "table does not exist", and not "permission denied".
It is simply a question of what the privilege covers.

I was wondering if it is desirable to apply the same model to Kafka.

Gwen

On Thu, Apr 30, 2015 at 3:51 PM, Suresh Srinivas 
wrote:

> Comment on AuthorizationException. I think the intent of exception should
> be to capture why a request is rejected. It is important from API
> perspective to be specific to aid debugging. Having a generic or obfuscated
> exception is not very useful. Does someone on getting an exception reach
> out to an admin to understand if a topic exists or it's an authorization
> issue?
>
> I am not getting the security concern. System must be ensure disallowing
> the access by implementing the security correctly. Not based on security by
> obscurity.
>
> Regards,
> Suresh
>
> Sent from phone
>
> _
> From: Gwen Shapira mailto:gshap...@cloudera.com>>
> Sent: Thursday, April 30, 2015 10:14 AM
> Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
> To: mailto:dev@kafka.apache.org>>
>
>
> * Regarding additional authorizers:
> Prasad, who is a PMC on Apache Sentry reviewed the design and confirmed
> Sentry can integrate with the current APIs. Dapeng Sun, a committer on
> Sentry had some concerns about the IP privileges and how we prioritize
> privileges - but nothing that prevents Sentry from integrating with the
> existing solution, from what I could see. It seems to me that the design is
> very generic and adapters can be written for other authorization systems
> (after all, you just need to implement setACL, getACL and Authorize - all
> pretty basic), although I can't speak for Oracle's Identity Manager
> specifically.
>
> * Regarding "AuthorizationException to indicate that an operation was not
> authorized": Sorry I missed this in previous reviewed, but now that I look
> at it - Many systems intentionally don't return AuthorizationException when
> READ privilege is missing, since this already gives too much information
> (that the topic exists and that you don't have privileges on it). Instead
> they return a variant of "doesn't exist". I'm wondering if this approach is
> applicable / desirable for Kafka as well.
> Note that this doesn't remove the need for AuthorizationException - I'm
> just suggesting a possible refinement on its use.
>
> Gwen
>
>
>
> On Thu, Apr 30, 2015 at 9:52 AM, Parth Brahmbhatt <
> pbrahmbh...@hortonworks.com> wrote:
>
> > Hi Joe, Thanks for taking the time to review.
> >
> > * All the open issues already have a resolution , I can open a jira for
> > each one and add the resolution to it and resolve them immediately if you
> > want this for tracking purposes.
> > * We will update system tests to verify that the code works. We have
> > thorough unit tests for all the new code except for modifications made to
> > KafkaAPI as that has way too many dependencies to be mocked which I guess
> > is the reason for no existing unit tests.
> > * I don’t know if I completely understand the concern. We have talked
> with
> > Ranger team (Don Bosco Durai) so we at least have one custom authorizer
> > implementation that has approved this design and they will be able to
> > inject their authorization framework with current interfaces. Do you see
> > any issue with the design which will prevent anyone from providing a
> > custom implementation?
> > * Did not understand the concern around wire protocol, we are adding
> > AuthorizationException to indicate that an operation was not authorized.
> >
> > Thanks
> > Parth
> >
> > On 4/30/15, 5:59 AM, "Jun Rao" mailto:j...@confluent.io>>
> wrote:
> >
> > >Joe,
> > >
> > >Could you elaborate on why we should not store JSON in ZK? So far, all
> > >existing ZK data are in JSON.
> > >
> > >Thanks,
> > >
> > >Jun
> > >
> > >On Thu, Apr 30, 2015 at 2:06 AM, Joe Stein  > wrote:
> > >
> > >> Hi, sorry I am coming in late to chime back in on this thread and
> > >>haven't
> > >> been able to make the KIP hangouts the last few weeks. Sorry if any of
> > >>this
> > >> was brought up already or I missed it.
> > >>
> > >> I read through the KIP and the thread(s) and a couple of things jumped
> > >>out.
> > >>
> > >>
> > >>- Can we break out the open issues in JIRA (maybe during the
> hangout)
> > >>that are in the KIP and resolve/flesh those out more?
> > >>
> > >>
> > >>
> > >>- I don't see any updates with the systems test or how we can know
> > >>the
> > >>code works.
> > >>
> > >>
> > >>
> > >>- We need some implementation/example/sample that we know can work
> in
> > >>all different existing entitlement servers and not just ones that

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein
If you have bucket A and Bucket B and in Bucket A there are patients with
"Disease X" and Bucket B patients without "Disease X".

Now you try to access "Alice" from bucket A and you get a 403  and then
from Bucket "B" you get a 404.

What does that tell you now about Alice? Yup, she has "Disease X".

Uniform none existence is a good policy for protecting data. If you don't
have permission then 404 not found works too.

The context that I thought that applied with this discussion is because I
thought the authorization module was going to be a bit more integration
where the api responses were happening

~ Joe Stein
- - - - - - - - - - - - - - - - -

  http://www.stealth.ly
- - - - - - - - - - - - - - - - -

On Thu, Apr 30, 2015 at 6:51 PM, Suresh Srinivas 
wrote:

> Comment on AuthorizationException. I think the intent of exception should
> be to capture why a request is rejected. It is important from API
> perspective to be specific to aid debugging. Having a generic or obfuscated
> exception is not very useful. Does someone on getting an exception reach
> out to an admin to understand if a topic exists or it's an authorization
> issue?
>
> I am not getting the security concern. System must be ensure disallowing
> the access by implementing the security correctly. Not based on security by
> obscurity.
>
> Regards,
> Suresh
>
> Sent from phone
>
> _
> From: Gwen Shapira mailto:gshap...@cloudera.com>>
> Sent: Thursday, April 30, 2015 10:14 AM
> Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
> To: mailto:dev@kafka.apache.org>>
>
>
> * Regarding additional authorizers:
> Prasad, who is a PMC on Apache Sentry reviewed the design and confirmed
> Sentry can integrate with the current APIs. Dapeng Sun, a committer on
> Sentry had some concerns about the IP privileges and how we prioritize
> privileges - but nothing that prevents Sentry from integrating with the
> existing solution, from what I could see. It seems to me that the design is
> very generic and adapters can be written for other authorization systems
> (after all, you just need to implement setACL, getACL and Authorize - all
> pretty basic), although I can't speak for Oracle's Identity Manager
> specifically.
>
> * Regarding "AuthorizationException to indicate that an operation was not
> authorized": Sorry I missed this in previous reviewed, but now that I look
> at it - Many systems intentionally don't return AuthorizationException when
> READ privilege is missing, since this already gives too much information
> (that the topic exists and that you don't have privileges on it). Instead
> they return a variant of "doesn't exist". I'm wondering if this approach is
> applicable / desirable for Kafka as well.
> Note that this doesn't remove the need for AuthorizationException - I'm
> just suggesting a possible refinement on its use.
>
> Gwen
>
>
>
> On Thu, Apr 30, 2015 at 9:52 AM, Parth Brahmbhatt <
> pbrahmbh...@hortonworks.com> wrote:
>
> > Hi Joe, Thanks for taking the time to review.
> >
> > * All the open issues already have a resolution , I can open a jira for
> > each one and add the resolution to it and resolve them immediately if you
> > want this for tracking purposes.
> > * We will update system tests to verify that the code works. We have
> > thorough unit tests for all the new code except for modifications made to
> > KafkaAPI as that has way too many dependencies to be mocked which I guess
> > is the reason for no existing unit tests.
> > * I don’t know if I completely understand the concern. We have talked
> with
> > Ranger team (Don Bosco Durai) so we at least have one custom authorizer
> > implementation that has approved this design and they will be able to
> > inject their authorization framework with current interfaces. Do you see
> > any issue with the design which will prevent anyone from providing a
> > custom implementation?
> > * Did not understand the concern around wire protocol, we are adding
> > AuthorizationException to indicate that an operation was not authorized.
> >
> > Thanks
> > Parth
> >
> > On 4/30/15, 5:59 AM, "Jun Rao" mailto:j...@confluent.io>>
> wrote:
> >
> > >Joe,
> > >
> > >Could you elaborate on why we should not store JSON in ZK? So far, all
> > >existing ZK data are in JSON.
> > >
> > >Thanks,
> > >
> > >Jun
> > >
> > >On Thu, Apr 30, 2015 at 2:06 AM, Joe Stein  > wrote:
> > >
> > >> Hi, sorry I am coming in late to chime back in on this thread and
> > >>haven't
> > >> been able to make the KIP hangouts the last few weeks. Sorry if any of
> > >>this
> > >> was brought up already or I missed it.
> > >>
> > >> I read through the KIP and the thread(s) and a couple of things jumped
> > >>out.
> > >>
> > >>
> > >>- Can we break out the open issues in JIRA (maybe during the
> hangout)
> > >>that are in the KIP and resolve/flesh those out more?
> > >>
> > >>
> > >>
> > >>- I don't see any upda

[jira] [Commented] (KAFKA-2132) Move Log4J appender to clients module

2015-04-30 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522427#comment-14522427
 ] 

Ashish K Singh commented on KAFKA-2132:
---

Updated reviewboard https://reviews.apache.org/r/33614/
 against branch trunk

> Move Log4J appender to clients module
> -
>
> Key: KAFKA-2132
> URL: https://issues.apache.org/jira/browse/KAFKA-2132
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Ashish K Singh
> Attachments: KAFKA-2132.patch, KAFKA-2132_2015-04-27_19:59:46.patch, 
> KAFKA-2132_2015-04-30_12:22:02.patch, KAFKA-2132_2015-04-30_15:53:17.patch
>
>
> Log4j appender is just a producer.
> Since we have a new producer in the clients module, no need to keep Log4J 
> appender in "core" and force people to package all of Kafka with their apps.
> Lets move the Log4jAppender to clients module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2132) Move Log4J appender to clients module

2015-04-30 Thread Ashish K Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish K Singh updated KAFKA-2132:
--
Attachment: KAFKA-2132_2015-04-30_15:53:17.patch

> Move Log4J appender to clients module
> -
>
> Key: KAFKA-2132
> URL: https://issues.apache.org/jira/browse/KAFKA-2132
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Ashish K Singh
> Attachments: KAFKA-2132.patch, KAFKA-2132_2015-04-27_19:59:46.patch, 
> KAFKA-2132_2015-04-30_12:22:02.patch, KAFKA-2132_2015-04-30_15:53:17.patch
>
>
> Log4j appender is just a producer.
> Since we have a new producer in the clients module, no need to keep Log4J 
> appender in "core" and force people to package all of Kafka with their apps.
> Lets move the Log4jAppender to clients module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33614: Patch for KAFKA-2132

2015-04-30 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33614/
---

(Updated April 30, 2015, 10:53 p.m.)


Review request for kafka.


Bugs: KAFKA-2132
https://issues.apache.org/jira/browse/KAFKA-2132


Repository: kafka


Description
---

KAFKA-2132: Move Log4J appender to clients module


Diffs (updated)
-

  build.gradle fef515b3b2276b1f861e7cc2e33e74c3ce5e405b 
  checkstyle/import-control.xml f2e6cec267e67ce8e261341e373718e14a8e8e03 
  core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala 
5d36a019e3dbfb93737a9cd23404dcd1c5d836d1 
  core/src/test/scala/unit/kafka/log4j/KafkaLog4jAppenderTest.scala 
41366a14590d318fced0e83d6921d8035fa882da 
  
log4j-appender/src/main/java/org/apache/kafka/log4jappender/KafkaLog4jAppender.java
 PRE-CREATION 
  
log4j-appender/src/test/java/org/apache/kafka/log4jappender/KafkaLog4jAppenderTest.java
 PRE-CREATION 
  
log4j-appender/src/test/java/org/apache/kafka/log4jappender/MockKafkaLog4jAppender.java
 PRE-CREATION 
  settings.gradle 83f764e6a4a15a5fdba232dce74a369870f26b45 

Diff: https://reviews.apache.org/r/33614/diff/


Testing
---


Thanks,

Ashish Singh



Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Suresh Srinivas
Comment on AuthorizationException. I think the intent of exception should be to 
capture why a request is rejected. It is important from API perspective to be 
specific to aid debugging. Having a generic or obfuscated exception is not very 
useful. Does someone on getting an exception reach out to an admin to 
understand if a topic exists or it's an authorization issue?

I am not getting the security concern. System must be ensure disallowing the 
access by implementing the security correctly. Not based on security by 
obscurity.

Regards,
Suresh

Sent from phone

_
From: Gwen Shapira mailto:gshap...@cloudera.com>>
Sent: Thursday, April 30, 2015 10:14 AM
Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
To: mailto:dev@kafka.apache.org>>


* Regarding additional authorizers:
Prasad, who is a PMC on Apache Sentry reviewed the design and confirmed
Sentry can integrate with the current APIs. Dapeng Sun, a committer on
Sentry had some concerns about the IP privileges and how we prioritize
privileges - but nothing that prevents Sentry from integrating with the
existing solution, from what I could see. It seems to me that the design is
very generic and adapters can be written for other authorization systems
(after all, you just need to implement setACL, getACL and Authorize - all
pretty basic), although I can't speak for Oracle's Identity Manager
specifically.

* Regarding "AuthorizationException to indicate that an operation was not
authorized": Sorry I missed this in previous reviewed, but now that I look
at it - Many systems intentionally don't return AuthorizationException when
READ privilege is missing, since this already gives too much information
(that the topic exists and that you don't have privileges on it). Instead
they return a variant of "doesn't exist". I'm wondering if this approach is
applicable / desirable for Kafka as well.
Note that this doesn't remove the need for AuthorizationException - I'm
just suggesting a possible refinement on its use.

Gwen



On Thu, Apr 30, 2015 at 9:52 AM, Parth Brahmbhatt <
pbrahmbh...@hortonworks.com> wrote:

> Hi Joe, Thanks for taking the time to review.
>
> * All the open issues already have a resolution , I can open a jira for
> each one and add the resolution to it and resolve them immediately if you
> want this for tracking purposes.
> * We will update system tests to verify that the code works. We have
> thorough unit tests for all the new code except for modifications made to
> KafkaAPI as that has way too many dependencies to be mocked which I guess
> is the reason for no existing unit tests.
> * I don’t know if I completely understand the concern. We have talked with
> Ranger team (Don Bosco Durai) so we at least have one custom authorizer
> implementation that has approved this design and they will be able to
> inject their authorization framework with current interfaces. Do you see
> any issue with the design which will prevent anyone from providing a
> custom implementation?
> * Did not understand the concern around wire protocol, we are adding
> AuthorizationException to indicate that an operation was not authorized.
>
> Thanks
> Parth
>
> On 4/30/15, 5:59 AM, "Jun Rao" mailto:j...@confluent.io>> 
> wrote:
>
> >Joe,
> >
> >Could you elaborate on why we should not store JSON in ZK? So far, all
> >existing ZK data are in JSON.
> >
> >Thanks,
> >
> >Jun
> >
> >On Thu, Apr 30, 2015 at 2:06 AM, Joe Stein 
> >mailto:joe.st...@stealth.ly>> wrote:
> >
> >> Hi, sorry I am coming in late to chime back in on this thread and
> >>haven't
> >> been able to make the KIP hangouts the last few weeks. Sorry if any of
> >>this
> >> was brought up already or I missed it.
> >>
> >> I read through the KIP and the thread(s) and a couple of things jumped
> >>out.
> >>
> >>
> >>- Can we break out the open issues in JIRA (maybe during the hangout)
> >>that are in the KIP and resolve/flesh those out more?
> >>
> >>
> >>
> >>- I don't see any updates with the systems test or how we can know
> >>the
> >>code works.
> >>
> >>
> >>
> >>- We need some implementation/example/sample that we know can work in
> >>all different existing entitlement servers and not just ones that
> >>run in
> >>types of data centers too. I am not saying we should support
> >>everything
> >> but
> >>if someone had to implement
> >>https://docs.oracle.com/cd/E19225-01/820-6551/bzafm/index.html with
> >>Kafka it has to work for them out of the box.
> >>
> >>
> >>
> >>- We should shy away from storing JSON in Zookeeper. Lets store
> >>bytes in
> >>Storage.
> >>
> >>
> >>
> >>- We should spend some time thinking through exceptions in the wire
> >>protocol maybe as part of this so it can keep moving forward.
> >>
> >>
> >> ~ Joe Stein
> >>
> >> On Tue, Apr 28, 2015 at 3:33 AM, Sun, Dapeng 
> >> mailto:dapeng@intel.com>>
> >>wrote:
> >>
> >> > Thank you for your reply, Gwe

Re: Review Request 33731: KAFKA-2160

2015-04-30 Thread Onur Karaman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33731/#review82217
---



core/src/main/scala/kafka/server/DelayedOperation.scala


We can put the key inside Watchers and just use the watchersForKey that's 
already in the scope of the Watchers to avoid these call-by-name calls.


- Onur Karaman


On April 30, 2015, 10:20 p.m., Guozhang Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33731/
> ---
> 
> (Updated April 30, 2015, 10:20 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2160
> https://issues.apache.org/jira/browse/KAFKA-2160
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Remove key from purgatory watchers pool when its watcher list is empty
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/DelayedOperation.scala 
> 2ed9b467c2865e5717d7f6fd933cd09a5c5b22c0 
>   core/src/main/scala/kafka/utils/timer/Timer.scala 
> b8cde820a770a4e894804f1c268b24b529940650 
>   core/src/test/scala/unit/kafka/log/LogConfigTest.scala 
> f3546adee490891e0d8d0214bef00b1dd7f42227 
>   core/src/test/scala/unit/kafka/server/DelayedOperationTest.scala 
> f3ab3f4ff8eb1aa6b2ab87ba75f72eceb6649620 
> 
> Diff: https://reviews.apache.org/r/33731/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>



Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein
Ok, I read through it all again a few times. I get the provider broker
piece now.

The configurations are still confusing if there are 2 or 3 and they should
be called out more specifically than as a change to a class. Configs are a
public interface we should be a bit more explicit.

Was there any discussion about any auditing component? How would anyone
know if the authorization plugin was running for when or what it was doing?

If we can't audit the access then what good is controlling the access?

I still don't see where all the command line configuration options come in.
There are a lot of things to-do with it but not sure how to use it yet.

This plug-in still feels like a very specific case and we should try to
generalize it down some more to make it more straight forward for folks.

~ Joestein

On Thu, Apr 30, 2015 at 3:51 PM, Parth Brahmbhatt <
pbrahmbh...@hortonworks.com> wrote:

> During the discussion Jun pointed out that mirror maker, which right now
> does not copy any zookeeper config overrides, will now replicate topics
> but will not replicate any acls. Given the authorizer interface exposes
> the acl management apis, list/get/add/remove, weproposed that mirror
> maker can just instantiate an instance of authorizer and call these apis
> directly to get acls for a topic and add it to the destination cluster if
> we want to add acls to be replicated as part of mirror maker.
>
> Thanks
> Parth
>
> On 4/30/15, 12:43 PM, "Joe Stein"  wrote:
>
> >Parth,
> >
> >Can you explain how "Mirror maker will have to start using new acl
> >management tool") and it not affect any other client. If you aren't
> >changing the wire protocol then how do clients use it?
> >
> >~ Joe stein
> >
> >
> >On Thu, Apr 30, 2015 at 3:15 PM, Parth Brahmbhatt <
> >pbrahmbh...@hortonworks.com> wrote:
> >
> >> Hi Joe,
> >>
> >> Regarding open question: I changed the title to “Questions resolved
> >>after
> >> community discussions” let me know if you have a better name. I have a
> >> question and a bullet point under each question describing the final
> >> decision. Not sure how can I make it any cleaner so appreciate any
> >> suggestion.
> >>
> >> Regarding system tests: I went through a bunch of KIP none of which
> >> mentions what test cases will be added. Do you want to add a “How do you
> >> plan to tet” section in the general KIP template or you think this is
> >> just a special case where the test cases should be listed and discussed
> >>as
> >> part of KIP? I am not sure if KIP really is the right forum for this
> >> discussion. This can easily be addressed during code review if people
> >> think we don’t have enough test coverage.
> >>
> >> I am still not sure which part is not clear. The scal exception is
> >>added
> >> for internal server side rpresentation. In the end all of our responses
> >> always return just an error code for which we will add an
> >> AuthorizationErroCode mapped to AuthorizationException. The error code
> >>it
> >> self will not reveal any informationother then the fact that you are
> >>not
> >> authorized to perform an operation on a resource and you will get this
> >> error code even for non existent topics if no acls exist for those
> >>topics.
> >>
> >>  can add a diagram if that makes things more clear, I am not convinced
> >> its needed given we have come so far without it. Essentially there are 3
> >> steps
> >> * users use the acl cli to add acls to their
> >>topics/groups/cluster
> >> * brokers start with a broker config that specifies what
> >>authorizer
> >> iplementation to use.
> >> * every api request first goes through the authorizer and fails
> >>if
> >> authorizer denies it. (authorizer implementation described in the doc
> >>with
> >> pseudo code)
> >>
> >> Note: Authentication/Wire Encryption is a separate piece and is being
> >> discussed actively in another KIP if that is the detail you are looking
> >> for.
> >>
> >> I think the description under this section
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+
> >>In
> >> terface#KIP-11-AuthorizationInterface-DataFlows captures the internal
> >> details.
> >>
> >> Thanks
> >> Parth
> >>
> >> On 4/30/15, 11:24 AM, "Joe Stein"  wrote:
> >>
> >> >Gwen << regarding additional authorizers
> >> >
> >> >I think having these i the system tests duals as both good confidence
> >>in
> >> >language independency of the changes. It also makes sure that when we
> >> >release that we don't go breaking Sentry or Ranger or anyone else that
> >> >wants to integrate.
> >> >
> >> >Gwen << Regading "AuthorizationException
> >> >
> >> >Yeah so I have two issues. The one you raised yes, 100%. Also I don't
> >> >unerstand how that is not a broker wire protocol response and only a
> >>JVM
> >> >exception.
> >> >
> >> >Jun << Could you elaborate on why we should not store JSON in ZK? So
> >>far,
> >> >all existing ZK data are in JSON.
> >> >
> >> >If I have 1,000,000 users in LDAP and 150 ge

[jira] [Commented] (KAFKA-2160) DelayedOperationPurgatory should remove the pair in watchersForKey with empty watchers list

2015-04-30 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522384#comment-14522384
 ] 

Guozhang Wang commented on KAFKA-2160:
--

Updated reviewboard https://reviews.apache.org/r/33731/diff/
 against branch origin/trunk

> DelayedOperationPurgatory should remove the pair in watchersForKey with empty 
> watchers list
> ---
>
> Key: KAFKA-2160
> URL: https://issues.apache.org/jira/browse/KAFKA-2160
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Attachments: KAFKA-2160.patch, KAFKA-2160_2015-04-30_15:20:14.patch
>
>
> With purgatory usage in consumer coordinator, it will be common that watcher 
> lists are very short and live only for a short time. So we'd better clean 
> them from the watchersForKey Pool once the list become empty in 
> checkAndComplete() calls. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2160) DelayedOperationPurgatory should remove the pair in watchersForKey with empty watchers list

2015-04-30 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-2160:
-
Attachment: KAFKA-2160_2015-04-30_15:20:14.patch

> DelayedOperationPurgatory should remove the pair in watchersForKey with empty 
> watchers list
> ---
>
> Key: KAFKA-2160
> URL: https://issues.apache.org/jira/browse/KAFKA-2160
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Attachments: KAFKA-2160.patch, KAFKA-2160_2015-04-30_15:20:14.patch
>
>
> With purgatory usage in consumer coordinator, it will be common that watcher 
> lists are very short and live only for a short time. So we'd better clean 
> them from the watchersForKey Pool once the list become empty in 
> checkAndComplete() calls. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33731: KAFKA-2160

2015-04-30 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33731/
---

(Updated April 30, 2015, 10:20 p.m.)


Review request for kafka.


Bugs: KAFKA-2160
https://issues.apache.org/jira/browse/KAFKA-2160


Repository: kafka


Description
---

Remove key from purgatory watchers pool when its watcher list is empty


Diffs (updated)
-

  core/src/main/scala/kafka/server/DelayedOperation.scala 
2ed9b467c2865e5717d7f6fd933cd09a5c5b22c0 
  core/src/main/scala/kafka/utils/timer/Timer.scala 
b8cde820a770a4e894804f1c268b24b529940650 
  core/src/test/scala/unit/kafka/log/LogConfigTest.scala 
f3546adee490891e0d8d0214bef00b1dd7f42227 
  core/src/test/scala/unit/kafka/server/DelayedOperationTest.scala 
f3ab3f4ff8eb1aa6b2ab87ba75f72eceb6649620 

Diff: https://reviews.apache.org/r/33731/diff/


Testing
---


Thanks,

Guozhang Wang



Re: Review Request 33731: KAFKA-2160

2015-04-30 Thread Onur Karaman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33731/#review82204
---



core/src/main/scala/kafka/server/DelayedOperation.scala


It's possible for someone to add a watcher for that key in-between the 
condition and the remove, resulting in us accidentally removing the Watchers 
for that key.


- Onur Karaman


On April 30, 2015, 9:03 p.m., Guozhang Wang wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33731/
> ---
> 
> (Updated April 30, 2015, 9:03 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2160
> https://issues.apache.org/jira/browse/KAFKA-2160
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> Remove key from purgatory watchers pool when its watcher list is empty
> 
> 
> Diffs
> -
> 
>   core/src/main/scala/kafka/server/DelayedOperation.scala 
> 2ed9b467c2865e5717d7f6fd933cd09a5c5b22c0 
>   core/src/main/scala/kafka/utils/timer/Timer.scala 
> b8cde820a770a4e894804f1c268b24b529940650 
>   core/src/test/scala/unit/kafka/log/LogConfigTest.scala 
> f3546adee490891e0d8d0214bef00b1dd7f42227 
>   core/src/test/scala/unit/kafka/server/DelayedOperationTest.scala 
> f3ab3f4ff8eb1aa6b2ab87ba75f72eceb6649620 
> 
> Diff: https://reviews.apache.org/r/33731/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Guozhang Wang
> 
>



[jira] [Commented] (KAFKA-2161) Fix a few copyrights

2015-04-30 Thread Joel Koshy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522285#comment-14522285
 ] 

Joel Koshy commented on KAFKA-2161:
---

We did this before and removed in KAFKA-1158. [~joestein] do you remember why?

> Fix a few copyrights
> 
>
> Key: KAFKA-2161
> URL: https://issues.apache.org/jira/browse/KAFKA-2161
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Trivial
> Attachments: KAFKA-2161.patch
>
>
> I noticed that I accidentally let some incorrect copyright headers slip in 
> with the KAKFA-1501 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2161) Fix a few copyrights

2015-04-30 Thread Aditya Auradkar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522273#comment-14522273
 ] 

Aditya Auradkar commented on KAFKA-2161:


+1 on using Rat. It doesn't seem to have a built-in gradle plugin though.
https://issues.apache.org/jira/browse/RAT-163

We can do what samza did: https://reviews.apache.org/r/23300/

> Fix a few copyrights
> 
>
> Key: KAFKA-2161
> URL: https://issues.apache.org/jira/browse/KAFKA-2161
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Trivial
> Attachments: KAFKA-2161.patch
>
>
> I noticed that I accidentally let some incorrect copyright headers slip in 
> with the KAKFA-1501 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2160) DelayedOperationPurgatory should remove the pair in watchersForKey with empty watchers list

2015-04-30 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-2160:
-
Status: Patch Available  (was: Open)

> DelayedOperationPurgatory should remove the pair in watchersForKey with empty 
> watchers list
> ---
>
> Key: KAFKA-2160
> URL: https://issues.apache.org/jira/browse/KAFKA-2160
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Attachments: KAFKA-2160.patch
>
>
> With purgatory usage in consumer coordinator, it will be common that watcher 
> lists are very short and live only for a short time. So we'd better clean 
> them from the watchersForKey Pool once the list become empty in 
> checkAndComplete() calls. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2160) DelayedOperationPurgatory should remove the pair in watchersForKey with empty watchers list

2015-04-30 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-2160:
-
Attachment: KAFKA-2160.patch

> DelayedOperationPurgatory should remove the pair in watchersForKey with empty 
> watchers list
> ---
>
> Key: KAFKA-2160
> URL: https://issues.apache.org/jira/browse/KAFKA-2160
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Attachments: KAFKA-2160.patch
>
>
> With purgatory usage in consumer coordinator, it will be common that watcher 
> lists are very short and live only for a short time. So we'd better clean 
> them from the watchersForKey Pool once the list become empty in 
> checkAndComplete() calls. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2160) DelayedOperationPurgatory should remove the pair in watchersForKey with empty watchers list

2015-04-30 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522266#comment-14522266
 ] 

Guozhang Wang commented on KAFKA-2160:
--

Created reviewboard https://reviews.apache.org/r/33731/diff/
 against branch origin/trunk

> DelayedOperationPurgatory should remove the pair in watchersForKey with empty 
> watchers list
> ---
>
> Key: KAFKA-2160
> URL: https://issues.apache.org/jira/browse/KAFKA-2160
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
> Attachments: KAFKA-2160.patch
>
>
> With purgatory usage in consumer coordinator, it will be common that watcher 
> lists are very short and live only for a short time. So we'd better clean 
> them from the watchersForKey Pool once the list become empty in 
> checkAndComplete() calls. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 33731: KAFKA-2160

2015-04-30 Thread Guozhang Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33731/
---

Review request for kafka.


Bugs: KAFKA-2160
https://issues.apache.org/jira/browse/KAFKA-2160


Repository: kafka


Description
---

Remove key from purgatory watchers pool when its watcher list is empty


Diffs
-

  core/src/main/scala/kafka/server/DelayedOperation.scala 
2ed9b467c2865e5717d7f6fd933cd09a5c5b22c0 
  core/src/main/scala/kafka/utils/timer/Timer.scala 
b8cde820a770a4e894804f1c268b24b529940650 
  core/src/test/scala/unit/kafka/log/LogConfigTest.scala 
f3546adee490891e0d8d0214bef00b1dd7f42227 
  core/src/test/scala/unit/kafka/server/DelayedOperationTest.scala 
f3ab3f4ff8eb1aa6b2ab87ba75f72eceb6649620 

Diff: https://reviews.apache.org/r/33731/diff/


Testing
---


Thanks,

Guozhang Wang



Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Parth Brahmbhatt
During the discussion Jun pointed out that mirror maker, which right now
does not copy any zookeeper config overrides, will now replicate topics
but will not replicate any acls. Given the authorizer interface exposes
the acl management apis, list/get/add/remove, weproposed that mirror
maker can just instantiate an instance of authorizer and call these apis
directly to get acls for a topic and add it to the destination cluster if
we want to add acls to be replicated as part of mirror maker.

Thanks
Parth

On 4/30/15, 12:43 PM, "Joe Stein"  wrote:

>Parth,
>
>Can you explain how "Mirror maker will have to start using new acl
>management tool") and it not affect any other client. If you aren't
>changing the wire protocol then how do clients use it?
>
>~ Joe stein
>
>
>On Thu, Apr 30, 2015 at 3:15 PM, Parth Brahmbhatt <
>pbrahmbh...@hortonworks.com> wrote:
>
>> Hi Joe,
>>
>> Regarding open question: I changed the title to “Questions resolved
>>after
>> community discussions” let me know if you have a better name. I have a
>> question and a bullet point under each question describing the final
>> decision. Not sure how can I make it any cleaner so appreciate any
>> suggestion.
>>
>> Regarding system tests: I went through a bunch of KIP none of which
>> mentions what test cases will be added. Do you want to add a “How do you
>> plan to tet” section in the general KIP template or you think this is
>> just a special case where the test cases should be listed and discussed
>>as
>> part of KIP? I am not sure if KIP really is the right forum for this
>> discussion. This can easily be addressed during code review if people
>> think we don’t have enough test coverage.
>>
>> I am still not sure which part is not clear. The scal exception is
>>added
>> for internal server side rpresentation. In the end all of our responses
>> always return just an error code for which we will add an
>> AuthorizationErroCode mapped to AuthorizationException. The error code
>>it
>> self will not reveal any informationother then the fact that you are
>>not
>> authorized to perform an operation on a resource and you will get this
>> error code even for non existent topics if no acls exist for those
>>topics.
>>
>>  can add a diagram if that makes things more clear, I am not convinced
>> its needed given we have come so far without it. Essentially there are 3
>> steps
>> * users use the acl cli to add acls to their
>>topics/groups/cluster
>> * brokers start with a broker config that specifies what
>>authorizer
>> iplementation to use.
>> * every api request first goes through the authorizer and fails
>>if
>> authorizer denies it. (authorizer implementation described in the doc
>>with
>> pseudo code)
>>
>> Note: Authentication/Wire Encryption is a separate piece and is being
>> discussed actively in another KIP if that is the detail you are looking
>> for.
>>
>> I think the description under this section
>> 
>>https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+
>>In
>> terface#KIP-11-AuthorizationInterface-DataFlows captures the internal
>> details.
>>
>> Thanks
>> Parth
>>
>> On 4/30/15, 11:24 AM, "Joe Stein"  wrote:
>>
>> >Gwen << regarding additional authorizers
>> >
>> >I think having these i the system tests duals as both good confidence
>>in
>> >language independency of the changes. It also makes sure that when we
>> >release that we don't go breaking Sentry or Ranger or anyone else that
>> >wants to integrate.
>> >
>> >Gwen << Regading "AuthorizationException
>> >
>> >Yeah so I have two issues. The one you raised yes, 100%. Also I don't
>> >unerstand how that is not a broker wire protocol response and only a
>>JVM
>> >exception.
>> >
>> >Jun << Could you elaborate on why we should not store JSON in ZK? So
>>far,
>> >all existing ZK data are in JSON.
>> >
>> >If I have 1,000,000 users in LDAP and 150 get access to Kafka topics
>> >through this mechanism then I have to go and parse and push all of my
>> >changes into zookeeper for it to take affect?
>> >
>> >If someone wanted to implement SAML I don't think this would work. Not
>> >sure
>> >how it wold work with NiFi either (something around here I think maybe
>> >
>> 
>>https://git-wip-us.apache.org/repos/asf?p=incubator-nifi.git;a=blob;f=nar
>>-
>> 
>>>bundles/framework-bundle/framework/web/web-security/src/main/java/org/ap
>>>ac
>> 
>>>he/nifi/web/security/authorization/NiFiAuthorizationService.java;hb=e67e
>>>b4
>> >f5>).
>> >
>> >Parth << All the open issues already have a esolution , I can open a
>>jira
>> >for each one and add the resolution to it and resolve them immediately
>>if
>> >you want thisfor tracking purposes.
>> >
>> >Are those inline to the question with the   I didn't quite get
>> >that
>> >section at all. If the open questions are answered then they aren't
>>open
>> >can you tidy that up then.
>> >
>> >Parth <<  We will update system tests to verify that the code works. We
>> >have thorough unit tests for all the new code 

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Parth Brahmbhatt
I am sorry but I am having a hard time to understand the core concern with
json storage in zookeeper.

* If you are concerned that in order to support only 150 users our of a
million we will have to add a huge json with all 1 million users that is a
misunderstanding. By default anyone who does no have an allow acl gets
rejected so the admin only needs to add one acl “Allow 
to perform operation  on resource ”
and only those 150 users will be allowed and every one else will be denied
access without having to define any explicit acls.
* If the admin actually wanted to add access to a million users , I
imagine a sane admin will create a user group and add allow acl for that
group.
* If you ae worried about json utf-8 bytes being too much storage
overhead, I can add compression, we reduce ability to debug because an
admin can’t just go to zookeeper and make sense our of acls but they still
have clis so should be fine.
* If you are concerned about json parsing , given authorizer is going to
cache acls on its end this should be relatively infrequent.

Let me know if you were pointing at something completely different.

Thanks
Parth 

On 4/30/15, 12:15PM, "Parth Brahmbhatt" 
wrote:

>Hi Joe,
>
>Regarding open question: I changed the title to “Questions resolved after
>community discussions” let me know if you have a better name. I have a
>question and a bullet point under each question describing the final
>decision. Not sure how can I make it any cleaner so appreciate any
>suggestion.
>
>Regarding system tests: I went through a bunch of KIP none of which
>mentions what test cases will be added. Do you want to add a “How do you
>plan to tet” section in the general KIP template or you think this is
>just a special case where the test cases should be listed and discussed as
>part of KIP? I am not sure if KIP really is the right forum for this
>discussion. This can easily be addressed during code review if people
>think we don’t have enough test coverage.
>
>I am still not sure which part is not clear. The scala exception is added
>for internal server side rpresentation. In the end all of our responses
>always return just an error code for which we will add an
>AuthorizationErroCode mapped to AuthorizationException. The error code it
>self will not reveal any information other then the fact that you are not
>authorized to perform an operation on a resource and you will get this
>error code even for non existent topics if no acls exist for those topics.
>
> can add a diagram if that makes things more clear, I am not convinced
>its needed given we have come so far without it. Essentially there are 3
>steps
>   * users use the acl cli to add acls to their topics/groups/cluster
>   * brokers start with a broker config that specifies what authorizer
>implementation to use.
>   * every api request first goes through the authorizer and fails if
>authorizer denies it. (authorzer implementation described in the doc with
>pseudo code)
>
>Note: Authentication/Wire Encryption is a separate piece and is being
>discussed actively in another KIP if that is the detail you are looking
>for.
>
>I think the description under this section
>https://cwiki.apache.org/conflunce/display/KAFKA/KIP-11+-+Authorization+I
>n
>terface#KIP-11-AuthorizationInterface-DataFlows captures the internal
>details.
>
>Thanks
>Parth
>
>On 4/30/15, 11:24 AM, "Joe Stein"  wrote:
>
>>Gwen << regarding additioal authorizers
>>
>>I think having these in the system tests duals as both good confidence in
>>language independency of the changes. It also makes sure that when we
>>release that we don't go breaking Sentry or Rnger or anyone else that
>>wants to integrate.
>>
>>Gwen << Regarding "AuthorizationException
>>
>>Yeah so I have two issues. The one you raised yes, 100%. Also I don't
>>unerstand how that is not a broker wire protocol response and only a JVM
>>exception.
>>
>>Jun << Could you elaborate on why we should not store JSON in ZK? So far,
>>all existing ZK data are in JSON.
>>
>>If I have 1,000,000 users in LDAP and 150 get access to Kafka topics
>>through this mechanism then I have to go and parse and push all of my
>>changes into zookeeper for it to take affect?>>
>>If someone wanted to implement SAML I don't think this would work. Not
>>sure
>>how it would work with NiFi either (something around here I think maybe
>>https://git-wip-us.apache.org/repos/asf?p=incubator-nifi.git;a=blob;f=nar
>>-
>>bundles/framework-bundle/framework/web/web-security/src/main/java/org/apa
>>c
>>he/nifi/web/security/authorization/NiFiAuthorizationService.java;hb=e67eb
>>4
>>f5>).
>>
>>Parth << All the ope issues already have a resolution , I can open a
>>jira
>>for each one and add the resolution to it and resolve them immediately if
>>you want thisfor tracking purposes.
>>
>>Are those inline to the question with the   I didn't quite get
>>that
>>section at all. If the open questions are answered then they aren't open
>>can you tidy that up then.
>>
>>Parth <<

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein
Parth,

Can you explain how "Mirror maker will have to start using new acl
management tool") and it not affect any other client. If you aren't
changing the wire protocol then how do clients use it?

~ Joe stein


On Thu, Apr 30, 2015 at 3:15 PM, Parth Brahmbhatt <
pbrahmbh...@hortonworks.com> wrote:

> Hi Joe,
>
> Regarding open question: I changed the title to “Questions resolved after
> community discussions” let me know if you have a better name. I have a
> question and a bullet point under each question describing the final
> decision. Not sure how can I make it any cleaner so appreciate any
> suggestion.
>
> Regarding system tests: I went through a bunch of KIP none of which
> mentions what test cases will be added. Do you want to add a “How do you
> plan to tet” section in the general KIP template or you think this is
> just a special case where the test cases should be listed and discussed as
> part of KIP? I am not sure if KIP really is the right forum for this
> discussion. This can easily be addressed during code review if people
> think we don’t have enough test coverage.
>
> I am still not sure which part is not clear. The scala exception is added
> for internal server side rpresentation. In the end all of our responses
> always return just an error code for which we will add an
> AuthorizationErroCode mapped to AuthorizationException. The error code it
> self will not reveal any information other then the fact that you are not
> authorized to perform an operation on a resource and you will get this
> error code even for non existent topics if no acls exist for those topics.
>
>  can add a diagram if that makes things more clear, I am not convinced
> its needed given we have come so far without it. Essentially there are 3
> steps
> * users use the acl cli to add acls to their topics/groups/cluster
> * brokers start with a broker config that specifies what authorizer
> implementation to use.
> * every api request first goes through the authorizer and fails if
> authorizer denies it. (authorizer implementation described in the doc with
> pseudo code)
>
> Note: Authentication/Wire Encryption is a separate piece and is being
> discussed actively in another KIP if that is the detail you are looking
> for.
>
> I think the description under this section
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+In
> terface#KIP-11-AuthorizationInterface-DataFlows captures the internal
> details.
>
> Thanks
> Parth
>
> On 4/30/15, 11:24 AM, "Joe Stein"  wrote:
>
> >Gwen << regarding additional authorizers
> >
> >I think having these in the system tests duals as both good confidence in
> >language independency of the changes. It also makes sure that when we
> >release that we don't go breaking Sentry or Ranger or anyone else that
> >wants to integrate.
> >
> >Gwen << Regarding "AuthorizationException
> >
> >Yeah so I have two issues. The one you raised yes, 100%. Also I don't
> >unerstand how that is not a broker wire protocol response and only a JVM
> >exception.
> >
> >Jun << Could you elaborate on why we should not store JSON in ZK? So far,
> >all existing ZK data are in JSON.
> >
> >If I have 1,000,000 users in LDAP and 150 get access to Kafka topics
> >through this mechanism then I have to go and parse and push all of my
> >changes into zookeeper for it to take affect?
> >
> >If someone wanted to implement SAML I don't think this would work. Not
> >sure
> >how it would work with NiFi either (something around here I think maybe
> >
> https://git-wip-us.apache.org/repos/asf?p=incubator-nifi.git;a=blob;f=nar-
> >bundles/framework-bundle/framework/web/web-security/src/main/java/org/apac
> >he/nifi/web/security/authorization/NiFiAuthorizationService.java;hb=e67eb4
> >f5>).
> >
> >Parth << All the open issues already have a resolution , I can open a jira
> >for each one and add the resolution to it and resolve them immediately if
> >you want thisfor tracking purposes.
> >
> >Are those inline to the question with the   I didn't quite get
> >that
> >section at all. If the open questions are answered then they aren't open
> >can you tidy that up then.
> >
> >Parth <<  We will update system tests to verify that the code works. We
> >have thorough unit tests for all the new code except for modificaions
> >made
> >to KafkaAPI as that has way too many dependencies to be mocked which I
> >guess is the reason for no existing unit tests.
> >
> >Can you update the KIP with some more detail about that please.
> >
> >Parth << I don’t know if I completely understand the concern. We have
> >talked with Ranger team (Don Bosco Durai) so we at least have one custom
> >authorizer implementation that as approved this design and they will be
> >able to inject their authorization framework with current interfaces. Do
> >you see any issue with the design which will prevent anyone frm providing
> >a custom implementation?
> >
> >Maybe a diagram for all of the different parts interacting. 

[jira] [Commented] (KAFKA-2161) Fix a few copyrights

2015-04-30 Thread Gwen Shapira (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522112#comment-14522112
 ] 

Gwen Shapira commented on KAFKA-2161:
-

Good catch :)

I think other projects are using Apache RAT to validate licenses during the 
build:
http://creadur.apache.org/rat/

Perhaps we can integrate that too and avoid future license issues.

> Fix a few copyrights
> 
>
> Key: KAFKA-2161
> URL: https://issues.apache.org/jira/browse/KAFKA-2161
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Trivial
> Attachments: KAFKA-2161.patch
>
>
> I noticed that I accidentally let some incorrect copyright headers slip in 
> with the KAKFA-1501 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33614: Patch for KAFKA-2132

2015-04-30 Thread Ashish Singh


> On April 29, 2015, 11:46 p.m., Gwen Shapira wrote:
> > Overall, looks good. 
> > I had a bunch of nits :)

Thanks for the review Gwen!


> On April 29, 2015, 11:46 p.m., Gwen Shapira wrote:
> > build.gradle, line 402
> > 
> >
> > What's this?

This is required for junit tests.


> On April 29, 2015, 11:46 p.m., Gwen Shapira wrote:
> > log4j/src/main/java/org/apache/kafka/log4j/KafkaLog4jAppender.java, lines 
> > 130-131
> > 
> >
> > Any idea why we are using ByteArraySerializer (and not 
> > StringSerializer) when the messages are strings?
> > 
> > I know the original class was the same, but I'm not sure why.

Honestly, I had the same question. However, I did not want to change the 
original behaviour. I doubt using StringSerializer will change the behaviour, 
but having byte serializer is helpful for testing as it can be easily overrided 
by MockProducer. If we really have to use StringSerializer, I can create a 
wrapper on top of current MockProducer to support String values.


- Ashish


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33614/#review82056
---


On April 30, 2015, 7:22 p.m., Ashish Singh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33614/
> ---
> 
> (Updated April 30, 2015, 7:22 p.m.)
> 
> 
> Review request for kafka.
> 
> 
> Bugs: KAFKA-2132
> https://issues.apache.org/jira/browse/KAFKA-2132
> 
> 
> Repository: kafka
> 
> 
> Description
> ---
> 
> KAFKA-2132: Move Log4J appender to clients module
> 
> 
> Diffs
> -
> 
>   build.gradle fef515b3b2276b1f861e7cc2e33e74c3ce5e405b 
>   checkstyle/import-control.xml f2e6cec267e67ce8e261341e373718e14a8e8e03 
>   core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala 
> 5d36a019e3dbfb93737a9cd23404dcd1c5d836d1 
>   core/src/test/scala/unit/kafka/log4j/KafkaLog4jAppenderTest.scala 
> 41366a14590d318fced0e83d6921d8035fa882da 
>   
> log4j-appender/src/main/java/org/apache/kafka/log4jappender/KafkaLog4jAppender.java
>  PRE-CREATION 
>   
> log4j-appender/src/test/java/org/apache/kafka/log4jappender/KafkaLog4jAppenderTest.java
>  PRE-CREATION 
>   
> log4j-appender/src/test/java/org/apache/kafka/log4jappender/MockKafkaLog4jAppender.java
>  PRE-CREATION 
>   settings.gradle 83f764e6a4a15a5fdba232dce74a369870f26b45 
> 
> Diff: https://reviews.apache.org/r/33614/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Ashish Singh
> 
>



[jira] [Commented] (KAFKA-2132) Move Log4J appender to clients module

2015-04-30 Thread Ashish K Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522094#comment-14522094
 ] 

Ashish K Singh commented on KAFKA-2132:
---

Updated reviewboard https://reviews.apache.org/r/33614/
 against branch trunk

> Move Log4J appender to clients module
> -
>
> Key: KAFKA-2132
> URL: https://issues.apache.org/jira/browse/KAFKA-2132
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Ashish K Singh
> Attachments: KAFKA-2132.patch, KAFKA-2132_2015-04-27_19:59:46.patch, 
> KAFKA-2132_2015-04-30_12:22:02.patch
>
>
> Log4j appender is just a producer.
> Since we have a new producer in the clients module, no need to keep Log4J 
> appender in "core" and force people to package all of Kafka with their apps.
> Lets move the Log4jAppender to clients module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33614: Patch for KAFKA-2132

2015-04-30 Thread Ashish Singh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33614/
---

(Updated April 30, 2015, 7:22 p.m.)


Review request for kafka.


Bugs: KAFKA-2132
https://issues.apache.org/jira/browse/KAFKA-2132


Repository: kafka


Description
---

KAFKA-2132: Move Log4J appender to clients module


Diffs (updated)
-

  build.gradle fef515b3b2276b1f861e7cc2e33e74c3ce5e405b 
  checkstyle/import-control.xml f2e6cec267e67ce8e261341e373718e14a8e8e03 
  core/src/main/scala/kafka/producer/KafkaLog4jAppender.scala 
5d36a019e3dbfb93737a9cd23404dcd1c5d836d1 
  core/src/test/scala/unit/kafka/log4j/KafkaLog4jAppenderTest.scala 
41366a14590d318fced0e83d6921d8035fa882da 
  
log4j-appender/src/main/java/org/apache/kafka/log4jappender/KafkaLog4jAppender.java
 PRE-CREATION 
  
log4j-appender/src/test/java/org/apache/kafka/log4jappender/KafkaLog4jAppenderTest.java
 PRE-CREATION 
  
log4j-appender/src/test/java/org/apache/kafka/log4jappender/MockKafkaLog4jAppender.java
 PRE-CREATION 
  settings.gradle 83f764e6a4a15a5fdba232dce74a369870f26b45 

Diff: https://reviews.apache.org/r/33614/diff/


Testing
---


Thanks,

Ashish Singh



[jira] [Updated] (KAFKA-2132) Move Log4J appender to clients module

2015-04-30 Thread Ashish K Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish K Singh updated KAFKA-2132:
--
Attachment: KAFKA-2132_2015-04-30_12:22:02.patch

> Move Log4J appender to clients module
> -
>
> Key: KAFKA-2132
> URL: https://issues.apache.org/jira/browse/KAFKA-2132
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Gwen Shapira
>Assignee: Ashish K Singh
> Attachments: KAFKA-2132.patch, KAFKA-2132_2015-04-27_19:59:46.patch, 
> KAFKA-2132_2015-04-30_12:22:02.patch
>
>
> Log4j appender is just a producer.
> Since we have a new producer in the clients module, no need to keep Log4J 
> appender in "core" and force people to package all of Kafka with their apps.
> Lets move the Log4jAppender to clients module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: Adding ability to provide a prefix for the des...

2015-04-30 Thread pedersen
GitHub user pedersen opened a pull request:

https://github.com/apache/kafka/pull/59

Adding ability to provide a prefix for the destination topic name. This ...

...can be used to allow two clusters to mirror to each other without 
causing a loop.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/pulsepointinc/kafka mirror-maker-prefix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/59.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #59


commit 5648375624208e08ded11025a5b7d3f893f39113
Author: Michael J. Pedersen 
Date:   2015-04-30T19:10:16Z

Adding ability to provide a prefix for the destination topic name. This can 
be used to allow two clusters to mirror to each other without causing a loop.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Sriharsha Chintalapani
Jun << Could you elaborate on why we should not store JSON in ZK? So far, 
all existing ZK data are in JSON. 

If I have 1,000,000 users in LDAP and 150 get access to Kafka topics 
through this mechanism then I have to go and parse and push all of my 
changes into zookeeper for it to take affect? 


 Is this a valid use case to support. Are we expecting 1,000,000 users in a 
kafka cluster and expecting the default authorizer with zookeeper backend to 
support.  We need to differentiate between users of the cluster from users of 
applications that might be built on kafka.


Thanks,
Harsha


On April 30, 2015 at 11:25:02 AM, Joe Stein (joe.st...@stealth.ly) wrote:

Gwen << regarding additional authorizers  

I think having these in the system tests duals as both good confidence in  
language independency of the changes. It also makes sure that when we  
release that we don't go breaking Sentry or Ranger or anyone else that  
wants to integrate.  

Gwen << Regarding "AuthorizationException  

Yeah so I have two issues. The one you raised yes, 100%. Also I don't  
understand how that is not a broker wire protocol response and only a JVM  
exception.  

Jun << Could you elaborate on why we should not store JSON in ZK? So far,  
all existing ZK data are in JSON.  

If I have 1,000,000 users in LDAP and 150 get access to Kafka topics  
through this mechanism then I have to go and parse and push all of my  
changes into zookeeper for it to take affect?  

If someone wanted to implement SAML I don't think this would work. Not sure  
how it would work with NiFi either (something around here I think maybe  
https://git-wip-us.apache.org/repos/asf?p=incubator-nifi.git;a=blob;f=nar-bundles/framework-bundle/framework/web/web-security/src/main/java/org/apache/nifi/web/security/authorization/NiFiAuthorizationService.java;hb=e67eb4f5
  
).  

Parth << All the open issues already have a resolution , I can open a jira  
for each one and add the resolution to it and resolve them immediately if  
you want this for tracking purposes.  

Are those inline to the question with the   I didn't quite get that  
section at all. If the open questions are answered then they aren't open  
can you tidy that up then.  

Parth << We will update system tests to verify that the code works. We  
have thorough unit tests for all the new code except for modifications made  
to KafkaAPI as that has way too many dependencies to be mocked which I  
guess is the reason for no existing unit tests.  

Can you update the KIP with some more detail about that please.  

Parth << I don’t know if I completely understand the concern. We have  
talked with Ranger team (Don Bosco Durai) so we at least have one custom  
authorizer implementation that has approved this design and they will be  
able to inject their authorization framework with current interfaces. Do  
you see any issue with the design which will prevent anyone from providing  
a custom implementation?  

Maybe a diagram for all of the different parts interacting. I still don't  
get why there are no wire protocol changes and just change in the JVM. What  
do non-jvm clients do and how do they work with Kafka. Very confusing,  
almost obfuscating.  

~ Joestein  


On Thu, Apr 30, 2015 at 1:14 PM, Gwen Shapira  wrote:  

> * Regarding additional authorizers:  
> Prasad, who is a PMC on Apache Sentry reviewed the design and confirmed  
> Sentry can integrate with the current APIs. Dapeng Sun, a committer on  
> Sentry had some concerns about the IP privileges and how we prioritize  
> privileges - but nothing that prevents Sentry from integrating with the  
> existing solution, from what I could see. It seems to me that the design is  
> very generic and adapters can be written for other authorization systems  
> (after all, you just need to implement setACL, getACL and Authorize - all  
> pretty basic), although I can't speak for Oracle's Identity Manager  
> specifically.  
>  
> * Regarding "AuthorizationException to indicate that an operation was not  
> authorized": Sorry I missed this in previous reviewed, but now that I look  
> at it - Many systems intentionally don't return AuthorizationException when  
> READ privilege is missing, since this already gives too much information  
> (that the topic exists and that you don't have privileges on it). Instead  
> they return a variant of "doesn't exist". I'm wondering if this approach is  
> applicable / desirable for Kafka as well.  
> Note that this doesn't remove the need for AuthorizationException - I'm  
> just suggesting a possible refinement on its use.  
>  
> Gwen  
>  
>  
>  
> On Thu, Apr 30, 2015 at 9:52 AM, Parth Brahmbhatt <  
> pbrahmbh...@hortonworks.com> wrote:  
>  
> > Hi Joe, Thanks for taking the time to review.  
> >  
> > * All the open issues already have a resolution , I can open a jira for  
> > each one and add the resolution to it and resolve them immediately if you  
> > want this for tracking purposes.  
> > * We

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Parth Brahmbhatt
Hi Joe,

Regarding open question: I changed the title to “Questions resolved after
community discussions” let me know if you have a better name. I have a
question and a bullet point under each question describing the final
decision. Not sure how can I make it any cleaner so appreciate any
suggestion.

Regarding system tests: I went through a bunch of KIP none of which
mentions what test cases will be added. Do you want to add a “How do you
plan to tet” section in the general KIP template or you think this is
just a special case where the test cases should be listed and discussed as
part of KIP? I am not sure if KIP really is the right forum for this
discussion. This can easily be addressed during code review if people
think we don’t have enough test coverage.

I am still not sure which part is not clear. The scala exception is added
for internal server side rpresentation. In the end all of our responses
always return just an error code for which we will add an
AuthorizationErroCode mapped to AuthorizationException. The error code it
self will not reveal any information other then the fact that you are not
authorized to perform an operation on a resource and you will get this
error code even for non existent topics if no acls exist for those topics.

 can add a diagram if that makes things more clear, I am not convinced
its needed given we have come so far without it. Essentially there are 3
steps
* users use the acl cli to add acls to their topics/groups/cluster
* brokers start with a broker config that specifies what authorizer
implementation to use.
* every api request first goes through the authorizer and fails if
authorizer denies it. (authorizer implementation described in the doc with
pseudo code)

Note: Authentication/Wire Encryption is a separate piece and is being
discussed actively in another KIP if that is the detail you are looking
for.

I think the description under this section
https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorization+In
terface#KIP-11-AuthorizationInterface-DataFlows captures the internal
details.

Thanks
Parth

On 4/30/15, 11:24 AM, "Joe Stein"  wrote:

>Gwen << regarding additional authorizers
>
>I think having these in the system tests duals as both good confidence in
>language independency of the changes. It also makes sure that when we
>release that we don't go breaking Sentry or Ranger or anyone else that
>wants to integrate.
>
>Gwen << Regarding "AuthorizationException
>
>Yeah so I have two issues. The one you raised yes, 100%. Also I don't
>unerstand how that is not a broker wire protocol response and only a JVM
>exception.
>
>Jun << Could you elaborate on why we should not store JSON in ZK? So far,
>all existing ZK data are in JSON.
>
>If I have 1,000,000 users in LDAP and 150 get access to Kafka topics
>through this mechanism then I have to go and parse and push all of my
>changes into zookeeper for it to take affect?
>
>If someone wanted to implement SAML I don't think this would work. Not
>sure
>how it would work with NiFi either (something around here I think maybe
>https://git-wip-us.apache.org/repos/asf?p=incubator-nifi.git;a=blob;f=nar-
>bundles/framework-bundle/framework/web/web-security/src/main/java/org/apac
>he/nifi/web/security/authorization/NiFiAuthorizationService.java;hb=e67eb4
>f5>).
>
>Parth << All the open issues already have a resolution , I can open a jira
>for each one and add the resolution to it and resolve them immediately if
>you want thisfor tracking purposes.
>
>Are those inline to the question with the   I didn't quite get
>that
>section at all. If the open questions are answered then they aren't open
>can you tidy that up then.
>
>Parth <<  We will update system tests to verify that the code works. We
>have thorough unit tests for all the new code except for modificaions
>made
>to KafkaAPI as that has way too many dependencies to be mocked which I
>guess is the reason for no existing unit tests.
>
>Can you update the KIP with some more detail about that please.
>
>Parth << I don’t know if I completely understand the concern. We have
>talked with Ranger team (Don Bosco Durai) so we at least have one custom
>authorizer implementation that as approved this design and they will be
>able to inject their authorization framework with current interfaces. Do
>you see any issue with the design which will prevent anyone frm providing
>a custom implementation?
>
>Maybe a diagram for all of the different parts interacting. I still don't
>get why there are no wire protocol changes and just change in the JVM.
>What
>do non-jvm clients do and how do they work with Kafka. Very confusing,
>almost obfuscating.
>
>~ Joestein
>
>
>On Thu, Apr 30, 2015 at 1:14 PM, Gwen Shapira 
>wrote:
>
>> * Regarding additional authorizers:
>> Prasad, who is a PMC on Apache Sentry reviewed the design and confirmed
> Sentry can integrate with the current APIs. Dapeng Sun, a committer on
>> Sentry had some concerns about the IP privilege

[jira] [Updated] (KAFKA-2161) Fix a few copyrights

2015-04-30 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2161:
-
Status: Patch Available  (was: Open)

> Fix a few copyrights
> 
>
> Key: KAFKA-2161
> URL: https://issues.apache.org/jira/browse/KAFKA-2161
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Trivial
> Attachments: KAFKA-2161.patch
>
>
> I noticed that I accidentally let some incorrect copyright headers slip in 
> with the KAKFA-1501 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2161) Fix a few copyrights

2015-04-30 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2161:
-
Attachment: KAFKA-2161.patch

> Fix a few copyrights
> 
>
> Key: KAFKA-2161
> URL: https://issues.apache.org/jira/browse/KAFKA-2161
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Trivial
> Attachments: KAFKA-2161.patch
>
>
> I noticed that I accidentally let some incorrect copyright headers slip in 
> with the KAKFA-1501 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2161) Fix a few copyrights

2015-04-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522033#comment-14522033
 ] 

Ewen Cheslack-Postava commented on KAFKA-2161:
--

Created reviewboard https://reviews.apache.org/r/33729/diff/
 against branch origin/trunk

> Fix a few copyrights
> 
>
> Key: KAFKA-2161
> URL: https://issues.apache.org/jira/browse/KAFKA-2161
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>Priority: Trivial
> Attachments: KAFKA-2161.patch
>
>
> I noticed that I accidentally let some incorrect copyright headers slip in 
> with the KAKFA-1501 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 33729: Patch for KAFKA-2161

2015-04-30 Thread Ewen Cheslack-Postava

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33729/
---

Review request for kafka.


Bugs: KAFKA-2161
https://issues.apache.org/jira/browse/KAFKA-2161


Repository: kafka


Description
---

KAFKA-2161: Fix copyright headers


Diffs
-

  core/src/test/scala/integration/kafka/api/ConsumerBounceTest.scala 
5c4cca653b3801df3494003cc40a56ae60a789a6 
  core/src/test/scala/integration/kafka/api/FixedPortTestUtils.scala 
1d31a4397e4d4f087b12bc8c6c1685f49d3a8f0e 
  core/src/test/scala/integration/kafka/api/ProducerBounceTest.scala 
c9d16bb6b851639683bf72e5f3a4dd65b0671cf4 

Diff: https://reviews.apache.org/r/33729/diff/


Testing
---


Thanks,

Ewen Cheslack-Postava



[jira] [Created] (KAFKA-2161) Fix a few copyrights

2015-04-30 Thread Ewen Cheslack-Postava (JIRA)
Ewen Cheslack-Postava created KAFKA-2161:


 Summary: Fix a few copyrights
 Key: KAFKA-2161
 URL: https://issues.apache.org/jira/browse/KAFKA-2161
 Project: Kafka
  Issue Type: Bug
Reporter: Ewen Cheslack-Postava
Assignee: Ewen Cheslack-Postava
Priority: Trivial


I noticed that I accidentally let some incorrect copyright headers slip in with 
the KAKFA-1501 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2123) Make new consumer offset commit API use callback + future

2015-04-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522016#comment-14522016
 ] 

Ewen Cheslack-Postava commented on KAFKA-2123:
--

This version is now ready for review. I ended up going with the simpler 
interface discussed on the mailing list. I ended up running into too many 
synchronization and proper layering of code dealing with the Future 
implementation because of the way the synchronization is currently handled for 
the consumer. I'll file a separate issue for those problems and think about if 
there's a path to eventually getting the interface with the Future. In the mean 
time, the worst case is that we end up settling on this interface, which was 
the other candidate discussed.

Some notes:

* Added a commit.retries setting. This is specific to commit offset requests 
which is why it has the commit. prefix instead of just being retries. Also, the 
default setting is currently -1, which gives infinite retries (in both sync and 
async modes). I think that's actually a bad idea since a default that can block 
indefinitely seems like it's always a bad idea to me, but we'd need to discuss 
the alternatives.
* Added queueing of offset commit requests. I debated how best to handle this. 
At first I was thinking we might be able to do something smarter to combine 
requests, but in the face of errors (especially partial errors that are 
isolated to one topic partition), it gets difficult to figure out how to handle 
callbacks. Simple queuing seems like the right solution to me, and for the vast 
majority of use cases it either has no impact (you used the sync mode) or has 
little impact (you used async to commit all offsets automatically). Only 
unusual cases where you're submitting the offsets map and doing partial commits 
might care about smarter behavior.




> Make new consumer offset commit API use callback + future
> -
>
> Key: KAFKA-2123
> URL: https://issues.apache.org/jira/browse/KAFKA-2123
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.3
>
> Attachments: KAFKA-2123.patch, KAFKA-2123_2015-04-30_11:23:05.patch
>
>
> The current version of the offset commit API in the new consumer is
> void commit(offsets, commit type)
> where the commit type is either sync or async. This means you need to use 
> sync if you ever want confirmation that the commit succeeded. Some 
> applications will want to use asynchronous offset commit, but be able to tell 
> when the commit completes.
> This is basically the same problem that had to be fixed going from old 
> consumer -> new consumer and I'd suggest the same fix using a callback + 
> future combination. The new API would be
> Future commit(Map offsets, ConsumerCommitCallback 
> callback);
> where ConsumerCommitCallback contains a single method:
> public void onCompletion(Exception exception);
> We can provide shorthand variants of commit() for eliding the different 
> arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein
Gwen << regarding additional authorizers

I think having these in the system tests duals as both good confidence in
language independency of the changes. It also makes sure that when we
release that we don't go breaking Sentry or Ranger or anyone else that
wants to integrate.

Gwen << Regarding "AuthorizationException

Yeah so I have two issues. The one you raised yes, 100%. Also I don't
understand how that is not a broker wire protocol response and only a JVM
exception.

Jun << Could you elaborate on why we should not store JSON in ZK? So far,
all existing ZK data are in JSON.

If I have 1,000,000 users in LDAP and 150 get access to Kafka topics
through this mechanism then I have to go and parse and push all of my
changes into zookeeper for it to take affect?

If someone wanted to implement SAML I don't think this would work. Not sure
how it would work with NiFi either (something around here I think maybe
https://git-wip-us.apache.org/repos/asf?p=incubator-nifi.git;a=blob;f=nar-bundles/framework-bundle/framework/web/web-security/src/main/java/org/apache/nifi/web/security/authorization/NiFiAuthorizationService.java;hb=e67eb4f5
).

Parth << All the open issues already have a resolution , I can open a jira
for each one and add the resolution to it and resolve them immediately if
you want this for tracking purposes.

Are those inline to the question with the   I didn't quite get that
section at all. If the open questions are answered then they aren't open
can you tidy that up then.

Parth <<  We will update system tests to verify that the code works. We
have thorough unit tests for all the new code except for modifications made
to KafkaAPI as that has way too many dependencies to be mocked which I
guess is the reason for no existing unit tests.

Can you update the KIP with some more detail about that please.

Parth << I don’t know if I completely understand the concern. We have
talked with Ranger team (Don Bosco Durai) so we at least have one custom
authorizer implementation that has approved this design and they will be
able to inject their authorization framework with current interfaces. Do
you see any issue with the design which will prevent anyone from providing
a custom implementation?

Maybe a diagram for all of the different parts interacting. I still don't
get why there are no wire protocol changes and just change in the JVM. What
do non-jvm clients do and how do they work with Kafka. Very confusing,
almost obfuscating.

~ Joestein


On Thu, Apr 30, 2015 at 1:14 PM, Gwen Shapira  wrote:

> * Regarding additional authorizers:
> Prasad, who is a PMC on Apache Sentry reviewed the design and confirmed
> Sentry can integrate with the current APIs. Dapeng Sun, a committer on
> Sentry had some concerns about the IP privileges and how we prioritize
> privileges - but nothing that prevents Sentry from integrating with the
> existing solution, from what I could see. It seems to me that the design is
> very generic and adapters can be written for other authorization systems
> (after all, you just need to implement setACL, getACL and Authorize - all
> pretty basic), although I can't speak for Oracle's Identity Manager
> specifically.
>
> * Regarding "AuthorizationException to indicate that an operation was not
> authorized": Sorry I missed this in previous reviewed, but now that I look
> at it - Many systems intentionally don't return AuthorizationException when
> READ privilege is missing, since this already gives too much information
> (that the topic exists and that you don't have privileges on it). Instead
> they return a variant of "doesn't exist". I'm wondering if this approach is
> applicable / desirable for Kafka as well.
> Note that this doesn't remove the need for AuthorizationException - I'm
> just suggesting a possible refinement on its use.
>
> Gwen
>
>
>
> On Thu, Apr 30, 2015 at 9:52 AM, Parth Brahmbhatt <
> pbrahmbh...@hortonworks.com> wrote:
>
> > Hi Joe, Thanks for taking the time to review.
> >
> > * All the open issues already have a resolution , I can open a jira for
> > each one and add the resolution to it and resolve them immediately if you
> > want this for tracking purposes.
> > * We will update system tests to verify that the code works. We have
> > thorough unit tests for all the new code except for modifications made to
> > KafkaAPI as that has way too many dependencies to be mocked which I guess
> > is the reason for no existing unit tests.
> > * I don’t know if I completely understand the concern. We have talked
> with
> > Ranger team (Don Bosco Durai) so we at least have one custom authorizer
> > implementation that has approved this design and they will be able to
> > inject their authorization framework with current interfaces. Do you see
> > any issue with the design which will prevent anyone from providing a
> > custom implementation?
> > * Did not understand the concern around wire protocol, we are adding
> > AuthorizationException to indicate that an operation was not authori

[jira] [Updated] (KAFKA-2123) Make new consumer offset commit API use callback + future

2015-04-30 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2123:
-
Status: Patch Available  (was: In Progress)

> Make new consumer offset commit API use callback + future
> -
>
> Key: KAFKA-2123
> URL: https://issues.apache.org/jira/browse/KAFKA-2123
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.3
>
> Attachments: KAFKA-2123.patch, KAFKA-2123_2015-04-30_11:23:05.patch
>
>
> The current version of the offset commit API in the new consumer is
> void commit(offsets, commit type)
> where the commit type is either sync or async. This means you need to use 
> sync if you ever want confirmation that the commit succeeded. Some 
> applications will want to use asynchronous offset commit, but be able to tell 
> when the commit completes.
> This is basically the same problem that had to be fixed going from old 
> consumer -> new consumer and I'd suggest the same fix using a callback + 
> future combination. The new API would be
> Future commit(Map offsets, ConsumerCommitCallback 
> callback);
> where ConsumerCommitCallback contains a single method:
> public void onCompletion(Exception exception);
> We can provide shorthand variants of commit() for eliding the different 
> arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2123) Make new consumer offset commit API use callback + future

2015-04-30 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-2123:
-
Attachment: KAFKA-2123_2015-04-30_11:23:05.patch

> Make new consumer offset commit API use callback + future
> -
>
> Key: KAFKA-2123
> URL: https://issues.apache.org/jira/browse/KAFKA-2123
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.3
>
> Attachments: KAFKA-2123.patch, KAFKA-2123_2015-04-30_11:23:05.patch
>
>
> The current version of the offset commit API in the new consumer is
> void commit(offsets, commit type)
> where the commit type is either sync or async. This means you need to use 
> sync if you ever want confirmation that the commit succeeded. Some 
> applications will want to use asynchronous offset commit, but be able to tell 
> when the commit completes.
> This is basically the same problem that had to be fixed going from old 
> consumer -> new consumer and I'd suggest the same fix using a callback + 
> future combination. The new API would be
> Future commit(Map offsets, ConsumerCommitCallback 
> callback);
> where ConsumerCommitCallback contains a single method:
> public void onCompletion(Exception exception);
> We can provide shorthand variants of commit() for eliding the different 
> arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2123) Make new consumer offset commit API use callback + future

2015-04-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521992#comment-14521992
 ] 

Ewen Cheslack-Postava commented on KAFKA-2123:
--

Updated reviewboard https://reviews.apache.org/r/33196/diff/
 against branch origin/trunk

> Make new consumer offset commit API use callback + future
> -
>
> Key: KAFKA-2123
> URL: https://issues.apache.org/jira/browse/KAFKA-2123
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, consumer
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.8.3
>
> Attachments: KAFKA-2123.patch, KAFKA-2123_2015-04-30_11:23:05.patch
>
>
> The current version of the offset commit API in the new consumer is
> void commit(offsets, commit type)
> where the commit type is either sync or async. This means you need to use 
> sync if you ever want confirmation that the commit succeeded. Some 
> applications will want to use asynchronous offset commit, but be able to tell 
> when the commit completes.
> This is basically the same problem that had to be fixed going from old 
> consumer -> new consumer and I'd suggest the same fix using a callback + 
> future combination. The new API would be
> Future commit(Map offsets, ConsumerCommitCallback 
> callback);
> where ConsumerCommitCallback contains a single method:
> public void onCompletion(Exception exception);
> We can provide shorthand variants of commit() for eliding the different 
> arguments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 33196: Patch for KAFKA-2123

2015-04-30 Thread Ewen Cheslack-Postava

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33196/
---

(Updated April 30, 2015, 6:23 p.m.)


Review request for kafka.


Bugs: KAFKA-2123
https://issues.apache.org/jira/browse/KAFKA-2123


Repository: kafka


Description (updated)
---

KAFKA-2123: Add queuing of offset commit requests.


Diffs (updated)
-

  clients/src/main/java/org/apache/kafka/clients/consumer/Consumer.java 
8f587bc0705b65b3ef37c86e0c25bb43ab8803de 
  
clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerCommitCallback.java
 PRE-CREATION 
  clients/src/main/java/org/apache/kafka/clients/consumer/ConsumerConfig.java 
bdff518b732105823058e6182f445248b45dc388 
  clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java 
d301be4709f7b112e1f3a39f3c04cfa65f00fa60 
  clients/src/main/java/org/apache/kafka/clients/consumer/MockConsumer.java 
f50da825756938c193d7f07bee953e000e2627d9 
  
clients/src/main/java/org/apache/kafka/clients/consumer/internals/Coordinator.java
 e55ab11df4db0b0084f841a74cbcf819caf780d5 
  
clients/src/main/java/org/apache/kafka/common/errors/ConsumerCoordinatorNotAvailableException.java
 PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/common/errors/NotCoordinatorForConsumerException.java
 PRE-CREATION 
  
clients/src/main/java/org/apache/kafka/common/errors/OffsetLoadInProgressException.java
 PRE-CREATION 
  clients/src/main/java/org/apache/kafka/common/protocol/Errors.java 
36aa412404ff1458c7bef0feecaaa8bc45bed9c7 
  
clients/src/test/java/org/apache/kafka/clients/consumer/internals/CoordinatorTest.java
 b06c4a73e2b4e9472cd772c8bc32bf4a29f431bb 
  core/src/test/scala/integration/kafka/api/ConsumerTest.scala 
ffbdf5dc106e2a59563768280074696c76491337 

Diff: https://reviews.apache.org/r/33196/diff/


Testing
---


Thanks,

Ewen Cheslack-Postava



[jira] [Created] (KAFKA-2160) DelayedOperationPurgatory should remove the pair in watchersForKey with empty watchers list

2015-04-30 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-2160:


 Summary: DelayedOperationPurgatory should remove the pair in 
watchersForKey with empty watchers list
 Key: KAFKA-2160
 URL: https://issues.apache.org/jira/browse/KAFKA-2160
 Project: Kafka
  Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang


With purgatory usage in consumer coordinator, it will be common that watcher 
lists are very short and live only for a short time. So we'd better clean them 
from the watchersForKey Pool once the list become empty in checkAndComplete() 
calls. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Gwen Shapira
* Regarding additional authorizers:
Prasad, who is a PMC on Apache Sentry reviewed the design and confirmed
Sentry can integrate with the current APIs. Dapeng Sun, a committer on
Sentry had some concerns about the IP privileges and how we prioritize
privileges - but nothing that prevents Sentry from integrating with the
existing solution, from what I could see. It seems to me that the design is
very generic and adapters can be written for other authorization systems
(after all, you just need to implement setACL, getACL and Authorize - all
pretty basic), although I can't speak for Oracle's Identity Manager
specifically.

* Regarding "AuthorizationException to indicate that an operation was not
authorized": Sorry I missed this in previous reviewed, but now that I look
at it - Many systems intentionally don't return AuthorizationException when
READ privilege is missing, since this already gives too much information
(that the topic exists and that you don't have privileges on it). Instead
they return a variant of "doesn't exist". I'm wondering if this approach is
applicable / desirable for Kafka as well.
Note that this doesn't remove the need for AuthorizationException - I'm
just suggesting a possible refinement on its use.

Gwen



On Thu, Apr 30, 2015 at 9:52 AM, Parth Brahmbhatt <
pbrahmbh...@hortonworks.com> wrote:

> Hi Joe, Thanks for taking the time to review.
>
> * All the open issues already have a resolution , I can open a jira for
> each one and add the resolution to it and resolve them immediately if you
> want this for tracking purposes.
> * We will update system tests to verify that the code works. We have
> thorough unit tests for all the new code except for modifications made to
> KafkaAPI as that has way too many dependencies to be mocked which I guess
> is the reason for no existing unit tests.
> * I don’t know if I completely understand the concern. We have talked with
> Ranger team (Don Bosco Durai) so we at least have one custom authorizer
> implementation that has approved this design and they will be able to
> inject their authorization framework with current interfaces. Do you see
> any issue with the design which will prevent anyone from providing a
> custom implementation?
> * Did not understand the concern around wire protocol, we are adding
> AuthorizationException to indicate that an operation was not authorized.
>
> Thanks
> Parth
>
> On 4/30/15, 5:59 AM, "Jun Rao"  wrote:
>
> >Joe,
> >
> >Could you elaborate on why we should not store JSON in ZK? So far, all
> >existing ZK data are in JSON.
> >
> >Thanks,
> >
> >Jun
> >
> >On Thu, Apr 30, 2015 at 2:06 AM, Joe Stein  wrote:
> >
> >> Hi, sorry I am coming in late to chime back in on this thread and
> >>haven't
> >> been able to make the KIP hangouts the last few weeks. Sorry if any of
> >>this
> >> was brought up already or I missed it.
> >>
> >> I read through the KIP and the thread(s) and a couple of things jumped
> >>out.
> >>
> >>
> >>- Can we break out the open issues in JIRA (maybe during the hangout)
> >>that are in the KIP and resolve/flesh those out more?
> >>
> >>
> >>
> >>- I don't see any updates with the systems test or how we can know
> >>the
> >>code works.
> >>
> >>
> >>
> >>- We need some implementation/example/sample that we know can work in
> >>all different existing entitlement servers and not just ones that
> >>run in
> >>types of data centers too. I am not saying we should support
> >>everything
> >> but
> >>if someone had to implement
> >>https://docs.oracle.com/cd/E19225-01/820-6551/bzafm/index.html with
> >>Kafka it has to work for them out of the box.
> >>
> >>
> >>
> >>- We should shy away from storing JSON in Zookeeper. Lets store
> >>bytes in
> >>Storage.
> >>
> >>
> >>
> >>- We should spend some time thinking through exceptions in the wire
> >>protocol maybe as part of this so it can keep moving forward.
> >>
> >>
> >> ~ Joe Stein
> >>
> >> On Tue, Apr 28, 2015 at 3:33 AM, Sun, Dapeng 
> >>wrote:
> >>
> >> > Thank you for your reply, Gwen.
> >> >
> >> > >1. Complex rule systems can be difficult to reason about and
> >>therefore
> >> > end up being less secure. The rule "Deny always wins" is very easy to
> >> grasp.
> >> > Yes, I'm agreed with your point: we should not make the rule complex.
> >> >
> >> > >2. We currently don't have any mechanism for specifying IP ranges (or
> >> host
> >> > >ranges) at all. I think its a pretty significant deficiency, but it
> >>does
> >> > mean that we don't need to worry about the issue of blocking a large
> >> range
> >> > while unblocking few servers in the range.
> >> > Support ranges sounds reasonable. If this feature will be in
> >>development
> >> > plan, I also don't think we can put "the best matching acl" and "
> >>Support
> >> > ip ranges" together.
> >> >
> >> > >We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss
> >>this
> >> > and other outstanding design issues (not all related to 

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Parth Brahmbhatt
I also wanted to send ping to all he committers. This voting thread has
been open for > 1 week and has 2 non-bindng +1s. I would appreciate if the
committers raised their concerns or casted their votes.

Thanks
Parth

On 4/30/15, 9:52 AM, "Parth Brahmbhatt" 
wrote:

>Hi Joe, Thanks for taking the time to review.
> 
>* All the open issues already have a resolution , I can open a jira for
>each one and add the resolution to it and resolve them immediately if you
>want this for tracking purposes.
>* We will update system tests to verify that the code works. We have
>thorough unit tests fr all the new code except for modifications made to
>KafkaAPI as that has way too many dependencies to be mocked which I guess
>is the reason for no existing unit tests.
>* I don’t know if I completely understand the concern. We have talked with
>Ranger team (Don Bosco Drai) so we at least have one custom authorizer
>implementation that has approved this design and they will be able to
>inject their authorization framework with current interfaces. Do you see
>any issue with the design which will prevent anyone from providing a
>custom implementation?
>* Did not understand the concern around wire protocol, we are adding
>AuthorizationException to indicate that an operation was not authorized.
>
>Thanks
>Parth
>
>On 4/30/15, 5:59 AM, "Jun Rao"  wrote:
>
>>Joe,
>>
>>Could you elaborate on why we should not store JSON in ZK? So far, all
>>existing ZK data are in JSON.
>>
>>Thanks,
>>
>>Jun
>>
>>On Thu, Apr 30, 2015 at 2:06 AM, Joe Stein  wrote:
>>
>>> Hi, sorry I am coming in late to chime back in on this thread and
>>>haven't
>>> been able to make the KIP hangouts the last few weeks. Sorry if any of
>>>this
>>> was brought up already or I missed it.
>>>
>>> I read through the KIP and the threads) and a couple of things jumped
>>>out.
>>>
>>>
>>>- Can we break out the open issues in JIRA (maybe during the
>>>hangout)
>>>that are in the KIP and resolve/flesh those out more?
>>>
>>>
>>>
>>>- I don't see any updates with the systems test or how wecan know
>>>the
>>>code works.
>>>
>>>
>>>
>>>- We need some implementation/example/sample that we know can work
>>>in
>>>all different existing entitlement servers and not just ones that
>>>run in
>>>types of data centers too. I am not saying we should support
>>>everything
>>> but
>>>if someone had to implement
>>>https://docs.oracle.com/cd/E19225-01/820-6551/bzafm/index.html with
>>>Kafka it has to work for them out of the box.
>>>
>>>
>>>
>>>- We should shy away from storing JSON in Zookeeper. Lets store
>>>bytes in
>>>Storage.
>>>
>>>
>>>
>>>- We should spend some time thinking through exceptions in the wire
>>>protocol maybe as part of this so it can keep moving forward.
>>>
>>>
>>> ~ Joe Stein
>>>
>>> On Tue, Apr 28, 2015 at 3:33 AM, Sun, Dapeng 
>>>wrote:
>>>
>>> > Thank you for your reply, Gwen.
>>> >
>>> > >1. Complex rule systems can be difficult to reason about and
>>>terefore
>>> > end up being less secure. The rule "Deny always wins" is very easy to
>>> grasp.
>>> > Yes, I'm agreed with your point: we should not make the rule complex.
>>> >
>>> > >2. We currently don't have any mechanism for specifying IP ranges
>>>(or
>>> host
>>> > >ranges) at all. I think its a pretty significant deficiency, but it
>>>does
>>> > mean that we don't need to worry about the issue of blocking a large
>>> range
>>> > while unblocking few servers in the range.
>>> > Support ranges sounds reasonable. If this feature will be in
>>>development
>>> > plan, I also don't think we can put "the best matching acl" and "
>>>Support
>>> > ip ranges" together.
>>> >
>>> > >We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss
>>>this
>>> > and other outstanding design issues (not all related to security). If
>>>you
>>> > are interested in joining - let me know and I'll forward you the
>>>invite.
>>> > Thank you, Gwen. I have the invite and I should be at home at that
>>>time.
>>> > But due to network issue, I may can't join the meeting smoothly.
>>> >
>>> > Regards
>>> > Dapeng
>>> >
>>> > -Original Message-
>>> > From: Gwen Shapira [mailto:gshap...@cloudera.com]
>>> > Sent: Tuesday, April 28, 2015 1:31 PM
>>> > To: dev@kafka.apache.org
>>> > Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
>>> >
>>> > While I see the advantage of being able to say something like: "deny
>>>user
>>> > X from hosts h1...h200" also "allow user X from host h189", there are
>>>two
>>> > issues here:
>>> >
>>> > 1. Complex rule systems can be difficult to reason about and
>>>therefore
>>> end
>>> > up being less secure. The rule "Deny always wins" is very easy to
>>>grasp.
>>> >
>>> > 2. We currently don't have any mechanism for specifying IP ranges (or
>>> host
>>> > ranges) at all. I think its a pretty significant deficiency, but it
>>>does
>>> > mean that we don't need to worry about the issue of blocking a large
>>> range
>>> > whil

Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Parth Brahmbhatt
Hi Joe, Thanks for taking the time to review.
 
* All the open issues already have a resolution , I can open a jira for
each one and add the resolution to it and resolve them immediately if you
want this for tracking purposes.
* We will update system tests to verify that the code works. We have
thorough unit tests for all the new code except for modifications made to
KafkaAPI as that has way too many dependencies to be mocked which I guess
is the reason for no existing unit tests.
* I don’t know if I completely understand the concern. We have talked with
Ranger team (Don Bosco Durai) so we at least have one custom authorizer
implementation that has approved this design and they will be able to
inject their authorization framework with current interfaces. Do you see
any issue with the design which will prevent anyone from providing a
custom implementation?
* Did not understand the concern around wire protocol, we are adding
AuthorizationException to indicate that an operation was not authorized.

Thanks
Parth

On 4/30/15, 5:59 AM, "Jun Rao"  wrote:

>Joe,
>
>Could you elaborate on why we should not store JSON in ZK? So far, all
>existing ZK data are in JSON.
>
>Thanks,
>
>Jun
>
>On Thu, Apr 30, 2015 at 2:06 AM, Joe Stein  wrote:
>
>> Hi, sorry I am coming in late to chime back in on this thread and
>>haven't
>> been able to make the KIP hangouts the last few weeks. Sorry if any of
>>this
>> was brought up already or I missed it.
>>
>> I read through the KIP and the thread(s) and a couple of things jumped
>>out.
>>
>>
>>- Can we break out the open issues in JIRA (maybe during the hangout)
>>that are in the KIP and resolve/flesh those out more?
>>
>>
>>
>>- I don't see any updates with the systems test or how we can know
>>the
>>code works.
>>
>>
>>
>>- We need some implementation/example/sample that we know can work in
>>all different existing entitlement servers and not just ones that
>>run in
>>types of data centers too. I am not saying we should support
>>everything
>> but
>>if someone had to implement
>>https://docs.oracle.com/cd/E19225-01/820-6551/bzafm/index.html with
>>Kafka it has to work for them out of the box.
>>
>>
>>
>>- We should shy away from storing JSON in Zookeeper. Lets store
>>bytes in
>>Storage.
>>
>>
>>
>>- We should spend some time thinking through exceptions in the wire
>>protocol maybe as part of this so it can keep moving forward.
>>
>>
>> ~ Joe Stein
>>
>> On Tue, Apr 28, 2015 at 3:33 AM, Sun, Dapeng 
>>wrote:
>>
>> > Thank you for your reply, Gwen.
>> >
>> > >1. Complex rule systems can be difficult to reason about and
>>therefore
>> > end up being less secure. The rule "Deny always wins" is very easy to
>> grasp.
>> > Yes, I'm agreed with your point: we should not make the rule complex.
>> >
>> > >2. We currently don't have any mechanism for specifying IP ranges (or
>> host
>> > >ranges) at all. I think its a pretty significant deficiency, but it
>>does
>> > mean that we don't need to worry about the issue of blocking a large
>> range
>> > while unblocking few servers in the range.
>> > Support ranges sounds reasonable. If this feature will be in
>>development
>> > plan, I also don't think we can put "the best matching acl" and "
>>Support
>> > ip ranges" together.
>> >
>> > >We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss
>>this
>> > and other outstanding design issues (not all related to security). If
>>you
>> > are interested in joining - let me know and I'll forward you the
>>invite.
>> > Thank you, Gwen. I have the invite and I should be at home at that
>>time.
>> > But due to network issue, I may can't join the meeting smoothly.
>> >
>> > Regards
>> > Dapeng
>> >
>> > -Original Message-
>> > From: Gwen Shapira [mailto:gshap...@cloudera.com]
>> > Sent: Tuesday, April 28, 2015 1:31 PM
>> > To: dev@kafka.apache.org
>> > Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
>> >
>> > While I see the advantage of being able to say something like: "deny
>>user
>> > X from hosts h1...h200" also "allow user X from host h189", there are
>>two
>> > issues here:
>> >
>> > 1. Complex rule systems can be difficult to reason about and therefore
>> end
>> > up being less secure. The rule "Deny always wins" is very easy to
>>grasp.
>> >
>> > 2. We currently don't have any mechanism for specifying IP ranges (or
>> host
>> > ranges) at all. I think its a pretty significant deficiency, but it
>>does
>> > mean that we don't need to worry about the issue of blocking a large
>> range
>> > while unblocking few servers in the range.
>> >
>> > Gwen
>> >
>> > P.S
>> > We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss
>>this
>> > and other outstanding design issues (not all related to security). If
>>you
>> > are interested in joining - let me know and I'll forward you the
>>invite.
>> >
>> > Gwen
>> >
>> > On Mon, Apr 27, 2015 at 10:15 PM, Sun, Dapeng 
>> > wrote:
>> >
>> > > Attach th

[jira] [Commented] (KAFKA-824) java.lang.NullPointerException in commitOffsets

2015-04-30 Thread John Humphreys (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521766#comment-14521766
 ] 

John Humphreys commented on KAFKA-824:
--

Awesome, thank you for the quick follow-up :).

> java.lang.NullPointerException in commitOffsets 
> 
>
> Key: KAFKA-824
> URL: https://issues.apache.org/jira/browse/KAFKA-824
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.7.2
>Reporter: Yonghui Zhao
>Assignee: Neha Narkhede
> Attachments: ZkClient.0.3.txt, ZkClient.0.4.txt, screenshot-1.jpg
>
>
> Neha Narkhede
> "Yes, I have. Unfortunately, I never quite around to fixing it. My guess is
> that it is caused due to a race condition between the rebalance thread and
> the offset commit thread when a rebalance is triggered or the client is
> being shutdown. Do you mind filing a bug ?"
> 2013/03/25 12:08:32.020 WARN [ZookeeperConsumerConnector] [] 
> 0_lu-ml-test10.bj-1364184411339-7c88f710 exception during commitOffsets
> java.lang.NullPointerException
> at org.I0Itec.zkclient.ZkConnection.writeData(ZkConnection.java:111)
> at org.I0Itec.zkclient.ZkClient$10.call(ZkClient.java:813)
> at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:809)
> at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:777)
> at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248)
> at scala.collection.Iterator$class.foreach(Iterator.scala:631)
> at 
> scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:549)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
> at 
> scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:570)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2.apply(ZookeeperConsumerConnector.scala:248)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2.apply(ZookeeperConsumerConnector.scala:246)
> at scala.collection.Iterator$class.foreach(Iterator.scala:631)
> at kafka.utils.Pool$$anon$1.foreach(Pool.scala:53)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
> at kafka.utils.Pool.foreach(Pool.scala:24)
> at 
> kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246)
> at 
> kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126)
> at kafka.utils.Utils$$anon$2.run(Utils.scala:58)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-2147) Unbalanced replication can cause extreme purgatory growth

2015-04-30 Thread Evan Huus (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521670#comment-14521670
 ] 

Evan Huus edited comment on KAFKA-2147 at 4/30/15 4:13 PM:
---

When I enable TRACE request logging on a single node (broker ID 1 in the 
problematic cluster) I see:
- approximately 850 FetchRequest messages per second with ClientId of 
"ReplicaFetcherThread-0-1"
- approximately evenly distributed between the various ReplicaIds
- each one with "MaxWait: 500 ms; MinBytes: 1 bytes" which seems reasonable
- an average of 31 partitions per request without too much variation

When I enable DEBUG logging generally the problem seems to go away on that 
node... however I still collected stats on the "Begin purging watch lists" 
message. Over a five minute window:
- exactly 100 occurrences of that line
- heavily grouped (75 of them within a single 10-second window, otherwise a 
handful every ~30 seconds)
- heavily variable number of elements purged; low of 38, high of 210073. 
Correlates pretty strongly with the time since the previous purge

fetch.purgatory.purge.interval.requests is still set to 200 on this node.


was (Author: eapache):
When I enable TRACE request logging on a single node (broker ID 1 in the 
problematic cluster) I see:
- approximately 850 FetchRequest messages per second with ClientId of 
"ReplicaFetcherThread-0-1"
- approximately evenly distributed between the various ReplicaIds
- each one with "MaxWait: 500 ms; MinBytes: 1 bytes" which seems reasonable
- an average of 31 partitions per request without too much variation

When I enable DEBUG logging generally the problem seems to go away on that 
node... however I still collected stats on the "Begin purging watch lists" 
message. Over a five minute window:
- exactly 100 occurrences of that line
- heavily grouped (75 of them within a single 10-second window, otherwise a 
handful every ~30 seconds)
- heavily variable number of elements purged; low of 38, high of 210073. 
Correlates pretty strongly with the time since the previous purge

> Unbalanced replication can cause extreme purgatory growth
> -
>
> Key: KAFKA-2147
> URL: https://issues.apache.org/jira/browse/KAFKA-2147
> Project: Kafka
>  Issue Type: Bug
>  Components: purgatory, replication
>Affects Versions: 0.8.2.1
>Reporter: Evan Huus
>Assignee: Joel Koshy
>
> Apologies in advance, this is going to be a bit of complex description, 
> mainly because we've seen this issue several different ways and we're still 
> tying them together in terms of root cause and analysis.
> It is worth noting now that we have all our producers set up to send 
> RequiredAcks==-1, and that this includes all our MirrorMakers.
> I understand the purgatory is being rewritten (again) for 0.8.3. Hopefully 
> that will incidentally fix this issue, or at least render it moot.
> h4. Symptoms
> Fetch request purgatory on a broker or brokers grows rapidly and steadily at 
> a rate of roughly 1-5K requests per second. Heap memory used also grows to 
> keep pace. When 4-5 million requests have accumulated in purgatory, the 
> purgatory is drained, causing a substantial latency spike. The node will tend 
> to drop leadership, replicate, and recover.
> h5. Case 1 - MirrorMaker
> We first noticed this case when enabling mirrormaker. We had one primary 
> cluster already, with many producers and consumers. We created a second, 
> identical cluster and enabled replication from the original to the new 
> cluster on some topics using mirrormaker. This caused all six nodes in the 
> new cluster to exhibit the symptom in lockstep - their purgatories would all 
> grow together, and get drained within about 20 seconds of each other. The 
> cluster-wide latency spikes at this time caused several problems for us.
> Turning MM on and off turned the problem on and off very precisely. When we 
> stopped MM, the purgatories would all drop to normal levels immediately, and 
> would start climbing again when we restarted it.
> Note that this is the *fetch* purgatories on the brokers that MM was 
> *producing* to, which indicates fairly strongly that this is a replication 
> issue, not a MM issue.
> This particular cluster and MM setup was abandoned for other reasons before 
> we could make much progress debugging.
> h5. Case 2 - Broker 6
> The second time we saw this issue was on the newest broker (broker 6) in the 
> original cluster. For a long time we were running with five nodes, and 
> eventually added a sixth to handle the increased load. At first, we moved 
> only a handful of higher-volume partitions to this broker. Later, we created 
> a group of new topics (totalling around 100 partitions) for testing purposes 
> that were spread automatically across all six node

[jira] [Comment Edited] (KAFKA-2147) Unbalanced replication can cause extreme purgatory growth

2015-04-30 Thread Evan Huus (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521670#comment-14521670
 ] 

Evan Huus edited comment on KAFKA-2147 at 4/30/15 4:12 PM:
---

When I enable TRACE request logging on a single node (broker ID 1 in the 
problematic cluster) I see:
- approximately 850 FetchRequest messages per second with ClientId of 
"ReplicaFetcherThread-0-1"
- approximately evenly distributed between the various ReplicaIds
- each one with "MaxWait: 500 ms; MinBytes: 1 bytes" which seems reasonable
- an average of 31 partitions per request without too much variation

When I enable DEBUG logging generally the problem seems to go away on that 
node... however I still collected stats on the "Begin purging watch lists" 
message. Over a five minute window:
- exactly 100 occurrences of that line
- heavily grouped (75 of them within a single 10-second window, otherwise a 
handful every ~30 seconds)
- heavily variable number of elements purged; low of 38, high of 210073. 
Correlates pretty strongly with the time since the previous purge


was (Author: eapache):
When I enable TRACE request logging on a single node (broker ID 1 in the 
problematic cluster) I see:
- approximately 850 FetchRequest messages per second with ClientId of 
"ReplicaFetcherThread-0-1"
- approximately evenly distributed between the various ReplicaIds
- each one with "MaxWait: 500 ms; MinBytes: 1 bytes" which seems reasonable
- an average of 31 partitions per request without too much variation

> Unbalanced replication can cause extreme purgatory growth
> -
>
> Key: KAFKA-2147
> URL: https://issues.apache.org/jira/browse/KAFKA-2147
> Project: Kafka
>  Issue Type: Bug
>  Components: purgatory, replication
>Affects Versions: 0.8.2.1
>Reporter: Evan Huus
>Assignee: Joel Koshy
>
> Apologies in advance, this is going to be a bit of complex description, 
> mainly because we've seen this issue several different ways and we're still 
> tying them together in terms of root cause and analysis.
> It is worth noting now that we have all our producers set up to send 
> RequiredAcks==-1, and that this includes all our MirrorMakers.
> I understand the purgatory is being rewritten (again) for 0.8.3. Hopefully 
> that will incidentally fix this issue, or at least render it moot.
> h4. Symptoms
> Fetch request purgatory on a broker or brokers grows rapidly and steadily at 
> a rate of roughly 1-5K requests per second. Heap memory used also grows to 
> keep pace. When 4-5 million requests have accumulated in purgatory, the 
> purgatory is drained, causing a substantial latency spike. The node will tend 
> to drop leadership, replicate, and recover.
> h5. Case 1 - MirrorMaker
> We first noticed this case when enabling mirrormaker. We had one primary 
> cluster already, with many producers and consumers. We created a second, 
> identical cluster and enabled replication from the original to the new 
> cluster on some topics using mirrormaker. This caused all six nodes in the 
> new cluster to exhibit the symptom in lockstep - their purgatories would all 
> grow together, and get drained within about 20 seconds of each other. The 
> cluster-wide latency spikes at this time caused several problems for us.
> Turning MM on and off turned the problem on and off very precisely. When we 
> stopped MM, the purgatories would all drop to normal levels immediately, and 
> would start climbing again when we restarted it.
> Note that this is the *fetch* purgatories on the brokers that MM was 
> *producing* to, which indicates fairly strongly that this is a replication 
> issue, not a MM issue.
> This particular cluster and MM setup was abandoned for other reasons before 
> we could make much progress debugging.
> h5. Case 2 - Broker 6
> The second time we saw this issue was on the newest broker (broker 6) in the 
> original cluster. For a long time we were running with five nodes, and 
> eventually added a sixth to handle the increased load. At first, we moved 
> only a handful of higher-volume partitions to this broker. Later, we created 
> a group of new topics (totalling around 100 partitions) for testing purposes 
> that were spread automatically across all six nodes. These topics saw 
> occasional traffic, but were generally unused. At this point broker 6 had 
> leadership for about an equal number of high-volume and unused partitions, 
> about 15-20 of each.
> Around this time (we don't have detailed enough data to prove real 
> correlation unfortunately), the issue started appearing on this broker as 
> well, but not on any of the other brokers in the cluster.
> h4. Debugging
> The first thing we tried was to reduce the 
> `fetch.purgatory.purge.interval.requests` from the default of 1000 to a much 

[jira] [Commented] (KAFKA-824) java.lang.NullPointerException in commitOffsets

2015-04-30 Thread Johannes Zillmann (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521688#comment-14521688
 ] 

Johannes Zillmann commented on KAFKA-824:
-

Hi guys,

just had a look at this.
Think there are only 2 possibilities that such an exception can occur:
- 1) a null zkConnection is passed in
- 2) a retryUntilConnected action wakes up and the client was closed in meantime

I could reproduce the NPE for case 2 and changed the code to throw an clear 
exception instead of risking unclear follow up exception like the NPE's.
See 
https://github.com/sgroschupf/zkclient/commit/0630c9c6e67ab49a51e80bfd939e4a0d01a69dfe

HTH

PS: this is part of the zkclient-0.5 release which should be online in a few 
hours!

> java.lang.NullPointerException in commitOffsets 
> 
>
> Key: KAFKA-824
> URL: https://issues.apache.org/jira/browse/KAFKA-824
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.7.2
>Reporter: Yonghui Zhao
>Assignee: Neha Narkhede
> Attachments: ZkClient.0.3.txt, ZkClient.0.4.txt, screenshot-1.jpg
>
>
> Neha Narkhede
> "Yes, I have. Unfortunately, I never quite around to fixing it. My guess is
> that it is caused due to a race condition between the rebalance thread and
> the offset commit thread when a rebalance is triggered or the client is
> being shutdown. Do you mind filing a bug ?"
> 2013/03/25 12:08:32.020 WARN [ZookeeperConsumerConnector] [] 
> 0_lu-ml-test10.bj-1364184411339-7c88f710 exception during commitOffsets
> java.lang.NullPointerException
> at org.I0Itec.zkclient.ZkConnection.writeData(ZkConnection.java:111)
> at org.I0Itec.zkclient.ZkClient$10.call(ZkClient.java:813)
> at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:809)
> at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:777)
> at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248)
> at scala.collection.Iterator$class.foreach(Iterator.scala:631)
> at 
> scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:549)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
> at 
> scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:570)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2.apply(ZookeeperConsumerConnector.scala:248)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2.apply(ZookeeperConsumerConnector.scala:246)
> at scala.collection.Iterator$class.foreach(Iterator.scala:631)
> at kafka.utils.Pool$$anon$1.foreach(Pool.scala:53)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
> at kafka.utils.Pool.foreach(Pool.scala:24)
> at 
> kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246)
> at 
> kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126)
> at kafka.utils.Utils$$anon$2.run(Utils.scala:58)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2147) Unbalanced replication can cause extreme purgatory growth

2015-04-30 Thread Evan Huus (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521670#comment-14521670
 ] 

Evan Huus commented on KAFKA-2147:
--

When I enable TRACE request logging on a single node (broker ID 1 in the 
problematic cluster) I see:
- approximately 850 FetchRequest messages per second with ClientId of 
"ReplicaFetcherThread-0-1"
- approximately evenly distributed between the various ReplicaIds
- each one with "MaxWait: 500 ms; MinBytes: 1 bytes" which seems reasonable
- an average of 31 partitions per request without too much variation

> Unbalanced replication can cause extreme purgatory growth
> -
>
> Key: KAFKA-2147
> URL: https://issues.apache.org/jira/browse/KAFKA-2147
> Project: Kafka
>  Issue Type: Bug
>  Components: purgatory, replication
>Affects Versions: 0.8.2.1
>Reporter: Evan Huus
>Assignee: Joel Koshy
>
> Apologies in advance, this is going to be a bit of complex description, 
> mainly because we've seen this issue several different ways and we're still 
> tying them together in terms of root cause and analysis.
> It is worth noting now that we have all our producers set up to send 
> RequiredAcks==-1, and that this includes all our MirrorMakers.
> I understand the purgatory is being rewritten (again) for 0.8.3. Hopefully 
> that will incidentally fix this issue, or at least render it moot.
> h4. Symptoms
> Fetch request purgatory on a broker or brokers grows rapidly and steadily at 
> a rate of roughly 1-5K requests per second. Heap memory used also grows to 
> keep pace. When 4-5 million requests have accumulated in purgatory, the 
> purgatory is drained, causing a substantial latency spike. The node will tend 
> to drop leadership, replicate, and recover.
> h5. Case 1 - MirrorMaker
> We first noticed this case when enabling mirrormaker. We had one primary 
> cluster already, with many producers and consumers. We created a second, 
> identical cluster and enabled replication from the original to the new 
> cluster on some topics using mirrormaker. This caused all six nodes in the 
> new cluster to exhibit the symptom in lockstep - their purgatories would all 
> grow together, and get drained within about 20 seconds of each other. The 
> cluster-wide latency spikes at this time caused several problems for us.
> Turning MM on and off turned the problem on and off very precisely. When we 
> stopped MM, the purgatories would all drop to normal levels immediately, and 
> would start climbing again when we restarted it.
> Note that this is the *fetch* purgatories on the brokers that MM was 
> *producing* to, which indicates fairly strongly that this is a replication 
> issue, not a MM issue.
> This particular cluster and MM setup was abandoned for other reasons before 
> we could make much progress debugging.
> h5. Case 2 - Broker 6
> The second time we saw this issue was on the newest broker (broker 6) in the 
> original cluster. For a long time we were running with five nodes, and 
> eventually added a sixth to handle the increased load. At first, we moved 
> only a handful of higher-volume partitions to this broker. Later, we created 
> a group of new topics (totalling around 100 partitions) for testing purposes 
> that were spread automatically across all six nodes. These topics saw 
> occasional traffic, but were generally unused. At this point broker 6 had 
> leadership for about an equal number of high-volume and unused partitions, 
> about 15-20 of each.
> Around this time (we don't have detailed enough data to prove real 
> correlation unfortunately), the issue started appearing on this broker as 
> well, but not on any of the other brokers in the cluster.
> h4. Debugging
> The first thing we tried was to reduce the 
> `fetch.purgatory.purge.interval.requests` from the default of 1000 to a much 
> lower value of 200. This had no noticeable effect at all.
> We then enabled debug logging on broker06 and started looking through that. I 
> can attach complete log samples if necessary, but the thing that stood out 
> for us was a substantial number of the following lines:
> {noformat}
> [2015-04-23 20:05:15,196] DEBUG [KafkaApi-6] Putting fetch request with 
> correlation id 49939 from client ReplicaFetcherThread-0-6 into purgatory 
> (kafka.server.KafkaApis)
> {noformat}
> The volume of these lines seemed to match (approximately) the fetch purgatory 
> growth on that broker.
> At this point we developed a hypothesis (detailed below) which guided our 
> subsequent debugging tests:
> - Setting a daemon up to produce regular random data to all of the topics led 
> by kafka06 (specifically the ones which otherwise would receive no data) 
> substantially alleviated the problem.
> - Doing an additional rebalance of the cluster in order to move

[jira] [Commented] (KAFKA-2152) Console producer fails to start when server running with broker.id != 0

2015-04-30 Thread Lior Gonnen (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521655#comment-14521655
 ] 

Lior Gonnen commented on KAFKA-2152:


Thanks Gwen. This was very helpful.

> Console producer fails to start when server running with broker.id != 0
> ---
>
> Key: KAFKA-2152
> URL: https://issues.apache.org/jira/browse/KAFKA-2152
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.1
>Reporter: Lior Gonnen
>Priority: Critical
>
> Scenario to reproduce:
> 1. Start zookeeper as usual: bin/zookeeper-server-start.sh 
> config/zookeeper.properties
> 2. Start local server as usual: bin/kafka-server-start.sh 
> config/server.properties
> 3. Start console producer: bin/kafka-console-producer.sh --broker-list 
> localhost:9092 --topic test
> 4. Producer starts as usual, and allows sending messages
> 5. Stop the producer and server
> 6. In config/server.properties, change broker.id to 2 (or any other non-zero 
> number)
> 7. Repeat steps 2 and 3
> 8. The received output is as follows:
> [2015-04-26 20:00:47,571] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,575] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:47,682] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,682] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:47,789] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,790] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:47,897] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:47,897] WARN Failed to collate messages by topic,partition 
> due to: No leader for any partition in topic test 
> (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:48,002] WARN Error while fetching metadata  partition 0 
> leader: nonereplicas:   isr:isUnderReplicated: false for topic 
> partition [test,0]: [class kafka.common.LeaderNotAvailableException] 
> (kafka.producer.BrokerPartitionInfo)
> [2015-04-26 20:00:48,004] ERROR Failed to send requests for topics test with 
> correlation ids in [0,8] (kafka.producer.async.DefaultEventHandler)
> [2015-04-26 20:00:48,004] ERROR Error in handling batch of 1 events 
> (kafka.producer.async.ProducerSendThread)
> kafka.common.FailedToSendMessageException: Failed to send messages after 3 
> tries.
>   at 
> kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
>   at 
> kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105)
>   at 
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:88)
>   at 
> kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:68)
>   at scala.collection.immutable.Stream.foreach(Stream.scala:594)
>   at 
> kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67)
>   at 
> kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-824) java.lang.NullPointerException in commitOffsets

2015-04-30 Thread John Humphreys (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521596#comment-14521596
 ] 

John Humphreys commented on KAFKA-824:
--

Has any progress been made on this, or do any workarounds exist yet?  Has 
anyone determined what causes it in particular?

We're getting it fairly regularly and I would love to know how to mitigate the 
issue.

> java.lang.NullPointerException in commitOffsets 
> 
>
> Key: KAFKA-824
> URL: https://issues.apache.org/jira/browse/KAFKA-824
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.7.2
>Reporter: Yonghui Zhao
>Assignee: Neha Narkhede
> Attachments: ZkClient.0.3.txt, ZkClient.0.4.txt, screenshot-1.jpg
>
>
> Neha Narkhede
> "Yes, I have. Unfortunately, I never quite around to fixing it. My guess is
> that it is caused due to a race condition between the rebalance thread and
> the offset commit thread when a rebalance is triggered or the client is
> being shutdown. Do you mind filing a bug ?"
> 2013/03/25 12:08:32.020 WARN [ZookeeperConsumerConnector] [] 
> 0_lu-ml-test10.bj-1364184411339-7c88f710 exception during commitOffsets
> java.lang.NullPointerException
> at org.I0Itec.zkclient.ZkConnection.writeData(ZkConnection.java:111)
> at org.I0Itec.zkclient.ZkClient$10.call(ZkClient.java:813)
> at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:809)
> at org.I0Itec.zkclient.ZkClient.writeData(ZkClient.java:777)
> at kafka.utils.ZkUtils$.updatePersistentPath(ZkUtils.scala:103)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:251)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2$$anonfun$apply$4.apply(ZookeeperConsumerConnector.scala:248)
> at scala.collection.Iterator$class.foreach(Iterator.scala:631)
> at 
> scala.collection.JavaConversions$JIteratorWrapper.foreach(JavaConversions.scala:549)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
> at 
> scala.collection.JavaConversions$JCollectionWrapper.foreach(JavaConversions.scala:570)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2.apply(ZookeeperConsumerConnector.scala:248)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$commitOffsets$2.apply(ZookeeperConsumerConnector.scala:246)
> at scala.collection.Iterator$class.foreach(Iterator.scala:631)
> at kafka.utils.Pool$$anon$1.foreach(Pool.scala:53)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:79)
> at kafka.utils.Pool.foreach(Pool.scala:24)
> at 
> kafka.consumer.ZookeeperConsumerConnector.commitOffsets(ZookeeperConsumerConnector.scala:246)
> at 
> kafka.consumer.ZookeeperConsumerConnector.autoCommit(ZookeeperConsumerConnector.scala:232)
> at 
> kafka.consumer.ZookeeperConsumerConnector$$anonfun$1.apply$mcV$sp(ZookeeperConsumerConnector.scala:126)
> at kafka.utils.Utils$$anon$2.run(Utils.scala:58)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at 
> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2159) offsets.topic.segment.bytes and offsets.topic.retention.minutes are ignored

2015-04-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/KAFKA-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521518#comment-14521518
 ] 

Rafał Boniecki commented on KAFKA-2159:
---

Possible reasons: offsets.topic.retention.minutes is actually called 
offsets.retention.minutes (documentation bug?), offsetsTopicSegmentBytes 
parameter is not passed in KafkaServer.createOffsetManager()

> offsets.topic.segment.bytes and offsets.topic.retention.minutes are ignored
> ---
>
> Key: KAFKA-2159
> URL: https://issues.apache.org/jira/browse/KAFKA-2159
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Rafał Boniecki
>
> My broker configuration:
> {quote}offsets.topic.num.partitions=20
> offsets.topic.segment.bytes=10485760
> offsets.topic.retention.minutes=10080{quote}
> Describe of __consumer_offsets topic:
> {quote}Topic:__consumer_offsets   PartitionCount:20   
> ReplicationFactor:3 Configs:segment.bytes=104857600,cleanup.policy=compact
>   Topic: __consumer_offsets   Partition: 0Leader: 112 
> Replicas: 112,212,312   Isr: 212,312,112
>   Topic: __consumer_offsets   Partition: 1Leader: 212 
> Replicas: 212,312,412   Isr: 212,312,412
>   Topic: __consumer_offsets   Partition: 2Leader: 312 
> Replicas: 312,412,512   Isr: 312,412,512
>   Topic: __consumer_offsets   Partition: 3Leader: 412 
> Replicas: 412,512,112   Isr: 412,512,112
>   Topic: __consumer_offsets   Partition: 4Leader: 512 
> Replicas: 512,112,212   Isr: 512,212,112
>   Topic: __consumer_offsets   Partition: 5Leader: 112 
> Replicas: 112,312,412   Isr: 312,412,112
>   Topic: __consumer_offsets   Partition: 6Leader: 212 
> Replicas: 212,412,512   Isr: 212,412,512
>   Topic: __consumer_offsets   Partition: 7Leader: 312 
> Replicas: 312,512,112   Isr: 312,512,112
>   Topic: __consumer_offsets   Partition: 8Leader: 412 
> Replicas: 412,112,212   Isr: 412,212,112
>   Topic: __consumer_offsets   Partition: 9Leader: 512 
> Replicas: 512,212,312   Isr: 512,212,312
>   Topic: __consumer_offsets   Partition: 10   Leader: 112 
> Replicas: 112,412,512   Isr: 412,512,112
>   Topic: __consumer_offsets   Partition: 11   Leader: 212 
> Replicas: 212,512,112   Isr: 212,512,112
>   Topic: __consumer_offsets   Partition: 12   Leader: 312 
> Replicas: 312,112,212   Isr: 312,212,112
>   Topic: __consumer_offsets   Partition: 13   Leader: 412 
> Replicas: 412,212,312   Isr: 412,212,312
>   Topic: __consumer_offsets   Partition: 14   Leader: 512 
> Replicas: 512,312,412   Isr: 512,312,412
>   Topic: __consumer_offsets   Partition: 15   Leader: 112 
> Replicas: 112,512,212   Isr: 512,212,112
>   Topic: __consumer_offsets   Partition: 16   Leader: 212 
> Replicas: 212,112,312   Isr: 212,312,112
>   Topic: __consumer_offsets   Partition: 17   Leader: 312 
> Replicas: 312,212,412   Isr: 312,212,412
>   Topic: __consumer_offsets   Partition: 18   Leader: 412 
> Replicas: 412,312,512   Isr: 412,312,512
>   Topic: __consumer_offsets   Partition: 19   Leader: 512 
> Replicas: 512,412,112   Isr: 512,412,112{quote}
> OffsetManager logs:
> {quote}2015-04-29 17:58:43:403 CEST DEBUG 
> [kafka-scheduler-3][kafka.server.OffsetManager] Compacting offsets cache.
> 2015-04-29 17:58:43:403 CEST DEBUG 
> [kafka-scheduler-3][kafka.server.OffsetManager] Found 1 stale offsets (older 
> than 8640 ms).
> 2015-04-29 17:58:43:404 CEST TRACE 
> [kafka-scheduler-3][kafka.server.OffsetManager] Removing stale offset and 
> metadata for [drafts,tasks,1]: OffsetAndMetadata[824,consumer_id = drafts, 
> time = 1430322433,0]
> 2015-04-29 17:58:43:404 CEST TRACE 
> [kafka-scheduler-3][kafka.server.OffsetManager] Marked 1 offsets in 
> [__consumer_offsets,2] for deletion.
> 2015-04-29 17:58:43:404 CEST DEBUG 
> [kafka-scheduler-3][kafka.server.OffsetManager] Removed 1 stale offsets in 1 
> milliseconds.{quote}
> Parameters are ignored and default values are used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2159) offsets.topic.segment.bytes and offsets.topic.retention.minutes are ignored

2015-04-30 Thread - (JIRA)
- created KAFKA-2159:


 Summary: offsets.topic.segment.bytes and 
offsets.topic.retention.minutes are ignored
 Key: KAFKA-2159
 URL: https://issues.apache.org/jira/browse/KAFKA-2159
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.1
Reporter: -


My broker configuration:

{quote}offsets.topic.num.partitions=20
offsets.topic.segment.bytes=10485760
offsets.topic.retention.minutes=10080{quote}

Describe of __consumer_offsets topic:

{quote}Topic:__consumer_offsets PartitionCount:20   ReplicationFactor:3 
Configs:segment.bytes=104857600,cleanup.policy=compact
Topic: __consumer_offsets   Partition: 0Leader: 112 
Replicas: 112,212,312   Isr: 212,312,112
Topic: __consumer_offsets   Partition: 1Leader: 212 
Replicas: 212,312,412   Isr: 212,312,412
Topic: __consumer_offsets   Partition: 2Leader: 312 
Replicas: 312,412,512   Isr: 312,412,512
Topic: __consumer_offsets   Partition: 3Leader: 412 
Replicas: 412,512,112   Isr: 412,512,112
Topic: __consumer_offsets   Partition: 4Leader: 512 
Replicas: 512,112,212   Isr: 512,212,112
Topic: __consumer_offsets   Partition: 5Leader: 112 
Replicas: 112,312,412   Isr: 312,412,112
Topic: __consumer_offsets   Partition: 6Leader: 212 
Replicas: 212,412,512   Isr: 212,412,512
Topic: __consumer_offsets   Partition: 7Leader: 312 
Replicas: 312,512,112   Isr: 312,512,112
Topic: __consumer_offsets   Partition: 8Leader: 412 
Replicas: 412,112,212   Isr: 412,212,112
Topic: __consumer_offsets   Partition: 9Leader: 512 
Replicas: 512,212,312   Isr: 512,212,312
Topic: __consumer_offsets   Partition: 10   Leader: 112 
Replicas: 112,412,512   Isr: 412,512,112
Topic: __consumer_offsets   Partition: 11   Leader: 212 
Replicas: 212,512,112   Isr: 212,512,112
Topic: __consumer_offsets   Partition: 12   Leader: 312 
Replicas: 312,112,212   Isr: 312,212,112
Topic: __consumer_offsets   Partition: 13   Leader: 412 
Replicas: 412,212,312   Isr: 412,212,312
Topic: __consumer_offsets   Partition: 14   Leader: 512 
Replicas: 512,312,412   Isr: 512,312,412
Topic: __consumer_offsets   Partition: 15   Leader: 112 
Replicas: 112,512,212   Isr: 512,212,112
Topic: __consumer_offsets   Partition: 16   Leader: 212 
Replicas: 212,112,312   Isr: 212,312,112
Topic: __consumer_offsets   Partition: 17   Leader: 312 
Replicas: 312,212,412   Isr: 312,212,412
Topic: __consumer_offsets   Partition: 18   Leader: 412 
Replicas: 412,312,512   Isr: 412,312,512
Topic: __consumer_offsets   Partition: 19   Leader: 512 
Replicas: 512,412,112   Isr: 512,412,112{quote}

OffsetManager logs:

{quote}2015-04-29 17:58:43:403 CEST DEBUG 
[kafka-scheduler-3][kafka.server.OffsetManager] Compacting offsets cache.
2015-04-29 17:58:43:403 CEST DEBUG 
[kafka-scheduler-3][kafka.server.OffsetManager] Found 1 stale offsets (older 
than 8640 ms).
2015-04-29 17:58:43:404 CEST TRACE 
[kafka-scheduler-3][kafka.server.OffsetManager] Removing stale offset and 
metadata for [drafts,tasks,1]: OffsetAndMetadata[824,consumer_id = drafts, time 
= 1430322433,0]
2015-04-29 17:58:43:404 CEST TRACE 
[kafka-scheduler-3][kafka.server.OffsetManager] Marked 1 offsets in 
[__consumer_offsets,2] for deletion.
2015-04-29 17:58:43:404 CEST DEBUG 
[kafka-scheduler-3][kafka.server.OffsetManager] Removed 1 stale offsets in 1 
milliseconds.{quote}

Parameters are ignored and default values are used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-4 - Command line and centralized administrative operations (Thread 2)

2015-04-30 Thread Jun Rao
The following is a description of some of my concerns on allowing the same
topic multiple times in AlterTopicRequest.

ATP has an array of entries, each corresponding to a topic. We allow
multiple changes to a topic in a single entry. Those changes may fail to
apply independently (e.g., the config change may succeed, but the replica
assignment change may fail). If there is an issue applying one of the
changes, we will set an error code for that entry in the response.
If we allow the same topic to be specified multiple times in ATR, it can
happen that the first entry succeeds, but the second entry fails partially.
Now, from the admin's perspective, it's a bit hard to do the verification.
Ideally, you want to wait for the changes in the first entry to be applied.
However, the second entry may have part of the changes applied successfully.

About putting restrictions on the requests. Currently, we effectively
expect a topic-partition to be only specified once in the FetchRequest.
Allowing the same topic-partition to be specified multiple times in
FetchRequest will be confusing and complicates the implementation (e.g.,
putting the request in purgatory). A few other requests probably have
similar implicit assumptions on topic or topic-partition being unique in
each request.

Thanks,

Jun


On Tue, Apr 28, 2015 at 5:26 PM, Andrii Biletskyi <
andrii.bilets...@stealth.ly> wrote:

> Guys,
>
> A quick summary of our today's meeting.
>
> There were no additional issues/questions. The only item about which
> we are not 100% sure is "multiple instructions for one topic in one
> request" case.
> It was proposed by Jun to explain reasons behind not allowing users doing
> that again
> here in mailing list, and in case we implement it in final version document
> it
> well so API clients understand what exactly is not allowed and why.
>
> At the meantime I will update the KIP. After that I will start voting
> thread.
>
> Thanks,
> Andrii Biletskyi
>
> On Tue, Apr 28, 2015 at 10:33 PM, Andrii Biletskyi <
> andrii.bilets...@stealth.ly> wrote:
>
> > Guys,
> >
> > It seems that there are no open questions left so prior to our weekly
> call
> > let me summarize what I'm going to implement as part of phase one for
> > KIP-4.
> >
> > 1. Add 3 new Wire Protocol requests - Create-, Alter- and
> > DeleteTopicRequest
> >
> > 2. Topic requests are batch requests, errors are returned per topic as
> part
> > of batch response.
> >
> > 3. Topic requests are asynchronous - respective commands are only
> > started and server is not blocked until command is finished.
> >
> > 4. It will be not allowed to specify multiple mutations for the same
> topic
> > in scope of one batch request - a special error will be returned for such
> > topic.
> >
> > 5. There will be no dedicated request for reassign-partitions - it is
> > simulated
> > with AlterTopicRequest.ReplicaAssignment field.
> >
> > 6. Preferred-replica-leader-election is not supported since there is no
> > need to have
> > a public API to trigger such operation.
> >
> > 7. TopicMetadataReqeust will be evolved to version 1 - topic-level
> > configuration
> > per topic will be included and ISR field will be removed. Automatic
> > topic-creation
> > logic will be removed (we will use CreateTopicRequest for that).
> >
> > Thanks,
> > Andrii Biletskyi
> >
> >
> > On Tue, Apr 28, 2015 at 12:23 AM, Jun Rao  wrote:
> >
> >> Yes, to verify if a partition reassignment completes or not, we just
> need
> >> to make sure AR == RAR. So, we don't need ISR for this. It's probably
> >> still
> >> useful to know ISR for monitoring in general though.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Mon, Apr 27, 2015 at 4:15 AM, Andrii Biletskyi <
> >> andrii.bilets...@stealth.ly> wrote:
> >>
> >> > Okay, I had some doubts in terms of reassign-partitions case. I was
> >> > not sure whether we need ISR to check post condition of partition
> >> > reassignment. But I think we can rely on assigned replicas - the
> >> workflow
> >> > in reassignPartitions is the following:
> >> > 1. Update AR in ZK with OAR + RAR.
> >> > ...
> >> > 10. Update AR in ZK with RAR.
> >> > 11. Update the /admin/reassign_partitions path in ZK to remove this
> >> > partition.
> >> > 12. After electing leader, the replicas and isr information changes.
> So
> >> > resend the update metadata request to every broker.
> >> >
> >> > In other words AR becomes RAR right before removing partitions from
> the
> >> > admin path. I think we can consider (with a little approximation)
> >> > reassignment
> >> > completed if AR == RAR.
> >> >
> >> > If it's okay, I will remove ISR and add topic config in one change as
> >> > discussed
> >> > earlier.
> >> >
> >> > Thanks,
> >> > Andrii Biletskyi
> >> >
> >> >
> >> > On Mon, Apr 27, 2015 at 1:50 AM, Jun Rao  wrote:
> >> >
> >> > > Andrii,
> >> > >
> >> > > Another thing. We decided not to add the lag info in TMR. To be
> >> > consistent,
> >> > > we probably also want to remove ISR from TMR since only th

[DISCUSS] Using GitHub Pull Requests for contributions and code review

2015-04-30 Thread Ismael Juma
Hi all,

Kafka currently uses a combination of Review Board and JIRA for
contributions and code review. In my opinion, this makes contribution and
code review a bit harder than it has to be.

I think the approach used by Spark would improve the current situation:

"Generally, Spark uses JIRA to track logical issues, including bugs and
improvements, and uses Github pull requests to manage the review and merge
of specific code changes. That is, JIRAs are used to describe what should
be fixed or changed, and high-level approaches, and pull requests describe
how to implement that change in the project's source code. For example,
major design decisions are discussed in JIRA."[1]

It's worth reading the wiki page for all the details, but I will summarise
the suggested workflow for code changes:

   1. Fork the Github repository at http://github.com/apache/kafka (if you
   haven't already)
   2. git checkout -b kafka-XXX
   3. Make one or more commits (smaller commits can be easier to review and
   reviewboard makes that hard)
   4. git push origin kafka-XXX
   5. Create PR against upstream/trunk (this will update JIRA
   automatically[2] and it will send an email to the dev mailing list too)
   6. A CI build will be triggered[3]
   7. Review process happens on GitHub (it's quite handy to be able to
   comment on both commit or PR-level, unlike Review Board)
   8. Once all feedback has been addressed and the build is green, a
   variant of the `merge_spark_pr.py`[4] script is used to squash, merge,
   push, close the PR and JIRA issue. The squashed commit generated by the
   script includes a bunch of useful information including links to the
   original commits[5] (in the future, I think it's worth reconsidering the
   squashing of commits, but retaining the information in the commit is
   already an improvement)

Neha merged a couple of commits via GitHub already and it went smoothly
although we are still missing a few of the pieces described above:

   1. CI builds triggered by GitHub PRs (this is supported by Apache Infra,
   we need to request it for Kafka and provide whatever configuration is
   needed)
   2. Adapting Spark's merge_park_pr script and integrating it into the
   kafka Git repository
   3. Updating the Kafka contribution wiki and adding a CONTRIBUTING.md to
   the Git repository (this is shown when someone is creating a pull request)
   4. Go through existing GitHub pull requests and close the ones that are
   no longer relevant (there are quite a few as people have been opening them
   over the years, but nothing was done about most of them)
   5. Other things I may be missing

I am volunteering to help with the above if people agree that this is the
right direction for Kafka. Thoughts?

Best.
Ismael

P.S. I was told in the Apache Infra HipChat that it's not currently
possible (and there are no plans to change that in the near future) to use
the GitHub merge button to merge PRs. The merge script does quite a few
useful things that the merge button does not in any case.

[1] https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
[2]
https://issues.apache.org/jira/browse/KAFKA-1054?focusedCommentId=14513614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14513614
[3] https://blogs.apache.org/infra/entry/github_pull_request_builds_now
[4] https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py
[5]
https://github.com/apache/spark/commit/59b7cfc41b2c06fbfbf6aca16c1619496a8d1d00


Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Jun Rao
Joe,

Could you elaborate on why we should not store JSON in ZK? So far, all
existing ZK data are in JSON.

Thanks,

Jun

On Thu, Apr 30, 2015 at 2:06 AM, Joe Stein  wrote:

> Hi, sorry I am coming in late to chime back in on this thread and haven't
> been able to make the KIP hangouts the last few weeks. Sorry if any of this
> was brought up already or I missed it.
>
> I read through the KIP and the thread(s) and a couple of things jumped out.
>
>
>- Can we break out the open issues in JIRA (maybe during the hangout)
>that are in the KIP and resolve/flesh those out more?
>
>
>
>- I don't see any updates with the systems test or how we can know the
>code works.
>
>
>
>- We need some implementation/example/sample that we know can work in
>all different existing entitlement servers and not just ones that run in
>types of data centers too. I am not saying we should support everything
> but
>if someone had to implement
>https://docs.oracle.com/cd/E19225-01/820-6551/bzafm/index.html with
>Kafka it has to work for them out of the box.
>
>
>
>- We should shy away from storing JSON in Zookeeper. Lets store bytes in
>Storage.
>
>
>
>- We should spend some time thinking through exceptions in the wire
>protocol maybe as part of this so it can keep moving forward.
>
>
> ~ Joe Stein
>
> On Tue, Apr 28, 2015 at 3:33 AM, Sun, Dapeng  wrote:
>
> > Thank you for your reply, Gwen.
> >
> > >1. Complex rule systems can be difficult to reason about and therefore
> > end up being less secure. The rule "Deny always wins" is very easy to
> grasp.
> > Yes, I'm agreed with your point: we should not make the rule complex.
> >
> > >2. We currently don't have any mechanism for specifying IP ranges (or
> host
> > >ranges) at all. I think its a pretty significant deficiency, but it does
> > mean that we don't need to worry about the issue of blocking a large
> range
> > while unblocking few servers in the range.
> > Support ranges sounds reasonable. If this feature will be in development
> > plan, I also don't think we can put "the best matching acl" and " Support
> > ip ranges" together.
> >
> > >We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss this
> > and other outstanding design issues (not all related to security). If you
> > are interested in joining - let me know and I'll forward you the invite.
> > Thank you, Gwen. I have the invite and I should be at home at that time.
> > But due to network issue, I may can't join the meeting smoothly.
> >
> > Regards
> > Dapeng
> >
> > -Original Message-
> > From: Gwen Shapira [mailto:gshap...@cloudera.com]
> > Sent: Tuesday, April 28, 2015 1:31 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
> >
> > While I see the advantage of being able to say something like: "deny user
> > X from hosts h1...h200" also "allow user X from host h189", there are two
> > issues here:
> >
> > 1. Complex rule systems can be difficult to reason about and therefore
> end
> > up being less secure. The rule "Deny always wins" is very easy to grasp.
> >
> > 2. We currently don't have any mechanism for specifying IP ranges (or
> host
> > ranges) at all. I think its a pretty significant deficiency, but it does
> > mean that we don't need to worry about the issue of blocking a large
> range
> > while unblocking few servers in the range.
> >
> > Gwen
> >
> > P.S
> > We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss this
> > and other outstanding design issues (not all related to security). If you
> > are interested in joining - let me know and I'll forward you the invite.
> >
> > Gwen
> >
> > On Mon, Apr 27, 2015 at 10:15 PM, Sun, Dapeng 
> > wrote:
> >
> > > Attach the image.
> > >
> > > https://raw.githubusercontent.com/sundapeng/attachment/master/kafka-ac
> > > l1.png
> > >
> > > Regards
> > > Dapeng
> > >
> > > From: Sun, Dapeng [mailto:dapeng@intel.com]
> > > Sent: Tuesday, April 28, 2015 11:44 AM
> > > To: dev@kafka.apache.org
> > > Subject: RE: [VOTE] KIP-11- Authorization design for kafka security
> > >
> > >
> > > Thank you for your rapid reply, Parth.
> > >
> > >
> > >
> > > >* I think the wiki already describes the precedence order as Deny
> > > >taking
> > > precedence over allow when conflicting acls are found
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorizati
> > > on+In
> > >
> > > >terface#KIP-11-AuthorizationInterface-PermissionType
> > >
> > > Got it, thank you.
> > >
> > >
> > >
> > > >* In the first version that I am currently writing there is no group
> > > support. Even when we add it I don't see the need to add a precedence
> > > for evaluation. it does not matter which principal matches as long as
> > >
> > > > we have a match.
> > >
> > >
> > >
> > > About this part, I think we should choose the best matching acl for
> > > authorization, no matter we support group or not.
> > >
> > >
> > >
> > > For the case
> > 

[jira] [Updated] (KAFKA-2158) Close all fetchers in AbstractFetcherManager without blocking

2015-04-30 Thread Jiasheng Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiasheng Wang updated KAFKA-2158:
-
Affects Version/s: 0.8.2.0
   Status: Patch Available  (was: Open)

> Close all fetchers in AbstractFetcherManager without blocking
> -
>
> Key: KAFKA-2158
> URL: https://issues.apache.org/jira/browse/KAFKA-2158
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.8.2.0
>Reporter: Jiasheng Wang
> Attachments: KAFKA-2158.patch
>
>
> def closeAllFetchers() {
> mapLock synchronized {
>   for ( (_, fetcher) <- fetcherThreadMap) {
> fetcher.shutdown()
>   }
>   fetcherThreadMap.clear()
> }
>   }
> It is time consuming for closeAllFetchers() in AbstractFetcherManager.scala 
> because each time a fetcher calls shutdown method it will block until 
> awaitShutdown() returns. As a result it will slow down the restart of kafka 
> service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2158) Close all fetchers in AbstractFetcherManager without blocking

2015-04-30 Thread Jiasheng Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiasheng Wang updated KAFKA-2158:
-
Attachment: KAFKA-2158.patch

> Close all fetchers in AbstractFetcherManager without blocking
> -
>
> Key: KAFKA-2158
> URL: https://issues.apache.org/jira/browse/KAFKA-2158
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.8.2.0
>Reporter: Jiasheng Wang
> Attachments: KAFKA-2158.patch
>
>
> def closeAllFetchers() {
> mapLock synchronized {
>   for ( (_, fetcher) <- fetcherThreadMap) {
> fetcher.shutdown()
>   }
>   fetcherThreadMap.clear()
> }
>   }
> It is time consuming for closeAllFetchers() in AbstractFetcherManager.scala 
> because each time a fetcher calls shutdown method it will block until 
> awaitShutdown() returns. As a result it will slow down the restart of kafka 
> service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2158) Close all fetchers in AbstractFetcherManager without blocking

2015-04-30 Thread Jiasheng Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiasheng Wang updated KAFKA-2158:
-
Status: Open  (was: Patch Available)

> Close all fetchers in AbstractFetcherManager without blocking
> -
>
> Key: KAFKA-2158
> URL: https://issues.apache.org/jira/browse/KAFKA-2158
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiasheng Wang
>
> def closeAllFetchers() {
> mapLock synchronized {
>   for ( (_, fetcher) <- fetcherThreadMap) {
> fetcher.shutdown()
>   }
>   fetcherThreadMap.clear()
> }
>   }
> It is time consuming for closeAllFetchers() in AbstractFetcherManager.scala 
> because each time a fetcher calls shutdown method it will block until 
> awaitShutdown() returns. As a result it will slow down the restart of kafka 
> service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-2158) Close all fetchers in AbstractFetcherManager without blocking

2015-04-30 Thread Jiasheng Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiasheng Wang updated KAFKA-2158:
-
Status: Patch Available  (was: Open)

> Close all fetchers in AbstractFetcherManager without blocking
> -
>
> Key: KAFKA-2158
> URL: https://issues.apache.org/jira/browse/KAFKA-2158
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jiasheng Wang
>
> def closeAllFetchers() {
> mapLock synchronized {
>   for ( (_, fetcher) <- fetcherThreadMap) {
> fetcher.shutdown()
>   }
>   fetcherThreadMap.clear()
> }
>   }
> It is time consuming for closeAllFetchers() in AbstractFetcherManager.scala 
> because each time a fetcher calls shutdown method it will block until 
> awaitShutdown() returns. As a result it will slow down the restart of kafka 
> service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 33719: Patch for KAFKA-2158

2015-04-30 Thread Jiasheng Wang

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33719/
---

Review request for kafka.


Bugs: KAFKA-2158
https://issues.apache.org/jira/browse/KAFKA-2158


Repository: kafka


Description
---

improve closeAllFetchers


Diffs
-

  core/src/main/scala/kafka/server/AbstractFetcherManager.scala 
20c00cb8cc2351950edbc8cb1752905a0c26e79f 
  core/src/main/scala/kafka/server/AbstractFetcherThread.scala 
8c281d4668f92eff95a4a5df3c03c4b5b20e7095 

Diff: https://reviews.apache.org/r/33719/diff/


Testing
---


Thanks,

Jiasheng Wang



[jira] [Commented] (KAFKA-2156) Possibility to plug in custom MetricRegistry

2015-04-30 Thread Andras Sereny (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521229#comment-14521229
 ] 

Andras Sereny commented on KAFKA-2156:
--

What is KM? 

> Possibility to plug in custom MetricRegistry
> 
>
> Key: KAFKA-2156
> URL: https://issues.apache.org/jira/browse/KAFKA-2156
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Affects Versions: 0.8.1.2
>Reporter: Andras Sereny
>Assignee: Jun Rao
>
> The trait KafkaMetricsGroup refers to Metrics.defaultRegistry() throughout. 
> It would be nice to be able to inject any MetricsRegistry instead of the 
> default one. 
> (My usecase is that I'd like to channel Kafka metrics into our application's 
> metrics system, for which I'd need custom implementations of 
> com.yammer.metrics.core.Metric.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-2158) Close all fetchers in AbstractFetcherManager without blocking

2015-04-30 Thread Jiasheng Wang (JIRA)
Jiasheng Wang created KAFKA-2158:


 Summary: Close all fetchers in AbstractFetcherManager without 
blocking
 Key: KAFKA-2158
 URL: https://issues.apache.org/jira/browse/KAFKA-2158
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jiasheng Wang


def closeAllFetchers() {
mapLock synchronized {
  for ( (_, fetcher) <- fetcherThreadMap) {
fetcher.shutdown()
  }
  fetcherThreadMap.clear()
}
  }
It is time consuming for closeAllFetchers() in AbstractFetcherManager.scala 
because each time a fetcher calls shutdown method it will block until 
awaitShutdown() returns. As a result it will slow down the restart of kafka 
service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-11- Authorization design for kafka security

2015-04-30 Thread Joe Stein
Hi, sorry I am coming in late to chime back in on this thread and haven't
been able to make the KIP hangouts the last few weeks. Sorry if any of this
was brought up already or I missed it.

I read through the KIP and the thread(s) and a couple of things jumped out.


   - Can we break out the open issues in JIRA (maybe during the hangout)
   that are in the KIP and resolve/flesh those out more?



   - I don't see any updates with the systems test or how we can know the
   code works.



   - We need some implementation/example/sample that we know can work in
   all different existing entitlement servers and not just ones that run in
   types of data centers too. I am not saying we should support everything but
   if someone had to implement
   https://docs.oracle.com/cd/E19225-01/820-6551/bzafm/index.html with
   Kafka it has to work for them out of the box.



   - We should shy away from storing JSON in Zookeeper. Lets store bytes in
   Storage.



   - We should spend some time thinking through exceptions in the wire
   protocol maybe as part of this so it can keep moving forward.


~ Joe Stein

On Tue, Apr 28, 2015 at 3:33 AM, Sun, Dapeng  wrote:

> Thank you for your reply, Gwen.
>
> >1. Complex rule systems can be difficult to reason about and therefore
> end up being less secure. The rule "Deny always wins" is very easy to grasp.
> Yes, I'm agreed with your point: we should not make the rule complex.
>
> >2. We currently don't have any mechanism for specifying IP ranges (or host
> >ranges) at all. I think its a pretty significant deficiency, but it does
> mean that we don't need to worry about the issue of blocking a large range
> while unblocking few servers in the range.
> Support ranges sounds reasonable. If this feature will be in development
> plan, I also don't think we can put "the best matching acl" and " Support
> ip ranges" together.
>
> >We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss this
> and other outstanding design issues (not all related to security). If you
> are interested in joining - let me know and I'll forward you the invite.
> Thank you, Gwen. I have the invite and I should be at home at that time.
> But due to network issue, I may can't join the meeting smoothly.
>
> Regards
> Dapeng
>
> -Original Message-
> From: Gwen Shapira [mailto:gshap...@cloudera.com]
> Sent: Tuesday, April 28, 2015 1:31 PM
> To: dev@kafka.apache.org
> Subject: Re: [VOTE] KIP-11- Authorization design for kafka security
>
> While I see the advantage of being able to say something like: "deny user
> X from hosts h1...h200" also "allow user X from host h189", there are two
> issues here:
>
> 1. Complex rule systems can be difficult to reason about and therefore end
> up being less secure. The rule "Deny always wins" is very easy to grasp.
>
> 2. We currently don't have any mechanism for specifying IP ranges (or host
> ranges) at all. I think its a pretty significant deficiency, but it does
> mean that we don't need to worry about the issue of blocking a large range
> while unblocking few servers in the range.
>
> Gwen
>
> P.S
> We have a call tomorrow (Tuesday, April 28) at 3pm PST - to discuss this
> and other outstanding design issues (not all related to security). If you
> are interested in joining - let me know and I'll forward you the invite.
>
> Gwen
>
> On Mon, Apr 27, 2015 at 10:15 PM, Sun, Dapeng 
> wrote:
>
> > Attach the image.
> >
> > https://raw.githubusercontent.com/sundapeng/attachment/master/kafka-ac
> > l1.png
> >
> > Regards
> > Dapeng
> >
> > From: Sun, Dapeng [mailto:dapeng@intel.com]
> > Sent: Tuesday, April 28, 2015 11:44 AM
> > To: dev@kafka.apache.org
> > Subject: RE: [VOTE] KIP-11- Authorization design for kafka security
> >
> >
> > Thank you for your rapid reply, Parth.
> >
> >
> >
> > >* I think the wiki already describes the precedence order as Deny
> > >taking
> > precedence over allow when conflicting acls are found
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-11+-+Authorizati
> > on+In
> >
> > >terface#KIP-11-AuthorizationInterface-PermissionType
> >
> > Got it, thank you.
> >
> >
> >
> > >* In the first version that I am currently writing there is no group
> > support. Even when we add it I don't see the need to add a precedence
> > for evaluation. it does not matter which principal matches as long as
> >
> > > we have a match.
> >
> >
> >
> > About this part, I think we should choose the best matching acl for
> > authorization, no matter we support group or not.
> >
> >
> >
> > For the case
> >
> >  [cid:image001.png@01D08197.E94BD410]
> >
> > https://raw.githubusercontent.com/sundapeng/attachment/master/kafka-ac
> > l1.png
> >
> >
> >
> > if 2 Acls are defined, one that deny an operation from all hosts and
> > one that allows the operation from host1, the operation from host1
> > will be denied or allowed?
> >
> > According wiki "Deny will take precedence over Allow in competing
> > acls.", it seems acl_1 will win the competit