[DISCUSS] KIP-39 Pinning controller to a broker

2015-10-20 Thread Abhishek Nigam
Hi,
Can we please discuss this KIP. The background for this is that it allows
us to pin controller to a broker. This is useful in a couple of scenarios:
a) If we want to do a rolling bounce we can reduce the number of controller
moves down to 1.
b) Again pick a designated broker and reduce the number of partitions on it
through admin reassign partitions and designate it as a controller.
c) Dynamically move controller if we see any problems on the broker which
it is running.

Here is the wiki page
https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker

-Abhishek


Re: [DISCUSS] KIP-39 Pinning controller to a broker

2015-10-20 Thread Abhishek Nigam
Hi Jay/Neha,
I just subscribed to the mailing list so I read your response but did not
receive your email so adding the context into this email thread.

"

Agree with Jay on staying away from pinning roles to brokers. This is
actually harder to operate and monitor.

Regarding the problems you mentioned-
1. Reducing the controller moves during rolling bounce is useful but really
something that should be handled by the tooling. The root cause is that
currently the controller move is expensive. I think we'd be better off
investing time and effort in thinning out the controller. Just moving to
the batch write APIs in ZooKeeper will make a huge difference.
2. I'm not sure I understood the motivation behind moving partitions out of
the controller broker. That seems like a proposal for a solution, but can
you describe the problems you saw that affected controller functionality?

Regarding the location of the controller, it seems there are 2 things you
are suggesting:
1. Optimizing the strategy of picking a broker as the controller (e.g.
least loaded node)
2. Moving the controller if a broker soft fails.

I don't think #1 is worth the effort involved. The better way of addressing
it is to make the controller thinner and faster. #2 is interesting since
the problem is that while a broker fails, all state changes fail or are
queued up which globally impacts the cluster. There are 2 alternatives -
have a tool that allows you to move the controller or just kill the broker
so the controller moves. I prefer the latter since it is simple and also
because a misbehaving broker is better off shutdown anyway.

Having said that, it will be helpful to know details of the problems you
saw while operating the controller. I think understanding those will help
guide the solution better.

On Tue, Oct 20, 2015 at 12:49 PM, Jay Kreps <j...@confluent.io> wrote:

> This seems like a step backwards--we really don't want people to manually
> manage the location of the controller and try to manually balance
> partitions off that broker.
>
> I think it might make sense to consider directly fixing the things you
> actual want to fix:
> 1. Two many controller moves--we could either just make this cheaper or
> make the controller location more deterministic e.g. having the election
> prefer the node with the smallest node id so there were fewer failovers in
> rolling bounces.
> 2. You seem to think having the controller on a normal node is a problem.
> Can you elaborate on what the negative consequences you've observed? Let's
> focus on fixing those.
>
> In general we've worked very hard to avoid having a bunch of dedicated
> roles for different nodes and I would be very very loath to see us move
> away from that philosophy. I have a fair amount of experience with both
> homogenous systems that have a single role and also systems with many
> differentiated roles and I really think that the differentiated approach
> causes more problems than it solves for most deployments due to the added
> complexity.
>
> I think we could also fix up this KIP a bit. For example it says there are
> no public interfaces involved but surely there are new admin commands to
> control the location? There are also some minor things like listing it as
> released in 0.8.3 that seem wrong.
>
> -Jay
>
> On Tue, Oct 20, 2015 at 12:18 PM, Abhishek Nigam <
> ani...@linkedin.com.invalid> wrote:
>
> > Hi,
> > Can we please discuss this KIP. The background for this is that it allows
> > us to pin controller to a broker. This is useful in a couple of
> scenarios:
> > a) If we want to do a rolling bounce we can reduce the number of
> controller
> > moves down to 1.
> > b) Again pick a designated broker and reduce the number of partitions on
> it
> > through admin reassign partitions and designate it as a controller.
> > c) Dynamically move controller if we see any problems on the broker which
> > it is running.
> >
> > Here is the wiki page
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker
> >
> > -Abhishek
> >
>

"

I think based on the feedback we can limit the discussion to the rolling
upgrade scenario and how best to address it. I think the only scenario
which I have heard
where we wanted to move controller off a broker was due to a bug where we
had multiple controllers due to a bug which has since been fixed.

I will update the KIP on how we can optimize the placement of controller
(pinning it to a preferred broker id (potentially config enabled) ) if that
sounds reasonable.
Many of the ideas of the original KIP can still apply in the limited scope.

-Abhishek


[jira] [Commented] (KAFKA-1599) Change preferred replica election admin command to handle large clusters

2015-09-21 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901279#comment-14901279
 ] 

Abhishek Nigam commented on KAFKA-1599:
---

Copying this content verbatim from a newly created ticket which is dup 
(KAFKA-2552) of this one which details approach 4). I think it is unavoidable 
to do chaining because even with a more compact representation we might still 
run into this issue maybe with a larger json.

"Essentially a generic approach to this which would require read and write side 
to change would be as follows:
We designate a zookeeper path as scratch:
Ex- /admin/scratch
Write side
When writing json to zookeeper we will chunk it into 1 MB units and store it in 
different zookeeper nodes from the sratch all but the first chunk.
The first chunk will live in the original location as we have it today.
Ex- /admin/reassign_partitions
Each chunk will have the following format 
"json incompatible header" something other than "{"
length of the zookeeper path to the next json chunk (0 means that this is the 
last chunk)
zookeeper path of the next json chunk.
length of chunk of json data blob.
chunk of json data blob.
We will write to this conceptual linked list back to front.
Read side 
The zookeeper watch will be fired as before. While reading if we detect there 
are more chunks we will do synced read from zookeeper."



> Change preferred replica election admin command to handle large clusters
> 
>
> Key: KAFKA-1599
> URL: https://issues.apache.org/jira/browse/KAFKA-1599
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>    Reporter: Todd Palino
>Assignee: Abhishek Nigam
>  Labels: newbie++
>
> We ran into a problem with a cluster that has 70k partitions where we could 
> not trigger a preferred replica election for all topics and partitions using 
> the admin tool. Upon investigation, it was determined that this was because 
> the JSON object that was being written to the admin znode to tell the 
> controller to start the election was 1.8 MB in size. As the default Zookeeper 
> data size limit is 1MB, and it is non-trivial to change, we should come up 
> with a better way to represent the list of topics and partitions for this 
> admin command.
> I have several thoughts on this so far:
> 1) Trigger the command for all topics and partitions with a JSON object that 
> does not include an explicit list of them (i.e. a flag that says "all 
> partitions")
> 2) Use a more compact JSON representation. Currently, the JSON contains a 
> 'partitions' key which holds a list of dictionaries that each have a 'topic' 
> and 'partition' key, and there must be one list item for each partition. This 
> results in a lot of repetition of key names that is unneeded. Changing this 
> to a format like this would be much more compact:
> {'topics': {'topicName1': [0, 1, 2, 3], 'topicName2': [0,1]}, 'version': 1}
> 3) Use a representation other than JSON. Strings are inefficient. A binary 
> format would be the most compact. This does put a greater burden on tools and 
> scripts that do not use the inbuilt libraries, but it is not too high.
> 4) Use a representation that involves multiple znodes. A structured tree in 
> the admin command would probably provide the most complete solution. However, 
> we would need to make sure to not exceed the data size limit with a wide tree 
> (the list of children for any single znode cannot exceed the ZK data size of 
> 1MB)
> Obviously, there could be a combination of #1 with a change in the 
> representation, which would likely be appropriate as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-1599) Change preferred replica election admin command to handle large clusters

2015-09-21 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Nigam reassigned KAFKA-1599:
-

Assignee: Abhishek Nigam

> Change preferred replica election admin command to handle large clusters
> 
>
> Key: KAFKA-1599
> URL: https://issues.apache.org/jira/browse/KAFKA-1599
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.8.2.0
>Reporter: Todd Palino
>    Assignee: Abhishek Nigam
>  Labels: newbie++
>
> We ran into a problem with a cluster that has 70k partitions where we could 
> not trigger a preferred replica election for all topics and partitions using 
> the admin tool. Upon investigation, it was determined that this was because 
> the JSON object that was being written to the admin znode to tell the 
> controller to start the election was 1.8 MB in size. As the default Zookeeper 
> data size limit is 1MB, and it is non-trivial to change, we should come up 
> with a better way to represent the list of topics and partitions for this 
> admin command.
> I have several thoughts on this so far:
> 1) Trigger the command for all topics and partitions with a JSON object that 
> does not include an explicit list of them (i.e. a flag that says "all 
> partitions")
> 2) Use a more compact JSON representation. Currently, the JSON contains a 
> 'partitions' key which holds a list of dictionaries that each have a 'topic' 
> and 'partition' key, and there must be one list item for each partition. This 
> results in a lot of repetition of key names that is unneeded. Changing this 
> to a format like this would be much more compact:
> {'topics': {'topicName1': [0, 1, 2, 3], 'topicName2': [0,1]}, 'version': 1}
> 3) Use a representation other than JSON. Strings are inefficient. A binary 
> format would be the most compact. This does put a greater burden on tools and 
> scripts that do not use the inbuilt libraries, but it is not too high.
> 4) Use a representation that involves multiple znodes. A structured tree in 
> the admin command would probably provide the most complete solution. However, 
> we would need to make sure to not exceed the data size limit with a wide tree 
> (the list of children for any single znode cannot exceed the ZK data size of 
> 1MB)
> Obviously, there could be a combination of #1 with a change in the 
> representation, which would likely be appropriate as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2552) Certain admin commands such as partition assignment fail on large clusters

2015-09-17 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804560#comment-14804560
 ] 

Abhishek Nigam commented on KAFKA-2552:
---

Essentially a generic approach to this which would require read and write side 
to change would be as follows:
We designate a zookeeper path as scratch:
Ex- /admin/scratch

Write side
When writing json to zookeeper we will chunk it into 1 MB units and store it in 
different zookeeper nodes from the sratch all but the first chunk.
The first chunk will live in the original location as we have it today.
Ex- /admin/reassign_partitions

Each chunk will have the following format 
"json incompatible header" something other than "{"
length of the zookeeper path to the next json chunk (0 means that this is the 
last chunk)
zookeeper path of the next json chunk.
length of chunk of json data blob.
chunk of json data blob.

We will write to this conceptual linked list back to front.

Read side 
The zookeeper watch will be fired as before. While reading if we detect there 
are more chunks we will do synced read from zookeeper. 


> Certain admin commands such as partition assignment fail on large clusters
> --
>
> Key: KAFKA-2552
> URL: https://issues.apache.org/jira/browse/KAFKA-2552
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Abhishek Nigam
>Assignee: Abhishek Nigam
>
> This happens because the json generated is greater than 1 MB and exceeds the 
> default data limit of zookeeper nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time

2015-08-14 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697427#comment-14697427
 ] 

Abhishek Nigam commented on KAFKA-1387:
---

Thanks a lot for digging into this. Not sure if it helps but in the past
when I saw this issue it went like this:
a) Say session time out is 30 seconds.
b) If we kill the instance which create the zookeeper ephemeral node and
bring it back up quickly (less than 30 seconds) we would find the previous
session data (ephemeral node) still exists.

The solution was to assume the existing data was from an old session,
delete and re-create it during startup. However, we were processing the
zookeeper events on a single thread.

On Fri, Aug 14, 2015 at 6:34 AM, Flavio Junqueira (JIRA) j...@apache.org



 Kafka getting stuck creating ephemeral node it has already created when two 
 zookeeper sessions are established in a very short period of time
 -

 Key: KAFKA-1387
 URL: https://issues.apache.org/jira/browse/KAFKA-1387
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Fedor Korotkiy
Priority: Blocker
  Labels: newbie, patch, zkclient-problems
 Attachments: kafka-1387.patch


 Kafka broker re-registers itself in zookeeper every time handleNewSession() 
 callback is invoked.
 https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
  
 Now imagine the following sequence of events.
 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
 the zkClient, but not invoked yet.
 2) Zookeeper session reestablishes again, queueing callback second time.
 3) First callback is invoked, creating /broker/[id] ephemeral path.
 4) Second callback is invoked and it tries to create /broker/[id] path using 
 createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
 already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
 stuck in the infinite loop.
 Seems like controller election code have the same issue.
 I'am able to reproduce this issue on the 0.8.1 branch from github using the 
 following configs.
 # zookeeper
 tickTime=10
 dataDir=/tmp/zk/
 clientPort=2101
 maxClientCnxns=0
 # kafka
 broker.id=1
 log.dir=/tmp/kafka
 zookeeper.connect=localhost:2101
 zookeeper.connection.timeout.ms=100
 zookeeper.sessiontimeout.ms=100
 Just start kafka and zookeeper and then pause zookeeper several times using 
 Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Can someone review ticket 1778

2015-08-11 Thread Abhishek Nigam
Hi Guozhang,
Can you please re-review KAFKA 1778 design.

Just to provide background for this ticket. This was a sub-ticket of kafka
admin commands KIP-4.
The goal of this was to avoid cascading controller moves maybe during
rolling broker bounce.

The approaches discussed were as follows:
a) Use a preferred controller admin command which can be used to
dynamically indicate a preferred controller.
b) Use configuration to set a whitelist or blacklist of brokers which are
eligible to become a controller.

Can we have consensus on how we want to resolve this issue.

-Abhishek

On Sun, May 17, 2015 at 10:55 PM, Abhishek Nigam ani...@linkedin.com
wrote:

 Hi,
 For pinning the controller to a broker I have proposed a design. Can
 someone review the design and let me know if it looks ok.
 I can then submit a patch for this ticket within the next couple of weeks.

 -Abhishek




[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function

2015-08-11 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692370#comment-14692370
 ] 

Abhishek Nigam commented on KAFKA-1778:
---

Hi Guozhang,
I agree 100% with you. Can you tell me what is the best way to move forward
on this on the open source side.

-Abhishek

On Tue, Aug 11, 2015 at 2:30 PM, Guozhang Wang (JIRA) j...@apache.org



 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function

2015-08-11 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692628#comment-14692628
 ] 

Abhishek Nigam commented on KAFKA-1778:
---

Thanks Guozhang,
I will write it up in a nice proposal.

-Abhishek

On Tue, Aug 11, 2015 at 3:28 PM, Guozhang Wang (JIRA) j...@apache.org



 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function

2015-05-29 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565837#comment-14565837
 ] 

Abhishek Nigam commented on KAFKA-1778:
---

I believe what you are suggesting is that we can have a group of brokers 
flagged as potential brokers and all controller elections will be limited to 
that subset of brokers. Do I need to provide any failsafe in case all the 
flagged brokers are not able to participate in the required election and we are 
controller-less?

-Abhishek

 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function

2015-05-27 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561833#comment-14561833
 ] 

Abhishek Nigam commented on KAFKA-1778:
---

Joel,
What I was proposing was that all the brokers will watch the 
ready-to-serve-as-controller ephemeral node. In the scenario outlined where the 
preferred controller dies after the election is over but before it can write to 
the /controller node all the brokers will get this notification. Then there 
will be another round of elections in that case.

The controller is the one which pulls from /admin/next_controller persistent 
zookeeper node and also keeps a watch on it. If it detects this has been 
changed and the chosen broker id is different from it it will start the 
preferred controller move process.

Also, can we avoid the message from current controller to the preferred 
controller by having all brokers just watch the admin/next_controller znode? 
This is definitely a better approach where zookeeper node can be used to 
achieve this messaging.

Jun,
In my opinion static assignment suffers from some issues where if the 
pre-determined controller goes down what happens or runs into any issues what 
happens.







 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function

2015-05-19 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550781#comment-14550781
 ] 

Abhishek Nigam commented on KAFKA-1778:
---

Jun,
The way I see it pinning the controller gives us multiple benefits:
a) If SREs are doing rolling upgrades they can set aside the broker on which 
the controller is pinned as the broker which they touch last.
This way there are only a limited number of controller moves and we can get 
more availability of the controller as a result as opposed to un-predictable 
number of controller moves.

b) I think more importantly if we do manual partition assignment we can set 
aside a broker to have very few partitions and this would reduce the impact on 
the controller from serving too many produce and consume events. To summarize 
it enables us to isolate the controller from the broker functionality 
potentially enabling us to push the brokers harder. 

Joel,
You are spot on. Since now all the brokers will be watching for the preferred 
controller node we can have the following situations:
a) All of them know about the preferred controller (zookeeper metadata has 
flowed to everyone). In this case the preferred controller would become the 
leader right away.

b) If some of them know about the preferred controller they will participate in 
the election and it is possible that somebody other than the preferred 
controller becomes the leader. What will happen in this case is that eventually 
this new controller will figure out that the preferred controller is available 
(thru zookeeper watch) to serve traffic it will resign and trigger another 
round of elections.
c) If none of them know about the preferred controller the behavior will be 
similar as above.

  

 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Can someone review ticket 1778

2015-05-17 Thread Abhishek Nigam
Hi,
For pinning the controller to a broker I have proposed a design. Can someone 
review the design and let me know if it looks ok.
I can then submit a patch for this ticket within the next couple of weeks.

-Abhishek



[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test

2015-05-17 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547566#comment-14547566
 ] 

Abhishek Nigam commented on KAFKA-1888:
---

Geoffrey,
Thanks for the heads up. I saw a related article that you are planning to work 
on the API compatibility testing as well.
I am taking myself off of this ticket as it looks like this ticket will be 
subsumed by your work.

-Abhishek

 Add a rolling upgrade system test
 ---

 Key: KAFKA-1888
 URL: https://issues.apache.org/jira/browse/KAFKA-1888
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Gwen Shapira
Assignee: Abhishek Nigam
 Fix For: 0.9.0

 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch


 To help test upgrades and compatibility between versions, it will be cool to 
 add a rolling-upgrade test to system tests:
 Given two versions (just a path to the jars?), check that you can do a
 rolling upgrade of the brokers from one version to another (using clients 
 from the old version) without losing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1888) Add a rolling upgrade system test

2015-05-17 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Nigam updated KAFKA-1888:
--
Assignee: (was: Abhishek Nigam)

 Add a rolling upgrade system test
 ---

 Key: KAFKA-1888
 URL: https://issues.apache.org/jira/browse/KAFKA-1888
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Gwen Shapira
 Fix For: 0.9.0

 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch


 To help test upgrades and compatibility between versions, it will be cool to 
 add a rolling-upgrade test to system tests:
 Given two versions (just a path to the jars?), check that you can do a
 rolling upgrade of the brokers from one version to another (using clients 
 from the old version) without losing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time

2015-05-07 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533402#comment-14533402
 ] 

Abhishek Nigam commented on KAFKA-1387:
---

I have seen the ephemeral node issue before and the fix made there was exactly 
what Thomas mentioned:
It seems the simplest thing to do would be to just delete the conflicted node 
and write the truth about the process environment it knows.

Is there a reason why the approach outlined by Thomas does not work for kafka?

 Kafka getting stuck creating ephemeral node it has already created when two 
 zookeeper sessions are established in a very short period of time
 -

 Key: KAFKA-1387
 URL: https://issues.apache.org/jira/browse/KAFKA-1387
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1.1
Reporter: Fedor Korotkiy
Priority: Blocker
  Labels: newbie, patch, zkclient-problems
 Attachments: kafka-1387.patch


 Kafka broker re-registers itself in zookeeper every time handleNewSession() 
 callback is invoked.
 https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
  
 Now imagine the following sequence of events.
 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
 the zkClient, but not invoked yet.
 2) Zookeeper session reestablishes again, queueing callback second time.
 3) First callback is invoked, creating /broker/[id] ephemeral path.
 4) Second callback is invoked and it tries to create /broker/[id] path using 
 createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
 already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
 stuck in the infinite loop.
 Seems like controller election code have the same issue.
 I'am able to reproduce this issue on the 0.8.1 branch from github using the 
 following configs.
 # zookeeper
 tickTime=10
 dataDir=/tmp/zk/
 clientPort=2101
 maxClientCnxns=0
 # kafka
 broker.id=1
 log.dir=/tmp/kafka
 zookeeper.connect=localhost:2101
 zookeeper.connection.timeout.ms=100
 zookeeper.sessiontimeout.ms=100
 Just start kafka and zookeeper and then pause zookeeper several times using 
 Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test

2015-04-02 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393806#comment-14393806
 ] 

Abhishek Nigam commented on KAFKA-1888:
---

Hi Gwen/Ashish,
I need to finish up something else and I will only be able to come back to this 
ticket in 2-3 weeks. 

 Add a rolling upgrade system test
 ---

 Key: KAFKA-1888
 URL: https://issues.apache.org/jira/browse/KAFKA-1888
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Gwen Shapira
Assignee: Abhishek Nigam
 Fix For: 0.9.0

 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch


 To help test upgrades and compatibility between versions, it will be cool to 
 add a rolling-upgrade test to system tests:
 Given two versions (just a path to the jars?), check that you can do a
 rolling upgrade of the brokers from one version to another (using clients 
 from the old version) without losing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30809: Patch for KAFKA-1888

2015-04-02 Thread Abhishek Nigam


 On March 31, 2015, 9:20 p.m., Joel Koshy wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 1
  https://reviews.apache.org/r/30809/diff/4/?file=903374#file903374line1
 
  This should definitely not be in tools - this should probably live 
  somewhere under clients/test. I don't think those are currently exported 
  though, so we will need to modify build.gradle. However, per other comments 
  below I'm not sure this should be part of system tests since it is (by 
  definition long running).

Will do.


 On March 31, 2015, 9:20 p.m., Joel Koshy wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 49
  https://reviews.apache.org/r/30809/diff/4/?file=903374#file903374line49
 
  It would help a lot if you could add comments describing what 
  validation is done. For e.g., I'm unclear on why we need the complicated 
  file-based signaling mechanism. So a high-level description would help a 
  lot.
  
  More importantly, I really think we should separate continuous 
  validation from broker upgrade which is the focus of KAFKA-1888
  
  In order to do a broker upgrade test, we don't need any additional 
  code. We just instantiate the producer performance and consumer via system 
  test utils. Keep those on the old jar. The cluster will start with the old 
  jar as well and during the test we bounce in the latest jar (the system 
  test utils will need to be updated to support this). We then do the 
  standard system test validation - that all messages sent were received.

I wanted to have two (topic, partition) tuples with leader on each broker. I 
have decided to use a single topic with multiple partitions rather than using 
two topics which could have also worked. The reason for picking the first 
approach was that essentially if I wanted to leverage continuous validation 
test outside of system test framework with in a test cluster with other topics. 
In order to illustrate why the second approach won't work in that scenario is 
that if we have 3 brokers with one partition if I create 3 topics (T1P1, T2P1, 
T3P1) then the following would be a valid assignment based on existing broker 
assignment algorithm.

B1B2   B3 
T1P1  TXP1 TXP2
T2P1  TYP1 TYP2
T3P1

where TX and TY are other production topics running in that cluster. In this 
case all the leaders have landed on the same broker. However the first approach 
precludes this possibility.


The file signalling was to workaround the fact that the most commonly used 
client does not have capability to consume from a particular partition. The way 
I have set it up the file signalling acts as a barrier. We make sure all the 
producer/consumer pairs have been instantiated with the hope being that they 
have talked to zookeeper and reserved their parition. Once both the consumers 
have been instantiated we expect themselves to have bound themselves to a 
particular partition we can now let the producers run in both the instances and 
this way we are assured that the consumer should never receive data from same 
producer.


 On March 31, 2015, 9:20 p.m., Joel Koshy wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 52
  https://reviews.apache.org/r/30809/diff/4/?file=903374#file903374line52
 
  This appears to be for rate-limiting the producer but can be more 
  general than that.
  
  It would help to add a comment describing its purpose.
  
  Also, should probably be private

This is a poor man's rate limiter as compared to guava rate limiter. I will 
make it private.


 On March 31, 2015, 9:20 p.m., Joel Koshy wrote:
  system_test/broker_upgrade/bin/test-broker-upgrade.sh, line 1
  https://reviews.apache.org/r/30809/diff/4/?file=903376#file903376line1
 
  This appears to be a one-off script to set up the test. This needs to 
  be done within the system test framework which already has a number of 
  utilities that do similar things.
  
  One other comment is that the patch is for an upgrade test, but I think 
  it is a bit confusing to mix this with CVT.

The continuous validation test will be useful outside of the system test 
framework. This was an attempt to leverage CVT in the system test setting.

I think since strong objections have been raised against adopting this approach 
I will leave a comment on this patch accordingly.


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review78270
---


On March 23, 2015, 6:54 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 23, 2015, 6:54 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA

Re: Review Request 30809: Patch for KAFKA-1888

2015-04-02 Thread Abhishek Nigam


 On April 2, 2015, 1:38 a.m., Jun Rao wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, lines 431-437
  https://reviews.apache.org/r/30809/diff/4/?file=903374#file903374line431
 
  Could we add a description of the test (what kind of data is generated, 
  how does consumer to the verification, what kind of output is generated, 
  etc)?

The data which is generated is very simple - increasing sequence of longs with 
timestamp. The producer keeps track of the newest sequence number, timestamp 
which it has sent.
The consumer keeps track of the last sequence number and timestamp which it has 
received. The system test will interrupt the CVT and compare the sequence 
numbers between the producer and the sender. If they do not line up then it is 
an error. (If either the producer or consumer threads terminate un-expectedly 
before they have been interrupted it will be flagged as an error) If the test 
fails then the data logs from the producer and consumer are not removed and can 
be inspected.

The idea behind putting the consumer and producer in the same JVM was 
orthogonal to system test and was in case it is used in a test cluster hosting 
other topics it makes easy to get hands on some things like delta etc. However, 
I think there is very strong objection to adopting this for system tests which 
are short-lived in nature. Unless there is support for the approach I have 
taken so far I plan to revert to the existing approach of spawning multiple 
JVMs for producer and consumer.

I will change the bash script to be in python similar to what other system 
tests do.


 On April 2, 2015, 1:38 a.m., Jun Rao wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, lines 440-454
  https://reviews.apache.org/r/30809/diff/4/?file=903374#file903374line440
 
  Could we add a description of each command line option?

I need to add more documentation. I will add this in.


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review78630
---


On March 23, 2015, 6:54 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 23, 2015, 6:54 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Updated the RB with Gwen's comments, Beckett's comments and a subset of 
 Guozhang's comments
 
 
 Diffs
 -
 
   bin/kafka-run-class.sh 881f578a8f5c796fe23415b978c1ad35869af76e 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   core/src/main/scala/kafka/utils/ShutdownableThread.scala 
 fc226c863095b7761290292cd8755cd7ad0f155c 
   system_test/broker_upgrade/bin/test-broker-upgrade.sh PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




Re: Review Request 30809: Patch for KAFKA-1888

2015-03-23 Thread Abhishek Nigam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/
---

(Updated March 23, 2015, 6:54 p.m.)


Review request for kafka.


Bugs: KAFKA-1888
https://issues.apache.org/jira/browse/KAFKA-1888


Repository: kafka


Description (updated)
---

Updated the RB with Gwen's comments, Beckett's comments and a subset of 
Guozhang's comments


Diffs (updated)
-

  bin/kafka-run-class.sh 881f578a8f5c796fe23415b978c1ad35869af76e 
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
  core/src/main/scala/kafka/utils/ShutdownableThread.scala 
fc226c863095b7761290292cd8755cd7ad0f155c 
  system_test/broker_upgrade/bin/test-broker-upgrade.sh PRE-CREATION 

Diff: https://reviews.apache.org/r/30809/diff/


Testing
---

Scripted it to run 20 times without any failures.
Command-line: broker-upgrade/bin/test.sh dir1 dir2


Thanks,

Abhishek Nigam



[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test

2015-03-23 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376399#comment-14376399
 ] 

Abhishek Nigam commented on KAFKA-1888:
---

Updated reviewboard https://reviews.apache.org/r/30809/diff/
 against branch origin/trunk

 Add a rolling upgrade system test
 ---

 Key: KAFKA-1888
 URL: https://issues.apache.org/jira/browse/KAFKA-1888
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Gwen Shapira
Assignee: Abhishek Nigam
 Fix For: 0.9.0

 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch


 To help test upgrades and compatibility between versions, it will be cool to 
 add a rolling-upgrade test to system tests:
 Given two versions (just a path to the jars?), check that you can do a
 rolling upgrade of the brokers from one version to another (using clients 
 from the old version) without losing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1888) Add a rolling upgrade system test

2015-03-23 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Nigam updated KAFKA-1888:
--
Status: Patch Available  (was: Open)

 Add a rolling upgrade system test
 ---

 Key: KAFKA-1888
 URL: https://issues.apache.org/jira/browse/KAFKA-1888
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Gwen Shapira
Assignee: Abhishek Nigam
 Fix For: 0.9.0

 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch


 To help test upgrades and compatibility between versions, it will be cool to 
 add a rolling-upgrade test to system tests:
 Given two versions (just a path to the jars?), check that you can do a
 rolling upgrade of the brokers from one version to another (using clients 
 from the old version) without losing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1888) Add a rolling upgrade system test

2015-03-23 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Nigam updated KAFKA-1888:
--
Attachment: KAFKA-1888_2015-03-23_11:54:25.patch

 Add a rolling upgrade system test
 ---

 Key: KAFKA-1888
 URL: https://issues.apache.org/jira/browse/KAFKA-1888
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Gwen Shapira
Assignee: Abhishek Nigam
 Fix For: 0.9.0

 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch


 To help test upgrades and compatibility between versions, it will be cool to 
 add a rolling-upgrade test to system tests:
 Given two versions (just a path to the jars?), check that you can do a
 rolling upgrade of the brokers from one version to another (using clients 
 from the old version) without losing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function

2015-03-19 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370385#comment-14370385
 ] 

Abhishek Nigam commented on KAFKA-1778:
---

I have a design for pinning the controller to a broker:
e we want to pin the controller to broker id x.

Handling the admin request in the controller:
a) We send the admin request to the controller.
b) It will create a persistent zookeeper node /admin/next_controller with data 
x.
c) It will then pull the information about broker id x to see if it is up and 
running through the alive broker list.
d) If the broker is up and running it will start 3-way handshake with x.
e) It will start a watch on /admin/ready_to_serve_as_controller zookeeper node.
f) It will send a message to the broker to tell it that it should become ready 
to serve as next_controller.
g) Broker x on receiving this message will create ephemeral node 
/admin/ready_to_server_as_controller.
h) Controller observes this change.
h) At this point the current controller will resign.

Changes in the election code:
a) All the brokers will pull from /admin/ready_to_server_as_controller with a 
watch.
b) If the brokers find that if this znode exists and their broker.id does not 
match the id specified in this ephemeral node they will simply not participate 
in the leader election.
c) Broker x will rightfully takes its place as the next controller.

c) The watches will be used in case broker x comes back to life.
d) In that case if I am the controller then I will resign.

Changes in the controller startup code:
a) Always pull from the /admin/next_controller for data changes as well as new 
data.
b) If there is any change try to setup the next broker similar to what has been 
specified in handling the admin request in the controller.

 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-1778) Create new re-elect controller admin function

2015-03-19 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-1778 started by Abhishek Nigam.
-
 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2003) Add upgrade tests

2015-03-11 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357325#comment-14357325
 ] 

Abhishek Nigam commented on KAFKA-2003:
---

Hi Gwen,
I did not realize might patch got uploaded to RB but the link was not attached 
to the jira.
I just added it in the comments section of 1888. 
https://reviews.apache.org/r/30809/



 Add upgrade tests
 -

 Key: KAFKA-2003
 URL: https://issues.apache.org/jira/browse/KAFKA-2003
 Project: Kafka
  Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: Ashish K Singh

 To test protocol changes, compatibility and upgrade process, we need a good 
 way to test different versions of the product together and to test end-to-end 
 upgrade process.
 For example, for 0.8.2 to 0.8.3 test we want to check:
 * Can we start a cluster with a mix of 0.8.2 and 0.8.3 brokers?
 * Can a cluster of 0.8.3 brokers bump the protocol level one broker at a time?
 * Can 0.8.2 clients run against a cluster of 0.8.3 brokers?
 There are probably more questions. But an automated framework that can test 
 those and report results will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Abhishek Nigam


 On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 183
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line183
 
  This is essentially a sync approach, can we use callback to do this?

This is intentional. We want to make sure the event has successfully reached 
the brokers. This enables us to form a reasonable expectation of what the 
consumer should expect.


 On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 184
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line184
 
  When a send fails, should we at least log the sequence number?

I log the exception and the logger gives me the timestamp in the logs.
Maybe I am missing something. Can you explain the rationale of why we would 
want to log the sequence number on the producer side when send fails.


 On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 321
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line321
 
  Similar to producer, can we log the expected sequence number and the 
  seq we actually saw?

Sure in the cases where this a mismatch I could do that.


 On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 386
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line386
 
  Can we use KafkaThread here?

I will take a look at that.


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76173
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 9, 2015, 11:55 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
 setup
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




Re: Review Request 30809: Patch for KAFKA-1888

2015-03-11 Thread Abhishek Nigam


 On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
  This looks like a very good start. I think the framework is flexible enough 
  to allow us to add a variety of upgrade tests. I'm looking forward to it.
  
  
  I have few comments, but mostly I'm still confused on how this will be 
  used. Perhaps more comments or even a readme is in order
  
  You wrote that we invoke test.sh dir1 dir2, what should each 
  directory contain? just the kafka jar of different versions? or an entire 
  installation (including bin/ and conf/)?
  Which one of the directories should be the newer and which is older? does 
  it matter?
  Which version of clients will be used.
  
  Perhaps a more descriptive name for test.sh can help too. I'm guessing 
  we'll have a whole collection of those test scripts soon.
  
  Gwen

The directory containing the kafka jars. 
kafka_2.10-0.8.3-SNAPSHOT.jar
kafka-clients-0.8.3-SNAPSHOT.jar
The other jars are shared between both the kafka brokers.


 On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
  build.gradle, line 209
  https://reviews.apache.org/r/30809/diff/3/?file=889854#file889854line209
 
  This should probably be a test dependency (if needed at all)
  
  Packaging Guava will be a pain, since so many systems use different 
  versions of Guava and they are all incompatible.

Guava provides an excellent rate limiter which I am using in the test and have 
used in the past.
When you talk about packaging we are already pulling in other external 
libraries like zookeeper with a specific version which the applications might 
be using extensively and might similarly run into conflicts.

If you have a suggestion for a library which provides rate limiting(less 
popular) than guava then I can use that instead otherwise I will move this 
dependency to the test for now.


 On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, lines 409-440
  https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line409
 
  Do we really want to do this? 
  
  We are using joptsimple for a bunch of other tools. It is easier to 
  read, maintain, nice error messages, help screen, etc.

Thanks, I will switch to jobOpts.


 On March 11, 2015, 11:12 p.m., Gwen Shapira wrote:
  system_test/broker_upgrade/bin/kafka-run-class.sh, lines 152-156
  https://reviews.apache.org/r/30809/diff/3/?file=889856#file889856line152
 
  Why did we decide to duplicate this entire file?

The only difference is that it takes an additional argument which contains the 
directory from which the kafka jars should be pulled.
Would you recommend adding it to the original script as an optional argument?


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review76157
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 9, 2015, 11:55 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
 setup
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test

2015-03-10 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355921#comment-14355921
 ] 

Abhishek Nigam commented on KAFKA-1888:
---

Hi Gwen,
I am not sure why the link is not showing up. Here you go:
https://reviews.apache.org/r/30809/

 Add a rolling upgrade system test
 ---

 Key: KAFKA-1888
 URL: https://issues.apache.org/jira/browse/KAFKA-1888
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Gwen Shapira
Assignee: Abhishek Nigam
 Fix For: 0.9.0


 To help test upgrades and compatibility between versions, it will be cool to 
 add a rolling-upgrade test to system tests:
 Given two versions (just a path to the jars?), check that you can do a
 rolling upgrade of the brokers from one version to another (using clients 
 from the old version) without losing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30809: Patch for KAFKA-1888

2015-03-09 Thread Abhishek Nigam


 On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 400
  https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line400
 
  The common format of commenting is :
  
  // this is a comment
  
  Personally I don't mind, but thats kind of a standard that I understood 
  from the reviews that I got.

IDE setup was a little messed up. I looked up kafka coding guidelines and there 
does not seem to be anything about comments so made the indenting consistent 
using the IDE.


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review72786
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 9, 2015, 11:55 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
 setup
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




Re: Review Request 30809: Patch for KAFKA-1888

2015-03-09 Thread Abhishek Nigam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/
---

(Updated March 9, 2015, 11:55 p.m.)


Review request for kafka.


Bugs: KAFKA-1888
https://issues.apache.org/jira/browse/KAFKA-1888


Repository: kafka


Description (updated)
---

Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup


Diffs (updated)
-

  build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
  system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
  system_test/broker_upgrade/bin/test.sh PRE-CREATION 
  system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
  system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
  system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/30809/diff/


Testing
---

Scripted it to run 20 times without any failures.
Command-line: broker-upgrade/bin/test.sh dir1 dir2


Thanks,

Abhishek Nigam



[jira] [Commented] (KAFKA-2003) Add upgrade tests

2015-03-09 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353922#comment-14353922
 ] 

Abhishek Nigam commented on KAFKA-2003:
---

Hi Ashish/Gwen,
Can you review KAFKA-1888 patch. I have just updated a cleaned up patch with 
all the comments I have gotten so far. 
I would prefer if we keep these patches distinct.

a) We could for example commit KAFKA-1888 
b) Once Ashish is ready with the new test which covers all 8 combinations of 
different versions of producers/consumers and brokers and any additional stuff 
he is planning to do you can simply subsume KAFKA-1888.

This would enable us to use this patch here locally till the time we have a 
working superset which is represented by this ticket. 

 Add upgrade tests
 -

 Key: KAFKA-2003
 URL: https://issues.apache.org/jira/browse/KAFKA-2003
 Project: Kafka
  Issue Type: Improvement
Reporter: Gwen Shapira
Assignee: Ashish K Singh

 To test protocol changes, compatibility and upgrade process, we need a good 
 way to test different versions of the product together and to test end-to-end 
 upgrade process.
 For example, for 0.8.2 to 0.8.3 test we want to check:
 * Can we start a cluster with a mix of 0.8.2 and 0.8.3 brokers?
 * Can a cluster of 0.8.3 brokers bump the protocol level one broker at a time?
 * Can 0.8.2 clients run against a cluster of 0.8.3 brokers?
 There are probably more questions. But an automated framework that can test 
 those and report results will be a good start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30809: Patch for KAFKA-1888

2015-03-09 Thread Abhishek Nigam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review73061
---



core/src/main/scala/kafka/tools/ContinuousValidationTest.java
https://reviews.apache.org/r/30809/#comment119244

Flip is needed to reset the pointer to beginning of byte buffer.


- Abhishek Nigam


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 9, 2015, 11:55 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
 setup
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




Re: Review Request 30809: Patch for KAFKA-1888

2015-03-09 Thread Abhishek Nigam


 On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 168
  https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line168
 
  same here can we use isInterrupted()?
  
  
  http://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html

I want to check if the current thread is interrupted. The link you sent out is 
useful if I wanted to query whether another thread was interrupted.


 On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 280
  https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line280
 
  Can we put this in a separate method like init().
  Constructor can be used mainly for assignment. what do you think?

Moved the launching of the threads to an init method.


 On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 298
  https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line298
 
  When is the blockingCallInterrupted set to true?

Got rid of this.


 On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 324
  https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line324
 
  formatting spaces.
  
  Will there be a case where :
  (evt.sequenceId  lastEventSeenSequenceId.get()  
  evt.eventProducedTimestamp  lastEventSeenTimeProduced.get()

This will happen when the sequence numbers wraparound.


 On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 61
  https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line61
 
  Any reason for not making this final?
  
  static variables should come before Instance variables. 
  
  Its a common standard to specify instance variables with _ like : 
  _groupId.

Fixed this.


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review72786
---


On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated March 9, 2015, 11:55 p.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 Fixing the tests based on Mayuresh comments, code cleanup after proper IDE 
 setup
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




RE: Review Request 31385: Patch for KAFKA-1978

2015-02-25 Thread Abhishek Nigam
2015-02-06 00:51:30,975 - INFO - 
==
2015-02-06 00:51:30,975 - INFO - Exception while running test list index out of 
range
2015-02-06 00:51:30,975 - INFO - 
==
Traceback (most recent call last):
  File 
/mnt/u001/kafka_replication_system_test/system_test/replication_testsuite/replica_basic_test.py,
 line 434, in runTest

kafka_system_test_utils.validate_simple_consumer_data_matched_across_replicas(self.systemTestEnv,
 self.testcaseEnv)
  File 
/mnt/u001/kafka_replication_system_test/system_test/utils/kafka_system_test_utils.py,
 line 2223, in validate_simple_consumer_data_matched_across_replicas
replicaIdxMsgIdList[replicaIdx - 1][topicPartition] = consumerMsgIdList
IndexError: list index out of range

-Abhishek

From: Guozhang Wang [nore...@reviews.apache.org] on behalf of Guozhang Wang 
[wangg...@gmail.com]
Sent: Tuesday, February 24, 2015 6:36 PM
To: Abhishek Nigam; Guozhang Wang; kafka
Subject: Re: Review Request 31385: Patch for KAFKA-1978

This is an automatically generated e-mail. To reply, visit: 
https://reviews.apache.org/r/31385/


From the diff file itself it's a bit hard to understand the issue and the 
solution as well. Could you elaborate on an out of bounds exception due to 
mis-configuration on the ticket?


- Guozhang Wang


On February 24th, 2015, 11:31 p.m. UTC, Abhishek Nigam wrote:

Review request for kafka.
By Abhishek Nigam.

Updated Feb. 24, 2015, 11:31 p.m.

Bugs: KAFKA-1978https://issues.apache.org/jira/browse/KAFKA-1978
Repository: kafka
Description

Fixing configuration for testcase 0131


Diffs

  *   
system_test/replication_testsuite/testcase_0131/testcase_0131_properties.json 
(0324b6f327cb75389f9f851fa3ca744d22a5d915)

View Diffhttps://reviews.apache.org/r/31385/diff/



RE: Review Request 31385: Patch for KAFKA-1978

2015-02-25 Thread Abhishek Nigam
I added debug information and the reason is that the array size, 
replicaIdxMsgIdList is 2 but the number of files which are being validated are 
3.


2015-02-25 00:53:17,882 - INFO - array size: 2 (kafka_system_test_utils)
2015-02-25 00:53:17,882 - INFO - replicaFactor: 2 (kafka_system_test_utils)
2015-02-25 00:53:17,882 - INFO - replicaIdx: 3 (kafka_system_test_utils)

The code tries to index into the array using index 2 but since the size of the 
array is only 2 we get array out of bounds exception.

-Abhishek

From: Abhishek Nigam
Sent: Wednesday, February 25, 2015 9:43 AM
To: Guozhang Wang; kafka
Subject: RE: Review Request 31385: Patch for KAFKA-1978

2015-02-06 00:51:30,975 - INFO - 
==
2015-02-06 00:51:30,975 - INFO - Exception while running test list index out of 
range
2015-02-06 00:51:30,975 - INFO - 
==
Traceback (most recent call last):
  File 
/mnt/u001/kafka_replication_system_test/system_test/replication_testsuite/replica_basic_test.py,
 line 434, in runTest

kafka_system_test_utils.validate_simple_consumer_data_matched_across_replicas(self.systemTestEnv,
 self.testcaseEnv)
  File 
/mnt/u001/kafka_replication_system_test/system_test/utils/kafka_system_test_utils.py,
 line 2223, in validate_simple_consumer_data_matched_across_replicas
replicaIdxMsgIdList[replicaIdx - 1][topicPartition] = consumerMsgIdList
IndexError: list index out of range

-Abhishek

From: Guozhang Wang [nore...@reviews.apache.org] on behalf of Guozhang Wang 
[wangg...@gmail.com]
Sent: Tuesday, February 24, 2015 6:36 PM
To: Abhishek Nigam; Guozhang Wang; kafka
Subject: Re: Review Request 31385: Patch for KAFKA-1978

This is an automatically generated e-mail. To reply, visit: 
https://reviews.apache.org/r/31385/


From the diff file itself it's a bit hard to understand the issue and the 
solution as well. Could you elaborate on an out of bounds exception due to 
mis-configuration on the ticket?


- Guozhang Wang


On February 24th, 2015, 11:31 p.m. UTC, Abhishek Nigam wrote:

Review request for kafka.
By Abhishek Nigam.

Updated Feb. 24, 2015, 11:31 p.m.

Bugs: KAFKA-1978https://issues.apache.org/jira/browse/KAFKA-1978
Repository: kafka
Description

Fixing configuration for testcase 0131


Diffs

  *   
system_test/replication_testsuite/testcase_0131/testcase_0131_properties.json 
(0324b6f327cb75389f9f851fa3ca744d22a5d915)

View Diffhttps://reviews.apache.org/r/31385/diff/



[jira] [Updated] (KAFKA-1978) Replication test_0131 system test has been failing.

2015-02-24 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Nigam updated KAFKA-1978:
--
Attachment: KAFKA-1978.patch

 Replication test_0131 system test has been failing.
 ---

 Key: KAFKA-1978
 URL: https://issues.apache.org/jira/browse/KAFKA-1978
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Reporter: Abhishek Nigam
Assignee: Abhishek Nigam
 Attachments: KAFKA-1978.patch


 Issue is an out of bounds exception due to mis-configuration of the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1978) Replication test_0131 system test has been failing.

2015-02-24 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Nigam updated KAFKA-1978:
--
Status: Patch Available  (was: Open)

 Replication test_0131 system test has been failing.
 ---

 Key: KAFKA-1978
 URL: https://issues.apache.org/jira/browse/KAFKA-1978
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Reporter: Abhishek Nigam
Assignee: Abhishek Nigam
 Attachments: KAFKA-1978.patch


 Issue is an out of bounds exception due to mis-configuration of the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 31385: Patch for KAFKA-1978

2015-02-24 Thread Abhishek Nigam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31385/
---

Review request for kafka.


Bugs: KAFKA-1978
https://issues.apache.org/jira/browse/KAFKA-1978


Repository: kafka


Description
---

Fixing configuration for testcase 0131


Diffs
-

  system_test/replication_testsuite/testcase_0131/testcase_0131_properties.json 
0324b6f327cb75389f9f851fa3ca744d22a5d915 

Diff: https://reviews.apache.org/r/31385/diff/


Testing
---


Thanks,

Abhishek Nigam



[jira] [Commented] (KAFKA-1978) Replication test_0131 system test has been failing.

2015-02-24 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335663#comment-14335663
 ] 

Abhishek Nigam commented on KAFKA-1978:
---

Created reviewboard https://reviews.apache.org/r/31385/diff/
 against branch origin/trunk

 Replication test_0131 system test has been failing.
 ---

 Key: KAFKA-1978
 URL: https://issues.apache.org/jira/browse/KAFKA-1978
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Reporter: Abhishek Nigam
Assignee: Abhishek Nigam
 Attachments: KAFKA-1978.patch


 Issue is an out of bounds exception due to mis-configuration of the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-1978) Replication test_0131 system test has been failing.

2015-02-23 Thread Abhishek Nigam (JIRA)
Abhishek Nigam created KAFKA-1978:
-

 Summary: Replication test_0131 system test has been failing.
 Key: KAFKA-1978
 URL: https://issues.apache.org/jira/browse/KAFKA-1978
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Reporter: Abhishek Nigam
Assignee: Abhishek Nigam


Issue is an out of bounds exception due to mis-configuration of the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1978) Replication test_0131 system test has been failing.

2015-02-23 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Nigam updated KAFKA-1978:
--
Reviewer: Guozhang Wang
  Status: Patch Available  (was: Open)

 Replication test_0131 system test has been failing.
 ---

 Key: KAFKA-1978
 URL: https://issues.apache.org/jira/browse/KAFKA-1978
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Reporter: Abhishek Nigam
Assignee: Abhishek Nigam

 Issue is an out of bounds exception due to mis-configuration of the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-1978) Replication test_0131 system test has been failing.

2015-02-23 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Nigam updated KAFKA-1978:
--
Status: Open  (was: Patch Available)

 Replication test_0131 system test has been failing.
 ---

 Key: KAFKA-1978
 URL: https://issues.apache.org/jira/browse/KAFKA-1978
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Reporter: Abhishek Nigam
Assignee: Abhishek Nigam

 Issue is an out of bounds exception due to mis-configuration of the test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30809: Patch for KAFKA-1888

2015-02-18 Thread Abhishek Nigam


 On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 207
  https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line207
 
  This might end up in infinite loop if something goes wrong with 
  cluster, right?
  Should we have a maximum numnber of retries?
  What do you think?

This will not be an issue since for timed runs we will interrupt the thread 
anyway after a fixed time. This is the mode
which is being used in the upgrade test.


 On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 424
  https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line424
 
  Are you assuming that first argument will be some key?

If you take a look at the script I am expecting alternate parameters like 
-timedRun  -timeToSpawn 
Key is essentially the parameter name.


 On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 474
  https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line474
 
  what do you mean by rebuild state later?

What I meant was that between two runs for the rolling upgrade test we will not 
re-use any state from zookeeper or the brokers so I do not need to worry about 
clean shutdown.


 On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote:
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 77
  https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line77
 
  Why we need a flip?

The flip is needed to reset the get pointer in byte buffer to beginning of the 
byte buffer else we will get underflow exception.


- Abhishek


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/#review72786
---


On Feb. 18, 2015, 1:59 a.m., Abhishek Nigam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/30809/
 ---
 
 (Updated Feb. 18, 2015, 1:59 a.m.)
 
 
 Review request for kafka.
 
 
 Bugs: KAFKA-1888
 https://issues.apache.org/jira/browse/KAFKA-1888
 
 
 Repository: kafka
 
 
 Description
 ---
 
 patch for KAFKA-1888
 
 
 Diffs
 -
 
   build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
   core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
   system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
   system_test/broker_upgrade/bin/test.sh PRE-CREATION 
   system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
   system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
   system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/30809/diff/
 
 
 Testing
 ---
 
 Scripted it to run 20 times without any failures.
 Command-line: broker-upgrade/bin/test.sh dir1 dir2
 
 
 Thanks,
 
 Abhishek Nigam
 




Re: Review Request 30809: Patch for KAFKA-1888

2015-02-17 Thread Abhishek Nigam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/
---

(Updated Feb. 18, 2015, 1:59 a.m.)


Review request for kafka.


Bugs: KAFKA-1888
https://issues.apache.org/jira/browse/KAFKA-1888


Repository: kafka


Description (updated)
---

patch for KAFKA-1888


Diffs (updated)
-

  build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af 
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
  system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
  system_test/broker_upgrade/bin/test.sh PRE-CREATION 
  system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
  system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
  system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/30809/diff/


Testing
---

Scripted it to run 20 times without any failures.
Command-line: broker-upgrade/bin/test.sh dir1 dir2


Thanks,

Abhishek Nigam



[jira] [Assigned] (KAFKA-1778) Create new re-elect controller admin function

2015-02-17 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Nigam reassigned KAFKA-1778:
-

Assignee: Abhishek Nigam

 Create new re-elect controller admin function
 -

 Key: KAFKA-1778
 URL: https://issues.apache.org/jira/browse/KAFKA-1778
 Project: Kafka
  Issue Type: Sub-task
Reporter: Joe Stein
Assignee: Abhishek Nigam
 Fix For: 0.8.3


 kafka --controller --elect



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30809: Patch for KAFKA-1888

2015-02-09 Thread Abhishek Nigam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/
---

(Updated Feb. 9, 2015, 11:53 p.m.)


Review request for kafka.


Bugs: KAFKA-1888
https://issues.apache.org/jira/browse/KAFKA-1888


Repository: kafka


Description (updated)
---

Essentially this test does the following:
a) Start a java process with 3 threads
   Producer - producing continuously
   Consumer - consuming from latest
   Bootstrap consumer - started after a pause to bootstrap from beginning.
   
   It uses sequentially increasing numbers and timestamps to make sure we are 
not receiving out of order messages and do real-time validation. 
   
b) Script which wraps this and takes two directories which contain the kafka 
version specific jars:
kafka_2.10-0.8.3-SNAPSHOT-test.jar
kafka_2.10-0.8.3-SNAPSHOT.jar

The first argument is the directory containing the older version of the jars.
The second argument is the directory containing the newer version of the jars.

The reason for choosing directories was because there are two jars in these 
directories:


Diffs
-

  build.gradle c3e6bb839ad65c512c9db4695d2bb49b82c80da5 
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
  system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
  system_test/broker_upgrade/bin/test.sh PRE-CREATION 
  system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
  system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
  system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/30809/diff/


Testing (updated)
---

Scripted it to run 20 times without any failures.
Command-line: broker-upgrade/bin/test.sh dir1 dir2


Thanks,

Abhishek Nigam



Review Request 30809: Patch for KAFKA-1888

2015-02-09 Thread Abhishek Nigam

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30809/
---

Review request for kafka.


Bugs: KAFKA-1888
https://issues.apache.org/jira/browse/KAFKA-1888


Repository: kafka


Description
---

Cleaning up the scripts, forgot to add build file pulling in guava library

Fixing build.gradle


Diffs
-

  build.gradle c3e6bb839ad65c512c9db4695d2bb49b82c80da5 
  core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION 
  system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION 
  system_test/broker_upgrade/bin/test.sh PRE-CREATION 
  system_test/broker_upgrade/configs/server1.properties PRE-CREATION 
  system_test/broker_upgrade/configs/server2.properties PRE-CREATION 
  system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION 

Diff: https://reviews.apache.org/r/30809/diff/


Testing
---


Thanks,

Abhishek Nigam



[jira] [Assigned] (KAFKA-1888) Add a rolling upgrade system test

2015-01-27 Thread Abhishek Nigam (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Nigam reassigned KAFKA-1888:
-

Assignee: Abhishek Nigam

 Add a rolling upgrade system test
 ---

 Key: KAFKA-1888
 URL: https://issues.apache.org/jira/browse/KAFKA-1888
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Gwen Shapira
Assignee: Abhishek Nigam
 Fix For: 0.9.0


 To help test upgrades and compatibility between versions, it will be cool to 
 add a rolling-upgrade test to system tests:
 Given two versions (just a path to the jars?), check that you can do a
 rolling upgrade of the brokers from one version to another (using clients 
 from the old version) without losing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test

2015-01-22 Thread Abhishek Nigam (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288206#comment-14288206
 ] 

Abhishek Nigam commented on KAFKA-1888:
---

Gwen,
If you are not actively working on it can I pick it up.

 Add a rolling upgrade system test
 ---

 Key: KAFKA-1888
 URL: https://issues.apache.org/jira/browse/KAFKA-1888
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Gwen Shapira
Assignee: Gwen Shapira
 Fix For: 0.9.0


 To help test upgrades and compatibility between versions, it will be cool to 
 add a rolling-upgrade test to system tests:
 Given two versions (just a path to the jars?), check that you can do a
 rolling upgrade of the brokers from one version to another (using clients 
 from the old version) without losing data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)