[DISCUSS] KIP-39 Pinning controller to a broker
Hi, Can we please discuss this KIP. The background for this is that it allows us to pin controller to a broker. This is useful in a couple of scenarios: a) If we want to do a rolling bounce we can reduce the number of controller moves down to 1. b) Again pick a designated broker and reduce the number of partitions on it through admin reassign partitions and designate it as a controller. c) Dynamically move controller if we see any problems on the broker which it is running. Here is the wiki page https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker -Abhishek
Re: [DISCUSS] KIP-39 Pinning controller to a broker
Hi Jay/Neha, I just subscribed to the mailing list so I read your response but did not receive your email so adding the context into this email thread. " Agree with Jay on staying away from pinning roles to brokers. This is actually harder to operate and monitor. Regarding the problems you mentioned- 1. Reducing the controller moves during rolling bounce is useful but really something that should be handled by the tooling. The root cause is that currently the controller move is expensive. I think we'd be better off investing time and effort in thinning out the controller. Just moving to the batch write APIs in ZooKeeper will make a huge difference. 2. I'm not sure I understood the motivation behind moving partitions out of the controller broker. That seems like a proposal for a solution, but can you describe the problems you saw that affected controller functionality? Regarding the location of the controller, it seems there are 2 things you are suggesting: 1. Optimizing the strategy of picking a broker as the controller (e.g. least loaded node) 2. Moving the controller if a broker soft fails. I don't think #1 is worth the effort involved. The better way of addressing it is to make the controller thinner and faster. #2 is interesting since the problem is that while a broker fails, all state changes fail or are queued up which globally impacts the cluster. There are 2 alternatives - have a tool that allows you to move the controller or just kill the broker so the controller moves. I prefer the latter since it is simple and also because a misbehaving broker is better off shutdown anyway. Having said that, it will be helpful to know details of the problems you saw while operating the controller. I think understanding those will help guide the solution better. On Tue, Oct 20, 2015 at 12:49 PM, Jay Kreps <j...@confluent.io> wrote: > This seems like a step backwards--we really don't want people to manually > manage the location of the controller and try to manually balance > partitions off that broker. > > I think it might make sense to consider directly fixing the things you > actual want to fix: > 1. Two many controller moves--we could either just make this cheaper or > make the controller location more deterministic e.g. having the election > prefer the node with the smallest node id so there were fewer failovers in > rolling bounces. > 2. You seem to think having the controller on a normal node is a problem. > Can you elaborate on what the negative consequences you've observed? Let's > focus on fixing those. > > In general we've worked very hard to avoid having a bunch of dedicated > roles for different nodes and I would be very very loath to see us move > away from that philosophy. I have a fair amount of experience with both > homogenous systems that have a single role and also systems with many > differentiated roles and I really think that the differentiated approach > causes more problems than it solves for most deployments due to the added > complexity. > > I think we could also fix up this KIP a bit. For example it says there are > no public interfaces involved but surely there are new admin commands to > control the location? There are also some minor things like listing it as > released in 0.8.3 that seem wrong. > > -Jay > > On Tue, Oct 20, 2015 at 12:18 PM, Abhishek Nigam < > ani...@linkedin.com.invalid> wrote: > > > Hi, > > Can we please discuss this KIP. The background for this is that it allows > > us to pin controller to a broker. This is useful in a couple of > scenarios: > > a) If we want to do a rolling bounce we can reduce the number of > controller > > moves down to 1. > > b) Again pick a designated broker and reduce the number of partitions on > it > > through admin reassign partitions and designate it as a controller. > > c) Dynamically move controller if we see any problems on the broker which > > it is running. > > > > Here is the wiki page > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-39+Pinning+controller+to+broker > > > > -Abhishek > > > " I think based on the feedback we can limit the discussion to the rolling upgrade scenario and how best to address it. I think the only scenario which I have heard where we wanted to move controller off a broker was due to a bug where we had multiple controllers due to a bug which has since been fixed. I will update the KIP on how we can optimize the placement of controller (pinning it to a preferred broker id (potentially config enabled) ) if that sounds reasonable. Many of the ideas of the original KIP can still apply in the limited scope. -Abhishek
[jira] [Commented] (KAFKA-1599) Change preferred replica election admin command to handle large clusters
[ https://issues.apache.org/jira/browse/KAFKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901279#comment-14901279 ] Abhishek Nigam commented on KAFKA-1599: --- Copying this content verbatim from a newly created ticket which is dup (KAFKA-2552) of this one which details approach 4). I think it is unavoidable to do chaining because even with a more compact representation we might still run into this issue maybe with a larger json. "Essentially a generic approach to this which would require read and write side to change would be as follows: We designate a zookeeper path as scratch: Ex- /admin/scratch Write side When writing json to zookeeper we will chunk it into 1 MB units and store it in different zookeeper nodes from the sratch all but the first chunk. The first chunk will live in the original location as we have it today. Ex- /admin/reassign_partitions Each chunk will have the following format "json incompatible header" something other than "{" length of the zookeeper path to the next json chunk (0 means that this is the last chunk) zookeeper path of the next json chunk. length of chunk of json data blob. chunk of json data blob. We will write to this conceptual linked list back to front. Read side The zookeeper watch will be fired as before. While reading if we detect there are more chunks we will do synced read from zookeeper." > Change preferred replica election admin command to handle large clusters > > > Key: KAFKA-1599 > URL: https://issues.apache.org/jira/browse/KAFKA-1599 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8.2.0 > Reporter: Todd Palino >Assignee: Abhishek Nigam > Labels: newbie++ > > We ran into a problem with a cluster that has 70k partitions where we could > not trigger a preferred replica election for all topics and partitions using > the admin tool. Upon investigation, it was determined that this was because > the JSON object that was being written to the admin znode to tell the > controller to start the election was 1.8 MB in size. As the default Zookeeper > data size limit is 1MB, and it is non-trivial to change, we should come up > with a better way to represent the list of topics and partitions for this > admin command. > I have several thoughts on this so far: > 1) Trigger the command for all topics and partitions with a JSON object that > does not include an explicit list of them (i.e. a flag that says "all > partitions") > 2) Use a more compact JSON representation. Currently, the JSON contains a > 'partitions' key which holds a list of dictionaries that each have a 'topic' > and 'partition' key, and there must be one list item for each partition. This > results in a lot of repetition of key names that is unneeded. Changing this > to a format like this would be much more compact: > {'topics': {'topicName1': [0, 1, 2, 3], 'topicName2': [0,1]}, 'version': 1} > 3) Use a representation other than JSON. Strings are inefficient. A binary > format would be the most compact. This does put a greater burden on tools and > scripts that do not use the inbuilt libraries, but it is not too high. > 4) Use a representation that involves multiple znodes. A structured tree in > the admin command would probably provide the most complete solution. However, > we would need to make sure to not exceed the data size limit with a wide tree > (the list of children for any single znode cannot exceed the ZK data size of > 1MB) > Obviously, there could be a combination of #1 with a change in the > representation, which would likely be appropriate as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (KAFKA-1599) Change preferred replica election admin command to handle large clusters
[ https://issues.apache.org/jira/browse/KAFKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Nigam reassigned KAFKA-1599: - Assignee: Abhishek Nigam > Change preferred replica election admin command to handle large clusters > > > Key: KAFKA-1599 > URL: https://issues.apache.org/jira/browse/KAFKA-1599 > Project: Kafka > Issue Type: Improvement >Affects Versions: 0.8.2.0 >Reporter: Todd Palino > Assignee: Abhishek Nigam > Labels: newbie++ > > We ran into a problem with a cluster that has 70k partitions where we could > not trigger a preferred replica election for all topics and partitions using > the admin tool. Upon investigation, it was determined that this was because > the JSON object that was being written to the admin znode to tell the > controller to start the election was 1.8 MB in size. As the default Zookeeper > data size limit is 1MB, and it is non-trivial to change, we should come up > with a better way to represent the list of topics and partitions for this > admin command. > I have several thoughts on this so far: > 1) Trigger the command for all topics and partitions with a JSON object that > does not include an explicit list of them (i.e. a flag that says "all > partitions") > 2) Use a more compact JSON representation. Currently, the JSON contains a > 'partitions' key which holds a list of dictionaries that each have a 'topic' > and 'partition' key, and there must be one list item for each partition. This > results in a lot of repetition of key names that is unneeded. Changing this > to a format like this would be much more compact: > {'topics': {'topicName1': [0, 1, 2, 3], 'topicName2': [0,1]}, 'version': 1} > 3) Use a representation other than JSON. Strings are inefficient. A binary > format would be the most compact. This does put a greater burden on tools and > scripts that do not use the inbuilt libraries, but it is not too high. > 4) Use a representation that involves multiple znodes. A structured tree in > the admin command would probably provide the most complete solution. However, > we would need to make sure to not exceed the data size limit with a wide tree > (the list of children for any single znode cannot exceed the ZK data size of > 1MB) > Obviously, there could be a combination of #1 with a change in the > representation, which would likely be appropriate as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2552) Certain admin commands such as partition assignment fail on large clusters
[ https://issues.apache.org/jira/browse/KAFKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804560#comment-14804560 ] Abhishek Nigam commented on KAFKA-2552: --- Essentially a generic approach to this which would require read and write side to change would be as follows: We designate a zookeeper path as scratch: Ex- /admin/scratch Write side When writing json to zookeeper we will chunk it into 1 MB units and store it in different zookeeper nodes from the sratch all but the first chunk. The first chunk will live in the original location as we have it today. Ex- /admin/reassign_partitions Each chunk will have the following format "json incompatible header" something other than "{" length of the zookeeper path to the next json chunk (0 means that this is the last chunk) zookeeper path of the next json chunk. length of chunk of json data blob. chunk of json data blob. We will write to this conceptual linked list back to front. Read side The zookeeper watch will be fired as before. While reading if we detect there are more chunks we will do synced read from zookeeper. > Certain admin commands such as partition assignment fail on large clusters > -- > > Key: KAFKA-2552 > URL: https://issues.apache.org/jira/browse/KAFKA-2552 > Project: Kafka > Issue Type: Improvement >Reporter: Abhishek Nigam >Assignee: Abhishek Nigam > > This happens because the json generated is greater than 1 MB and exceeds the > default data limit of zookeeper nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time
[ https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14697427#comment-14697427 ] Abhishek Nigam commented on KAFKA-1387: --- Thanks a lot for digging into this. Not sure if it helps but in the past when I saw this issue it went like this: a) Say session time out is 30 seconds. b) If we kill the instance which create the zookeeper ephemeral node and bring it back up quickly (less than 30 seconds) we would find the previous session data (ephemeral node) still exists. The solution was to assume the existing data was from an old session, delete and re-create it during startup. However, we were processing the zookeeper events on a single thread. On Fri, Aug 14, 2015 at 6:34 AM, Flavio Junqueira (JIRA) j...@apache.org Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time - Key: KAFKA-1387 URL: https://issues.apache.org/jira/browse/KAFKA-1387 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Fedor Korotkiy Priority: Blocker Labels: newbie, patch, zkclient-problems Attachments: kafka-1387.patch Kafka broker re-registers itself in zookeeper every time handleNewSession() callback is invoked. https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala Now imagine the following sequence of events. 1) Zookeeper session reestablishes. handleNewSession() callback is queued by the zkClient, but not invoked yet. 2) Zookeeper session reestablishes again, queueing callback second time. 3) First callback is invoked, creating /broker/[id] ephemeral path. 4) Second callback is invoked and it tries to create /broker/[id] path using createEphemeralPathExpectConflictHandleZKBug() function. But the path is already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting stuck in the infinite loop. Seems like controller election code have the same issue. I'am able to reproduce this issue on the 0.8.1 branch from github using the following configs. # zookeeper tickTime=10 dataDir=/tmp/zk/ clientPort=2101 maxClientCnxns=0 # kafka broker.id=1 log.dir=/tmp/kafka zookeeper.connect=localhost:2101 zookeeper.connection.timeout.ms=100 zookeeper.sessiontimeout.ms=100 Just start kafka and zookeeper and then pause zookeeper several times using Ctrl-Z. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Can someone review ticket 1778
Hi Guozhang, Can you please re-review KAFKA 1778 design. Just to provide background for this ticket. This was a sub-ticket of kafka admin commands KIP-4. The goal of this was to avoid cascading controller moves maybe during rolling broker bounce. The approaches discussed were as follows: a) Use a preferred controller admin command which can be used to dynamically indicate a preferred controller. b) Use configuration to set a whitelist or blacklist of brokers which are eligible to become a controller. Can we have consensus on how we want to resolve this issue. -Abhishek On Sun, May 17, 2015 at 10:55 PM, Abhishek Nigam ani...@linkedin.com wrote: Hi, For pinning the controller to a broker I have proposed a design. Can someone review the design and let me know if it looks ok. I can then submit a patch for this ticket within the next couple of weeks. -Abhishek
[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function
[ https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692370#comment-14692370 ] Abhishek Nigam commented on KAFKA-1778: --- Hi Guozhang, I agree 100% with you. Can you tell me what is the best way to move forward on this on the open source side. -Abhishek On Tue, Aug 11, 2015 at 2:30 PM, Guozhang Wang (JIRA) j...@apache.org Create new re-elect controller admin function - Key: KAFKA-1778 URL: https://issues.apache.org/jira/browse/KAFKA-1778 Project: Kafka Issue Type: Sub-task Reporter: Joe Stein Assignee: Abhishek Nigam Fix For: 0.8.3 kafka --controller --elect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function
[ https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692628#comment-14692628 ] Abhishek Nigam commented on KAFKA-1778: --- Thanks Guozhang, I will write it up in a nice proposal. -Abhishek On Tue, Aug 11, 2015 at 3:28 PM, Guozhang Wang (JIRA) j...@apache.org Create new re-elect controller admin function - Key: KAFKA-1778 URL: https://issues.apache.org/jira/browse/KAFKA-1778 Project: Kafka Issue Type: Sub-task Reporter: Joe Stein Assignee: Abhishek Nigam Fix For: 0.8.3 kafka --controller --elect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function
[ https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565837#comment-14565837 ] Abhishek Nigam commented on KAFKA-1778: --- I believe what you are suggesting is that we can have a group of brokers flagged as potential brokers and all controller elections will be limited to that subset of brokers. Do I need to provide any failsafe in case all the flagged brokers are not able to participate in the required election and we are controller-less? -Abhishek Create new re-elect controller admin function - Key: KAFKA-1778 URL: https://issues.apache.org/jira/browse/KAFKA-1778 Project: Kafka Issue Type: Sub-task Reporter: Joe Stein Assignee: Abhishek Nigam Fix For: 0.8.3 kafka --controller --elect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function
[ https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14561833#comment-14561833 ] Abhishek Nigam commented on KAFKA-1778: --- Joel, What I was proposing was that all the brokers will watch the ready-to-serve-as-controller ephemeral node. In the scenario outlined where the preferred controller dies after the election is over but before it can write to the /controller node all the brokers will get this notification. Then there will be another round of elections in that case. The controller is the one which pulls from /admin/next_controller persistent zookeeper node and also keeps a watch on it. If it detects this has been changed and the chosen broker id is different from it it will start the preferred controller move process. Also, can we avoid the message from current controller to the preferred controller by having all brokers just watch the admin/next_controller znode? This is definitely a better approach where zookeeper node can be used to achieve this messaging. Jun, In my opinion static assignment suffers from some issues where if the pre-determined controller goes down what happens or runs into any issues what happens. Create new re-elect controller admin function - Key: KAFKA-1778 URL: https://issues.apache.org/jira/browse/KAFKA-1778 Project: Kafka Issue Type: Sub-task Reporter: Joe Stein Assignee: Abhishek Nigam Fix For: 0.8.3 kafka --controller --elect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function
[ https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550781#comment-14550781 ] Abhishek Nigam commented on KAFKA-1778: --- Jun, The way I see it pinning the controller gives us multiple benefits: a) If SREs are doing rolling upgrades they can set aside the broker on which the controller is pinned as the broker which they touch last. This way there are only a limited number of controller moves and we can get more availability of the controller as a result as opposed to un-predictable number of controller moves. b) I think more importantly if we do manual partition assignment we can set aside a broker to have very few partitions and this would reduce the impact on the controller from serving too many produce and consume events. To summarize it enables us to isolate the controller from the broker functionality potentially enabling us to push the brokers harder. Joel, You are spot on. Since now all the brokers will be watching for the preferred controller node we can have the following situations: a) All of them know about the preferred controller (zookeeper metadata has flowed to everyone). In this case the preferred controller would become the leader right away. b) If some of them know about the preferred controller they will participate in the election and it is possible that somebody other than the preferred controller becomes the leader. What will happen in this case is that eventually this new controller will figure out that the preferred controller is available (thru zookeeper watch) to serve traffic it will resign and trigger another round of elections. c) If none of them know about the preferred controller the behavior will be similar as above. Create new re-elect controller admin function - Key: KAFKA-1778 URL: https://issues.apache.org/jira/browse/KAFKA-1778 Project: Kafka Issue Type: Sub-task Reporter: Joe Stein Assignee: Abhishek Nigam Fix For: 0.8.3 kafka --controller --elect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Can someone review ticket 1778
Hi, For pinning the controller to a broker I have proposed a design. Can someone review the design and let me know if it looks ok. I can then submit a patch for this ticket within the next couple of weeks. -Abhishek
[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547566#comment-14547566 ] Abhishek Nigam commented on KAFKA-1888: --- Geoffrey, Thanks for the heads up. I saw a related article that you are planning to work on the API compatibility testing as well. I am taking myself off of this ticket as it looks like this ticket will be subsumed by your work. -Abhishek Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Assignee: Abhishek Nigam Fix For: 0.9.0 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Nigam updated KAFKA-1888: -- Assignee: (was: Abhishek Nigam) Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Fix For: 0.9.0 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time
[ https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533402#comment-14533402 ] Abhishek Nigam commented on KAFKA-1387: --- I have seen the ephemeral node issue before and the fix made there was exactly what Thomas mentioned: It seems the simplest thing to do would be to just delete the conflicted node and write the truth about the process environment it knows. Is there a reason why the approach outlined by Thomas does not work for kafka? Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time - Key: KAFKA-1387 URL: https://issues.apache.org/jira/browse/KAFKA-1387 Project: Kafka Issue Type: Bug Affects Versions: 0.8.1.1 Reporter: Fedor Korotkiy Priority: Blocker Labels: newbie, patch, zkclient-problems Attachments: kafka-1387.patch Kafka broker re-registers itself in zookeeper every time handleNewSession() callback is invoked. https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala Now imagine the following sequence of events. 1) Zookeeper session reestablishes. handleNewSession() callback is queued by the zkClient, but not invoked yet. 2) Zookeeper session reestablishes again, queueing callback second time. 3) First callback is invoked, creating /broker/[id] ephemeral path. 4) Second callback is invoked and it tries to create /broker/[id] path using createEphemeralPathExpectConflictHandleZKBug() function. But the path is already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting stuck in the infinite loop. Seems like controller election code have the same issue. I'am able to reproduce this issue on the 0.8.1 branch from github using the following configs. # zookeeper tickTime=10 dataDir=/tmp/zk/ clientPort=2101 maxClientCnxns=0 # kafka broker.id=1 log.dir=/tmp/kafka zookeeper.connect=localhost:2101 zookeeper.connection.timeout.ms=100 zookeeper.sessiontimeout.ms=100 Just start kafka and zookeeper and then pause zookeeper several times using Ctrl-Z. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393806#comment-14393806 ] Abhishek Nigam commented on KAFKA-1888: --- Hi Gwen/Ashish, I need to finish up something else and I will only be able to come back to this ticket in 2-3 weeks. Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Assignee: Abhishek Nigam Fix For: 0.9.0 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30809: Patch for KAFKA-1888
On March 31, 2015, 9:20 p.m., Joel Koshy wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 1 https://reviews.apache.org/r/30809/diff/4/?file=903374#file903374line1 This should definitely not be in tools - this should probably live somewhere under clients/test. I don't think those are currently exported though, so we will need to modify build.gradle. However, per other comments below I'm not sure this should be part of system tests since it is (by definition long running). Will do. On March 31, 2015, 9:20 p.m., Joel Koshy wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 49 https://reviews.apache.org/r/30809/diff/4/?file=903374#file903374line49 It would help a lot if you could add comments describing what validation is done. For e.g., I'm unclear on why we need the complicated file-based signaling mechanism. So a high-level description would help a lot. More importantly, I really think we should separate continuous validation from broker upgrade which is the focus of KAFKA-1888 In order to do a broker upgrade test, we don't need any additional code. We just instantiate the producer performance and consumer via system test utils. Keep those on the old jar. The cluster will start with the old jar as well and during the test we bounce in the latest jar (the system test utils will need to be updated to support this). We then do the standard system test validation - that all messages sent were received. I wanted to have two (topic, partition) tuples with leader on each broker. I have decided to use a single topic with multiple partitions rather than using two topics which could have also worked. The reason for picking the first approach was that essentially if I wanted to leverage continuous validation test outside of system test framework with in a test cluster with other topics. In order to illustrate why the second approach won't work in that scenario is that if we have 3 brokers with one partition if I create 3 topics (T1P1, T2P1, T3P1) then the following would be a valid assignment based on existing broker assignment algorithm. B1B2 B3 T1P1 TXP1 TXP2 T2P1 TYP1 TYP2 T3P1 where TX and TY are other production topics running in that cluster. In this case all the leaders have landed on the same broker. However the first approach precludes this possibility. The file signalling was to workaround the fact that the most commonly used client does not have capability to consume from a particular partition. The way I have set it up the file signalling acts as a barrier. We make sure all the producer/consumer pairs have been instantiated with the hope being that they have talked to zookeeper and reserved their parition. Once both the consumers have been instantiated we expect themselves to have bound themselves to a particular partition we can now let the producers run in both the instances and this way we are assured that the consumer should never receive data from same producer. On March 31, 2015, 9:20 p.m., Joel Koshy wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 52 https://reviews.apache.org/r/30809/diff/4/?file=903374#file903374line52 This appears to be for rate-limiting the producer but can be more general than that. It would help to add a comment describing its purpose. Also, should probably be private This is a poor man's rate limiter as compared to guava rate limiter. I will make it private. On March 31, 2015, 9:20 p.m., Joel Koshy wrote: system_test/broker_upgrade/bin/test-broker-upgrade.sh, line 1 https://reviews.apache.org/r/30809/diff/4/?file=903376#file903376line1 This appears to be a one-off script to set up the test. This needs to be done within the system test framework which already has a number of utilities that do similar things. One other comment is that the patch is for an upgrade test, but I think it is a bit confusing to mix this with CVT. The continuous validation test will be useful outside of the system test framework. This was an attempt to leverage CVT in the system test setting. I think since strong objections have been raised against adopting this approach I will leave a comment on this patch accordingly. - Abhishek --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review78270 --- On March 23, 2015, 6:54 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 23, 2015, 6:54 p.m.) Review request for kafka. Bugs: KAFKA
Re: Review Request 30809: Patch for KAFKA-1888
On April 2, 2015, 1:38 a.m., Jun Rao wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, lines 431-437 https://reviews.apache.org/r/30809/diff/4/?file=903374#file903374line431 Could we add a description of the test (what kind of data is generated, how does consumer to the verification, what kind of output is generated, etc)? The data which is generated is very simple - increasing sequence of longs with timestamp. The producer keeps track of the newest sequence number, timestamp which it has sent. The consumer keeps track of the last sequence number and timestamp which it has received. The system test will interrupt the CVT and compare the sequence numbers between the producer and the sender. If they do not line up then it is an error. (If either the producer or consumer threads terminate un-expectedly before they have been interrupted it will be flagged as an error) If the test fails then the data logs from the producer and consumer are not removed and can be inspected. The idea behind putting the consumer and producer in the same JVM was orthogonal to system test and was in case it is used in a test cluster hosting other topics it makes easy to get hands on some things like delta etc. However, I think there is very strong objection to adopting this for system tests which are short-lived in nature. Unless there is support for the approach I have taken so far I plan to revert to the existing approach of spawning multiple JVMs for producer and consumer. I will change the bash script to be in python similar to what other system tests do. On April 2, 2015, 1:38 a.m., Jun Rao wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, lines 440-454 https://reviews.apache.org/r/30809/diff/4/?file=903374#file903374line440 Could we add a description of each command line option? I need to add more documentation. I will add this in. - Abhishek --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review78630 --- On March 23, 2015, 6:54 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 23, 2015, 6:54 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Updated the RB with Gwen's comments, Beckett's comments and a subset of Guozhang's comments Diffs - bin/kafka-run-class.sh 881f578a8f5c796fe23415b978c1ad35869af76e core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION core/src/main/scala/kafka/utils/ShutdownableThread.scala fc226c863095b7761290292cd8755cd7ad0f155c system_test/broker_upgrade/bin/test-broker-upgrade.sh PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
Re: Review Request 30809: Patch for KAFKA-1888
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 23, 2015, 6:54 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description (updated) --- Updated the RB with Gwen's comments, Beckett's comments and a subset of Guozhang's comments Diffs (updated) - bin/kafka-run-class.sh 881f578a8f5c796fe23415b978c1ad35869af76e core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION core/src/main/scala/kafka/utils/ShutdownableThread.scala fc226c863095b7761290292cd8755cd7ad0f155c system_test/broker_upgrade/bin/test-broker-upgrade.sh PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376399#comment-14376399 ] Abhishek Nigam commented on KAFKA-1888: --- Updated reviewboard https://reviews.apache.org/r/30809/diff/ against branch origin/trunk Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Assignee: Abhishek Nigam Fix For: 0.9.0 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Nigam updated KAFKA-1888: -- Status: Patch Available (was: Open) Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Assignee: Abhishek Nigam Fix For: 0.9.0 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Nigam updated KAFKA-1888: -- Attachment: KAFKA-1888_2015-03-23_11:54:25.patch Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Assignee: Abhishek Nigam Fix For: 0.9.0 Attachments: KAFKA-1888_2015-03-23_11:54:25.patch To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1778) Create new re-elect controller admin function
[ https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370385#comment-14370385 ] Abhishek Nigam commented on KAFKA-1778: --- I have a design for pinning the controller to a broker: e we want to pin the controller to broker id x. Handling the admin request in the controller: a) We send the admin request to the controller. b) It will create a persistent zookeeper node /admin/next_controller with data x. c) It will then pull the information about broker id x to see if it is up and running through the alive broker list. d) If the broker is up and running it will start 3-way handshake with x. e) It will start a watch on /admin/ready_to_serve_as_controller zookeeper node. f) It will send a message to the broker to tell it that it should become ready to serve as next_controller. g) Broker x on receiving this message will create ephemeral node /admin/ready_to_server_as_controller. h) Controller observes this change. h) At this point the current controller will resign. Changes in the election code: a) All the brokers will pull from /admin/ready_to_server_as_controller with a watch. b) If the brokers find that if this znode exists and their broker.id does not match the id specified in this ephemeral node they will simply not participate in the leader election. c) Broker x will rightfully takes its place as the next controller. c) The watches will be used in case broker x comes back to life. d) In that case if I am the controller then I will resign. Changes in the controller startup code: a) Always pull from the /admin/next_controller for data changes as well as new data. b) If there is any change try to setup the next broker similar to what has been specified in handling the admin request in the controller. Create new re-elect controller admin function - Key: KAFKA-1778 URL: https://issues.apache.org/jira/browse/KAFKA-1778 Project: Kafka Issue Type: Sub-task Reporter: Joe Stein Assignee: Abhishek Nigam Fix For: 0.8.3 kafka --controller --elect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (KAFKA-1778) Create new re-elect controller admin function
[ https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on KAFKA-1778 started by Abhishek Nigam. - Create new re-elect controller admin function - Key: KAFKA-1778 URL: https://issues.apache.org/jira/browse/KAFKA-1778 Project: Kafka Issue Type: Sub-task Reporter: Joe Stein Assignee: Abhishek Nigam Fix For: 0.8.3 kafka --controller --elect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-2003) Add upgrade tests
[ https://issues.apache.org/jira/browse/KAFKA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357325#comment-14357325 ] Abhishek Nigam commented on KAFKA-2003: --- Hi Gwen, I did not realize might patch got uploaded to RB but the link was not attached to the jira. I just added it in the comments section of 1888. https://reviews.apache.org/r/30809/ Add upgrade tests - Key: KAFKA-2003 URL: https://issues.apache.org/jira/browse/KAFKA-2003 Project: Kafka Issue Type: Improvement Reporter: Gwen Shapira Assignee: Ashish K Singh To test protocol changes, compatibility and upgrade process, we need a good way to test different versions of the product together and to test end-to-end upgrade process. For example, for 0.8.2 to 0.8.3 test we want to check: * Can we start a cluster with a mix of 0.8.2 and 0.8.3 brokers? * Can a cluster of 0.8.3 brokers bump the protocol level one broker at a time? * Can 0.8.2 clients run against a cluster of 0.8.3 brokers? There are probably more questions. But an automated framework that can test those and report results will be a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30809: Patch for KAFKA-1888
On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 183 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line183 This is essentially a sync approach, can we use callback to do this? This is intentional. We want to make sure the event has successfully reached the brokers. This enables us to form a reasonable expectation of what the consumer should expect. On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 184 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line184 When a send fails, should we at least log the sequence number? I log the exception and the logger gives me the timestamp in the logs. Maybe I am missing something. Can you explain the rationale of why we would want to log the sequence number on the producer side when send fails. On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 321 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line321 Similar to producer, can we log the expected sequence number and the seq we actually saw? Sure in the cases where this a mismatch I could do that. On March 12, 2015, 12:13 a.m., Jiangjie Qin wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 386 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line386 Can we use KafkaThread here? I will take a look at that. - Abhishek --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review76173 --- On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
Re: Review Request 30809: Patch for KAFKA-1888
On March 11, 2015, 11:12 p.m., Gwen Shapira wrote: This looks like a very good start. I think the framework is flexible enough to allow us to add a variety of upgrade tests. I'm looking forward to it. I have few comments, but mostly I'm still confused on how this will be used. Perhaps more comments or even a readme is in order You wrote that we invoke test.sh dir1 dir2, what should each directory contain? just the kafka jar of different versions? or an entire installation (including bin/ and conf/)? Which one of the directories should be the newer and which is older? does it matter? Which version of clients will be used. Perhaps a more descriptive name for test.sh can help too. I'm guessing we'll have a whole collection of those test scripts soon. Gwen The directory containing the kafka jars. kafka_2.10-0.8.3-SNAPSHOT.jar kafka-clients-0.8.3-SNAPSHOT.jar The other jars are shared between both the kafka brokers. On March 11, 2015, 11:12 p.m., Gwen Shapira wrote: build.gradle, line 209 https://reviews.apache.org/r/30809/diff/3/?file=889854#file889854line209 This should probably be a test dependency (if needed at all) Packaging Guava will be a pain, since so many systems use different versions of Guava and they are all incompatible. Guava provides an excellent rate limiter which I am using in the test and have used in the past. When you talk about packaging we are already pulling in other external libraries like zookeeper with a specific version which the applications might be using extensively and might similarly run into conflicts. If you have a suggestion for a library which provides rate limiting(less popular) than guava then I can use that instead otherwise I will move this dependency to the test for now. On March 11, 2015, 11:12 p.m., Gwen Shapira wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, lines 409-440 https://reviews.apache.org/r/30809/diff/3/?file=889855#file889855line409 Do we really want to do this? We are using joptsimple for a bunch of other tools. It is easier to read, maintain, nice error messages, help screen, etc. Thanks, I will switch to jobOpts. On March 11, 2015, 11:12 p.m., Gwen Shapira wrote: system_test/broker_upgrade/bin/kafka-run-class.sh, lines 152-156 https://reviews.apache.org/r/30809/diff/3/?file=889856#file889856line152 Why did we decide to duplicate this entire file? The only difference is that it takes an additional argument which contains the directory from which the kafka jars should be pulled. Would you recommend adding it to the original script as an optional argument? - Abhishek --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review76157 --- On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355921#comment-14355921 ] Abhishek Nigam commented on KAFKA-1888: --- Hi Gwen, I am not sure why the link is not showing up. Here you go: https://reviews.apache.org/r/30809/ Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Assignee: Abhishek Nigam Fix For: 0.9.0 To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30809: Patch for KAFKA-1888
On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 400 https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line400 The common format of commenting is : // this is a comment Personally I don't mind, but thats kind of a standard that I understood from the reviews that I got. IDE setup was a little messed up. I looked up kafka coding guidelines and there does not seem to be anything about comments so made the indenting consistent using the IDE. - Abhishek --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review72786 --- On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
Re: Review Request 30809: Patch for KAFKA-1888
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description (updated) --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs (updated) - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
[jira] [Commented] (KAFKA-2003) Add upgrade tests
[ https://issues.apache.org/jira/browse/KAFKA-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353922#comment-14353922 ] Abhishek Nigam commented on KAFKA-2003: --- Hi Ashish/Gwen, Can you review KAFKA-1888 patch. I have just updated a cleaned up patch with all the comments I have gotten so far. I would prefer if we keep these patches distinct. a) We could for example commit KAFKA-1888 b) Once Ashish is ready with the new test which covers all 8 combinations of different versions of producers/consumers and brokers and any additional stuff he is planning to do you can simply subsume KAFKA-1888. This would enable us to use this patch here locally till the time we have a working superset which is represented by this ticket. Add upgrade tests - Key: KAFKA-2003 URL: https://issues.apache.org/jira/browse/KAFKA-2003 Project: Kafka Issue Type: Improvement Reporter: Gwen Shapira Assignee: Ashish K Singh To test protocol changes, compatibility and upgrade process, we need a good way to test different versions of the product together and to test end-to-end upgrade process. For example, for 0.8.2 to 0.8.3 test we want to check: * Can we start a cluster with a mix of 0.8.2 and 0.8.3 brokers? * Can a cluster of 0.8.3 brokers bump the protocol level one broker at a time? * Can 0.8.2 clients run against a cluster of 0.8.3 brokers? There are probably more questions. But an automated framework that can test those and report results will be a good start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30809: Patch for KAFKA-1888
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review73061 --- core/src/main/scala/kafka/tools/ContinuousValidationTest.java https://reviews.apache.org/r/30809/#comment119244 Flip is needed to reset the pointer to beginning of byte buffer. - Abhishek Nigam On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
Re: Review Request 30809: Patch for KAFKA-1888
On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 168 https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line168 same here can we use isInterrupted()? http://docs.oracle.com/javase/tutorial/essential/concurrency/interrupt.html I want to check if the current thread is interrupted. The link you sent out is useful if I wanted to query whether another thread was interrupted. On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 280 https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line280 Can we put this in a separate method like init(). Constructor can be used mainly for assignment. what do you think? Moved the launching of the threads to an init method. On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 298 https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line298 When is the blockingCallInterrupted set to true? Got rid of this. On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 324 https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line324 formatting spaces. Will there be a case where : (evt.sequenceId lastEventSeenSequenceId.get() evt.eventProducedTimestamp lastEventSeenTimeProduced.get() This will happen when the sequence numbers wraparound. On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 61 https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line61 Any reason for not making this final? static variables should come before Instance variables. Its a common standard to specify instance variables with _ like : _groupId. Fixed this. - Abhishek --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review72786 --- On March 9, 2015, 11:55 p.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated March 9, 2015, 11:55 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Fixing the tests based on Mayuresh comments, code cleanup after proper IDE setup Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
RE: Review Request 31385: Patch for KAFKA-1978
2015-02-06 00:51:30,975 - INFO - == 2015-02-06 00:51:30,975 - INFO - Exception while running test list index out of range 2015-02-06 00:51:30,975 - INFO - == Traceback (most recent call last): File /mnt/u001/kafka_replication_system_test/system_test/replication_testsuite/replica_basic_test.py, line 434, in runTest kafka_system_test_utils.validate_simple_consumer_data_matched_across_replicas(self.systemTestEnv, self.testcaseEnv) File /mnt/u001/kafka_replication_system_test/system_test/utils/kafka_system_test_utils.py, line 2223, in validate_simple_consumer_data_matched_across_replicas replicaIdxMsgIdList[replicaIdx - 1][topicPartition] = consumerMsgIdList IndexError: list index out of range -Abhishek From: Guozhang Wang [nore...@reviews.apache.org] on behalf of Guozhang Wang [wangg...@gmail.com] Sent: Tuesday, February 24, 2015 6:36 PM To: Abhishek Nigam; Guozhang Wang; kafka Subject: Re: Review Request 31385: Patch for KAFKA-1978 This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31385/ From the diff file itself it's a bit hard to understand the issue and the solution as well. Could you elaborate on an out of bounds exception due to mis-configuration on the ticket? - Guozhang Wang On February 24th, 2015, 11:31 p.m. UTC, Abhishek Nigam wrote: Review request for kafka. By Abhishek Nigam. Updated Feb. 24, 2015, 11:31 p.m. Bugs: KAFKA-1978https://issues.apache.org/jira/browse/KAFKA-1978 Repository: kafka Description Fixing configuration for testcase 0131 Diffs * system_test/replication_testsuite/testcase_0131/testcase_0131_properties.json (0324b6f327cb75389f9f851fa3ca744d22a5d915) View Diffhttps://reviews.apache.org/r/31385/diff/
RE: Review Request 31385: Patch for KAFKA-1978
I added debug information and the reason is that the array size, replicaIdxMsgIdList is 2 but the number of files which are being validated are 3. 2015-02-25 00:53:17,882 - INFO - array size: 2 (kafka_system_test_utils) 2015-02-25 00:53:17,882 - INFO - replicaFactor: 2 (kafka_system_test_utils) 2015-02-25 00:53:17,882 - INFO - replicaIdx: 3 (kafka_system_test_utils) The code tries to index into the array using index 2 but since the size of the array is only 2 we get array out of bounds exception. -Abhishek From: Abhishek Nigam Sent: Wednesday, February 25, 2015 9:43 AM To: Guozhang Wang; kafka Subject: RE: Review Request 31385: Patch for KAFKA-1978 2015-02-06 00:51:30,975 - INFO - == 2015-02-06 00:51:30,975 - INFO - Exception while running test list index out of range 2015-02-06 00:51:30,975 - INFO - == Traceback (most recent call last): File /mnt/u001/kafka_replication_system_test/system_test/replication_testsuite/replica_basic_test.py, line 434, in runTest kafka_system_test_utils.validate_simple_consumer_data_matched_across_replicas(self.systemTestEnv, self.testcaseEnv) File /mnt/u001/kafka_replication_system_test/system_test/utils/kafka_system_test_utils.py, line 2223, in validate_simple_consumer_data_matched_across_replicas replicaIdxMsgIdList[replicaIdx - 1][topicPartition] = consumerMsgIdList IndexError: list index out of range -Abhishek From: Guozhang Wang [nore...@reviews.apache.org] on behalf of Guozhang Wang [wangg...@gmail.com] Sent: Tuesday, February 24, 2015 6:36 PM To: Abhishek Nigam; Guozhang Wang; kafka Subject: Re: Review Request 31385: Patch for KAFKA-1978 This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31385/ From the diff file itself it's a bit hard to understand the issue and the solution as well. Could you elaborate on an out of bounds exception due to mis-configuration on the ticket? - Guozhang Wang On February 24th, 2015, 11:31 p.m. UTC, Abhishek Nigam wrote: Review request for kafka. By Abhishek Nigam. Updated Feb. 24, 2015, 11:31 p.m. Bugs: KAFKA-1978https://issues.apache.org/jira/browse/KAFKA-1978 Repository: kafka Description Fixing configuration for testcase 0131 Diffs * system_test/replication_testsuite/testcase_0131/testcase_0131_properties.json (0324b6f327cb75389f9f851fa3ca744d22a5d915) View Diffhttps://reviews.apache.org/r/31385/diff/
[jira] [Updated] (KAFKA-1978) Replication test_0131 system test has been failing.
[ https://issues.apache.org/jira/browse/KAFKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Nigam updated KAFKA-1978: -- Attachment: KAFKA-1978.patch Replication test_0131 system test has been failing. --- Key: KAFKA-1978 URL: https://issues.apache.org/jira/browse/KAFKA-1978 Project: Kafka Issue Type: Bug Components: system tests Reporter: Abhishek Nigam Assignee: Abhishek Nigam Attachments: KAFKA-1978.patch Issue is an out of bounds exception due to mis-configuration of the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1978) Replication test_0131 system test has been failing.
[ https://issues.apache.org/jira/browse/KAFKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Nigam updated KAFKA-1978: -- Status: Patch Available (was: Open) Replication test_0131 system test has been failing. --- Key: KAFKA-1978 URL: https://issues.apache.org/jira/browse/KAFKA-1978 Project: Kafka Issue Type: Bug Components: system tests Reporter: Abhishek Nigam Assignee: Abhishek Nigam Attachments: KAFKA-1978.patch Issue is an out of bounds exception due to mis-configuration of the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Review Request 31385: Patch for KAFKA-1978
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/31385/ --- Review request for kafka. Bugs: KAFKA-1978 https://issues.apache.org/jira/browse/KAFKA-1978 Repository: kafka Description --- Fixing configuration for testcase 0131 Diffs - system_test/replication_testsuite/testcase_0131/testcase_0131_properties.json 0324b6f327cb75389f9f851fa3ca744d22a5d915 Diff: https://reviews.apache.org/r/31385/diff/ Testing --- Thanks, Abhishek Nigam
[jira] [Commented] (KAFKA-1978) Replication test_0131 system test has been failing.
[ https://issues.apache.org/jira/browse/KAFKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335663#comment-14335663 ] Abhishek Nigam commented on KAFKA-1978: --- Created reviewboard https://reviews.apache.org/r/31385/diff/ against branch origin/trunk Replication test_0131 system test has been failing. --- Key: KAFKA-1978 URL: https://issues.apache.org/jira/browse/KAFKA-1978 Project: Kafka Issue Type: Bug Components: system tests Reporter: Abhishek Nigam Assignee: Abhishek Nigam Attachments: KAFKA-1978.patch Issue is an out of bounds exception due to mis-configuration of the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (KAFKA-1978) Replication test_0131 system test has been failing.
Abhishek Nigam created KAFKA-1978: - Summary: Replication test_0131 system test has been failing. Key: KAFKA-1978 URL: https://issues.apache.org/jira/browse/KAFKA-1978 Project: Kafka Issue Type: Bug Components: system tests Reporter: Abhishek Nigam Assignee: Abhishek Nigam Issue is an out of bounds exception due to mis-configuration of the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1978) Replication test_0131 system test has been failing.
[ https://issues.apache.org/jira/browse/KAFKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Nigam updated KAFKA-1978: -- Reviewer: Guozhang Wang Status: Patch Available (was: Open) Replication test_0131 system test has been failing. --- Key: KAFKA-1978 URL: https://issues.apache.org/jira/browse/KAFKA-1978 Project: Kafka Issue Type: Bug Components: system tests Reporter: Abhishek Nigam Assignee: Abhishek Nigam Issue is an out of bounds exception due to mis-configuration of the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (KAFKA-1978) Replication test_0131 system test has been failing.
[ https://issues.apache.org/jira/browse/KAFKA-1978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Nigam updated KAFKA-1978: -- Status: Open (was: Patch Available) Replication test_0131 system test has been failing. --- Key: KAFKA-1978 URL: https://issues.apache.org/jira/browse/KAFKA-1978 Project: Kafka Issue Type: Bug Components: system tests Reporter: Abhishek Nigam Assignee: Abhishek Nigam Issue is an out of bounds exception due to mis-configuration of the test. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30809: Patch for KAFKA-1888
On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 207 https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line207 This might end up in infinite loop if something goes wrong with cluster, right? Should we have a maximum numnber of retries? What do you think? This will not be an issue since for timed runs we will interrupt the thread anyway after a fixed time. This is the mode which is being used in the upgrade test. On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 424 https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line424 Are you assuming that first argument will be some key? If you take a look at the script I am expecting alternate parameters like -timedRun -timeToSpawn Key is essentially the parameter name. On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 474 https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line474 what do you mean by rebuild state later? What I meant was that between two runs for the rolling upgrade test we will not re-use any state from zookeeper or the brokers so I do not need to worry about clean shutdown. On Feb. 18, 2015, 12:06 a.m., Mayuresh Gharat wrote: core/src/main/scala/kafka/tools/ContinuousValidationTest.java, line 77 https://reviews.apache.org/r/30809/diff/1/?file=859055#file859055line77 Why we need a flip? The flip is needed to reset the get pointer in byte buffer to beginning of the byte buffer else we will get underflow exception. - Abhishek --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/#review72786 --- On Feb. 18, 2015, 1:59 a.m., Abhishek Nigam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated Feb. 18, 2015, 1:59 a.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- patch for KAFKA-1888 Diffs - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
Re: Review Request 30809: Patch for KAFKA-1888
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated Feb. 18, 2015, 1:59 a.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description (updated) --- patch for KAFKA-1888 Diffs (updated) - build.gradle 0f0fe60a74542efa91a0e727146e896edcaa38af core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
[jira] [Assigned] (KAFKA-1778) Create new re-elect controller admin function
[ https://issues.apache.org/jira/browse/KAFKA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Nigam reassigned KAFKA-1778: - Assignee: Abhishek Nigam Create new re-elect controller admin function - Key: KAFKA-1778 URL: https://issues.apache.org/jira/browse/KAFKA-1778 Project: Kafka Issue Type: Sub-task Reporter: Joe Stein Assignee: Abhishek Nigam Fix For: 0.8.3 kafka --controller --elect -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Review Request 30809: Patch for KAFKA-1888
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- (Updated Feb. 9, 2015, 11:53 p.m.) Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description (updated) --- Essentially this test does the following: a) Start a java process with 3 threads Producer - producing continuously Consumer - consuming from latest Bootstrap consumer - started after a pause to bootstrap from beginning. It uses sequentially increasing numbers and timestamps to make sure we are not receiving out of order messages and do real-time validation. b) Script which wraps this and takes two directories which contain the kafka version specific jars: kafka_2.10-0.8.3-SNAPSHOT-test.jar kafka_2.10-0.8.3-SNAPSHOT.jar The first argument is the directory containing the older version of the jars. The second argument is the directory containing the newer version of the jars. The reason for choosing directories was because there are two jars in these directories: Diffs - build.gradle c3e6bb839ad65c512c9db4695d2bb49b82c80da5 core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing (updated) --- Scripted it to run 20 times without any failures. Command-line: broker-upgrade/bin/test.sh dir1 dir2 Thanks, Abhishek Nigam
Review Request 30809: Patch for KAFKA-1888
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/30809/ --- Review request for kafka. Bugs: KAFKA-1888 https://issues.apache.org/jira/browse/KAFKA-1888 Repository: kafka Description --- Cleaning up the scripts, forgot to add build file pulling in guava library Fixing build.gradle Diffs - build.gradle c3e6bb839ad65c512c9db4695d2bb49b82c80da5 core/src/main/scala/kafka/tools/ContinuousValidationTest.java PRE-CREATION system_test/broker_upgrade/bin/kafka-run-class.sh PRE-CREATION system_test/broker_upgrade/bin/test.sh PRE-CREATION system_test/broker_upgrade/configs/server1.properties PRE-CREATION system_test/broker_upgrade/configs/server2.properties PRE-CREATION system_test/broker_upgrade/configs/zookeeper_source.properties PRE-CREATION Diff: https://reviews.apache.org/r/30809/diff/ Testing --- Thanks, Abhishek Nigam
[jira] [Assigned] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhishek Nigam reassigned KAFKA-1888: - Assignee: Abhishek Nigam Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Assignee: Abhishek Nigam Fix For: 0.9.0 To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (KAFKA-1888) Add a rolling upgrade system test
[ https://issues.apache.org/jira/browse/KAFKA-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288206#comment-14288206 ] Abhishek Nigam commented on KAFKA-1888: --- Gwen, If you are not actively working on it can I pick it up. Add a rolling upgrade system test --- Key: KAFKA-1888 URL: https://issues.apache.org/jira/browse/KAFKA-1888 Project: Kafka Issue Type: Improvement Components: system tests Reporter: Gwen Shapira Assignee: Gwen Shapira Fix For: 0.9.0 To help test upgrades and compatibility between versions, it will be cool to add a rolling-upgrade test to system tests: Given two versions (just a path to the jars?), check that you can do a rolling upgrade of the brokers from one version to another (using clients from the old version) without losing data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)