Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-04-26 Thread Dong Lin
Hey Jun, Ismael, Thanks for all the review! Can you vote for KIP-112 if you are OK with the latest design doc? Thanks, Dong On Thu, Mar 30, 2017 at 3:29 PM, Dong Lin wrote: > Hi all, > > Thanks for all the comments. I am going to open the voting thread if > there is no

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-30 Thread Dong Lin
Hi all, Thanks for all the comments. I am going to open the voting thread if there is no further concern with the KIP. Dong On Wed, Mar 15, 2017 at 5:25 PM, Ismael Juma wrote: > Thanks for the updates Dong, they look good to me. > > Ismael > > On Wed, Mar 15, 2017 at 5:50

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-15 Thread Ismael Juma
Thanks for the updates Dong, they look good to me. Ismael On Wed, Mar 15, 2017 at 5:50 PM, Dong Lin wrote: > Hey Ismael, > > Sure, I have updated "Changes in Operational Procedures" section in KIP-113 > to specify the problem and solution with known disk failure. And I

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-15 Thread Dong Lin
Hey Ismael, Sure, I have updated "Changes in Operational Procedures" section in KIP-113 to specify the problem and solution with known disk failure. And I updated the "Test Plan" section to note that we have test in KIP-113 to verify that replicas already created on the good log directories will

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-15 Thread Ismael Juma
Hi Dong, Yes, that sounds good to me. I'd list option 2 first since that is safe and, as you said, no worse than what happens today. The file approach is a bit hacky as you said, so it may be a bit fragile. Not sure if we really want to mention that. :) About the note in KIP-112 versus adding

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-14 Thread Dong Lin
Hey Ismael, I get your concern that it is more likely for a disk to be slow, or exhibit other forms of non-fatal symptom, after some known fatal error. Then it is weird for user to start broker with the likely-problematic disk in the broker config. In that case, I think there are two things user

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-14 Thread Dong Lin
Hey Ismael, Thanks for the comment. Please see my reply below. On Tue, Mar 14, 2017 at 10:31 AM, Ismael Juma wrote: > Thanks Dong. Comments inline. > > On Fri, Mar 10, 2017 at 6:25 PM, Dong Lin wrote: > > > > I get your point. But I am not sure we

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-14 Thread Ismael Juma
Thanks Dong. Comments inline. On Fri, Mar 10, 2017 at 6:25 PM, Dong Lin wrote: > > I get your point. But I am not sure we should recommend user to simply > remove disk from the broker config. If user simply does this without > checking the utilization of good disks, replica

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-10 Thread Dong Lin
Hey Ismael, Thanks for your comments. Please see my reply below. On Fri, Mar 10, 2017 at 9:12 AM, Ismael Juma wrote: > Hi Dong, > > Thanks for the updates, they look good. A couple of comments below. > > On Tue, Mar 7, 2017 at 7:30 PM, Dong Lin wrote: >

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-10 Thread Ismael Juma
Hi Dong, Thanks for the updates, they look good. A couple of comments below. On Tue, Mar 7, 2017 at 7:30 PM, Dong Lin wrote: > > > > > 3. Another point regarding operational procedures, with a large enough > > cluster, disk failures may not be that uncommon. It may be worth

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-07 Thread Dong Lin
Hey Becket, Thanks for the review. 1. I have thought about this before. I think it is fine to delete the node after controller reads it. On controller failover, the new controller will always send LeaderAndIsrRequest for all partitions to each broker in order to learn about offline replicas. 2.

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-07 Thread Becket Qin
Hi Dong, Thanks for the KIP, a few more comments: 1. In the KIP wiki section "A log directory stops working on a broker during runtime", the controller deletes the notification node right after it reads the znode. It seems safer to do this at last so even though the controller fails the new

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-07 Thread Dong Lin
Hey Ismael, Thanks much for taking time to review the KIP and read through all the discussion! Please see my reply inline. On Tue, Mar 7, 2017 at 9:47 AM, Ismael Juma wrote: > Hi Dong, > > It took me a while, but I finally went through the whole thread. I have a > few minor

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-07 Thread Ismael Juma
Hi Dong, It took me a while, but I finally went through the whole thread. I have a few minor comments: 1. Regarding the metrics, can we include the full name (e.g. kafka.cluster:type=Partition,name=InSyncReplicasCount, topic={topic},partition={partition} was defined in KIP-96)? 2. We talk about

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-05 Thread Dong Lin
Hey Jun, I am happy to work for a few days if that is what it takes to discuss KIP-113. But if it takes 2+ weeks to discuss KIP-113, I wondering if we can vote for KIP-112 first. We aim to start JBOD test in a test cluster by end of Q1 and there is only three weeks from that. If we can miss the

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-02 Thread Jun Rao
Hi, Dong, Ok. We can keep LeaderAndIsrRequest as it is in the wiki. Since we need both KIP-112 and KIP-113 to make a compelling case for JBOD, perhaps we should discuss KIP-113 before voting for both? I left some comments in the other thread. Thanks, Jun On Wed, Mar 1, 2017 at 1:58 PM, Dong

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-03-01 Thread Dong Lin
Hey Jun, Do you think it is OK to keep the existing wire protocol in the KIP? I am wondering if we can initiate vote for this KIP. Thanks, Dong On Tue, Feb 28, 2017 at 2:41 PM, Dong Lin wrote: > Hey Jun, > > I just realized that StopReplicaRequest itself doesn't specify

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-28 Thread Dong Lin
Hey Jun, I just realized that StopReplicaRequest itself doesn't specify the replicaId in the wire protocol. Thus controller would need to log the brokerId with StopReplicaRequest in the log. Thus it may be reasonable for controller to do the same with LeaderAndIsrRequest and only specify the

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-28 Thread Dong Lin
Hi Jun, Yeah there is tradeoff between controller's implementation complexity vs. wire-protocol complexity. I personally think it is more important to keep wire-protocol concise and only add information in wire-protocol if necessary. It seems fine to add a little bit complexity to controller's

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-28 Thread Jun Rao
Hi, Dong, 52. What you suggested would work. However, I am thinking that it's probably simpler to just set isNewReplica at the replica level. That way, the LeaderAndIsrRequest can be created a bit simpler. When reading a LeaderAndIsrRequest in the controller log, it's easier to see which replicas

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-28 Thread Eno Thereska
Thanks Todd for the explanation. Eno > On 28 Feb 2017, at 18:15, Todd Palino wrote: > > We have tested RAID 5/6 in the past (and recently) and found it to be > lacking. So, as noted, rebuild takes more time than RAID 10 because all the > disks need to be accessed to

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-28 Thread Dong Lin
Hey Jun, Certainly, I have added Todd to reply to the thread. And I have updated the item to in the wiki. 50. The full statement is "Broker assumes a log directory to be good after it starts, and mark log directory as bad once there is IOException when broker attempts to access (i.e. read or

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-28 Thread Todd Palino
We have tested RAID 5/6 in the past (and recently) and found it to be lacking. So, as noted, rebuild takes more time than RAID 10 because all the disks need to be accessed to recalculate parity. In addition, there’s a significant performance loss just in normal operations. It’s been a while since

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-28 Thread Jun Rao
Hi, Dong, RAID6 is an improvement over RAID5 and can tolerate 2 disks failure. Eno's point is that the rebuild of RAID5/RAID6 requires reading more data compared with RAID10, which increases the probability of error during rebuild. This makes sense. In any case, do you think you could ask the

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-28 Thread Eno Thereska
Makes sense, thank you Dong. Eno > On 28 Feb 2017, at 01:51, Dong Lin wrote: > > Hi Jun, > > In addition to the Eno's reference of why rebuild time with RAID-5 is more > expensive, another concern is that RAID-5 will fail if more than one disk > fails. JBOD is still works

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-27 Thread Dong Lin
Hi Jun, In addition to the Eno's reference of why rebuild time with RAID-5 is more expensive, another concern is that RAID-5 will fail if more than one disk fails. JBOD is still works with 1+ disk failure and has better performance with one disk failure. These seems like good argument for using

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-27 Thread Eno Thereska
RAID-10's code is much simpler (just stripe plus mirror) and under failure the recovery is much faster since it just has to read from a mirror, not several disks to reconstruct the data. Of course, the price paid is that mirroring is more expensive in terms of storage space. E.g., see

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-27 Thread Jun Rao
Hi, Eno, Thanks for the pointers. Doesn't RAID-10 have a similar issue during rebuild? In both cases, all data on existing disks have to be read during rebuild? RAID-10 seems to still be used widely. Jun On Mon, Feb 27, 2017 at 1:38 PM, Eno Thereska wrote: >

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-27 Thread Eno Thereska
Unfortunately RAID-5/6 is not typically advised anymore due to failure issues, as Dong mentions, e.g.: http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/ Eno > On 27 Feb 2017, at 21:16, Jun Rao

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-27 Thread Jun Rao
Hi, Dong, For RAID5, I am not sure the rebuild cost is a big concern. If a disk fails, typically an admin has to bring down the broker, replace the failed disk with a new one, trigger the RAID rebuild, and bring up the broker. This way, there is no performance impact at runtime due to rebuild.

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-25 Thread Dong Lin
Hey Jun, Thanks for the suggestion. I think it is a good idea to know put created flag in ZK and simply specify isNewReplica=true in LeaderAndIsrRequest if repilcas was in NewReplica state. It will only fail the replica creation in the scenario that the controller fails after

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-25 Thread Jun Rao
Hi, Dong, Thanks for the reply. Personally, I'd prefer not to write the created flag per replica in ZK. Your suggestion of disabling replica creation if there is a bad log directory on the broker could work. The only thing is that it may delay the creation of new replicas. I was thinking that an

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-24 Thread Dong Lin
Hey Jun, I don't think we should allow failed replicas to be re-created on the good disks. Say there are 2 disks and each of them is 51% loaded. If any disk fail, and we allow replicas to be re-created on the other disks, both disks will fail. Alternatively we can disable replica creation if

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-23 Thread Dong Lin
Hey Jun, I think there is one simpler design that doesn't need to add "create" flag in LeaderAndIsrRequest and also remove the need for controller to track/update which replicas are created. The idea is for each broker to persist the created replicas in per-broker-per-topic znode. When a replica

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-23 Thread Dong Lin
Hey Jun, Sure, here is my explanation. Design B would not work if it doesn't store created replicas in the ZK. For example, say broker B is health when it is shutdown. At this moment no offline replica is written in ZK for this broker. Suppose log directory is damaged when broker is offline,

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-23 Thread Dong Lin
Hey Jun, Thanks for you reply! Let me first comment on the things that you listed as advantage of B over A. 1) No change in LeaderAndIsrRequest protocol. I agree with this. 2) Step 1. One less round of LeaderAndIsrRequest and no additional ZK writes to record the created flag. I don't think

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-23 Thread Jun Rao
Hi, Dong, Just so that we are on the same page. Let me spec out the alternative design a bit more and then compare. Let's call the current design A and the alternative design B. Design B: New ZK path failed log directory path (persistent): This is created by a broker when a log directory fails

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-22 Thread Dong Lin
Hey Jun, Thanks much for the explanation. I have some questions about 21 but that is less important than 20. 20 would require considerable change to the KIP and probably requires weeks to discuss again. Thus I would like to be very sure that we agree on the problems with the current design as you

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-22 Thread Jun Rao
Hi, Dong, Jiangjie, 20. (1) I agree that ideally we'd like to use direct RPC for broker-to-broker communication instead of ZK. However, in the alternative design, the failed log directory path also serves as the persistent state for remembering the offline partitions. This is similar to the

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-21 Thread Dong Lin
Hey Jun, Motivated by your suggestion, I think we can also store the information of created replicas in per-broker znode at /brokers/created_replicas/ids/[id]. Does this sound good? Regards, Dong On Tue, Feb 21, 2017 at 2:37 PM, Dong Lin wrote: > Hey Jun, > > Thanks much

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-21 Thread Dong Lin
Hey Jun, Thanks much for your comments. I actually proposed the design to store both offline replicas and created replicas in per-broker znode before switching to the design in the current KIP. The current design stores created replicas in per-partition znode and transmits offline replicas via

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-21 Thread Becket Qin
Hey Jun, Using Zookeeper to propagate the offline replica state as you suggest sounds a simpler approach. However, I am wondering if we want to avoid using zookeeper to propagate information. In the past this caused a lot of problem for us, including missing notifications, performance issue,

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-20 Thread Jun Rao
Hi, Dong, Sorry for the delay. A few more comments. 20. One complexity that I found in the current KIP is that the way the broker communicates failed replicas to the controller is inefficient. When a log directory fails, the broker only sends an indication through ZK to the controller and the

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-18 Thread Dong Lin
Hey Jun, Could you please let me know if the solutions above could address your concern? I really want to move the discussion forward. Thanks, Dong On Tue, Feb 14, 2017 at 8:17 PM, Dong Lin wrote: > Hey Jun, > > Thanks for all your help and time to discuss this KIP. When

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-14 Thread Dong Lin
Hey Jun, Thanks for all your help and time to discuss this KIP. When you get the time, could you let me know if the previous answers address the concern? I think the more interesting question in your last email is where we should store the "created" flag in ZK. I proposed the solution that I

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-14 Thread Dong Lin
Hey Jun, I just realized that you may be suggesting that a tool for listing offline directories is necessary for KIP-112 by asking whether KIP-112 and KIP-113 will be in the same release. I think such a tool is useful but doesn't have to be included in KIP-112. This is because as of now admin

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-13 Thread Dong Lin
And the test plan has also been updated to simulate disk failure by changing log directory permission to 000. On Mon, Feb 13, 2017 at 5:50 PM, Dong Lin wrote: > Hi Jun, > > Thanks for the reply. These comments are very helpful. Let me answer them > inline. > > > On Mon, Feb

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-13 Thread Dong Lin
Hi Jun, Thanks for the reply. These comments are very helpful. Let me answer them inline. On Mon, Feb 13, 2017 at 3:25 PM, Jun Rao wrote: > Hi, Dong, > > Thanks for the reply. A few more replies and new comments below. > > On Fri, Feb 10, 2017 at 4:27 PM, Dong Lin

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-13 Thread Jun Rao
Hi, Dong, Thanks for the reply. A few more replies and new comments below. On Fri, Feb 10, 2017 at 4:27 PM, Dong Lin wrote: > Hi Jun, > > Thanks for the detailed comments. Please see answers inline: > > On Fri, Feb 10, 2017 at 3:08 PM, Jun Rao wrote: >

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-10 Thread Dong Lin
Hi Jun, Thanks for the detailed comments. Please see answers inline: On Fri, Feb 10, 2017 at 3:08 PM, Jun Rao wrote: > Hi, Dong, > > Thanks for the updated wiki. A few comments below. > > 1. Topics get created > 1.1 Instead of storing successfully created replicas in ZK,

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-10 Thread Jun Rao
Hi, Dong, Thanks for the updated wiki. A few comments below. 1. Topics get created 1.1 Instead of storing successfully created replicas in ZK, could we store unsuccessfully created replicas in ZK? Since the latter is less common, it probably reduces the load on ZK. 1.2 If an error is received

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-10 Thread Dong Lin
Hi Jun, Can I replace zookeeper access with direct RPC for both ISR notification and disk failure notification in a future KIP, or do you feel we should do it in this KIP? Hi Eno, Grant and everyone, Is there further improvement you would like to see with this KIP? Thanks you all for the

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-09 Thread Dong Lin
On Thu, Feb 9, 2017 at 3:37 PM, Colin McCabe wrote: > On Thu, Feb 9, 2017, at 11:40, Dong Lin wrote: > > Thanks for all the comments Colin! > > > > To answer your questions: > > - Yes, a broker will shutdown if all its log directories are bad. > > That makes sense. Can you

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-09 Thread Colin McCabe
On Thu, Feb 9, 2017, at 11:40, Dong Lin wrote: > Thanks for all the comments Colin! > > To answer your questions: > - Yes, a broker will shutdown if all its log directories are bad. That makes sense. Can you add this to the writeup? > - I updated the KIP to explicitly state that a log

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-09 Thread Dong Lin
Thanks for all the comments Colin! To answer your questions: - Yes, a broker will shutdown if all its log directories are bad. - I updated the KIP to explicitly state that a log directory will be assumed to be good until broker sees IOException when it tries to access the log directory. -

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-09 Thread Colin McCabe
On Thu, Feb 9, 2017, at 11:03, Colin McCabe wrote: > Thanks, Dong L. > > Do we plan on bringing down the broker process when all log directories > are offline? > > Can you explicitly state on the KIP that the log dirs are all considered > good after the broker process is bounced? It seems like

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-09 Thread Colin McCabe
Thanks, Dong L. Do we plan on bringing down the broker process when all log directories are offline? Can you explicitly state on the KIP that the log dirs are all considered good after the broker process is bounced? It seems like an important thing to be clear about. Also, perhaps discuss how

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-08 Thread Dong Lin
Hi all, Thank you all for the helpful suggestion. I have updated the KIP to address the comments received so far. See here to read the changes of the KIP. Here is a summary of change: - Updated the Proposed

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-07 Thread Dong Lin
Hey Eno, Thanks much for the comment! I still think the complexity added to Kafka is justified by its benefit. Let me provide my reasons below. 1) The additional logic is easy to understand and thus its complexity should be reasonable. On the broker side, it needs to catch exception when

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-07 Thread Dong Lin
Hey Jun, Thanks for the all the comments. I should have written the summary earlier but got delayed. I think Grant has summarized pretty much every major issues we discussed in the KIP meeting. I have provided answer to each issue. Let me try to address your questions here. I will update the KIP

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-07 Thread Dong Lin
Hey Grant, Thanks much for the detailed summary! Yes, this is pretty much my understanding of the KIP meeting. I also think everyone agreed on the point you outlined in the email. Here is my reply to the five issues you mentioned. 1) Automatic vs Manual Recovery In the case where a disk is

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-07 Thread Jun Rao
Hi, Dong, Thanks for the discussion in the KIP meeting today. A few comments inlined below. On Mon, Feb 6, 2017 at 7:22 PM, Dong Lin wrote: > Hey Jun, > > Thanks for the review! Please see reply inline. > > On Mon, Feb 6, 2017 at 6:21 PM, Jun Rao wrote:

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-07 Thread Grant Henke
Hi Dong, Thanks for proposing the KIP and all the hard work on it! In order to help summarize the discussion from the KIP call today I wanted to list the things I heard as the main discussion points that people would like to be considered or discussed. However, this is strictly from memory so

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-07 Thread Eno Thereska
Hi Dong, To simplify the discussion today, on my part I'll zoom into one thing only: - I'll discuss the options called below : "one-broker-per-disk" or "one-broker-per-few-disks". - I completely buy the JBOD vs RAID arguments so there is no need to discuss that part for me. I buy it that

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-06 Thread Dong Lin
Hey Jun, Thanks for the review! Please see reply inline. On Mon, Feb 6, 2017 at 6:21 PM, Jun Rao wrote: > Hi, Dong, > > Thanks for the proposal. A few quick questions/comments. > > 1. Do you know why your stress test loses 15% of the throughput with the > one-broker-per-disk

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-06 Thread Jun Rao
Hi, Dong, Thanks for the proposal. A few quick questions/comments. 1. Do you know why your stress test loses 15% of the throughput with the one-broker-per-disk setup? 2. In the KIP, it wasn't super clear to me what /broker/topics/[topic]/partitions/[partitionId]/controller_managed_state

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-02 Thread Dong Lin
Hey Eno, I forgot that. Sure, that works for us. Thanks, Dong On Thu, Feb 2, 2017 at 2:03 AM, Eno Thereska wrote: > Hi Dong, > > The KIP meetings are traditionally held at 11am. Would that also work? So > Tuesday 7th at 11am? > > Thanks > Eno > > > On 2 Feb 2017, at

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-02 Thread Eno Thereska
Hi Dong, The KIP meetings are traditionally held at 11am. Would that also work? So Tuesday 7th at 11am? Thanks Eno > On 2 Feb 2017, at 02:53, Dong Lin wrote: > > Hey Eno, Colin, > > Would you have time next Tuesday morning to discuss the KIP? How about 10 - > 11 am? >

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Dong Lin
Sorry for the typo. I mean that before the KIP meeting, please free feel to provide comment in this email thread so that discussion in the KIP meeting can be more efficient. On Wed, Feb 1, 2017 at 6:53 PM, Dong Lin wrote: > Hey Eno, Colin, > > Would you have time next

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Dong Lin
Hey Eno, Colin, Would you have time next Tuesday morning to discuss the KIP? How about 10 - 11 am? To make best use of our time, can you please invite one or more committer from Confluent to join the meeting? I hope the KIP can receive one or more +1 from committer at Confluent if we have no

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Dong Lin
Hey Colin, Thanks much for the comment. Please see my reply inline. On Wed, Feb 1, 2017 at 1:54 PM, Colin McCabe wrote: > On Wed, Feb 1, 2017, at 11:35, Dong Lin wrote: > > Hey Grant, Colin, > > > > My bad, I misunderstood Grant's suggestion initially. Indeed this is a > >

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Colin McCabe
On Wed, Feb 1, 2017, at 11:35, Dong Lin wrote: > Hey Grant, Colin, > > My bad, I misunderstood Grant's suggestion initially. Indeed this is a > very > interesting idea to just wait for replica.max.lag.ms for the replica on > the > bad disk to drop out of ISR instead of having broker actively

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Dong Lin
Hey Grant, Colin, My bad, I misunderstood Grant's suggestion initially. Indeed this is a very interesting idea to just wait for replica.max.lag.ms for the replica on the bad disk to drop out of ISR instead of having broker actively reporting this to the controller. I have several concerns with

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Eno Thereska
Hi Dong, Would it make sense to do a discussion over video/voice about this? I think it's sufficiently complex that we can probably make quicker progress that way? So shall we do a KIP meeting soon? I can do this week (Thu/Fri) or next week. Thanks Eno > On 1 Feb 2017, at 18:29, Colin McCabe

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Colin McCabe
Hmm. Maybe I misinterpreted, but I got the impression that Grant was suggesting that we avoid introducing this concept of "offline replicas" for now. Is that feasible? What is the strategy for declaring a log directory bad? Is it an administrative action? Or is the broker itself going to be

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Dong Lin
Hey Grant, Yes, this KIP does exactly what you described:) Thanks, Dong On Wed, Feb 1, 2017 at 9:45 AM, Grant Henke wrote: > Hi Dong, > > Thanks for putting this together. > > Since we are discussing alternative/simplified options. Have you considered > handling the disk

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Grant Henke
Hi Dong, Thanks for putting this together. Since we are discussing alternative/simplified options. Have you considered handling the disk failures broker side to prevent a crash, marking the disk as "bad" to that individual broker, and continuing as normal? I imagine the broker would then fall

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Dong Lin
Hey Eno, Thanks much for the review. I think your suggestion is to split disks of a machine into multiple disk sets and run one broker per disk set. Yeah this is similar to Colin's suggestion of one-broker-per-disk, which we have evaluated at LinkedIn and considered it to be a good short term

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Eno Thereska
I'm coming somewhat late to the discussion, apologies for that. I'm worried about this proposal. It's moving Kafka to a world where it manages disks. So in a sense, the scope of the KIP is limited, but the direction it sets for Kafka is quite a big step change. Fundamentally this is about

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-31 Thread Dong Lin
Hi all, I am going to initiate the vote If there is no further concern with the KIP. Thanks, Dong On Fri, Jan 27, 2017 at 8:08 PM, radai wrote: > a few extra points: > > 1. broker per disk might also incur more client <--> broker sockets: > suppose every producer

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-27 Thread radai
a few extra points: 1. broker per disk might also incur more client <--> broker sockets: suppose every producer / consumer "talks" to >1 partition, there's a very good chance that partitions that were co-located on a single 10-disk broker would now be split between several single-disk broker

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-26 Thread Dong Lin
Hey Colin, Thanks much for the comment. Please see me comment inline. On Thu, Jan 26, 2017 at 10:15 AM, Colin McCabe wrote: > On Wed, Jan 25, 2017, at 13:50, Dong Lin wrote: > > Hey Colin, > > > > Good point! Yeah we have actually considered and tested this solution, > >

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-26 Thread Colin McCabe
On Wed, Jan 25, 2017, at 13:50, Dong Lin wrote: > Hey Colin, > > Good point! Yeah we have actually considered and tested this solution, > which we call one-broker-per-disk. It would work and should require no > major change in Kafka as compared to this JBOD KIP. So it would be a good > short term

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-25 Thread Dong Lin
Hey Colin, Good point! Yeah we have actually considered and tested this solution, which we call one-broker-per-disk. It would work and should require no major change in Kafka as compared to this JBOD KIP. So it would be a good short term solution. But it has a few drawbacks which makes it less

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-25 Thread Colin McCabe
Hi Dong, Thanks for the writeup! It's very interesting. I apologize in advance if this has been discussed somewhere else. But I am curious if you have considered the solution of running multiple brokers per node. Clearly there is a memory overhead with this solution because of the fixed cost

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-24 Thread Guozhang Wang
Thanks for the detailed explanations Dong. That makes sense to me. Guozhang On Sun, Jan 22, 2017 at 4:00 PM, Dong Lin wrote: > Hey Guozhang, > > Thanks for the review! Yes we have considered this approach and briefly > explained why we don't do it in the rejected

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-22 Thread Dong Lin
Hey Guozhang, Thanks for the review! Yes we have considered this approach and briefly explained why we don't do it in the rejected alternative section. Here is my concern with this approach in more detail: - This approach introduces tight coupling between kafka's logical leader election with

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-22 Thread Guozhang Wang
I think it also affects KIP-113 design but just leave it as a single comment here. On Sun, Jan 22, 2017 at 10:50 AM, Guozhang Wang wrote: > Hello Dong, > > Thanks for the very well written KIP. I had a general thought on the ZK > path management, wondering if the following

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-22 Thread Guozhang Wang
Hello Dong, Thanks for the very well written KIP. I had a general thought on the ZK path management, wondering if the following alternative would work: 1. Bump up versions in "brokers/topics/[topic]" and "/brokers/topics/[topic]/partitions/[partitionId]/state" to 2, in which the replica id is no

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-12 Thread Ismael Juma
Thanks for the KIP. Just wanted to quickly say that it's great to see proposals for improving JBOD (KIP-113 too). More feedback soon, hopefully. Ismael On Thu, Jan 12, 2017 at 6:46 PM, Dong Lin wrote: > Hi all, > > We created KIP-112: Handle disk failure for JBOD. Please

[DISCUSS] KIP-112: Handle disk failure for JBOD

2017-01-12 Thread Dong Lin
Hi all, We created KIP-112: Handle disk failure for JBOD. Please find the KIP wiki in the link https://cwiki.apache.org/confluence/display/KAFKA/KIP- 112%3A+Handle+disk+failure+for+JBOD. This KIP is related to KIP-113