Re: Fine Grained Scaling and Hadoop-2.7.
Ah, I'm understanding better now. Leaving the 2G,1CPU unused is certainly flawed and undesirable. I'm unopposed to the idea of an initial/minimum profile size that grows and shrinks but never goes below its initial/minimum capacity. As for your concern, a recently completed task will give up its unnamed resources like cpu and memory, without knowing/caring where they go. There is no distinction between the cpu from one task and the cpu from another. First priority goes to maintaining the minimum capacity. Anything beyond that can be offered back to Mesos (perhaps after some timeout for promoting reuse). The only concern might be with named resources like ports or persistent volumes. Why do you worry that Myriad needs to figure out which container is associated with which offer/profile? Is it not already tracking the YARN containers? How else does it know when to release resources? That said, a zero profile also makes sense, as does mixing profiles of different sizes (including zero/non-zero) within a cluster. You could restrict dynamic NM resizing to zero-profile NMs for starters, but I'd imagine we'd want them all to be resizable in the future. On Fri, Jul 10, 2015 at 6:47 PM, Santosh Marella smare...@maprtech.com wrote: a) Give the executor at least a minimal 0.01cpu, 1MB RAM Myriad does this already. The problem is not with respect to executor's capacity. b) ... I don't think I understand your zero profile use case Let's take an example. Let's say the low profile corresponds to (2G,1CPU). When Myriad wants to launch a NM with low profile, it waits for a mesos offer that can hold an executor + a java process for NM + a (2G,1CPU) capacity that NM can advertise to RM for launching future YARN containers. With CGS, when NM registers with RM, YARN scheduler believes the NM has (2G,1CPU) and hence can allocate containers worth (2G,1CPU) when apps require containers. With FGS, YARN scheduler believes NM has (0G,0CPU). This is because, FGS intercepts NM's registration with RM and sets NM's advertised capacity to (0G,0CPU), although NM has originally started with (2G,1CPU). At this point, YARN scheduler cannot allocate containers to this NM. Subsequently, when mesos offers resources on the same slave node, FGS increases the capacity of the NM and notifies RM that NM now has capacity available. For e.g. if (5G,4CPU) are offered to Myriad, then FGS notifies RM that the NM now has (5G,4CPU). RM can now allocate containers worth (5G,4CPU) for this NM. If you now count the total resources Myriad has consumed from the given slave node, we observe that Myriad never utilizes the (2G,1CPU) [low profile size] that was obtained at NM's launch time. The notion of a zero profile tries to eliminate this wastage by allowing NM to be launched with an advertisable capacity of (0G,0CPU) in the first place. Why does FGS change NM's initial capacity from (2G,1CPU) to (0G,0CPU)? That's the way it had been until now, but it need not be. FGS can choose to not reset NM's capacity to (0G,0CPU) and instead allow NM to grow beyond initial capacity of (2G,1CPU) and shrink back to (2G,1CPU). I tried this approach recently, but there are other problems if we do that (mentioned under option#1 in my first email) that seemed more complex than going with a zero profile. c)... We should still investigate pushing a disable flag into YARN. Absolutely. It totally makes sense to turn off admission restriction for auto-scaling YARN clusters. FWIW, I will be sending out a PR shortly from my private issue_14 branch with the changes I made so far. Comments/suggestions are welcome! Thanks, Santosh On Fri, Jul 10, 2015 at 11:44 AM, Adam Bordelon a...@mesosphere.io wrote: a) Give the executor at least a minimal 0.01cpu, 1MB RAM, since the executor itself will use some resources, and Mesos gets confused when the executor claims no resources. See https://issues.apache.org/jira/browse/MESOS-1807 b) I agree 100% with needing a way to enable/disable FGS vs. CGS, but I don't think I understand your zero profile use case. I'd recommend going with a simple enable/disable flag for the MVP, and then we can extend it later if/when necessary. c) Interesting. Seems like a hacky workaround for the admission control problem, but I'm intrigued by its complexities and capabilities for other scenarios. We should still investigate pushing a disable flag into YARN. YARN-2604, YARN-3079. YARN-2604 seems to have been added because of a genuine problem where an app's AM container size exceeds the size of the largest NM node in the cluster. This still needs a way to be disabled, because an auto-scaling Hadoop cluster wouldn't worry about insufficient capacity. It would just make more. On Fri, Jul 10, 2015 at 11:13 AM, Santosh Marella smare...@maprtech.com wrote: Good point. YARN seems to have added this admission control as part of YARN-2604, YARN-3079.
Re: Fine Grained Scaling and Hadoop-2.7.
The only options I can imagine are to a) use fixed-size NMs that cannot grow, alongside the elastic zero-profile NMs; or b) disable admission control in the RM so this isn't a problem. I'd vote for b), but depending on how long that takes, you may want to implement a) in the meantime. Agreed. (a) is implemented in PR: https://github.com/mesos/myriad/pull/116 Santosh On Tue, Jul 14, 2015 at 4:10 PM, Adam Bordelon a...@mesosphere.io wrote: Ok, this makes sense now. With zero profile, tracking these will be much easier, since each YARN container would have a placeholder task of the same size. But with an initial/minimum capacity, you'd need to do extra bookkeeping to know how many resources belong to each task, what the initial NM capacity was, and its current size. Then, when a task completes, you'll see how many resources it was using, and determine whether some/all of those resources should be freed and given back to Mesos, or whether they just go back to idle minimum capacity for the NM. However, since Mesos doesn't (yet) support resizeTask, you'd have to kill the placeholder task that best matches the size of the completed task (even though that task may have originally launched in the miminum capacity). Tricky indeed. So, I like the idea of the zero-profile NM in that case, but it still doesn't solve the problem of admission control of AMs/containers that are bigger than the current cluster capacity. If we keep some minimum capacity NMs that can resize with placeholder tasks, you run into the same problem as above. The only options I can imagine are to a) use fixed-size NMs that cannot grow, alongside the elastic zero-profile NMs; or b) disable admission control in the RM so this isn't a problem. I'd vote for b), but depending on how long that takes, you may want to implement a) in the meantime. On Tue, Jul 14, 2015 at 2:02 PM, Santosh Marella smare...@maprtech.com wrote: Why do you worry that Myriad needs to figure out which container is associated with which offer/profile The framework needs to figure out the size of the placeholder task that it needs to launch corresponding to a YARN container. The size of the placeholder is not always 1:1 with the size of the YARN container (zero profile is trying to make it 1:1). Let's take an example flow: 1. Let's say the NM's initial capacity was (4G,4CPU) and YARN wants to launch a container with size (2G, 2CPU). No problem. NM already has capacity to accommodate it. No need to wait for more offers or to launch placeholder mesos tasks. Just launch the YARN container via NM's HB. 2. Let's say the NM's initial capacity was (4G,4CPU) and (2G,2CPU) is under use due to previously launched YARN container. If the RM's next request requires a container with (3G,3CPU), that container doesn't get allocated to this NM, since NM doesn't have enough capacity. No problem here too. 3. Let's say mesos offers a (1G,1CPU) at this point. NM has (2G,2CPU) available and Myriad allows adding (1G,1CPU) to it. Thus, RM believes NM now has (3G,3CPU) and allocates a (3G,3CPU) container on the NM. At this point, since Myriad needs to use the launchTasks() API to launch a placeholder task with (1G,1CPU). Thanks, Santosh On Tue, Jul 14, 2015 at 1:12 AM, Adam Bordelon a...@mesosphere.io wrote: Ah, I'm understanding better now. Leaving the 2G,1CPU unused is certainly flawed and undesirable. I'm unopposed to the idea of an initial/minimum profile size that grows and shrinks but never goes below its initial/minimum capacity. As for your concern, a recently completed task will give up its unnamed resources like cpu and memory, without knowing/caring where they go. There is no distinction between the cpu from one task and the cpu from another. First priority goes to maintaining the minimum capacity. Anything beyond that can be offered back to Mesos (perhaps after some timeout for promoting reuse). The only concern might be with named resources like ports or persistent volumes. Why do you worry that Myriad needs to figure out which container is associated with which offer/profile? Is it not already tracking the YARN containers? How else does it know when to release resources? That said, a zero profile also makes sense, as does mixing profiles of different sizes (including zero/non-zero) within a cluster. You could restrict dynamic NM resizing to zero-profile NMs for starters, but I'd imagine we'd want them all to be resizable in the future. On Fri, Jul 10, 2015 at 6:47 PM, Santosh Marella smare...@maprtech.com wrote: a) Give the executor at least a minimal 0.01cpu, 1MB RAM Myriad does this already. The problem is not with respect to executor's capacity. b) ... I don't think I understand your zero profile use case Let's take an example. Let's say the low profile corresponds to
Re: Fine Grained Scaling and Hadoop-2.7.
Ok, this makes sense now. With zero profile, tracking these will be much easier, since each YARN container would have a placeholder task of the same size. But with an initial/minimum capacity, you'd need to do extra bookkeeping to know how many resources belong to each task, what the initial NM capacity was, and its current size. Then, when a task completes, you'll see how many resources it was using, and determine whether some/all of those resources should be freed and given back to Mesos, or whether they just go back to idle minimum capacity for the NM. However, since Mesos doesn't (yet) support resizeTask, you'd have to kill the placeholder task that best matches the size of the completed task (even though that task may have originally launched in the miminum capacity). Tricky indeed. So, I like the idea of the zero-profile NM in that case, but it still doesn't solve the problem of admission control of AMs/containers that are bigger than the current cluster capacity. If we keep some minimum capacity NMs that can resize with placeholder tasks, you run into the same problem as above. The only options I can imagine are to a) use fixed-size NMs that cannot grow, alongside the elastic zero-profile NMs; or b) disable admission control in the RM so this isn't a problem. I'd vote for b), but depending on how long that takes, you may want to implement a) in the meantime. On Tue, Jul 14, 2015 at 2:02 PM, Santosh Marella smare...@maprtech.com wrote: Why do you worry that Myriad needs to figure out which container is associated with which offer/profile The framework needs to figure out the size of the placeholder task that it needs to launch corresponding to a YARN container. The size of the placeholder is not always 1:1 with the size of the YARN container (zero profile is trying to make it 1:1). Let's take an example flow: 1. Let's say the NM's initial capacity was (4G,4CPU) and YARN wants to launch a container with size (2G, 2CPU). No problem. NM already has capacity to accommodate it. No need to wait for more offers or to launch placeholder mesos tasks. Just launch the YARN container via NM's HB. 2. Let's say the NM's initial capacity was (4G,4CPU) and (2G,2CPU) is under use due to previously launched YARN container. If the RM's next request requires a container with (3G,3CPU), that container doesn't get allocated to this NM, since NM doesn't have enough capacity. No problem here too. 3. Let's say mesos offers a (1G,1CPU) at this point. NM has (2G,2CPU) available and Myriad allows adding (1G,1CPU) to it. Thus, RM believes NM now has (3G,3CPU) and allocates a (3G,3CPU) container on the NM. At this point, since Myriad needs to use the launchTasks() API to launch a placeholder task with (1G,1CPU). Thanks, Santosh On Tue, Jul 14, 2015 at 1:12 AM, Adam Bordelon a...@mesosphere.io wrote: Ah, I'm understanding better now. Leaving the 2G,1CPU unused is certainly flawed and undesirable. I'm unopposed to the idea of an initial/minimum profile size that grows and shrinks but never goes below its initial/minimum capacity. As for your concern, a recently completed task will give up its unnamed resources like cpu and memory, without knowing/caring where they go. There is no distinction between the cpu from one task and the cpu from another. First priority goes to maintaining the minimum capacity. Anything beyond that can be offered back to Mesos (perhaps after some timeout for promoting reuse). The only concern might be with named resources like ports or persistent volumes. Why do you worry that Myriad needs to figure out which container is associated with which offer/profile? Is it not already tracking the YARN containers? How else does it know when to release resources? That said, a zero profile also makes sense, as does mixing profiles of different sizes (including zero/non-zero) within a cluster. You could restrict dynamic NM resizing to zero-profile NMs for starters, but I'd imagine we'd want them all to be resizable in the future. On Fri, Jul 10, 2015 at 6:47 PM, Santosh Marella smare...@maprtech.com wrote: a) Give the executor at least a minimal 0.01cpu, 1MB RAM Myriad does this already. The problem is not with respect to executor's capacity. b) ... I don't think I understand your zero profile use case Let's take an example. Let's say the low profile corresponds to (2G,1CPU). When Myriad wants to launch a NM with low profile, it waits for a mesos offer that can hold an executor + a java process for NM + a (2G,1CPU) capacity that NM can advertise to RM for launching future YARN containers. With CGS, when NM registers with RM, YARN scheduler believes the NM has (2G,1CPU) and hence can allocate containers worth (2G,1CPU) when apps require containers. With FGS, YARN scheduler believes NM has (0G,0CPU). This is because, FGS intercepts NM's registration with RM and sets NM's
Establish versions to target for first incubator release?
The FGS discussion made me wonder if we've put a line in the sand about what versions of YARN and Mesos we're going to target for the first Myriad incubator release. Might be nice to start getting some kind of mileage on specific versions. Perhaps 0.23 and 2.7? Is this premature?
Re: Establish versions to target for first incubator release?
Would it be made so older versions of Yarn wouldn't work with the incubator release, or just allow a way to run on older versions, but gracefully not allow FGS on older versions of Yarn? On Tuesday, July 14, 2015, yuliya Feldman yufeld...@yahoo.com.invalid wrote: +1 on 2.7 From: Jim Klucar klu...@gmail.com javascript:; To: dev@myriad.incubator.apache.org javascript:; Sent: Tuesday, July 14, 2015 5:36 PM Subject: Establish versions to target for first incubator release? The FGS discussion made me wonder if we've put a line in the sand about what versions of YARN and Mesos we're going to target for the first Myriad incubator release. Might be nice to start getting some kind of mileage on specific versions. Perhaps 0.23 and 2.7? Is this premature? -- Sent from my iThing
Re: Establish versions to target for first incubator release?
+1 on 2.7 From: Jim Klucar klu...@gmail.com To: dev@myriad.incubator.apache.org Sent: Tuesday, July 14, 2015 5:36 PM Subject: Establish versions to target for first incubator release? The FGS discussion made me wonder if we've put a line in the sand about what versions of YARN and Mesos we're going to target for the first Myriad incubator release. Might be nice to start getting some kind of mileage on specific versions. Perhaps 0.23 and 2.7? Is this premature?