Re: [VOTE] Merge YARN-3926 (resource profile) to trunk
Hi all, Given we have 3 binding +1s, the vote passes. I just push changes to trunk. Will update JIRAs accordingly. Thanks everybody for helping this feature and voting! Best, Wangda On Sat, Aug 26, 2017 at 8:58 AM, Sunil Gwrote: > Hi Daniel > > Thank you very much for the support. > > * When you say that the feature can be turned > off, do you mean resource types or resource profiles? I know there's an > off-by-default property that governs resource profiles, but I didn't see > any way to turn off resource types. > Yes,*yarn.resourcemanager.resource-profiles.enabled* is false by default > and controls off/on of this feature. Now regarding new resource types, its > been loaded from "*resource-types.xml"* and by default this XML file is not > available in the package. Thus prevents any issues in default case. Once > this file is added to a cluster then new resources will be loaded from > same. > > * Even if only CPU and memory are configured, i.e. no additional resource > types, the code path is different than it was. > Earlier primitive data types were used to represent vcores and memory. As > per resource profile work, all resources under YARN is categorized as > ResourceInformation and placed under existing Resource object. So memory > and vcores will be accessible and operable with same set of public apis > from Resources or ResourceCalculator (DRC) same as earlier even when > feature is off (Code path is same, but improved to support a unified > ResourceInformation class instead of memory/vcores primitive types). > > Thanks > Sunil > > > > > On Sat, Aug 26, 2017 at 8:10 PM Daniel Templeton > wrote: > > > Quick question, Wangda. When you say that the feature can be turned > > off, do you mean resource types or resource profiles? I know there's an > > off-by-default property that governs resource profiles, but I didn't see > > any way to turn off resource types. Even if only CPU and memory are > > configured, i.e. no additional resource types, the code path is > > different than it was. Specifically, where CPU and memory were > > primitives before, they're now entries in an array whose indexes have to > > be looked up through the ResourceUtils class. Did I miss something? > > > > For those who haven't followed the feature closely, there are really two > > features here. Resource types allows for declarative extension of the > > resource system in YARN. Resource profiles builds on top of resource > > types to allow a user to request a group of resources as a profile, much > > like EC2 instance types, e.g. "fast-compute" might mean 32GB RAM, 8 > > vcores, and 2 GPUs. > > > > Daniel > > > > On 8/23/17 11:49 AM, Wangda Tan wrote: > > > Hi folks, > > > > > > Per earlier discussion [1], I'd like to start a formal vote to merge > > > feature branch YARN-3926 (Resource profile) to trunk. The vote will run > > for > > > 7 days and will end August 30 10:00 AM PDT. > > > > > > Briefly, YARN-3926 can extend resource model of YARN to support > resource > > > types other than CPU and memory, so it will be a cornerstone of > features > > > like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139), > FPGA > > > support (YARN-5983), network IO scheduling/isolation (YARN-2140). In > > > addition to that, YARN-3926 allows admin to preconfigure resource > > profiles > > > in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64 > GB > > > disk>, so applications can request "m3.large" profile instead of > > specifying > > > all resource types’s values. > > > > > > There are 32 subtasks that were completed as part of this effort. > > > > > > This feature needs to be explicitly turned on before use. We paid close > > > attention to compatibility, performance, and scalability of this > feature, > > > mentioned in [1], we didn't see observable performance regression in > > large > > > scale SLS (scheduler load simulator) executions and saw less than 5% > > > performance regression by using micro benchmark added by YARN-6775. > > > > > > This feature works from end-to-end (including > UI/CLI/application/server), > > > we have setup a cluster with this feature turned on runs for several > > weeks, > > > we didn't see any issues by far. > > > > > > Merge JIRA: YARN-7013 (Jenkins gave +1 already). > > > Documentation: YARN-7056 > > > > > > Special thanks to a team of folks who worked hard and contributed > towards > > > this effort including design discussion/development/reviews, etc.: > Varun > > > Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu, > > > Karthik Kambatla, Jason Lowe, Arun Suresh. > > > > > > Regards, > > > Wangda Tan > > > > > > [1] > > > > > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/ > 201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4% > 3DBbp4G8inQZmaMg%40mail.gmail.com%3E > > > > > > > > > - > > To unsubscribe, e-mail:
Re: [VOTE] Merge YARN-3926 (resource profile) to trunk
Hi Daniel Thank you very much for the support. * When you say that the feature can be turned off, do you mean resource types or resource profiles? I know there's an off-by-default property that governs resource profiles, but I didn't see any way to turn off resource types. Yes,*yarn.resourcemanager.resource-profiles.enabled* is false by default and controls off/on of this feature. Now regarding new resource types, its been loaded from "*resource-types.xml"* and by default this XML file is not available in the package. Thus prevents any issues in default case. Once this file is added to a cluster then new resources will be loaded from same. * Even if only CPU and memory are configured, i.e. no additional resource types, the code path is different than it was. Earlier primitive data types were used to represent vcores and memory. As per resource profile work, all resources under YARN is categorized as ResourceInformation and placed under existing Resource object. So memory and vcores will be accessible and operable with same set of public apis from Resources or ResourceCalculator (DRC) same as earlier even when feature is off (Code path is same, but improved to support a unified ResourceInformation class instead of memory/vcores primitive types). Thanks Sunil On Sat, Aug 26, 2017 at 8:10 PM Daniel Templetonwrote: > Quick question, Wangda. When you say that the feature can be turned > off, do you mean resource types or resource profiles? I know there's an > off-by-default property that governs resource profiles, but I didn't see > any way to turn off resource types. Even if only CPU and memory are > configured, i.e. no additional resource types, the code path is > different than it was. Specifically, where CPU and memory were > primitives before, they're now entries in an array whose indexes have to > be looked up through the ResourceUtils class. Did I miss something? > > For those who haven't followed the feature closely, there are really two > features here. Resource types allows for declarative extension of the > resource system in YARN. Resource profiles builds on top of resource > types to allow a user to request a group of resources as a profile, much > like EC2 instance types, e.g. "fast-compute" might mean 32GB RAM, 8 > vcores, and 2 GPUs. > > Daniel > > On 8/23/17 11:49 AM, Wangda Tan wrote: > > Hi folks, > > > > Per earlier discussion [1], I'd like to start a formal vote to merge > > feature branch YARN-3926 (Resource profile) to trunk. The vote will run > for > > 7 days and will end August 30 10:00 AM PDT. > > > > Briefly, YARN-3926 can extend resource model of YARN to support resource > > types other than CPU and memory, so it will be a cornerstone of features > > like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139), FPGA > > support (YARN-5983), network IO scheduling/isolation (YARN-2140). In > > addition to that, YARN-3926 allows admin to preconfigure resource > profiles > > in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64 GB > > disk>, so applications can request "m3.large" profile instead of > specifying > > all resource types’s values. > > > > There are 32 subtasks that were completed as part of this effort. > > > > This feature needs to be explicitly turned on before use. We paid close > > attention to compatibility, performance, and scalability of this feature, > > mentioned in [1], we didn't see observable performance regression in > large > > scale SLS (scheduler load simulator) executions and saw less than 5% > > performance regression by using micro benchmark added by YARN-6775. > > > > This feature works from end-to-end (including UI/CLI/application/server), > > we have setup a cluster with this feature turned on runs for several > weeks, > > we didn't see any issues by far. > > > > Merge JIRA: YARN-7013 (Jenkins gave +1 already). > > Documentation: YARN-7056 > > > > Special thanks to a team of folks who worked hard and contributed towards > > this effort including design discussion/development/reviews, etc.: Varun > > Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu, > > Karthik Kambatla, Jason Lowe, Arun Suresh. > > > > Regards, > > Wangda Tan > > > > [1] > > > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4%3DBbp4G8inQZmaMg%40mail.gmail.com%3E > > > > > - > To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > >
Re: [VOTE] Merge YARN-3926 (resource profile) to trunk
Thanks Arun for checking the feature. * can you folks point me to any test application / framework or has this been integrated with MapReduce Currently this feature is integrated with Distributed Shell. Reference: YARN-5588 This is not yet integrated with MapReduce. This work is ongoing in YARN-6504. * Can you maybe comment a bit on the type of scale testing done ? We have done scale testing by using SLS with this feature turned off and also turned on with only Memory and VCores. This performance was on par with trunk with a variance of ~2%. I will let Wangda to add more color here with data. * Is there a plan to merge this with branch-2 ? We had a discussion with few folks here in Bangalore from MS and Huawei. And will be looking into same as this branch is merged in trunk. - Sunil On Sat, Aug 26, 2017 at 9:47 AM Arun Sureshwrote: > Really looking forward to getting this in. > > Couple of questions: > * Can you maybe comment a bit on the type of scale testing done ? > Specifically, the number of resources tested with and any point where it is > discovered that performance might take a hit. Also, given that we do not > have AM's that currently use this feature, can you folks point me to any > test application / framework or has this been integrated with MapReduce ? > * Is there a plan to merge this with branch-2 ? - Since we would like to > see this in 2.9.0 as well. > > Just to clarify, I am a +1 for merging, irrespective of the above - given > that this is an opt-in feature after all. I am just eager to start using it > :) > > Cheers > -Arun > > > On Thu, Aug 24, 2017 at 10:54 AM, Sunil G wrote: > >> Thank you very much Varun Vasudev, Wangda Tan, Daniel and all the folks >> who >> helped in getting this feature in this level. >> >> Starting with my +1 (binding). >> >> >> # Tested a 5 node cluster with resource profiles enabled/disabled (feature >> is disabled by default) >> >> # All apis added are marked as Unstable/Evolving (very few) >> >> # There is no compatibility break with older versions (we have added UT >> cases also to ensure same) >> >> # Performance tests were done using SLS and also with some tight loops >> unit >> tests. There is no much regression with current trunk. >> >> # Latest jenkins +1 on YARN-7013 for whole branch code. >> >> # Verified old RM UI and new YARN UI (newly added resources could be seen >> easily) >> >> >> Once again thanks all the folks who helped in getting this feature. Kudos! >> >> >> Thanks >> >> - Sunil >> >> >> On Thu, Aug 24, 2017 at 12:20 AM Wangda Tan wrote: >> >> > Hi folks, >> > >> > Per earlier discussion [1], I'd like to start a formal vote to merge >> > feature branch YARN-3926 (Resource profile) to trunk. The vote will run >> for >> > 7 days and will end August 30 10:00 AM PDT. >> > >> > Briefly, YARN-3926 can extend resource model of YARN to support resource >> > types other than CPU and memory, so it will be a cornerstone of features >> > like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139), >> FPGA >> > support (YARN-5983), network IO scheduling/isolation (YARN-2140). In >> > addition to that, YARN-3926 allows admin to preconfigure resource >> profiles >> > in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64 >> GB >> > disk>, so applications can request "m3.large" profile instead of >> specifying >> > all resource types’s values. >> > >> > There are 32 subtasks that were completed as part of this effort. >> > >> > This feature needs to be explicitly turned on before use. We paid close >> > attention to compatibility, performance, and scalability of this >> feature, >> > mentioned in [1], we didn't see observable performance regression in >> large >> > scale SLS (scheduler load simulator) executions and saw less than 5% >> > performance regression by using micro benchmark added by YARN-6775. >> > >> > This feature works from end-to-end (including >> UI/CLI/application/server), >> > we have setup a cluster with this feature turned on runs for several >> weeks, >> > we didn't see any issues by far. >> > >> > Merge JIRA: YARN-7013 (Jenkins gave +1 already). >> > Documentation: YARN-7056 >> > >> > Special thanks to a team of folks who worked hard and contributed >> towards >> > this effort including design discussion/development/reviews, etc.: Varun >> > Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu, >> > Karthik Kambatla, Jason Lowe, Arun Suresh. >> > >> > Regards, >> > Wangda Tan >> > >> > [1] >> > >> > >> http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4%3DBbp4G8inQZmaMg%40mail.gmail.com%3E >> > >> > >
Re: [VOTE] Merge YARN-3926 (resource profile) to trunk
Quick question, Wangda. When you say that the feature can be turned off, do you mean resource types or resource profiles? I know there's an off-by-default property that governs resource profiles, but I didn't see any way to turn off resource types. Even if only CPU and memory are configured, i.e. no additional resource types, the code path is different than it was. Specifically, where CPU and memory were primitives before, they're now entries in an array whose indexes have to be looked up through the ResourceUtils class. Did I miss something? For those who haven't followed the feature closely, there are really two features here. Resource types allows for declarative extension of the resource system in YARN. Resource profiles builds on top of resource types to allow a user to request a group of resources as a profile, much like EC2 instance types, e.g. "fast-compute" might mean 32GB RAM, 8 vcores, and 2 GPUs. Daniel On 8/23/17 11:49 AM, Wangda Tan wrote: Hi folks, Per earlier discussion [1], I'd like to start a formal vote to merge feature branch YARN-3926 (Resource profile) to trunk. The vote will run for 7 days and will end August 30 10:00 AM PDT. Briefly, YARN-3926 can extend resource model of YARN to support resource types other than CPU and memory, so it will be a cornerstone of features like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139), FPGA support (YARN-5983), network IO scheduling/isolation (YARN-2140). In addition to that, YARN-3926 allows admin to preconfigure resource profiles in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64 GB disk>, so applications can request "m3.large" profile instead of specifying all resource types’s values. There are 32 subtasks that were completed as part of this effort. This feature needs to be explicitly turned on before use. We paid close attention to compatibility, performance, and scalability of this feature, mentioned in [1], we didn't see observable performance regression in large scale SLS (scheduler load simulator) executions and saw less than 5% performance regression by using micro benchmark added by YARN-6775. This feature works from end-to-end (including UI/CLI/application/server), we have setup a cluster with this feature turned on runs for several weeks, we didn't see any issues by far. Merge JIRA: YARN-7013 (Jenkins gave +1 already). Documentation: YARN-7056 Special thanks to a team of folks who worked hard and contributed towards this effort including design discussion/development/reviews, etc.: Varun Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu, Karthik Kambatla, Jason Lowe, Arun Suresh. Regards, Wangda Tan [1] http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4%3DBbp4G8inQZmaMg%40mail.gmail.com%3E - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
Re: [VOTE] Merge YARN-3926 (resource profile) to trunk
Really looking forward to getting this in. Couple of questions: * Can you maybe comment a bit on the type of scale testing done ? Specifically, the number of resources tested with and any point where it is discovered that performance might take a hit. Also, given that we do not have AM's that currently use this feature, can you folks point me to any test application / framework or has this been integrated with MapReduce ? * Is there a plan to merge this with branch-2 ? - Since we would like to see this in 2.9.0 as well. Just to clarify, I am a +1 for merging, irrespective of the above - given that this is an opt-in feature after all. I am just eager to start using it :) Cheers -Arun On Thu, Aug 24, 2017 at 10:54 AM, Sunil Gwrote: > Thank you very much Varun Vasudev, Wangda Tan, Daniel and all the folks who > helped in getting this feature in this level. > > Starting with my +1 (binding). > > > # Tested a 5 node cluster with resource profiles enabled/disabled (feature > is disabled by default) > > # All apis added are marked as Unstable/Evolving (very few) > > # There is no compatibility break with older versions (we have added UT > cases also to ensure same) > > # Performance tests were done using SLS and also with some tight loops unit > tests. There is no much regression with current trunk. > > # Latest jenkins +1 on YARN-7013 for whole branch code. > > # Verified old RM UI and new YARN UI (newly added resources could be seen > easily) > > > Once again thanks all the folks who helped in getting this feature. Kudos! > > > Thanks > > - Sunil > > > On Thu, Aug 24, 2017 at 12:20 AM Wangda Tan wrote: > > > Hi folks, > > > > Per earlier discussion [1], I'd like to start a formal vote to merge > > feature branch YARN-3926 (Resource profile) to trunk. The vote will run > for > > 7 days and will end August 30 10:00 AM PDT. > > > > Briefly, YARN-3926 can extend resource model of YARN to support resource > > types other than CPU and memory, so it will be a cornerstone of features > > like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139), FPGA > > support (YARN-5983), network IO scheduling/isolation (YARN-2140). In > > addition to that, YARN-3926 allows admin to preconfigure resource > profiles > > in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64 GB > > disk>, so applications can request "m3.large" profile instead of > specifying > > all resource types’s values. > > > > There are 32 subtasks that were completed as part of this effort. > > > > This feature needs to be explicitly turned on before use. We paid close > > attention to compatibility, performance, and scalability of this feature, > > mentioned in [1], we didn't see observable performance regression in > large > > scale SLS (scheduler load simulator) executions and saw less than 5% > > performance regression by using micro benchmark added by YARN-6775. > > > > This feature works from end-to-end (including UI/CLI/application/server), > > we have setup a cluster with this feature turned on runs for several > weeks, > > we didn't see any issues by far. > > > > Merge JIRA: YARN-7013 (Jenkins gave +1 already). > > Documentation: YARN-7056 > > > > Special thanks to a team of folks who worked hard and contributed towards > > this effort including design discussion/development/reviews, etc.: Varun > > Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu, > > Karthik Kambatla, Jason Lowe, Arun Suresh. > > > > Regards, > > Wangda Tan > > > > [1] > > > > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/ > 201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4% > 3DBbp4G8inQZmaMg%40mail.gmail.com%3E > > >
Re: [VOTE] Merge YARN-3926 (resource profile) to trunk
Thank you very much Varun Vasudev, Wangda Tan, Daniel and all the folks who helped in getting this feature in this level. Starting with my +1 (binding). # Tested a 5 node cluster with resource profiles enabled/disabled (feature is disabled by default) # All apis added are marked as Unstable/Evolving (very few) # There is no compatibility break with older versions (we have added UT cases also to ensure same) # Performance tests were done using SLS and also with some tight loops unit tests. There is no much regression with current trunk. # Latest jenkins +1 on YARN-7013 for whole branch code. # Verified old RM UI and new YARN UI (newly added resources could be seen easily) Once again thanks all the folks who helped in getting this feature. Kudos! Thanks - Sunil On Thu, Aug 24, 2017 at 12:20 AM Wangda Tanwrote: > Hi folks, > > Per earlier discussion [1], I'd like to start a formal vote to merge > feature branch YARN-3926 (Resource profile) to trunk. The vote will run for > 7 days and will end August 30 10:00 AM PDT. > > Briefly, YARN-3926 can extend resource model of YARN to support resource > types other than CPU and memory, so it will be a cornerstone of features > like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139), FPGA > support (YARN-5983), network IO scheduling/isolation (YARN-2140). In > addition to that, YARN-3926 allows admin to preconfigure resource profiles > in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64 GB > disk>, so applications can request "m3.large" profile instead of specifying > all resource types’s values. > > There are 32 subtasks that were completed as part of this effort. > > This feature needs to be explicitly turned on before use. We paid close > attention to compatibility, performance, and scalability of this feature, > mentioned in [1], we didn't see observable performance regression in large > scale SLS (scheduler load simulator) executions and saw less than 5% > performance regression by using micro benchmark added by YARN-6775. > > This feature works from end-to-end (including UI/CLI/application/server), > we have setup a cluster with this feature turned on runs for several weeks, > we didn't see any issues by far. > > Merge JIRA: YARN-7013 (Jenkins gave +1 already). > Documentation: YARN-7056 > > Special thanks to a team of folks who worked hard and contributed towards > this effort including design discussion/development/reviews, etc.: Varun > Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu, > Karthik Kambatla, Jason Lowe, Arun Suresh. > > Regards, > Wangda Tan > > [1] > > http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4%3DBbp4G8inQZmaMg%40mail.gmail.com%3E >
[VOTE] Merge YARN-3926 (resource profile) to trunk
Hi folks, Per earlier discussion [1], I'd like to start a formal vote to merge feature branch YARN-3926 (Resource profile) to trunk. The vote will run for 7 days and will end August 30 10:00 AM PDT. Briefly, YARN-3926 can extend resource model of YARN to support resource types other than CPU and memory, so it will be a cornerstone of features like GPU support (YARN-6223), disk scheduling/isolation (YARN-2139), FPGA support (YARN-5983), network IO scheduling/isolation (YARN-2140). In addition to that, YARN-3926 allows admin to preconfigure resource profiles in the cluster, for example, m3.large means <2 vcores, 8 GB memory, 64 GB disk>, so applications can request "m3.large" profile instead of specifying all resource types’s values. There are 32 subtasks that were completed as part of this effort. This feature needs to be explicitly turned on before use. We paid close attention to compatibility, performance, and scalability of this feature, mentioned in [1], we didn't see observable performance regression in large scale SLS (scheduler load simulator) executions and saw less than 5% performance regression by using micro benchmark added by YARN-6775. This feature works from end-to-end (including UI/CLI/application/server), we have setup a cluster with this feature turned on runs for several weeks, we didn't see any issues by far. Merge JIRA: YARN-7013 (Jenkins gave +1 already). Documentation: YARN-7056 Special thanks to a team of folks who worked hard and contributed towards this effort including design discussion/development/reviews, etc.: Varun Vasudev, Sunil Govind, Daniel Templeton, Vinod Vavilapalli, Yufei Gu, Karthik Kambatla, Jason Lowe, Arun Suresh. Regards, Wangda Tan [1] http://mail-archives.apache.org/mod_mbox/hadoop-yarn-dev/201708.mbox/%3CCAD%2B%2BeCnjEHU%3D-M33QdjnND0ZL73eKwxRua4%3DBbp4G8inQZmaMg%40mail.gmail.com%3E