[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6622: -- Fix Version/s: (was: 2.9.0) > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6622: - Fix Version/s: 2.8.0 > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.8.0, 2.7.3, 2.9.0, 2.6.5 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6622: - Fix Version/s: 2.6.5 2.7.3 > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.7.3, 2.9.0, 2.6.5 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6622: - Target Version/s: 2.8.0, 2.7.3, 2.6.5 (was: 2.8.0) > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated MAPREDUCE-6622: - Priority: Critical (was: Major) > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang >Priority: Critical > Labels: supportability > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6622: - Attachment: MAPREDUCE-6622.014.patch For completeness, patch 14 is the final commit with the trivial changes I made. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch, MAPREDUCE-6622.014.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Kanter updated MAPREDUCE-6622: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) Thanks [~rchiang] and everyone for reviews. Committed to trunk and branch-2! > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.013.patch - Remove wildcard imports - Add logging call when HSFileRuntimeException is caught > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch, MAPREDUCE-6622.013.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.012.patch - Add unit test to verify getFullJob() return value with exceptions is backwards compatible - Fix loadJob() and getFullJob() to catch/pass certain exceptions > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch, > MAPREDUCE-6622.012.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Status: Open (was: Patch Available) No jenkins run. Relaunching. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Status: Patch Available (was: Open) Re-launch test. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.011.patch - Change loadJob() to throw YarnRuntimeException instead of Exception - Fix try/catch to distinguish between different error cases > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch, MAPREDUCE-6622.011.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.010.patch - Fixes that allow TestJobHistoryServer to pass. -- Fix loadJob() so that it never returns null. Throw an exception if the history for the job cannot be found. -- Fix getFullJob() so that the exception thrown in loadJob() is caught. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch, MAPREDUCE-6622.010.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.009.patch - Fix up tasks cache initialization - Add unit tests for testing cache type and size > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch, > MAPREDUCE-6622.009.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.008.patch - Keep forgetting to fix whitespace > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch, MAPREDUCE-6622.008.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Release Note: Two recommendations for the mapreduce.jobhistory.loadedtasks.cache.size property: 1) For every 100k of cache size, set the heap size of the Job History Server to 1.2GB. For example, mapreduce.jobhistory.loadedtasks.cache.size=50, heap size=6GB. 2) Make sure that the cache size is larger than the number of tasks required for the largest job run on the cluster. It might be a good idea to set the value slightly higher (say, 20%) in order to allow for job size growth. was: Two recommendations for the mapreduce.jobhistory.loadedtasks.cache.size property: 1) For every 100k of cache size, set the heap size of the Job History Server to 1.2GB. For example, mapreduce.jobhistory.loadedtasks.cache.size=500, heap size=6GB. 2) Make sure that the cache size is larger than the number of tasks required for the largest job run on the cluster. It might be a good idea to set the value slightly higher (say, 20%) in order to allow for job size growth. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.007.patch - Further refactoring based on Karthik's feedback > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch, MAPREDUCE-6622.007.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.006.patch - Fix documentation error in property. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch, > MAPREDUCE-6622.006.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.005.patch - Remove unused member variable cleanUpThread - Change mapreduce.jobhistory.loadedtasks.cache.size default value in mapred-default.xml to empty - Add sanity checking for mapreduce.jobhistory.loadedtasks.cache.size property and only use the new cache if sanity checking passes > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch, MAPREDUCE-6622.005.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.004.patch Update patch based on Robert's feedback - Copy Release Notes into mapred-default.xml comments - Remove "*" imports - Add comment about Guava Cache concurrency level setting - Remove -1 from cache size. Leftover legacy from earlier testing. - Change configuration calls to setInt() > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch, MAPREDUCE-6622.004.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Release Note: Two recommendations for the mapreduce.jobhistory.loadedtasks.cache.size property: 1) For every 100k of cache size, set the heap size of the Job History Server to 1.2GB. For example, mapreduce.jobhistory.loadedtasks.cache.size=500, heap size=6GB. 2) Make sure that the cache size is larger than the number of tasks required for the largest job run on the cluster. It might be a good idea to set the value slightly higher (say, 20%) in order to allow for job size growth. > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.003.patch - Remove references to using Cache#cleanUp() - Add new unit test for adding a job that is larger than the set task limit > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch, > MAPREDUCE-6622.003.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.002.patch - Update documentation in mapred-default.xml - Update behavior of cache sleep property - Fix cache variable name - Make value checking for loadedtasks property more robust > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch, MAPREDUCE-6622.002.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Status: Patch Available (was: Open) > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6622) Add capability to set JHS job cache to a task-based limit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Chiang updated MAPREDUCE-6622: -- Attachment: MAPREDUCE-6622.001.patch > Add capability to set JHS job cache to a task-based limit > - > > Key: MAPREDUCE-6622 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6622 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Affects Versions: 2.7.2 >Reporter: Ray Chiang >Assignee: Ray Chiang > Labels: supportability > Attachments: MAPREDUCE-6622.001.patch > > > When setting the property mapreduce.jobhistory.loadedjobs.cache.size the jobs > can be of varying size. This is generally not a problem when the jobs sizes > are uniform or small, but when the job sizes can be very large (say greater > than 250k tasks), then the JHS heap size can grow tremendously. > In cases, where multiple jobs are very large, then the JHS can lock up and > spend all its time in GC. However, since the cache is holding on to all the > jobs, not much heap space can be freed up. > By setting a property that sets a cap on the number of tasks allowed in the > cache and since the total number of tasks loaded is directly proportional to > the amount of heap used, this should help prevent the JHS from locking up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)