Re: Update interval of default counters
I'm thinking the reason for hard-coding is to protect Hadoop cluster from high network traffic. If the value is too small, there are too many network traffic between Map/Reduce tasks and MRAppMaster. Please see https://issues.apache.org/jira/browse/MAPREDUCE-4381 also. That's why you need to be very careful if you really want to change the value. The source code is at hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java (line 532-533) /** The number of milliseconds between progress reports. */ public static final int PROGRESS_INTERVAL = 3000; Regards, Akira (2014/04/16 22:17), Dharmesh Kakadia wrote: Hi Akira, Thanks fir the quick reply. Any particular reason for hard-coding it? Is there a workaround? I want to be able to get the counters as fine as possible. Also can you point me to the relevant source code. I am willing to take the issue and contribute if required. Thanks, Dharmesh On Wed, Apr 16, 2014 at 3:14 PM, Akira AJISAKA wrote: Moved mapreduce-dev@ to Bcc. Hi Dharmesh, The parameter is to set the interval of polling the progress of the MRAppMaster, not the Map/Reduce tasks. The tasks send the progress (includes the counter information) to MRAppMaster every 3000 milliseconds, which is hard-coded. That's why a sudden big change in counter values happens even if the parameter is set to a small value. Regards, Akira (2014/04/16 15:42), Dharmesh Kakadia wrote: Hi Akira, Thanks for the reply, but as I understand this is the interval of console counter printing. What I am trying to get while(!job.isComplete()){ getcounters() and do some processing on that. } Now this is running fine, but the status I get the same counter values repeatedly and then suddenly a big change in counter values. For example, getcounters for REDUCE_INPUT_RECORDS returns values like 0 0 .. 0 280 280 ... 280 516 516 ... 516 etc. I want to get more finer values, instead of directly jumping from 280 to 516. Did that make sense? mapreduce.client.progressmonitor.pollinterval does not seem to effect it. Any workaround ? Thanks, Dharmesh On Tue, Apr 15, 2014 at 7:51 PM, Akira AJISAKA wrote: Moved to u...@hadoop.apache.org. You can configure the interval by setting "mapreduce.client.progressmonitor.pollinterval" parameter. The default value is 1000 ms. For more details, please see http://hadoop.apache.org/docs/ stable/hadoop-mapreduce-client/hadoop-mapreduce- client-core/mapred-default.xml. Regards, Akira (2014/04/15 15:29), Dharmesh Kakadia wrote: Hi, What is the update interval of inbuilt framework counters? Is that configurable? I am trying to collect very fine grained information about the job execution and using counters for that. It would be great if someone can point me to documentation/code for it. Thanks in advance. Thanks, Dharmesh
Re: Policy on adding timeouts to tests
The other advantage of timeout is early failure - earlier than the uber 10 min timeout that seems to exist in the build files. Usually the test-writer has a general idea of how long the test is supposed to run and if that doesn't happen, we can fail early. Clearly, this involves choosing a reasonable timeout so that the test can pass on local machines, different OSes and/or in VMs. +Vinod On Apr 15, 2014, at 11:37 AM, Chris Nauroth wrote: > +common-dev, hdfs-dev > > My understanding of the current situation is that we had a period where we > tried to enforce adding timeouts on all new tests in patches, but it caused > trouble, and now we're back to not requiring it. Jenkins test-patch isn't > checking for it anymore. > > I don't think patches are getting rejected for using timeouts though. > > The difficulty is that execution time is quite sensitive to the build > environment. (Consider top-of-the-line server hardware used in build > infrastructure vs. a dev running a VirtualBox VM with 1 dedicated CPU, 2 GB > RAM and slow virtualized disk.) When we were enforcing timeouts, it was > quite common to see follow-up patches tuning up the timeout settings to > make tests work reliably in a greater variety of environments. At that > point, the benefit of using the timeout becomes questionable, because now > the fast machine is running with the longer timeout too. > > Chris Nauroth > Hortonworks > http://hortonworks.com/ > > > > On Mon, Apr 14, 2014 at 9:41 AM, Karthik Kambatla wrote: > >> Hi folks >> >> Just wanted to check what our policy for adding timeouts to tests is. Do we >> encourage/discourage using timeouts for tests? If we discourage using >> timeouts for tests in general, are we okay with adding timeouts for a few >> tests where we explicitly want the test to fail if it takes longer than a >> particular amount of time? >> >> Thanks >> Karthik >> > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. signature.asc Description: Message signed with OpenPGP using GPGMail
[jira] [Created] (MAPREDUCE-5841) uber job doesn't terminate on getting mapred job kill
Sangjin Lee created MAPREDUCE-5841: -- Summary: uber job doesn't terminate on getting mapred job kill Key: MAPREDUCE-5841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5841 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.3.0 Reporter: Sangjin Lee If you issue a "mapred job -kill" against a uberized job, the job (and the yarn application) state transitions to KILLED, but the application master process continues to run. The job actually runs to completion. This can be easily reproduced by running a sleep job: {noformat} hadoop jar hadoop-mapreduce-client-jobclient-2.3.0-tests.jar sleep -m 1 -r 0 -mt 30 {noformat} Issue a kill with "mapred job -kill \[job-id\]". The UI will show the job (app) is in the KILLED state. However, you can see the application master is still running. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-3825) MR should not be getting duplicate tokens for a MR Job.
[ https://issues.apache.org/jira/browse/MAPREDUCE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mit Desai resolved MAPREDUCE-3825. -- Resolution: Duplicate Target Version/s: 2.0.0-alpha, 0.23.3, 3.0.0 (was: 0.23.3, 2.0.0-alpha, 3.0.0) I see that HADOOP-7967 was fixed and the change needed for this fix has already been checked in with it. Resolving this issue as dup. > MR should not be getting duplicate tokens for a MR Job. > --- > > Key: MAPREDUCE-3825 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3825 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Affects Versions: 0.23.1, 0.24.0 >Reporter: Daryn Sharp >Assignee: Daryn Sharp > Attachments: MAPREDUCE-3825.patch, TokenCache.pdf, solution4.patch > > > This is the counterpart to HADOOP-7967. > MR gets tokens for all input, output and the default filesystem when a MR job > is submitted. > The APIs in FileSystem make it challenging to avoid duplicate tokens when > there are file systems that have embedded > filesystems. > Here is the original description that Daryn wrote: > The token cache currently tries to assume a filesystem's token service key. > The assumption generally worked while there was a one to one mapping of > filesystem to token. With the advent of multi-token filesystems like viewfs, > the token cache will try to use a service key (ie. for viewfs) that will > never exist (because it really gets the mounted fs tokens). > The descriop -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-3704) Yarn client goes into tight loop upon connection failure
[ https://issues.apache.org/jira/browse/MAPREDUCE-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daryn Sharp resolved MAPREDUCE-3704. Resolution: Not a Problem I think this old issue has already been fixed. > Yarn client goes into tight loop upon connection failure > > > Key: MAPREDUCE-3704 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3704 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client, mrv2 >Affects Versions: 0.23.0, 0.24.0 >Reporter: Daryn Sharp > > If the client fails to connect to the AM or HS, it will go into a tight loop > retrying the connection. The log rapidly grows with multiple log lines per > attempt. Based one of the logs, the client was pounding on the AM ~1000/sec. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Policy on adding timeouts to tests
Hi Karthik, Some tests with servers like MiniCluster or ZK can never end because of unexpected busy loop or something if the tests don't have timeouts. It can blocks the other jobs of Jenkins server. Therefore, IMHO, we should add timeouts when we write tests with them. Thanks, - Tsuyoshi On Wed, Apr 16, 2014 at 6:11 PM, Steve Loughran wrote: > There's a JIRA somewhere that 's never gone in, to add a timeout rule to a > base class; this rule gets picked up in that test class and all children to > specify the timeout > > @Rule > public final Timeout testTimeout = new Timeout(TEST_TIMEOUT); > > >1. If we are going to have a timeout everywhere, it should be >configurable to different delays. For mavn, that's SystemProperties being >passed down and extracted. >2. We don't want that in every @test method >3. so... we should have a AbstractYarnTest, AbstractMapReduce test, &c, >each picking up the timeout option for their part of the suite >4. then cut out all the other timeouts. >5. and finally document this somewhere. >6. Object store tests need extra-long timeouts, execution time for multi >GB uploads to S3 and openstack object stores are a function of your upload >bandwidth, not machine speed > > -steve > > > > On 15 April 2014 21:20, Karthik Kambatla wrote: > >> - hwx-hdfs-dev >> + hdfs-dev >> >> Agree with all the points Chris makes. >> >> I asked this question in the context of a fix that bumps up the timeout to >> make the test pass on slower machines. If the timeout is not central to the >> test, is the recommended approach to get rid of the timeout? >> >> >> >> On Tue, Apr 15, 2014 at 11:37 AM, Chris Nauroth > >wrote: >> >> > +common-dev, hdfs-dev >> > >> > My understanding of the current situation is that we had a period where >> we >> > tried to enforce adding timeouts on all new tests in patches, but it >> caused >> > trouble, and now we're back to not requiring it. Jenkins test-patch >> isn't >> > checking for it anymore. >> > >> > I don't think patches are getting rejected for using timeouts though. >> > >> > The difficulty is that execution time is quite sensitive to the build >> > environment. (Consider top-of-the-line server hardware used in build >> > infrastructure vs. a dev running a VirtualBox VM with 1 dedicated CPU, 2 >> GB >> > RAM and slow virtualized disk.) When we were enforcing timeouts, it was >> > quite common to see follow-up patches tuning up the timeout settings to >> > make tests work reliably in a greater variety of environments. At that >> > point, the benefit of using the timeout becomes questionable, because now >> > the fast machine is running with the longer timeout too. >> > >> > Chris Nauroth >> > Hortonworks >> > http://hortonworks.com/ >> > >> > >> > >> > On Mon, Apr 14, 2014 at 9:41 AM, Karthik Kambatla > > >wrote: >> > >> > > Hi folks >> > > >> > > Just wanted to check what our policy for adding timeouts to tests is. >> Do >> > we >> > > encourage/discourage using timeouts for tests? If we discourage using >> > > timeouts for tests in general, are we okay with adding timeouts for a >> few >> > > tests where we explicitly want the test to fail if it takes longer >> than a >> > > particular amount of time? >> > > >> > > Thanks >> > > Karthik >> > > >> > >> > -- >> > CONFIDENTIALITY NOTICE >> > NOTICE: This message is intended for the use of the individual or entity >> to >> > which it is addressed and may contain information that is confidential, >> > privileged and exempt from disclosure under applicable law. If the reader >> > of this message is not the intended recipient, you are hereby notified >> that >> > any printing, copying, dissemination, distribution, disclosure or >> > forwarding of this communication is strictly prohibited. If you have >> > received this communication in error, please contact the sender >> immediately >> > and delete it from your system. Thank You. >> > >> > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. -- - Tsuyoshi
[jira] [Created] (MAPREDUCE-5840) Update MapReduce calls to ProxyUsers#authorize.
Chris Nauroth created MAPREDUCE-5840: Summary: Update MapReduce calls to ProxyUsers#authorize. Key: MAPREDUCE-5840 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5840 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0, 2.4.0 Reporter: Chris Nauroth Assignee: Benoy Antony Priority: Minor HADOOP-10499 will remove an unnecessary overload of {{ProxyUsers#authorize}}. This issue tracks updating call sites in the MapReduce code. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (MAPREDUCE-3406) Add node information to bin/mapred job -list-attempt-ids and other improvements
[ https://issues.apache.org/jira/browse/MAPREDUCE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen He resolved MAPREDUCE-3406. Resolution: Duplicate Target Version/s: 2.0.0-alpha, 0.23.3, 3.0.0 (was: 0.23.3, 2.0.0-alpha, 3.0.0) > Add node information to bin/mapred job -list-attempt-ids and other > improvements > --- > > Key: MAPREDUCE-3406 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3406 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 0.24.0 > > > From [~rramya] > Providing the NM information where the containers are scheduled in bin/mapred > job -list-attempt-ids will be helpful in automation, debugging and to avoid > grepping through the AM logs. > From my own observation, the list-attempt-ids should list the attempt ids and > not require the arguments. The arguments if given, can be used to filter the > results. From the usage: > bq. [-list-attempt-ids ]. Valid values for > are MAP REDUCE JOB_SETUP JOB_CLEANUP TASK_CLEANUP. Valid values > for are running, completed -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Update interval of default counters
Hi Akira, Thanks fir the quick reply. Any particular reason for hard-coding it? Is there a workaround? I want to be able to get the counters as fine as possible. Also can you point me to the relevant source code. I am willing to take the issue and contribute if required. Thanks, Dharmesh On Wed, Apr 16, 2014 at 3:14 PM, Akira AJISAKA wrote: > Moved mapreduce-dev@ to Bcc. > > Hi Dharmesh, > > The parameter is to set the interval of polling the progress > of the MRAppMaster, not the Map/Reduce tasks. The tasks send > the progress (includes the counter information) to MRAppMaster > every 3000 milliseconds, which is hard-coded. > > That's why a sudden big change in counter values happens > even if the parameter is set to a small value. > > Regards, > Akira > > > (2014/04/16 15:42), Dharmesh Kakadia wrote: > >> Hi Akira, >> >> Thanks for the reply, but as I understand this is the interval of console >> counter printing. What I am trying to get >> >> while(!job.isComplete()){ >> getcounters() and do some processing on that. >> } >> >> Now this is running fine, but the status I get the same counter values >> repeatedly and then suddenly a big change in counter values. >> For example, getcounters for REDUCE_INPUT_RECORDS returns values like >> >> 0 >> 0 >> .. >> 0 >> 280 >> 280 >> ... >> 280 >> 516 >> 516 >> ... >> 516 >> >> etc. >> >> I want to get more finer values, instead of directly jumping from 280 to >> 516. >> Did that make sense? mapreduce.client.progressmonitor.pollinterval does >> not >> seem to effect it. Any workaround ? >> >> Thanks, >> Dharmesh >> >> >> >> >> On Tue, Apr 15, 2014 at 7:51 PM, Akira AJISAKA >> wrote: >> >> Moved to u...@hadoop.apache.org. >>> >>> You can configure the interval by setting >>> "mapreduce.client.progressmonitor.pollinterval" parameter. >>> The default value is 1000 ms. >>> >>> For more details, please see http://hadoop.apache.org/docs/ >>> stable/hadoop-mapreduce-client/hadoop-mapreduce- >>> client-core/mapred-default.xml. >>> >>> Regards, >>> Akira >>> >>> >>> (2014/04/15 15:29), Dharmesh Kakadia wrote: >>> >>> Hi, What is the update interval of inbuilt framework counters? Is that configurable? I am trying to collect very fine grained information about the job execution and using counters for that. It would be great if someone can point me to documentation/code for it. Thanks in advance. Thanks, Dharmesh >>> >> >
Re: Update interval of default counters
Moved mapreduce-dev@ to Bcc. Hi Dharmesh, The parameter is to set the interval of polling the progress of the MRAppMaster, not the Map/Reduce tasks. The tasks send the progress (includes the counter information) to MRAppMaster every 3000 milliseconds, which is hard-coded. That's why a sudden big change in counter values happens even if the parameter is set to a small value. Regards, Akira (2014/04/16 15:42), Dharmesh Kakadia wrote: Hi Akira, Thanks for the reply, but as I understand this is the interval of console counter printing. What I am trying to get while(!job.isComplete()){ getcounters() and do some processing on that. } Now this is running fine, but the status I get the same counter values repeatedly and then suddenly a big change in counter values. For example, getcounters for REDUCE_INPUT_RECORDS returns values like 0 0 .. 0 280 280 ... 280 516 516 ... 516 etc. I want to get more finer values, instead of directly jumping from 280 to 516. Did that make sense? mapreduce.client.progressmonitor.pollinterval does not seem to effect it. Any workaround ? Thanks, Dharmesh On Tue, Apr 15, 2014 at 7:51 PM, Akira AJISAKA wrote: Moved to u...@hadoop.apache.org. You can configure the interval by setting "mapreduce.client.progressmonitor.pollinterval" parameter. The default value is 1000 ms. For more details, please see http://hadoop.apache.org/docs/ stable/hadoop-mapreduce-client/hadoop-mapreduce- client-core/mapred-default.xml. Regards, Akira (2014/04/15 15:29), Dharmesh Kakadia wrote: Hi, What is the update interval of inbuilt framework counters? Is that configurable? I am trying to collect very fine grained information about the job execution and using counters for that. It would be great if someone can point me to documentation/code for it. Thanks in advance. Thanks, Dharmesh
Re: Policy on adding timeouts to tests
There's a JIRA somewhere that 's never gone in, to add a timeout rule to a base class; this rule gets picked up in that test class and all children to specify the timeout @Rule public final Timeout testTimeout = new Timeout(TEST_TIMEOUT); 1. If we are going to have a timeout everywhere, it should be configurable to different delays. For mavn, that's SystemProperties being passed down and extracted. 2. We don't want that in every @test method 3. so... we should have a AbstractYarnTest, AbstractMapReduce test, &c, each picking up the timeout option for their part of the suite 4. then cut out all the other timeouts. 5. and finally document this somewhere. 6. Object store tests need extra-long timeouts, execution time for multi GB uploads to S3 and openstack object stores are a function of your upload bandwidth, not machine speed -steve On 15 April 2014 21:20, Karthik Kambatla wrote: > - hwx-hdfs-dev > + hdfs-dev > > Agree with all the points Chris makes. > > I asked this question in the context of a fix that bumps up the timeout to > make the test pass on slower machines. If the timeout is not central to the > test, is the recommended approach to get rid of the timeout? > > > > On Tue, Apr 15, 2014 at 11:37 AM, Chris Nauroth >wrote: > > > +common-dev, hdfs-dev > > > > My understanding of the current situation is that we had a period where > we > > tried to enforce adding timeouts on all new tests in patches, but it > caused > > trouble, and now we're back to not requiring it. Jenkins test-patch > isn't > > checking for it anymore. > > > > I don't think patches are getting rejected for using timeouts though. > > > > The difficulty is that execution time is quite sensitive to the build > > environment. (Consider top-of-the-line server hardware used in build > > infrastructure vs. a dev running a VirtualBox VM with 1 dedicated CPU, 2 > GB > > RAM and slow virtualized disk.) When we were enforcing timeouts, it was > > quite common to see follow-up patches tuning up the timeout settings to > > make tests work reliably in a greater variety of environments. At that > > point, the benefit of using the timeout becomes questionable, because now > > the fast machine is running with the longer timeout too. > > > > Chris Nauroth > > Hortonworks > > http://hortonworks.com/ > > > > > > > > On Mon, Apr 14, 2014 at 9:41 AM, Karthik Kambatla > >wrote: > > > > > Hi folks > > > > > > Just wanted to check what our policy for adding timeouts to tests is. > Do > > we > > > encourage/discourage using timeouts for tests? If we discourage using > > > timeouts for tests in general, are we okay with adding timeouts for a > few > > > tests where we explicitly want the test to fail if it takes longer > than a > > > particular amount of time? > > > > > > Thanks > > > Karthik > > > > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or entity > to > > which it is addressed and may contain information that is confidential, > > privileged and exempt from disclosure under applicable law. If the reader > > of this message is not the intended recipient, you are hereby notified > that > > any printing, copying, dissemination, distribution, disclosure or > > forwarding of this communication is strictly prohibited. If you have > > received this communication in error, please contact the sender > immediately > > and delete it from your system. Thank You. > > > -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.