Re: Policy on adding timeouts to tests

2014-04-16 Thread Steve Loughran
There's a JIRA somewhere that 's never gone in, to add a timeout rule to a
base class; this rule gets picked up in that test class and all children to
specify the timeout

  @Rule
  public final Timeout testTimeout = new Timeout(TEST_TIMEOUT);


   1. If we are going to have a timeout everywhere, it should be
   configurable to different delays. For mavn, that's SystemProperties being
   passed down and extracted.
   2. We don't want that in every @test method
   3. so... we should have a AbstractYarnTest, AbstractMapReduce test, c,
   each picking up the timeout option for their part of the suite
   4. then cut out all the other timeouts.
   5. and finally document this somewhere.
   6. Object store tests need extra-long timeouts, execution time for multi
   GB uploads to S3 and openstack object stores are a function of your upload
   bandwidth, not machine speed

-steve



On 15 April 2014 21:20, Karthik Kambatla ka...@cloudera.com wrote:

 - hwx-hdfs-dev
 + hdfs-dev

 Agree with all the points Chris makes.

 I asked this question in the context of a fix that bumps up the timeout to
 make the test pass on slower machines. If the timeout is not central to the
 test, is the recommended approach to get rid of the timeout?



 On Tue, Apr 15, 2014 at 11:37 AM, Chris Nauroth cnaur...@hortonworks.com
 wrote:

  +common-dev, hdfs-dev
 
  My understanding of the current situation is that we had a period where
 we
  tried to enforce adding timeouts on all new tests in patches, but it
 caused
  trouble, and now we're back to not requiring it.  Jenkins test-patch
 isn't
  checking for it anymore.
 
  I don't think patches are getting rejected for using timeouts though.
 
  The difficulty is that execution time is quite sensitive to the build
  environment.  (Consider top-of-the-line server hardware used in build
  infrastructure vs. a dev running a VirtualBox VM with 1 dedicated CPU, 2
 GB
  RAM and slow virtualized disk.)  When we were enforcing timeouts, it was
  quite common to see follow-up patches tuning up the timeout settings to
  make tests work reliably in a greater variety of environments.  At that
  point, the benefit of using the timeout becomes questionable, because now
  the fast machine is running with the longer timeout too.
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
 
 
  On Mon, Apr 14, 2014 at 9:41 AM, Karthik Kambatla ka...@cloudera.com
  wrote:
 
   Hi folks
  
   Just wanted to check what our policy for adding timeouts to tests is.
 Do
  we
   encourage/discourage using timeouts for tests? If we discourage using
   timeouts for tests in general, are we okay with adding timeouts for a
 few
   tests where we explicitly want the test to fail if it takes longer
 than a
   particular amount of time?
  
   Thanks
   Karthik
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Update interval of default counters

2014-04-16 Thread Akira AJISAKA

Moved mapreduce-dev@ to Bcc.

Hi Dharmesh,

The parameter is to set the interval of polling the progress
of the MRAppMaster, not the Map/Reduce tasks. The tasks send
the progress (includes the counter information) to MRAppMaster
every 3000 milliseconds, which is hard-coded.

That's why a sudden big change in counter values happens
even if the parameter is set to a small value.

Regards,
Akira

(2014/04/16 15:42), Dharmesh Kakadia wrote:

Hi Akira,

Thanks for the reply, but as I understand this is the interval of console
counter printing. What I am trying to get

while(!job.isComplete()){
  getcounters() and do some processing on that.
}

Now this is running fine, but the status I get the same counter values
repeatedly and then suddenly a big change in counter values.
For example, getcounters for REDUCE_INPUT_RECORDS returns values like

0
0
..
0
280
280
...
280
516
516
...
516

etc.

I want to get more finer values, instead of directly jumping from 280 to
516.
Did that make sense? mapreduce.client.progressmonitor.pollinterval does not
seem to effect it. Any workaround ?

Thanks,
Dharmesh




On Tue, Apr 15, 2014 at 7:51 PM, Akira AJISAKA
ajisa...@oss.nttdata.co.jpwrote:


Moved to u...@hadoop.apache.org.

You can configure the interval by setting
mapreduce.client.progressmonitor.pollinterval parameter.
The default value is 1000 ms.

For more details, please see http://hadoop.apache.org/docs/
stable/hadoop-mapreduce-client/hadoop-mapreduce-
client-core/mapred-default.xml.

Regards,
Akira


(2014/04/15 15:29), Dharmesh Kakadia wrote:


Hi,

What is the update interval of inbuilt framework counters? Is that
configurable?
I am trying to collect very fine grained information about the job
execution and using counters for that. It would be great if someone can
point me to documentation/code for it. Thanks in advance.

Thanks,
Dharmesh










Re: Update interval of default counters

2014-04-16 Thread Dharmesh Kakadia
Hi Akira,

Thanks fir the quick reply.
Any particular reason for hard-coding it? Is there a workaround? I want to
be able to get the counters as fine as possible. Also can you point me to
the relevant source code. I am willing to take the issue and contribute if
required.

Thanks,
Dharmesh


On Wed, Apr 16, 2014 at 3:14 PM, Akira AJISAKA
ajisa...@oss.nttdata.co.jpwrote:

 Moved mapreduce-dev@ to Bcc.

 Hi Dharmesh,

 The parameter is to set the interval of polling the progress
 of the MRAppMaster, not the Map/Reduce tasks. The tasks send
 the progress (includes the counter information) to MRAppMaster
 every 3000 milliseconds, which is hard-coded.

 That's why a sudden big change in counter values happens
 even if the parameter is set to a small value.

 Regards,
 Akira


 (2014/04/16 15:42), Dharmesh Kakadia wrote:

 Hi Akira,

 Thanks for the reply, but as I understand this is the interval of console
 counter printing. What I am trying to get

 while(!job.isComplete()){
   getcounters() and do some processing on that.
 }

 Now this is running fine, but the status I get the same counter values
 repeatedly and then suddenly a big change in counter values.
 For example, getcounters for REDUCE_INPUT_RECORDS returns values like

 0
 0
 ..
 0
 280
 280
 ...
 280
 516
 516
 ...
 516

 etc.

 I want to get more finer values, instead of directly jumping from 280 to
 516.
 Did that make sense? mapreduce.client.progressmonitor.pollinterval does
 not
 seem to effect it. Any workaround ?

 Thanks,
 Dharmesh




 On Tue, Apr 15, 2014 at 7:51 PM, Akira AJISAKA
 ajisa...@oss.nttdata.co.jpwrote:

  Moved to u...@hadoop.apache.org.

 You can configure the interval by setting
 mapreduce.client.progressmonitor.pollinterval parameter.
 The default value is 1000 ms.

 For more details, please see http://hadoop.apache.org/docs/
 stable/hadoop-mapreduce-client/hadoop-mapreduce-
 client-core/mapred-default.xml.

 Regards,
 Akira


 (2014/04/15 15:29), Dharmesh Kakadia wrote:

  Hi,

 What is the update interval of inbuilt framework counters? Is that
 configurable?
 I am trying to collect very fine grained information about the job
 execution and using counters for that. It would be great if someone can
 point me to documentation/code for it. Thanks in advance.

 Thanks,
 Dharmesh








[jira] [Resolved] (MAPREDUCE-3406) Add node information to bin/mapred job -list-attempt-ids and other improvements

2014-04-16 Thread Chen He (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen He resolved MAPREDUCE-3406.


  Resolution: Duplicate
Target Version/s: 2.0.0-alpha, 0.23.3, 3.0.0  (was: 0.23.3, 2.0.0-alpha, 
3.0.0)

 Add node information to bin/mapred job -list-attempt-ids and other 
 improvements
 ---

 Key: MAPREDUCE-3406
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3406
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Ravi Prakash
Assignee: Ravi Prakash
 Fix For: 0.24.0


 From [~rramya]
 Providing the NM information where the containers are scheduled in bin/mapred 
 job -list-attempt-ids will be helpful in automation, debugging and to avoid 
 grepping through the AM logs.
 From my own observation, the list-attempt-ids should list the attempt ids and 
 not require the arguments. The arguments if given, can be used to filter the 
 results. From the usage:
 bq. [-list-attempt-ids job-id task-type task-state]. Valid values for 
 task-type are MAP REDUCE JOB_SETUP JOB_CLEANUP TASK_CLEANUP. Valid values 
 for task-state are running, completed



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5840) Update MapReduce calls to ProxyUsers#authorize.

2014-04-16 Thread Chris Nauroth (JIRA)
Chris Nauroth created MAPREDUCE-5840:


 Summary: Update MapReduce calls to ProxyUsers#authorize.
 Key: MAPREDUCE-5840
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5840
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0, 2.4.0
Reporter: Chris Nauroth
Assignee: Benoy Antony
Priority: Minor


HADOOP-10499 will remove an unnecessary overload of {{ProxyUsers#authorize}}. 
This issue tracks updating call sites in the MapReduce code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Policy on adding timeouts to tests

2014-04-16 Thread Tsuyoshi OZAWA
Hi Karthik,

Some tests with servers like MiniCluster or ZK can never end because
of unexpected busy loop or something if the tests don't have timeouts.
It can blocks the other jobs of Jenkins server. Therefore, IMHO, we
should add timeouts when we write tests with them.

Thanks,
- Tsuyoshi

On Wed, Apr 16, 2014 at 6:11 PM, Steve Loughran ste...@hortonworks.com wrote:
 There's a JIRA somewhere that 's never gone in, to add a timeout rule to a
 base class; this rule gets picked up in that test class and all children to
 specify the timeout

   @Rule
   public final Timeout testTimeout = new Timeout(TEST_TIMEOUT);


1. If we are going to have a timeout everywhere, it should be
configurable to different delays. For mavn, that's SystemProperties being
passed down and extracted.
2. We don't want that in every @test method
3. so... we should have a AbstractYarnTest, AbstractMapReduce test, c,
each picking up the timeout option for their part of the suite
4. then cut out all the other timeouts.
5. and finally document this somewhere.
6. Object store tests need extra-long timeouts, execution time for multi
GB uploads to S3 and openstack object stores are a function of your upload
bandwidth, not machine speed

 -steve



 On 15 April 2014 21:20, Karthik Kambatla ka...@cloudera.com wrote:

 - hwx-hdfs-dev
 + hdfs-dev

 Agree with all the points Chris makes.

 I asked this question in the context of a fix that bumps up the timeout to
 make the test pass on slower machines. If the timeout is not central to the
 test, is the recommended approach to get rid of the timeout?



 On Tue, Apr 15, 2014 at 11:37 AM, Chris Nauroth cnaur...@hortonworks.com
 wrote:

  +common-dev, hdfs-dev
 
  My understanding of the current situation is that we had a period where
 we
  tried to enforce adding timeouts on all new tests in patches, but it
 caused
  trouble, and now we're back to not requiring it.  Jenkins test-patch
 isn't
  checking for it anymore.
 
  I don't think patches are getting rejected for using timeouts though.
 
  The difficulty is that execution time is quite sensitive to the build
  environment.  (Consider top-of-the-line server hardware used in build
  infrastructure vs. a dev running a VirtualBox VM with 1 dedicated CPU, 2
 GB
  RAM and slow virtualized disk.)  When we were enforcing timeouts, it was
  quite common to see follow-up patches tuning up the timeout settings to
  make tests work reliably in a greater variety of environments.  At that
  point, the benefit of using the timeout becomes questionable, because now
  the fast machine is running with the longer timeout too.
 
  Chris Nauroth
  Hortonworks
  http://hortonworks.com/
 
 
 
  On Mon, Apr 14, 2014 at 9:41 AM, Karthik Kambatla ka...@cloudera.com
  wrote:
 
   Hi folks
  
   Just wanted to check what our policy for adding timeouts to tests is.
 Do
  we
   encourage/discourage using timeouts for tests? If we discourage using
   timeouts for tests in general, are we okay with adding timeouts for a
 few
   tests where we explicitly want the test to fail if it takes longer
 than a
   particular amount of time?
  
   Thanks
   Karthik
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 


 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



-- 
- Tsuyoshi


[jira] [Resolved] (MAPREDUCE-3704) Yarn client goes into tight loop upon connection failure

2014-04-16 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp resolved MAPREDUCE-3704.


Resolution: Not a Problem

I think this old issue has already been fixed.

 Yarn client goes into tight loop upon connection failure
 

 Key: MAPREDUCE-3704
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3704
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mrv2
Affects Versions: 0.23.0, 0.24.0
Reporter: Daryn Sharp

 If the client fails to connect to the AM or HS, it will go into a tight loop 
 retrying the connection.  The log rapidly grows with multiple log lines per 
 attempt.  Based one of the logs, the client was pounding on the AM ~1000/sec.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (MAPREDUCE-3825) MR should not be getting duplicate tokens for a MR Job.

2014-04-16 Thread Mit Desai (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mit Desai resolved MAPREDUCE-3825.
--

  Resolution: Duplicate
Target Version/s: 2.0.0-alpha, 0.23.3, 3.0.0  (was: 0.23.3, 2.0.0-alpha, 
3.0.0)

I see that HADOOP-7967 was fixed and the change needed for this fix has already 
been checked in with it.
Resolving this issue as dup.

 MR should not be getting duplicate tokens for a MR Job.
 ---

 Key: MAPREDUCE-3825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Affects Versions: 0.23.1, 0.24.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: MAPREDUCE-3825.patch, TokenCache.pdf, solution4.patch


 This is the counterpart to HADOOP-7967.  
 MR gets tokens for all input, output and the default filesystem when a MR job 
 is submitted. 
 The APIs in FileSystem make it challenging to avoid duplicate tokens when 
 there are file systems that have embedded
 filesystems.
 Here is the original description that Daryn wrote: 
 The token cache currently tries to assume a filesystem's token service key.  
 The assumption generally worked while there was a one to one mapping of 
 filesystem to token.  With the advent of multi-token filesystems like viewfs, 
 the token cache will try to use a service key (ie. for viewfs) that will 
 never exist (because it really gets the mounted fs tokens).
 The descriop



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5841) uber job doesn't terminate on getting mapred job kill

2014-04-16 Thread Sangjin Lee (JIRA)
Sangjin Lee created MAPREDUCE-5841:
--

 Summary: uber job doesn't terminate on getting mapred job kill
 Key: MAPREDUCE-5841
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5841
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.3.0
Reporter: Sangjin Lee


If you issue a mapred job -kill against a uberized job, the job (and the yarn 
application) state transitions to KILLED, but the application master process 
continues to run. The job actually runs to completion.

This can be easily reproduced by running a sleep job:

{noformat}
hadoop jar hadoop-mapreduce-client-jobclient-2.3.0-tests.jar sleep -m 1 -r 0 
-mt 30
{noformat}

Issue a kill with mapred job -kill \[job-id\]. The UI will show the job (app) 
is in the KILLED state. However, you can see the application master is still 
running.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Policy on adding timeouts to tests

2014-04-16 Thread Vinod Kumar Vavilapalli
The other advantage of timeout is early failure - earlier than the uber 10 min 
timeout that seems to exist in the build files. Usually the test-writer has a 
general idea of how long the test is supposed to run and if that doesn't 
happen, we can fail early. Clearly, this involves choosing a reasonable timeout 
so that the test can pass on local machines, different OSes and/or in VMs.

+Vinod

On Apr 15, 2014, at 11:37 AM, Chris Nauroth cnaur...@hortonworks.com wrote:

 +common-dev, hdfs-dev
 
 My understanding of the current situation is that we had a period where we
 tried to enforce adding timeouts on all new tests in patches, but it caused
 trouble, and now we're back to not requiring it.  Jenkins test-patch isn't
 checking for it anymore.
 
 I don't think patches are getting rejected for using timeouts though.
 
 The difficulty is that execution time is quite sensitive to the build
 environment.  (Consider top-of-the-line server hardware used in build
 infrastructure vs. a dev running a VirtualBox VM with 1 dedicated CPU, 2 GB
 RAM and slow virtualized disk.)  When we were enforcing timeouts, it was
 quite common to see follow-up patches tuning up the timeout settings to
 make tests work reliably in a greater variety of environments.  At that
 point, the benefit of using the timeout becomes questionable, because now
 the fast machine is running with the longer timeout too.
 
 Chris Nauroth
 Hortonworks
 http://hortonworks.com/
 
 
 
 On Mon, Apr 14, 2014 at 9:41 AM, Karthik Kambatla ka...@cloudera.comwrote:
 
 Hi folks
 
 Just wanted to check what our policy for adding timeouts to tests is. Do we
 encourage/discourage using timeouts for tests? If we discourage using
 timeouts for tests in general, are we okay with adding timeouts for a few
 tests where we explicitly want the test to fail if it takes longer than a
 particular amount of time?
 
 Thanks
 Karthik
 
 
 -- 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader 
 of this message is not the intended recipient, you are hereby notified that 
 any printing, copying, dissemination, distribution, disclosure or 
 forwarding of this communication is strictly prohibited. If you have 
 received this communication in error, please contact the sender immediately 
 and delete it from your system. Thank You.


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Update interval of default counters

2014-04-16 Thread Akira AJISAKA

I'm thinking the reason for hard-coding is to protect Hadoop cluster
from high network traffic. If the value is too small, there are
too many network traffic between Map/Reduce tasks and MRAppMaster.

Please see https://issues.apache.org/jira/browse/MAPREDUCE-4381 also.

That's why you need to be very careful if you really want to change
the value.

The source code is at
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java 
(line 532-533)


  /** The number of milliseconds between progress reports. */
  public static final int PROGRESS_INTERVAL = 3000;

Regards,
Akira

(2014/04/16 22:17), Dharmesh Kakadia wrote:

Hi Akira,

Thanks fir the quick reply.
Any particular reason for hard-coding it? Is there a workaround? I want to
be able to get the counters as fine as possible. Also can you point me to
the relevant source code. I am willing to take the issue and contribute if
required.

Thanks,
Dharmesh


On Wed, Apr 16, 2014 at 3:14 PM, Akira AJISAKA
ajisa...@oss.nttdata.co.jpwrote:


Moved mapreduce-dev@ to Bcc.

Hi Dharmesh,

The parameter is to set the interval of polling the progress
of the MRAppMaster, not the Map/Reduce tasks. The tasks send
the progress (includes the counter information) to MRAppMaster
every 3000 milliseconds, which is hard-coded.

That's why a sudden big change in counter values happens
even if the parameter is set to a small value.

Regards,
Akira


(2014/04/16 15:42), Dharmesh Kakadia wrote:


Hi Akira,

Thanks for the reply, but as I understand this is the interval of console
counter printing. What I am trying to get

while(!job.isComplete()){
   getcounters() and do some processing on that.
}

Now this is running fine, but the status I get the same counter values
repeatedly and then suddenly a big change in counter values.
For example, getcounters for REDUCE_INPUT_RECORDS returns values like

0
0
..
0
280
280
...
280
516
516
...
516

etc.

I want to get more finer values, instead of directly jumping from 280 to
516.
Did that make sense? mapreduce.client.progressmonitor.pollinterval does
not
seem to effect it. Any workaround ?

Thanks,
Dharmesh




On Tue, Apr 15, 2014 at 7:51 PM, Akira AJISAKA
ajisa...@oss.nttdata.co.jpwrote:

  Moved to u...@hadoop.apache.org.


You can configure the interval by setting
mapreduce.client.progressmonitor.pollinterval parameter.
The default value is 1000 ms.

For more details, please see http://hadoop.apache.org/docs/
stable/hadoop-mapreduce-client/hadoop-mapreduce-
client-core/mapred-default.xml.

Regards,
Akira


(2014/04/15 15:29), Dharmesh Kakadia wrote:

  Hi,


What is the update interval of inbuilt framework counters? Is that
configurable?
I am trying to collect very fine grained information about the job
execution and using counters for that. It would be great if someone can
point me to documentation/code for it. Thanks in advance.

Thanks,
Dharmesh