[jira] Created: (MAPREDUCE-1685) Debug statements affecting performance of JobTracker.heartbeat

2010-04-08 Thread Vinod K V (JIRA)
Debug statements affecting performance of JobTracker.heartbeat
--

 Key: MAPREDUCE-1685
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1685
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod K V


Several debug statements that come in the critical section in 
JobTracker.heartbeat() are not protected by a LOG.isDebugEnabled() and so incur 
non-trivial costs, in the order of 15% of the total heartbeat processing time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1526) Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.

2010-04-08 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1526:
-

Attachment: 1526-yahadoop-20-101.patch

Attaching the first cut of the feature.

As Job.getJobId returns null in yhadoop 20.1xx . We are using 
GridmixJob.ORIGNAME as the configuration variable to set the key for job. 
GridmixJob class the related code.

 Cache the job related information while submitting the job , this would avoid 
 many RPC calls to JobTracker.
 ---

 Key: MAPREDUCE-1526
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1526
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Reporter: rahul k singh
 Attachments: 1526-yahadoop-20-101.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1685) Debug statements affecting performance of JobTracker.heartbeat

2010-04-08 Thread Xiao Kang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854879#action_12854879
 ] 

Xiao Kang commented on MAPREDUCE-1685:
--

Any more details? Which version of hadoop the 15% is profied on?

 Debug statements affecting performance of JobTracker.heartbeat
 --

 Key: MAPREDUCE-1685
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1685
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod K V

 Several debug statements that come in the critical section in 
 JobTracker.heartbeat() are not protected by a LOG.isDebugEnabled() and so 
 incur non-trivial costs, in the order of 15% of the total heartbeat 
 processing time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1684) ClusterStatus can be cached in CapacityTaskScheduler.assignTasks()

2010-04-08 Thread Xiao Kang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854884#action_12854884
 ] 

Xiao Kang commented on MAPREDUCE-1684:
--

What's the reason to cache ClusterStatus, performance improvement?

 ClusterStatus can be cached in CapacityTaskScheduler.assignTasks()
 --

 Key: MAPREDUCE-1684
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1684
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/capacity-sched
Reporter: Amareshwari Sriramadasu

 Currently,  CapacityTaskScheduler.assignTasks() calls getClusterStatus() 
 thrice: once in assignTasks(), once in MapTaskScheduler and once in 
 ReduceTaskScheduler. It can be cached in assignTasks() and re-used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1646) Task Killing tests

2010-04-08 Thread Vinay Kumar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854904#action_12854904
 ] 

Vinay Kumar Thota commented on MAPREDUCE-1646:
--

Shall the limit be raised slightly? Say 2 mins? 

I agreed with you for increasing the time slightly 2 mins. However, If a task 
is not started almost 1min, in that case there might be some issue with task 
tracker.As per my opinion whenever job started, atleast one task should be 
started with in fraction of seconds. Overall I am saying, in  these condition 
situations we failing the test and giving the proper message instead of 
increasing the time.

'Good citizen' test should be unrelated to environment as much as possible. If 
you can provide some special settings in the test config - you should do so, If 
test requires a particular environment to exist at the moment of a test 
execution and such env. setting doesn't exist then the test in question should 
fail with proper and meaningful error message. E.g. a person who runs tests 
shouldn't be guessing the required environment.

I agreed with you, however currently I have checked all the possible conditions 
in the test. As per your log,it say some class is missing 
(com.hadoop.compression.lzo.LzoCodec not found) and it seems to me some library 
file is missing at your environment,so that test is failing.

I had been running the test couple of times on 5 node multi cluster and it 
passes consistently every time. Please check the attached log 
file(TEST-org.apache.hadoop.mapred.TestTaskKilling.txt).


 Task Killing tests
 --

 Key: MAPREDUCE-1646
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1646
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: TaskKilling_1646.patch, TaskKilling_1646.patch, 
 TaskKilling_1646.patch, TEST-org.apache.hadoop.mapred.TestTaskKilling.txt, 
 TEST-org.apache.hadoop.mapred.TestTaskKilling.txt


 The following tasks covered in the test.
 1. In a running job, kill a task and verify the job succeeds.
 2. Setup a job with long running tasks that write some output to HDFS. When 
 one of the tasks is running, ensure that
 the output/_temporary/_attempt-id directory is created. Kill the task. After 
 the task is killed, make sure that the
 output/_temporary/_attempt-id directory is cleaned up.
 3. Setup a job with long running tasks that write some output to HDFS. When 
 one of the tasks is running, ensure that
 the output/_temporary/_attempt-id directory is created. Fail the task by 
 simulating the map. After the task is failed,
 make sure that the output/_temporary/_attempt-id directory is cleaned up. The 
 important difference we are trying to
 check is btw kill and fail, there would a subtle difference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1523) Sometimes rumen trace generator fails to extract the job finish time.

2010-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854980#action_12854980
 ] 

Hudson commented on MAPREDUCE-1523:
---

Integrated in Hadoop-Mapreduce-trunk #280 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/280/])
MAPREDUCE-1523. Sometimes rumen trace generator fails to extract the job 
finish time. (dick king via mahadev)


 Sometimes rumen trace generator fails to extract the job finish time.
 -

 Key: MAPREDUCE-1523
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1523
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Hong Tang
Assignee: Dick King
 Fix For: 0.22.0

 Attachments: mapreduce-1523--2010-03-31a-1612PDT.patch


 We saw sometimes (not very often) that rumen may fail to extract the job 
 finish time from Hadoop 0.20 history log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1644) Remove Sqoop from Apache Hadoop (moving to github)

2010-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854978#action_12854978
 ] 

Hudson commented on MAPREDUCE-1644:
---

Integrated in Hadoop-Mapreduce-trunk #280 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/280/])
MAPREDUCE-1644. Remove Sqoop contrib module. Contributed by Aaron Kimball


 Remove Sqoop from Apache Hadoop (moving to github)
 --

 Key: MAPREDUCE-1644
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1644
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1644.patch


 Sqoop is moving to github! All code for sqoop is already live at 
 http://github.com/cloudera/sqoop - this issue removes the duplicate code from 
 the Apache Hadoop repository before the 0.21 release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1585) Create Hadoop Archives version 2 with filenames URL-encoded

2010-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854981#action_12854981
 ] 

Hudson commented on MAPREDUCE-1585:
---

Integrated in Hadoop-Mapreduce-trunk #280 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/280/])


 Create Hadoop Archives version 2 with filenames URL-encoded
 ---

 Key: MAPREDUCE-1585
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1585
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1585.1.patch, MAPREDUCE-1585.2.patch, 
 MAPREDUCE-1585.patch


 Hadoop Archives version 1 don't cope with files that have spaces on their 
 names.
 One proposal is to URLEncode filenames inside the index file (version 2, 
 refers to HADOOP-6591).
 This task is to allow the creation of version 2 files that have file names 
 encoded appropriately. It currently depends on HADOOP-6591

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars

2010-04-08 Thread Paul Burkhardt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Burkhardt updated MAPREDUCE-1686:
--

Priority: Minor  (was: Trivial)

 ClassNotFoundException for custom format classes provided in libjars
 

 Key: MAPREDUCE-1686
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.20.2
Reporter: Paul Burkhardt
Priority: Minor

 The StreamUtil::goodClassOrNull method assumes user-provided classes have 
 package names and if not, they are part of the Hadoop Streaming package. For 
 example, using custom InputFormat or OutputFormat classes without package 
 names will fail with a ClassNotFound exception which is not indicative given 
 the classes are provided in the libjars option. Admittedly, most Java 
 packages should have a package name so this should rarely come up.
 Possible resolution options:
 1) modify the error message to include the actual classname that was 
 attempted in the goodClassOrNull method
 2) call the Configuration::getClassByName method first and if class not found 
 check for default package name and try the call again
 {code}
 public static Class goodClassOrNull(Configuration conf, String className, 
 String defaultPackage) {
 Class clazz = null;
 try {
 clazz = conf.getClassByName(className);
 } catch (ClassNotFoundException cnf) {
 }
 if (clazz == null) {
 if (className.indexOf('.') == -1  defaultPackage != null) {
 className = defaultPackage + . + className;
 try {
 clazz = conf.getClassByName(className);
 } catch (ClassNotFoundException cnf) {
 }
 }
 }
 return clazz;
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1625) Improve grouping of packages in Javadoc

2010-04-08 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1625:
-

 Priority: Blocker  (was: Major)
Fix Version/s: 0.21.0

 Improve grouping of packages in Javadoc
 ---

 Key: MAPREDUCE-1625
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1625
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: documentation
Reporter: Tom White
Assignee: Tom White
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1625.patch, MAPREDUCE-1625.patch


 There are a couple of problems with the current Javadoc:
 * The main MapReduce package documentation on the index page appears under 
 Other Packages below the fold.
 * Some contrib classes and packages are interspersed in the main MapReduce 
 documentation, which is very confusing for users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1623) Apply audience and stability annotations to classes in mapred package

2010-04-08 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855222#action_12855222
 ] 

Tom White commented on MAPREDUCE-1623:
--

The classes in the mapreduce.lib packages should be marked public evolving too.

 Apply audience and stability annotations to classes in mapred package
 -

 Key: MAPREDUCE-1623
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1623
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: documentation
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-1623.patch, MAPREDUCE-1623.patch, 
 MAPREDUCE-1623.patch, MAPREDUCE-1623.patch


 There are lots of implementation classes in org.apache.hadoop.mapred which 
 makes it difficult to see the user-level MapReduce API classes in the 
 Javadoc. (See 
 http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/package-summary.html
  for example.) By marking these implementation classes with the 
 InterfaceAudience.Private annotation we can exclude them from user Javadoc 
 (using HADOOP-6658).
 Later work will move the implementation classes into o.a.h.mapreduce.server 
 and related packages (see MAPREDUCE-561), but applying the annotations is a 
 good first step. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1526) Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.

2010-04-08 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855242#action_12855242
 ] 

Hong Tang commented on MAPREDUCE-1526:
--

Mostly good. Detailed comments:

Statistics.java:
- Use job seq id instead of ORIGNAME+random number as key to track the job 
stats.
- JobStats.setNoOfMaps should not be public. Better if you have a JobStats 
constructor that takes two parameters, and eliminate the need of the setters or 
the empty constructor.
- You should avoid returning the whole jobMaps in ClusterStats, it became worse 
when you save the returned map in the member field of StressJobFactory. It 
seems sufficient to avoid them if you have the following methods in 
ClusterStatus
-- int getNumRunningJobs()
-- CollectionJobStats getRunningJobStats()
- The following code looks wrong. Why is an empty JobStats object created? 
Shouldn't you search from internal JobStats map, and only call listeners if an 
instance of JobStats is found? Also, the if statement seems redundant.
{noformat}
  try {
//Job is completed notify all the listeners.
if (jobStatListeners.size()  0) {
  for (StatListenerJobStats l : jobStatListeners) {
JobStats stats = new JobStats();
l.update(stats);
  }
}
{noformat}

StressJobFactory.java:
- Why do we need to add volatile to loadStatus?
- I think the following statement should be Log.debug() instead of Log.info() 
(and be protected by a check of LOG.isDebugEnabled()):
{noformat}
-if (LOG.isDebugEnabled()) {
-  LOG.info(
+LOG.info(
 System.currentTimeMillis() +  Overloaded is  + Boolean.toString(
   overloaded) +  incompleteMapTasks  + relOp +   +
   OVERLAOD_MAPTASK_MAPSLOT_RATIO + *mapSlotCapacity + ( +
   incompleteMapTasks +   + relOp +   +
   OVERLAOD_MAPTASK_MAPSLOT_RATIO + * +
   clusterStatus.getMaxMapTasks() + ));
-}
+
{noformat}

Misc:
- Some indentation problem in JobSubmitter.java

 Cache the job related information while submitting the job , this would avoid 
 many RPC calls to JobTracker.
 ---

 Key: MAPREDUCE-1526
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1526
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix
Reporter: rahul k singh
 Attachments: 1526-yahadoop-20-101.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1535) Replace usage of FileStatus#isDir()

2010-04-08 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-1535:
---

Attachment: mapreduce-1535-1.patch

Patch attached, modulo a couple cases this is a pretty straightforward change 
from isDir() to isDirectory() and !isDir() to isFile().

 Replace usage of FileStatus#isDir()
 ---

 Key: MAPREDUCE-1535
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1535
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 0.22.0

 Attachments: mapreduce-1535-1.patch


 HADOOP-6585 will deprecate FileStatus#isDir(). This jira is for replacing all 
 uses of isDir() in MR with checks of isDirectory() or isFile() as needed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1646) Task Killing tests

2010-04-08 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855263#action_12855263
 ] 

Konstantin Boudnik commented on MAPREDUCE-1646:
---

Ok, sounds good to me. Most probably the issue I'm seeing is caused by the lack 
of LZO codec. If there's no other comments from anyone please commit it to the 
internal Y20 branch and I'll commit it to the trunk as soon as HADOOP-6332 is 
ready.

 Task Killing tests
 --

 Key: MAPREDUCE-1646
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1646
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota
 Attachments: TaskKilling_1646.patch, TaskKilling_1646.patch, 
 TaskKilling_1646.patch, TEST-org.apache.hadoop.mapred.TestTaskKilling.txt, 
 TEST-org.apache.hadoop.mapred.TestTaskKilling.txt


 The following tasks covered in the test.
 1. In a running job, kill a task and verify the job succeeds.
 2. Setup a job with long running tasks that write some output to HDFS. When 
 one of the tasks is running, ensure that
 the output/_temporary/_attempt-id directory is created. Kill the task. After 
 the task is killed, make sure that the
 output/_temporary/_attempt-id directory is cleaned up.
 3. Setup a job with long running tasks that write some output to HDFS. When 
 one of the tasks is running, ensure that
 the output/_temporary/_attempt-id directory is created. Fail the task by 
 simulating the map. After the task is failed,
 make sure that the output/_temporary/_attempt-id directory is cleaned up. The 
 important difference we are trying to
 check is btw kill and fail, there would a subtle difference.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.