date:20100811

[jira] Commented: (MAPREDUCE-1780) AccessControlList.toString() is used for serialization of ACL in JobStatus.java

2010-08-11 Thread Vinod K V (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897159#action_12897159
 ] 

Vinod K V commented on MAPREDUCE-1780:
--

Some comments:
 - JobSubmittedEvent is used to persist job-acls to JobHistory but the acls are 
incorrectly written through toString() method. Please add a test/modify the 
existing test in TestJobHistory to verify this bug.
 - Minor: Not directly related to the patch, but can fix it here. In 
QueueManger.dumpConfiguration(), we don't need aclsSubmitJobValue to be a 
StringBuilder. We can drop off getAclsInfo() method itself.

 AccessControlList.toString() is used for serialization of ACL in 
 JobStatus.java
 ---

 Key: MAPREDUCE-1780
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1780
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Attachments: 1780.patch


 HADOOP-6715 is created to fix AccessControlList.toString() for the case of 
 WILDCARD. JobStatus.write() and readFields() assume that toString() returns 
 the serialized String of AccessControlList object, which is not true. Once 
 HADOOP-6715 gets fixed in COMMON, JobStatus.write() and 
 JobStatus.readFields() should be fixed depending on the fix of HADOOP-6715.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation

2010-08-11 Thread Luke Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897178#action_12897178
 ] 

Luke Lu commented on MAPREDUCE-1881:


I have no issue with the statusUpdate method. I got where you're coming from :) 
But I question many users will want to do the same thing. I'm curious about 
many useful instrumentation classes being written. Adding features 
(especially redundant ones), IMO, doesn't necessarily make Hadoop better but 
rather bloated and harder to maintain. You know, perfection is attained not 
when no more can be added, but when no more can be removed.

Another thing about the patch is that if the instrumentation class is specified 
as an empty string, it silently defaults to the composite class with a empty 
list (essentially a noop instrumentation), which is a behavior change from the 
existing behavior: an exception would be thrown.

 Improve TaskTrackerInstrumentation
 --

 Key: MAPREDUCE-1881
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Minor
 Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, 
 mapreduce-1881.patch


 The TaskTrackerInstrumentation class provides a useful way to capture key 
 events at the TaskTracker for use in various reporting tools, but it is 
 currently rather limited, because only one TaskTrackerInstrumentation can be 
 added to a given TaskTracker and this objects receives minimal information 
 about tasks (only their IDs). I propose enhancing the functionality through 
 two changes:
 # Support a comma-separated list of TaskTrackerInstrumentation classes rather 
 than just a single one in the JobConf, and report events to all of them.
 # Make the reportTaskLaunch and reportTaskEnd methods in 
 TaskTrackerInstrumentation receive a reference to a whole Task object rather 
 than just its TaskAttemptID. It might also be useful to make the latter 
 receive the task's final state, i.e. failed, killed, or successful.
 I'm just posting this here to get a sense of whether this is a good idea. If 
 people think it's okay, I will make a patch against trunk that implements 
 these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1780) AccessControlList.toString() is used for serialization of ACL in JobStatus.java

2010-08-11 Thread Ravi Gummadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1780:


Attachment: 1780.v1.patch

Attaching patch incorporating review comments.

Validation of job acls that are logged to history file is added now to 
TestJobHistory. This somehow missed from trunk's patch of MAPREDUCE-1493, which 
was there in Y! dist patch.

 AccessControlList.toString() is used for serialization of ACL in 
 JobStatus.java
 ---

 Key: MAPREDUCE-1780
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1780
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Attachments: 1780.patch, 1780.v1.patch


 HADOOP-6715 is created to fix AccessControlList.toString() for the case of 
 WILDCARD. JobStatus.write() and readFields() assume that toString() returns 
 the serialized String of AccessControlList object, which is not true. Once 
 HADOOP-6715 gets fixed in COMMON, JobStatus.write() and 
 JobStatus.readFields() should be fixed depending on the fix of HADOOP-6715.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-08-11 Thread Amareshwari Sriramadasu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897188#action_12897188
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1920:


Tests that timed out till now:
TestAdminOperationsProtocolWithServiceAuthorization
TestClusterMRNotification
TestDebugScript
TestEmptyJob
TestIsolationRunner
TestJobCleanup
TestJobHistory
TestJobHistoryParsing
TestJobInProgress
TestJobInProgressListener
TestJobKillAndFail
TestJobQueueClient
TestJvmReuse
TestKillSubProcesses
TestMRWithDistributedCache
TestMapredHeartbeat
TestMiniMRBringup

Tests that failed:
TestJobTrackerStart
TestKillCompletedJob

my local ant test run is still running. So, more tests to be added to the above 
list. 

Shall we fix MiniMRCluster to set a persist dir in local file system if 
fileSystem passed is local, instead of fixing these individual tests?
Or shall we disable completed job store for the unit tests by adding conf in 
src/test/mapred-site.xml (similar to disabling retire jobs) as 
TestJobStatusPersistency anyways tests the functionality of completedJobStore?

 Job.getCounters() returns null when using a cluster
 ---

 Key: MAPREDUCE-1920
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Aaron Kimball
Assignee: Tom White
Priority: Critical
 Attachments: MAPREDUCE-1920.patch, MAPREDUCE-1920.patch, 
 MAPREDUCE-1920.patch


 Calling Job.getCounters() after the job has completed (successfully) returns 
 null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1979) Output directory already exists error in gridmix when gridmix.output.directory is not defined

2010-08-11 Thread Ravi Gummadi (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1979:


Attachment: 1979.v1.patch

With earlier patch, TestGridmixSubmission was failing. Attaching new patch with 
the correct fix. Also added testcase.

 Output directory already exists error in gridmix when 
 gridmix.output.directory is not defined
 ---

 Key: MAPREDUCE-1979
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1979
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Attachments: 1979.patch, 1979.v1.patch


 Output directory already exists error is seen in gridmix when 
 gridmix.output.directory is not defined. When gridmix.output.directory is not 
 defined, then gridmix uses inputDir/gridmix/ as output path for gridmix run. 
 Because gridmix is creating outputPath(in this case, inputDir/gridmix/) at 
 the begining, the output path to generate-data-mapreduce-job(i.e. inputDir) 
 already exists and becomes error from mapreduce.
 There is no need of creating this outputPath in any case(whether user 
 specifies the path using gridmix.output.directory OR gridmix itself 
 considering inputDir/gridmix/ ) because the paths are automatically created 
 for output paths of mapreduce jobs(like mkdir -p).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1959) Should use long name for token renewer on the client side

2010-08-11 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897193#action_12897193
]

Hadoop QA commented on MAPREDUCE-1959:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12451719/m1959-02.patch
against trunk revision 983815.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/353/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/353/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/353/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/353/console

This message is automatically generated.

Should use long name for token renewer on the client side
-

Key: MAPREDUCE-1959
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1959
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: job submission, security
Reporter: Kan Zhang
Assignee: Kan Zhang
Attachments: m1959-01.patch, m1959-02.patch

When getting a delegation token from a NN, a client needs to specify the
renewer for the token. For use on a MapRed cluster, JT should be specified as
the renewer. However, in the current code, the client maps JT's long name
(Kerberos principal name) to cluster-internal short name and then sets the
short name as the renewer. This is undesirable for 2 reasons. 1) It's
unnecessary since NN (or JT) converts client-supplied renewer from long to
short name anyway. 2) In principle, the mapping from long to short name
should be done on the server. This is consistent with the authentication
case, where the client uses the same long name to authenticate to multiple
servers and servers map client's long name to their own internal short names.
It facilitates using the same job client to get delegation tokens from
multiple NN's, which may have different mapping rules for JT.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1780) AccessControlList.toString() is used for serialization of ACL in JobStatus.java

2010-08-11 Thread Vinod K V (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897200#action_12897200
 ] 

Vinod K V commented on MAPREDUCE-1780:
--

Looks good, +1 for the patch.

 AccessControlList.toString() is used for serialization of ACL in 
 JobStatus.java
 ---

 Key: MAPREDUCE-1780
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1780
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Attachments: 1780.patch, 1780.v1.patch


 HADOOP-6715 is created to fix AccessControlList.toString() for the case of 
 WILDCARD. JobStatus.write() and readFields() assume that toString() returns 
 the serialized String of AccessControlList object, which is not true. Once 
 HADOOP-6715 gets fixed in COMMON, JobStatus.write() and 
 JobStatus.readFields() should be fixed depending on the fix of HADOOP-6715.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-2003) It should be able to specify different jvm settings for map and reduce child process (via mapred.child.map.java.opts and mapred.child.reduce.java.opts options)

2010-08-11 Thread Vladimir Klimontovich (JIRA)

It should be able to specify different jvm settings for map and reduce child 
process (via mapred.child.map.java.opts and mapred.child.reduce.java.opts 
options) 


 Key: MAPREDUCE-2003
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2003
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Reporter: Vladimir Klimontovich
 Fix For: 0.20.3, 0.21.0, 0.22.0


Sometimes mapper child process requires different JVM settings than reducer. 
For example when mapper requires much more memory than reducer.
Now it's only possible to set options for both using mapred.child.java.opts.

Proposed solution: mapred.child.java.opts could be overwritten by 
mapred.child.map.java.opts or mapred.child.reduce.java.opts. Thus, we're adding 
more flexibility and compatibility with old configurations.

The same should be done for mapred.child.env.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Arun C Murthy (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897328#action_12897328
 ] 

Arun C Murthy commented on MAPREDUCE-220:
-

Scott, sorry for coming in late. 

I have a nit: we seem to create a new ProcfsBasedProcessTree each time - 
wouldn't it be easier to re-use the object? Create it once and re-use it each 
time?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-08-11 Thread Tom White (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897330#action_12897330
 ] 

Tom White commented on MAPREDUCE-1920:
--

 Or shall we disable completed job store for the unit tests by adding conf in 
 src/test/mapred-site.xml (similar to disabling retire jobs) as 
 TestJobStatusPersistency anyways tests the functionality of completedJobStore?

I think this is a much better way of doing it. Thanks for the suggestion. I'll 
prepare a patch.

 Job.getCounters() returns null when using a cluster
 ---

 Key: MAPREDUCE-1920
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Aaron Kimball
Assignee: Tom White
Priority: Critical
 Attachments: MAPREDUCE-1920.patch, MAPREDUCE-1920.patch, 
 MAPREDUCE-1920.patch


 Calling Job.getCounters() after the job has completed (successfully) returns 
 null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation

2010-08-11 Thread Philip Zeyliger (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897334#action_12897334
]

Philip Zeyliger commented on MAPREDUCE-1881:

I'll chime in that I'm using the instrumentation classes and find them a useful
way to listen to some events that are otherwise hard to get at.

Improve TaskTrackerInstrumentation
--

Key: MAPREDUCE-1881
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Minor
Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch,
mapreduce-1881.patch

The TaskTrackerInstrumentation class provides a useful way to capture key
events at the TaskTracker for use in various reporting tools, but it is
currently rather limited, because only one TaskTrackerInstrumentation can be
added to a given TaskTracker and this objects receives minimal information
about tasks (only their IDs). I propose enhancing the functionality through
two changes:
# Support a comma-separated list of TaskTrackerInstrumentation classes rather
than just a single one in the JobConf, and report events to all of them.
# Make the reportTaskLaunch and reportTaskEnd methods in
TaskTrackerInstrumentation receive a reference to a whole Task object rather
than just its TaskAttemptID. It might also be useful to make the latter
receive the task's final state, i.e. failed, killed, or successful.
I'm just posting this here to get a sense of whether this is a good idea. If
people think it's okay, I will make a patch against trunk that implements
these changes.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-2004) IP address vs host name in updating Counter.DATA_LOCAL_MAPS

2010-08-11 Thread Rares Vernica (JIRA)

IP address vs host name in updating Counter.DATA_LOCAL_MAPS
---

 Key: MAPREDUCE-2004
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2004
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.2
Reporter: Rares Vernica
Priority: Minor


Hello,

 I set mapred.task.cache.levels to 1 so that I have only
data-local-map tasks. Still, by looking the the data-local-maps
counter it seems not all map tasks are local. I checked each map task
to see where it run and what split has been assigned to it and all the
maps were actually processing only local data. (BTW, replication was
set to 1.)

I looked into the JobClient so see what information is there for each
split. For each file, the first n-1 splits have an IP address as
location while the n-th split has a host name as location. The reason
for this is that there is a different code path in deciding the
location for the first n-1 splits versus the n-th split. The maps that
processed the splits where the location was a host name were counted
as data-local-maps while the others were not.

So, regardless of the fact that the JobClient gives IP or host names
for splits the job works fine. The problem is that the data-local-maps
counter does not take this into consideration.

Cheers,
Rares

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation

2010-08-11 Thread Arun C Murthy (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897348#action_12897348
]

Arun C Murthy commented on MAPREDUCE-1881:
--

I'm trying to understand the proposal... please help me.

Currently you can define multiple 'sinks' for the same data via
CompositeContext. Thus you can define multiple listeners and each will get the
same data, is that sufficient for this use case?

Improve TaskTrackerInstrumentation
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation

2010-08-11 Thread Luke Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897367#action_12897367
 ] 

Luke Lu commented on MAPREDUCE-1881:


The instrumentation class is related to but not dependent on metrics 
frameworks. Some of the events are actually not collected in the regular 
metrics, so there is an expert level config property 
mapreduce.tasktracker.instrumentation to specify a subclass for 
TaskTrackerInstrumentation which contains all the overridable callbacks. The 
default value for the property is the TaskTrackerMetricsInst class which 
currently implements the Updater interface to collect tasktracker metrics in 
the mapred metrics context. Similarly for metrics v2, 
TaskTrackerMetricsSource would be the default.

Matei and others want to use the overridable instrumentation property to hook 
in other listeners, for things that're not strictly metrics related, like 
statusUpdate, which is useful for his project which does two-level scheduling 
:) He can achieve this with the addition of the statusUpdate method in 
TaskTrackerInstrumentation. To make adding more instrumentation classes (while 
preserving the existing instrumentation like metrics) slightly easier (IMO, a 
user defined composite class is just as easy), he wants to make the property a 
list of classes so that the events are fired for each instances of the 
specified classes.

The latter part of the patch would add a composite instrumentation class that 
dispatches all the events to all the instances of the specified instrumentation 
classes. Currently the patch lacks unit tests for the composite class. I can 
see problems down the road maintaining the class, like making sure it doesn't 
block in one of the classes that can potentially do RPCs etc and properly 
handle exceptions in the delegate objects. 



 Improve TaskTrackerInstrumentation
 --

 Key: MAPREDUCE-1881
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1881
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Matei Zaharia
Assignee: Matei Zaharia
Priority: Minor
 Attachments: mapreduce-1881-v2.patch, mapreduce-1881-v2b.patch, 
 mapreduce-1881.patch


 The TaskTrackerInstrumentation class provides a useful way to capture key 
 events at the TaskTracker for use in various reporting tools, but it is 
 currently rather limited, because only one TaskTrackerInstrumentation can be 
 added to a given TaskTracker and this objects receives minimal information 
 about tasks (only their IDs). I propose enhancing the functionality through 
 two changes:
 # Support a comma-separated list of TaskTrackerInstrumentation classes rather 
 than just a single one in the JobConf, and report events to all of them.
 # Make the reportTaskLaunch and reportTaskEnd methods in 
 TaskTrackerInstrumentation receive a reference to a whole Task object rather 
 than just its TaskAttemptID. It might also be useful to make the latter 
 receive the task's final state, i.e. failed, killed, or successful.
 I'm just posting this here to get a sense of whether this is a good idea. If 
 people think it's okay, I will make a patch against trunk that implements 
 these changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1253) Making Mumak work with Capacity-Scheduler

2010-08-11 Thread Hong Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897380#action_12897380
 ] 

Hong Tang commented on MAPREDUCE-1253:
--

I have reviewed Anirban's earlier and I forgot to comment with +1.

 Making Mumak work with Capacity-Scheduler
 -

 Key: MAPREDUCE-1253
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1253
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/mumak
Affects Versions: 0.21.0, 0.22.0
Reporter: Anirban Dasgupta
Assignee: Anirban Dasgupta
 Attachments: MAPREDUCE-1253-20100406.patch, 
 MAPREDUCE-1253-20100726-2.patch, MAPREDUCE-1253-20100804.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 In order to make the capacity-scheduler work in the mumak simulation 
 environment, we have to replace the job-initialization threads of the 
 capacity scheduler with classes that perform event-based initialization. We 
 propose to use aspectj to disable the threads  of the JobInitializationPoller 
 class used by the Capacity Scheduler, and then perform the corresponding 
 initialization tasks through a simulation job-initialization class that 
 receives periodic wake-up calls from the simulator engine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1253) Making Mumak work with Capacity-Scheduler

2010-08-11 Thread Mahadev konar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1253:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

+1 for the patch. I just committed this. ant test for mumak pass. 



 Making Mumak work with Capacity-Scheduler
 -

 Key: MAPREDUCE-1253
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1253
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/mumak
Affects Versions: 0.21.0, 0.22.0
Reporter: Anirban Dasgupta
Assignee: Anirban Dasgupta
 Attachments: MAPREDUCE-1253-20100406.patch, 
 MAPREDUCE-1253-20100726-2.patch, MAPREDUCE-1253-20100804.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 In order to make the capacity-scheduler work in the mumak simulation 
 environment, we have to replace the job-initialization threads of the 
 capacity scheduler with classes that perform event-based initialization. We 
 propose to use aspectj to disable the threads  of the JobInitializationPoller 
 class used by the Capacity Scheduler, and then perform the corresponding 
 initialization tasks through a simulation job-initialization class that 
 receives periodic wake-up calls from the simulator engine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1253) Making Mumak work with Capacity-Scheduler

2010-08-11 Thread Mahadev konar (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1253:
-

Fix Version/s: 0.22.0

 Making Mumak work with Capacity-Scheduler
 -

 Key: MAPREDUCE-1253
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1253
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/mumak
Affects Versions: 0.21.0, 0.22.0
Reporter: Anirban Dasgupta
Assignee: Anirban Dasgupta
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1253-20100406.patch, 
 MAPREDUCE-1253-20100726-2.patch, MAPREDUCE-1253-20100804.patch

   Original Estimate: 672h
  Remaining Estimate: 672h

 In order to make the capacity-scheduler work in the mumak simulation 
 environment, we have to replace the job-initialization threads of the 
 capacity scheduler with classes that perform event-based initialization. We 
 propose to use aspectj to disable the threads  of the JobInitializationPoller 
 class used by the Capacity Scheduler, and then perform the corresponding 
 initialization tasks through a simulation job-initialization class that 
 receives periodic wake-up calls from the simulator engine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MAPREDUCE-2005) TestDelegationTokenRenewal fails

2010-08-11 Thread Boris Shkolnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik reassigned MAPREDUCE-2005:
-

Assignee: Boris Shkolnik

 TestDelegationTokenRenewal fails
 

 Key: MAPREDUCE-2005
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2005
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-2005-YH20.patch


 looks like the problem is in host resolution.
 test is using localhost:0, but in DelegationTokenRenewal we use 
 getCannonicalName() for localhost, and on some machine it is not localhost
 Fix - change test to use getCannonicalName too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-2005) TestDelegationTokenRenewal fails

2010-08-11 Thread Boris Shkolnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated MAPREDUCE-2005:
--

Attachment: MAPREDUCE-2005-YH20.patch

for previous version, not for commit
I've also updated some comments and debug lines

 TestDelegationTokenRenewal fails
 

 Key: MAPREDUCE-2005
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2005
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Attachments: MAPREDUCE-2005-YH20.patch


 looks like the problem is in host resolution.
 test is using localhost:0, but in DelegationTokenRenewal we use 
 getCannonicalName() for localhost, and on some machine it is not localhost
 Fix - change test to use getCannonicalName too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-152) getMapOutput() keeps failing too many times before the tasktracker fails

2010-08-11 Thread Krishna Ramachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897395#action_12897395
 ] 

Krishna Ramachandran commented on MAPREDUCE-152:


It has been more than 2 years. Is this still an issue?

 getMapOutput() keeps failing too many times before the tasktracker fails
 

 Key: MAPREDUCE-152
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-152
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Yiping Han
Priority: Critical

 We are running a big job on our cluster. There are about 400 reducers. Around 
 361 reducers finished successfully while the last batch of 39 reducers all 
 failed roughly around the same time. After examining the log files, the 
 following error info was found 858 times for a single tasktracker:
 2008-04-21 02:42:45,368 WARN org.apache.hadoop.mapred.TaskTracker: 
 getMapOutput(task_200804101742_0001_m_032077_2,396) failed :
 2008-04-21 02:42:49,468 WARN org.apache.hadoop.mapred.TaskTracker: 
 getMapOutput(task_200804101742_0001_m_032077_2,396) failed :
 2008-04-21 02:43:03,717 WARN org.apache.hadoop.mapred.TaskTracker: 
 getMapOutput(task_200804101742_0001_m_032077_2,396) failed :
 Shouldn't the task tracker failed early without trying so many times?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-223) JobClient should work with -1/+1 version of JobTracker

2010-08-11 Thread Krishna Ramachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897407#action_12897407
 ] 

Krishna Ramachandran commented on MAPREDUCE-223:


it has been sitting for over 2 years and I do not believe anything has changed. 
hdfs I believe provide read only interface for listing/retrieving data over 
http. 

Is this still critical  - to  have similar interface  to embedded JT http 
server on top of what the web interface already provides (for accessing task or 
job logs?)


 JobClient should work with -1/+1 version of JobTracker
 --

 Key: MAPREDUCE-223
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-223
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
 Environment: all
Reporter: Alejandro Abdelnur
Priority: Critical

 Currently there is version check on the RPC calls that enforces the same 
 Hadoop version on the client and the server.
 To enable phased upgrades of systems using Hadoop and Hadoop itself the 
 {{JobClient}} should be able to interact with a {{JobTracker}} of the 
 previous and the next version of Hadoop (or with a range).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-08-11 Thread Tom White (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1920:
-

Attachment: MAPREDUCE-1920.patch

This patch (based on the first one) sets 
mapreduce.jobtracker.persist.jobstatus.active to false in the test 
mapred-site.xml. It passes all unit tests (I ran it on Linux). Here's the 
output of test-patch:

{noformat}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
{noformat}

 Job.getCounters() returns null when using a cluster
 ---

 Key: MAPREDUCE-1920
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Aaron Kimball
Assignee: Tom White
Priority: Critical
 Attachments: MAPREDUCE-1920.patch, MAPREDUCE-1920.patch, 
 MAPREDUCE-1920.patch, MAPREDUCE-1920.patch


 Calling Job.getCounters() after the job has completed (successfully) returns 
 null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1980) TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs MAP_ATTEMPT_KILLED as event type for reduce tasks

2010-08-11 Thread Amar Kamat (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-1980:
--

Attachment: mapreduce-1980-v1.0.patch

Attaching a patch the fixes the bug. test-patch and ant-tests passed on my box.

 TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs 
 MAP_ATTEMPT_KILLED as event type for reduce tasks
 --

 Key: MAPREDUCE-1980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1980
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: mapreduce-1980-v1.0.patch


 TaskAttemptUnsuccessfulCompletionEvent is used to log unsuccessful map and 
 reduce task attempts to JobHistory. Following is the implementation of 
 getEventType() method of TaskAttemptUnsuccessfulCompletionEvent
 /** Get the event type */
   public EventType getEventType() {
 return EventType.MAP_ATTEMPT_KILLED;
   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1980) TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs MAP_ATTEMPT_KILLED as event type for reduce tasks

2010-08-11 Thread Hong Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897427#action_12897427
 ] 

Hong Tang commented on MAPREDUCE-1980:
--

Patch looks good. +1.

 TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs 
 MAP_ATTEMPT_KILLED as event type for reduce tasks
 --

 Key: MAPREDUCE-1980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1980
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: mapreduce-1980-v1.0.patch


 TaskAttemptUnsuccessfulCompletionEvent is used to log unsuccessful map and 
 reduce task attempts to JobHistory. Following is the implementation of 
 getEventType() method of TaskAttemptUnsuccessfulCompletionEvent
 /** Get the event type */
   public EventType getEventType() {
 return EventType.MAP_ATTEMPT_KILLED;
   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1881) Improve TaskTrackerInstrumentation

2010-08-11 Thread Luke Lu (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897442#action_12897442
]

Luke Lu commented on MAPREDUCE-1881:

The jobtracker and tasktracker instrumentation is introduced in HADOOP-3772,
which contains more background info.

Improve TaskTrackerInstrumentation
--

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API

2010-08-11 Thread Hairong Kuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated MAPREDUCE-1981:
-

Attachment: mapredListFiles2.patch

Now HDFS-202 is in, mapredListFiles2.patch is the last piece of code that 
completes the improvement of getSplits performance.

Could a warm heart give it a review? Thanks.

 Improve getSplits performance by using listFiles, the new FileSystem API
 

 Key: MAPREDUCE-1981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: mapredListFiles.patch, mapredListFiles1.patch, 
 mapredListFiles2.patch


 This jira will make FileInputFormat and CombinedFileInputForm to use the new 
 API, thus reducing the number of RPCs to HDFS NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1981) Improve getSplits performance by using listFiles, the new FileSystem API

2010-08-11 Thread Hairong Kuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated MAPREDUCE-1981:
-

Status: Patch Available  (was: Open)

 Improve getSplits performance by using listFiles, the new FileSystem API
 

 Key: MAPREDUCE-1981
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1981
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: job submission
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.22.0

 Attachments: mapredListFiles.patch, mapredListFiles1.patch, 
 mapredListFiles2.patch


 This jira will make FileInputFormat and CombinedFileInputForm to use the new 
 API, thus reducing the number of RPCs to HDFS NameNode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Scott Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897492#action_12897492
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Thanks, Arun. I will update the patch soon.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, MAPREDUCE-220-v1.txt, 
 MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Scott Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-220:
-

Attachment: MAPREDUCE-220-20100811.txt

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Scott Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-220:
-

Status: Open  (was: Patch Available)

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Scott Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Chen updated MAPREDUCE-220:
-

Status: Patch Available  (was: Open)

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Scott Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897501#action_12897501
 ] 

Scott Chen commented on MAPREDUCE-220:
--

Update to address Arun's comment.

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-220) Collecting cpu and memory usage for MapReduce tasks

2010-08-11 Thread Eli Collins (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897509#action_12897509
 ] 

Eli Collins commented on MAPREDUCE-220:
---

Caching the process tree this way works with JVM re-use?

 Collecting cpu and memory usage for MapReduce tasks
 ---

 Key: MAPREDUCE-220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Reporter: Hong Tang
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-220-20100616.txt, MAPREDUCE-220-20100804.txt, 
 MAPREDUCE-220-20100806.txt, MAPREDUCE-220-20100809.txt, 
 MAPREDUCE-220-20100811.txt, MAPREDUCE-220-v1.txt, MAPREDUCE-220.txt


 It would be nice for TaskTracker to collect cpu and memory usage for 
 individual Map or Reduce tasks over time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1496) org.apache.hadoop.mapred.lib.FieldSelectionMapReduce removes empty fields from key/value end

2010-08-11 Thread Krishna Ramachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897515#action_12897515
 ] 

Krishna Ramachandran commented on MAPREDUCE-1496:
-

can you provide more details?
say for example 
your key fields have a   at the end?

like

map.output.key.value.fields.spec, 6 ,5,1-3:0-
instead of 
map.output.key.value.fields.spec, 6,5,1-3:0-


 org.apache.hadoop.mapred.lib.FieldSelectionMapReduce removes empty fields 
 from key/value end
 

 Key: MAPREDUCE-1496
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1496
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Maxim Zizin
Priority: Critical

 If input record's key and/or value has empty fields in the end then these 
 fields will be cut off by org.apache.hadoop.mapred.lib.FieldSelectionMapReduce

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2002) MRUnit driver classes should provide ability to set a configuration object to be passed into the mapper/reducer

2010-08-11 Thread Aaron Kimball (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897523#action_12897523
 ] 

Aaron Kimball commented on MAPREDUCE-2002:
--

Does this duplicate MAPREDUCE-1569?

 MRUnit driver classes should provide ability to set a configuration object to 
 be passed into the mapper/reducer
 ---

 Key: MAPREDUCE-2002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/mrunit
Affects Versions: 0.20.2
Reporter: David Rosenstrauch
Priority: Minor

 Short description:
 Enhance the org.apache.hadoop.mrunit.mapreduce.MapDriver, ReduceDriver, and 
 MapReduceDriver unit test driver classes to contain setConfiguration and 
 withConfiguration methods for passing in user-supplied 
 org.apache.hadoop.conf.Configuration objects, and have those configuration 
 objects eventually get passed on to the Context objects that are passed in to 
 the mapper/reducer setup methods.  (Rather than passing in an empty 
 Configuration object, as is being done now.)
 Long description:
 The MRUnit driver classes (i.e., MapDriver, ReduceDriver, and 
 MapReduceDriver) ought to be enhanced to contain methods for setting a 
 Configuration object to be used by the mapper/reducer being tested - i.e., 
 setConfiguration() and withConfiguration().
 The only way to effectively pass parameters into a mapper or reducer is by 
 setting properties on a configuration object, which the mapper/reducer can 
 then retrieve in their setup step, and use to customize its operation.  As 
 a result, specific mappers/reducers may require the presence of specific 
 configuration properties/parameters in order to function correctly (or at 
 all).  (I am currently coding such a reducer right now.)
 Testing such a mapper/reducer thus requires that the unit testing framework 
 used provide the ability to pass in user-supplied Configuration objects to 
 them so that they can be tested with appropriate parameter values.  However, 
 MRUnit currently does not provide this ability.  (All mappers/reducers are 
 always passed an empty configuration object.)  And there is not even 
 currently any (easy) way for the end-user to fix this problem by creating a 
 simple sub-class that supplies this functionality, as such subclasses would 
 require a substantial reimplementation/override of several MRUnit framework 
 classes.
 I believe this something that is not too difficult to fix in the MRUnit 
 framework code, however, and would greatly help the usability of MRUnit.
 Although I don't have time to code this enhancement right now, if 
 needed/preferred I could squeeze out some time to code up a patch for this.  
 If that's needed, please let me know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1118) Capacity Scheduler scheduling information is hard to read / should be tabular format

2010-08-11 Thread Dick King (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897532#action_12897532
 ] 

Dick King commented on MAPREDUCE-1118:
--

This comment is a review.

First, let me say that I didn't review {{sorttable.js}} .  It would be bad to 
have subtly different versions of this code flying around.

{{CapacitySchedulerServlet.java}} 

near end of {{doGet()}} :

*This is serious*:  {{ByteArrayOutputSteam.writeTo(OutputStream)}} throws.  
Please revise this call to something like

{noformat}
  OutputStream servletOut = null;
  try {
servletOut = response.getOutputStream();
baos.writeTo(servletOut);
  } finally {
if (servletOut != null) {
  servletOut.close();
}
  }
{noformat}

.

*This is semi-serious*:  In {{showQueues}} , where queues are printed, the code

{noformat}
  out.printf(
tda href=\jobqueue_details.jsp?queueName=%s\%s/a/td\n,
name, name);
{noformat}

the code deposits the name right in the middle of hard-core HTML.  If the queue 
names contain obnoxious characters such as a quote or an angle bracket we could 
have a bad day.  These characters should be escaped with HTML escape sequences 
such as {{lt;}} , etc.

Don't forget to escape the ampersands :-) .  I believe that only quote marks 
and angle brackets need to be escaped in the URL, but everything needs to be 
escaped in the rendered text.

*This is a nit*:  In

{noformat}
   out.printf(td%s/td\n, queuesManager.getJobQueue(name)
.getRunningJobs().size());
   out.printf(td%s/td\n, qsc.getNumOfWaitingJobs());
{noformat}

I can't condone dropping numeric data onto a {{%s}} .  I realize that it works 
but it looks ugly to my eye.

*This is potentially serious*: I don't see where {{showQueues}} does the needed 
locking.  You allude to this by defensively dumping into a 
{{ByteArrayOutputStream}} , but the code doesn't lock anything.  I can see why 
it should.  Can queues disappear or appear?

*This is a potential omission*: The block comment before the {{class}} 
declaration claims to implement an advanced mode, but I don't see any footprint 
of such a thing in the code.  In any event, I'm not a big fan of magic URLs.  
The servlet should include a button to bring itself into advanced mode.  If 
there are users that shouldn't be able to go into advanced mode, this should be 
handled in some other manner than hidden URLs.

I don't see the code to get into the scheduler manager servlet.  Perhaps there 
should be a button in the job tracker administration page when the capacity 
scheduler is in use?

{{TaskSchedulingMgr}}

{{infoServer.setAttribute(scheduler, this);}}

*This is a nit*: I would prefer 
{{infoServer.setAttribute(scheduler.scheduler, this);}} .  All of the 
servlets share an attribute namespace.  However, this one isn't bad as such 
things go, since it's hard to imagine another servlet code author putting 
anything except the ambient scheduler into that attribute.

{{TestCapacitySchedulerServlet}}

This is a minor nit.

I can't condone {{assertTrue(queueData.contains(50.0%));}} .  That's the 
moral equivalent of floating point equality.  I do realize that 1/2 can be 
represented exactly in most float systems, but you might want to do something 
else, even if only allowing the value to be {{49.9}} which is okay because the 
servlet does print it out as a {{%.1f}} .

 Capacity Scheduler scheduling information is hard to read / should be tabular 
 format
 

 Key: MAPREDUCE-1118
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1118
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2
Reporter: Allen Wittenauer
Assignee: Krishna Ramachandran
 Attachments: mapred-1118-1.patch, mapred-1118-2.patch, 
 mapred-1118.20S.patch, mapred-1118.patch


 The scheduling information provided by the capacity scheduler is extremely 
 hard to read on the job tracker web page.  Instead of just flat text, it 
 should be presenting the information in a tabular format, similar to what the 
 fair share scheduler provides.  This makes it much easier to compare what 
 different queues are doing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1598) Wrongly configured 'hadoop.job.history.user.location' can cause jobs to be pinned in JobTracker's memory forever

2010-08-11 Thread Dick King (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897535#action_12897535
 ] 

Dick King commented on MAPREDUCE-1598:
--

This comment is a code review.

*This is a minor nit* :  

{noformat}

 throw new IOException(Mkdirs failed to create  + done.toString());
   }
-}
+  } else { // directory exists. Check permissions
+checkDirectoryPermissions(doneDirFs, done,
+mapreduce.jobtracker.jobhistory.completed.location);
+  }

{noformat}

The last {{checkDirectoryPermissions(...)}} call will cruddy up the 
indentation.  The patch otherwise looks right.

 Wrongly configured 'hadoop.job.history.user.location' can cause jobs to be 
 pinned in JobTracker's memory forever
 

 Key: MAPREDUCE-1598
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1598
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Amar Kamat
 Fix For: 0.20.3

 Attachments: mapred-1598


 Wrongly configured 'hadoop.job.history.user.location' can disable 
 job-history. Jobs retires when JobHistory notifies the JobTracker after 
 moving the history file to the done folder (i.e 
 mapreduce.jobtracker.jobhistory.completed.location). If the JobHistory gets 
 disabled, JobTracker would not receive any notification and thus jobs will be 
 pinned in JobTracker's memory forever.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAPREDUCE-166) Remove distcp from hadoop core libraries, and publish documentation

2010-08-11 Thread Krishna Ramachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishna Ramachandran resolved MAPREDUCE-166.


Resolution: Won't Fix

Based on  Owen's comment am closing this


 Remove distcp from hadoop core libraries, and publish documentation
 ---

 Key: MAPREDUCE-166
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-166
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Marco Nicosia
Priority: Critical

 Every time we want to ship a change in distcp, not only do we have to replace 
 the entire version of map-reduce deployed to the clusters, we also have to 
 update internal documentation to reflect those changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2002) MRUnit driver classes should provide ability to set a configuration object to be passed into the mapper/reducer

2010-08-11 Thread David Rosenstrauch (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897547#action_12897547
 ] 

David Rosenstrauch commented on MAPREDUCE-2002:
---

Yes, it does look like a dupe.  (I did do a search through Jira before filing 
this bug, but that bug didn't turn up in my search for some reason.)

 MRUnit driver classes should provide ability to set a configuration object to 
 be passed into the mapper/reducer
 ---

 Key: MAPREDUCE-2002
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2002
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/mrunit
Affects Versions: 0.20.2
Reporter: David Rosenstrauch
Priority: Minor

 Short description:
 Enhance the org.apache.hadoop.mrunit.mapreduce.MapDriver, ReduceDriver, and 
 MapReduceDriver unit test driver classes to contain setConfiguration and 
 withConfiguration methods for passing in user-supplied 
 org.apache.hadoop.conf.Configuration objects, and have those configuration 
 objects eventually get passed on to the Context objects that are passed in to 
 the mapper/reducer setup methods.  (Rather than passing in an empty 
 Configuration object, as is being done now.)
 Long description:
 The MRUnit driver classes (i.e., MapDriver, ReduceDriver, and 
 MapReduceDriver) ought to be enhanced to contain methods for setting a 
 Configuration object to be used by the mapper/reducer being tested - i.e., 
 setConfiguration() and withConfiguration().
 The only way to effectively pass parameters into a mapper or reducer is by 
 setting properties on a configuration object, which the mapper/reducer can 
 then retrieve in their setup step, and use to customize its operation.  As 
 a result, specific mappers/reducers may require the presence of specific 
 configuration properties/parameters in order to function correctly (or at 
 all).  (I am currently coding such a reducer right now.)
 Testing such a mapper/reducer thus requires that the unit testing framework 
 used provide the ability to pass in user-supplied Configuration objects to 
 them so that they can be tested with appropriate parameter values.  However, 
 MRUnit currently does not provide this ability.  (All mappers/reducers are 
 always passed an empty configuration object.)  And there is not even 
 currently any (easy) way for the end-user to fix this problem by creating a 
 simple sub-class that supplies this functionality, as such subclasses would 
 require a substantial reimplementation/override of several MRUnit framework 
 classes.
 I believe this something that is not too difficult to fix in the MRUnit 
 framework code, however, and would greatly help the usability of MRUnit.
 Although I don't have time to code this enhancement right now, if 
 needed/preferred I could squeeze out some time to code up a patch for this.  
 If that's needed, please let me know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-223) JobClient should work with -1/+1 version of JobTracker

2010-08-11 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897558#action_12897558
 ] 

Alejandro Abdelnur commented on MAPREDUCE-223:
--

Wasn't the idea that Avro would help fixing this?

Yes, doing things over HTTP (assuming you take care of not breaking things a 
payload level) works. 

Still Hadoop does not support HTTP natively for client side calls, so this is 
not option without add-on protocol adapter systems fronting JT and NN/DNs. In 
other words, a JT proxy and a HDFS proxy. 

FYI, Oozie is planning to provide JT proxy capabilities.


 JobClient should work with -1/+1 version of JobTracker
 --

 Key: MAPREDUCE-223
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-223
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
 Environment: all
Reporter: Alejandro Abdelnur
Priority: Critical

 Currently there is version check on the RPC calls that enforces the same 
 Hadoop version on the client and the server.
 To enable phased upgrades of systems using Hadoop and Hadoop itself the 
 {{JobClient}} should be able to interact with a {{JobTracker}} of the 
 previous and the next version of Hadoop (or with a range).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1980) TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs MAP_ATTEMPT_KILLED as event type for reduce tasks

2010-08-11 Thread Amareshwari Sriramadasu (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12897567#action_12897567
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1980:


The same problem is present in TaskAttemptFinishedEvent also. setup and cleanup 
tasks are always logged as MAP_ATTEMPT_FINISHED. Can you fix that also?

 TaskAttemptUnsuccessfulCompletionEvent.java incorrectly logs 
 MAP_ATTEMPT_KILLED as event type for reduce tasks
 --

 Key: MAPREDUCE-1980
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1980
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: mapreduce-1980-v1.0.patch


 TaskAttemptUnsuccessfulCompletionEvent is used to log unsuccessful map and 
 reduce task attempts to JobHistory. Following is the implementation of 
 getEventType() method of TaskAttemptUnsuccessfulCompletionEvent
 /** Get the event type */
   public EventType getEventType() {
 return EventType.MAP_ATTEMPT_KILLED;
   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1920) Job.getCounters() returns null when using a cluster

2010-08-11 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1920:
---

   Status: Resolved  (was: Patch Available)
 Hadoop Flags: [Reviewed]
Fix Version/s: 0.21.0
   Resolution: Fixed

I just committed this to trunk and branch 0.21.

Thanks Tom!

 Job.getCounters() returns null when using a cluster
 ---

 Key: MAPREDUCE-1920
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1920
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Aaron Kimball
Assignee: Tom White
Priority: Critical
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1920.patch, MAPREDUCE-1920.patch, 
 MAPREDUCE-1920.patch, MAPREDUCE-1920.patch


 Calling Job.getCounters() after the job has completed (successfully) returns 
 null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1856) Extract a subset of tests for smoke (DOA) validation

2010-08-11 Thread Konstantin Boudnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1856:
--

Status: Open  (was: Patch Available)

 Extract a subset of tests for smoke (DOA) validation
 

 Key: MAPREDUCE-1856
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1856
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Attachments: MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, 
 MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, 
 MAPREDUCE-1856.patch


 Similar to that of HDFS-1199 for MapReduce.
 Adds an ability to run up to 30 minutes of the tests to 'smoke' MapReduce 
 build i.e. find possible issues faster than the full test cycle does).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1856) Extract a subset of tests for smoke (DOA) validation

2010-08-11 Thread Konstantin Boudnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1856:
--

Status: Patch Available  (was: Open)

has not been picked up in 6 days. Resubmitting

 Extract a subset of tests for smoke (DOA) validation
 

 Key: MAPREDUCE-1856
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1856
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik
Assignee: Konstantin Boudnik
 Attachments: MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, 
 MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, MAPREDUCE-1856.patch, 
 MAPREDUCE-1856.patch


 Similar to that of HDFS-1199 for MapReduce.
 Adds an ability to run up to 30 minutes of the tests to 'smoke' MapReduce 
 build i.e. find possible issues faster than the full test cycle does).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

44 matches

Mail list logo