[jira] Updated: (MAPREDUCE-1018) Document changes to the memory management and scheduling model

2009-11-15 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1018:
-

Attachment: MAPRED-1018-2.patch

 Document changes to the memory management and scheduling model
 --

 Key: MAPREDUCE-1018
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.21.0
Reporter: Hemanth Yamijala
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch, 
 MAPRED-1018-commons.patch


 There were changes done for the configuration, monitoring and scheduling of 
 high ram jobs. This must be documented in the mapred-defaults.xml and also on 
 forrest documentation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1215) Counter deprecation warnings in jobtracker log are excessive

2009-11-15 Thread Chris Douglas (JIRA)
Counter deprecation warnings in jobtracker log are excessive


 Key: MAPREDUCE-1215
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1215
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Chris Douglas


In a recent test, the log message
{noformat}
WARN org.apache.hadoop.mapred.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. \
Use org.apache.hadoop.mapreduce.TaskCounter instead
{noformat}
was nearly a third of a 1.3GB jobtracker log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1018) Document changes to the memory management and scheduling model

2009-11-15 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778231#action_12778231
 ] 

Vinod K V commented on MAPREDUCE-1018:
--

Already so many other mapreduce issues have only modified cluster-setup.xml, 
the one in mapreduce project. Rahul mentioned offline that forrest 
documentation is not getting generated in mapreduce sub-project. Assuming we'll 
address that in a separate issue, I propose we have only one patch - the mapred 
one.

 - The mapred Patch has git prefixes which need to be removed

 - Monitoring/Scheduling based on RAM is completely removed. So remove the 
references too. Just add a note saying that (quoting from HADOOP-5881) there 
isn't any need for distinguishing vmem from physical memory w.r.t 
configuration. Depending on a site's requirements, the configuration items can 
reflect whether one wants tasks to go beyond physical memory or not.

cluster_setup.html
 - All config names should be renamed to the new names. Of-course this means a 
slightly different patch for 0.20 - which we will come to after the patch
 for trunk's done
 - mapred.{map|reduce}.child.ulimit also need to be renamed
 - What happens when monitoring is enabled, but job has -1?
 - Memory-monitoring is no longer defined in terms of per-task-limit and 
per-node-limit. It is now driven by per-slot-size and number of slots. We 
should use these new terms through-out.
 - Before getting into details, consider the following additional 
memory-related parameters than can be configured to enable better scheduling:\
 The above line is no longer needed.
 
capacity_scheduler.html
 - Feature for monitoring RAM no more. Remove all references.
 - Working of scheduling
   -- Point 1: 4 parameters, not three. Parameters described in cluster_setup.  
vmem.reserved no more used.
   -- Point 2: This is changed completely. No more offsets.
   Total = numSlots * PerSlotMemSize.
   Used = Sigma(numSlotsPerTask * PerSlotMemSize)
   -- Point 3: JT now rejects the jobs, not the scheduler.
 - See the MapReduce Tutorial for details on how the TT monitors memory usage.
 See cluster_setup instead?

 - Need to update mapred_tutorial.html's memory management section. Aslo need a 
reference to this in both cluster_setup.html as well as capacity_scheduler.html

 - Another point I've already mentioned on the JIRA.
 Along with everything else, we should document that job setup and job cleanup 
tasks of all jobs, either requiring or not requiring high memory for their maps 
and reduces, still run on a single slot.

 Document changes to the memory management and scheduling model
 --

 Key: MAPREDUCE-1018
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.21.0
Reporter: Hemanth Yamijala
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch, 
 MAPRED-1018-commons.patch


 There were changes done for the configuration, monitoring and scheduling of 
 high ram jobs. This must be documented in the mapred-defaults.xml and also on 
 forrest documentation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.

2009-11-15 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778233#action_12778233
 ] 

Hemanth Yamijala commented on MAPREDUCE-1143:
-

I spoke to Amarsri and Rahul about my comments and found out some explanations:

bq. For instance, even after this patch, I see that the number of running tasks 
is decremented under different checks when a task completes and when a task 
fails. I assume this is for good reason, but still it is difficult to review.

So, the different checks are as follows:

{code}
completedTask() {
  if (this tip is complete) {
return;
  }
  update counters
}

failedTask() {
  if (any attempt was running for this tip before status update) {
update counters
  }
}
{code}

It appears completedTask doesn't need the check for TIP being complete at all, 
as it can never happen. A tip is marked complete only if atleast one attempt 
has completed and  remains so. If another attempt comes in reporting success 
now, we fail this in status update and do not follow the completedTask code 
path at all. So, for all practical purposes, counters are being updated 
unconditionally in completedTask. Further, in the same code path, the task is 
removed from the active tasks as well. Hence no further check is necessary.

The check in failedTask is required though. This is because a task can fail 
*after* it has been marked as succeeded. For e.g. if there are fetch failures 
for a map, or if a tracker is lost. In this case, we should not update counters 
again because they would have already been updated when the task succeeded.

However, in this context, I am a little worried that we are checking for any 
attempt being running before status update, rather than this specific attempt. 
At least in theory it is possible this results in some inconsistency.

Consider this sequence of events:
- A task is scheduled
- It is speculated
- It completes - Counters are decremented here.
- It fails (lost TT, fetch failures) - The current patch will decrement 
counters here again.
- The speculated attempt succeeds.

In practice though, this scenario may not be very likely. Apparently fetch 
failures and lost TTs are the only extreme cases when this is possible. And 
there is considerable time lag that can happen before a task completes and it 
has to be failed. The time lag will in most cases be large enough to kill the 
speculative attempt as well.

With this background, is it worth changing the current patch to:

{code}
failedTask() {
  if (this task was running before status update) {
update counters
  }
}
{code}

This seems more correct to me, but was wondering if it was worth the change. 
Thoughts ?

 runningMapTasks counter is not properly decremented in case of failed Tasks.
 

 Key: MAPREDUCE-1143
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: rahul k singh
Priority: Blocker
 Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, 
 MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch, 
 MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, 
 MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch, 
 MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-896) Users can set non-writable permissions on temporary files for TT and can abuse disk usage.

2009-11-15 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-896:
---

Attachment: MR-896.patch

Attaching patch with the fix. Please review and provide your comments.

Patch does the following:

(1) When deleting $jobId/$attemptId/work or when deleting $jobId/$attemptId, TT 
uses task-controller to enable the path for deletion(by changing permissions). 
(a) LinuxTaskController sets 770 as permissions for all the files/directories 
within this dir recursively and then TT will delete the dir. (b) 
DefaultTaskController sets rwx for user(same as TT) for this dir recursively 
and then TT deletes the dir.

(2) Deletion of $jobId is done as earlier because user can't create any files 
in this dir.

(3) Deletion of work dir in TaskRunner(useful with jvm reuse) is also done by 
changing the permissions first(no taskcontroller is needed in this case).

(4) With jvm reuse, after final task of a Jvm is finished, workDir is deleted 
using procedure mentioned in (1) above by task-controller.

(5) Modified TestTaskTrackerLocalization and 
TestLocalizationWithLinuxTaskController to have directories/files created in 
$attemptId and set nonwritable permissions which tests whether (a) 
DefaultTaskController and (b) LinuxTaskController would be able to remove these 
new files from $attemptId.

 Users can set non-writable permissions on temporary files for TT and can 
 abuse disk usage.
 --

 Key: MAPREDUCE-896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-896
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Vinod K V
Assignee: Ravi Gummadi
 Fix For: 0.21.0

 Attachments: MR-896.patch


 As of now, irrespective of the TaskController in use, TT itself does a full 
 delete on local files created by itself or job tasks. This step, depending 
 upon TT's umask and the permissions set by files by the user, for e.g in 
 job-work/task-work or child.tmp directories, may or may not go through 
 successful completion fully. Thus is left an opportunity for abusing disk 
 space usage either accidentally or intentionally by TT/users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-15 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1140:
---

Status: Open  (was: Patch Available)

 Per cache-file refcount can become negative when tasks release 
 distributed-cache files
 --

 Key: MAPREDUCE-1140
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1140-1.txt, patch-1140-2.txt, 
 patch-1140-ydist.txt, patch-1140.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1147) Map output records counter missing for map-only jobs in new API

2009-11-15 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778235#action_12778235
 ] 

Amar Kamat commented on MAPREDUCE-1147:
---

For branch 20, all tests except the ones mentioned below have passed.
# hdfs.TestDatanodeBlockScanner FAILED (timeout) 
# hdfs.TestDistributedFileSystem FAILED
# hdfs.server.namenode.TestFsck FAILED (timeout)
# mapred.TestReduceFetch FAILED

None of these seems related to this issue.

 Map output records counter missing for map-only jobs in new API
 ---

 Key: MAPREDUCE-1147
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1147
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.21.0
Reporter: Chris Douglas
Assignee: Amar Kamat
Priority: Blocker
 Fix For: 0.20.2

 Attachments: mapred-1147-v1.3.patch, mapred-1147-v1.4-y20.patch, 
 mapred-1147-v1.4.patch


 In the new API, the counter for map output records is not incremented for 
 map-only jobs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-15 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1140:
---

Attachment: patch-1140-2.txt

Patch with review comments incorporated.

 Per cache-file refcount can become negative when tasks release 
 distributed-cache files
 --

 Key: MAPREDUCE-1140
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1140-1.txt, patch-1140-2.txt, 
 patch-1140-ydist.txt, patch-1140.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-15 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1140:
---

Status: Patch Available  (was: Open)

 Per cache-file refcount can become negative when tasks release 
 distributed-cache files
 --

 Key: MAPREDUCE-1140
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1140-1.txt, patch-1140-2.txt, 
 patch-1140-ydist.txt, patch-1140.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1185) URL to JT webconsole for running job and job history should be the same

2009-11-15 Thread Jothi Padmanabhan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778242#action_12778242
 ] 

Jothi Padmanabhan commented on MAPREDUCE-1185:
--

In the test case, does it also make sense to add one more check -- do not set 
{{conn.setInstanceFollowRedirects(false)}} and then ensure that 
{{conn.connect()}} is successful. If not the contents, we would at least verify 
that the redirection works as expected, no?

 URL to JT webconsole for running job and job history should be the same
 ---

 Key: MAPREDUCE-1185
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1185
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Sharad Agarwal
Assignee: Sharad Agarwal
 Attachments: 1185_v1.patch, 1185_v2.patch, 1185_v3.patch


 The tracking url for running jobs and the jobs which are retired is 
 different. This creates problem for clients which caches the job running url 
 because soon it becomes invalid when job is retired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.