date:20091115

[jira] Updated: (MAPREDUCE-1018) Document changes to the memory management and scheduling model

2009-11-15 Thread rahul k singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1018:
-

Attachment: MAPRED-1018-2.patch

 Document changes to the memory management and scheduling model
 --

 Key: MAPREDUCE-1018
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.21.0
Reporter: Hemanth Yamijala
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch, 
 MAPRED-1018-commons.patch


 There were changes done for the configuration, monitoring and scheduling of 
 high ram jobs. This must be documented in the mapred-defaults.xml and also on 
 forrest documentation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1215) Counter deprecation warnings in jobtracker log are excessive

2009-11-15 Thread Chris Douglas (JIRA)

Counter deprecation warnings in jobtracker log are excessive


 Key: MAPREDUCE-1215
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1215
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Chris Douglas


In a recent test, the log message
{noformat}
WARN org.apache.hadoop.mapred.Counters: Group 
org.apache.hadoop.mapred.Task$Counter is deprecated. \
Use org.apache.hadoop.mapreduce.TaskCounter instead
{noformat}
was nearly a third of a 1.3GB jobtracker log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1018) Document changes to the memory management and scheduling model

2009-11-15 Thread Vinod K V (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778231#action_12778231
]

Vinod K V commented on MAPREDUCE-1018:
--

Already so many other mapreduce issues have only modified cluster-setup.xml,
the one in mapreduce project. Rahul mentioned offline that forrest
documentation is not getting generated in mapreduce sub-project. Assuming we'll
address that in a separate issue, I propose we have only one patch - the mapred
one.

- The mapred Patch has git prefixes which need to be removed

- Monitoring/Scheduling based on RAM is completely removed. So remove the
references too. Just add a note saying that (quoting from HADOOP-5881) there
isn't any need for distinguishing vmem from physical memory w.r.t
configuration. Depending on a site's requirements, the configuration items can
reflect whether one wants tasks to go beyond physical memory or not.

cluster_setup.html
- All config names should be renamed to the new names. Of-course this means a
slightly different patch for 0.20 - which we will come to after the patch
for trunk's done
- mapred.{map|reduce}.child.ulimit also need to be renamed
- What happens when monitoring is enabled, but job has -1?
- Memory-monitoring is no longer defined in terms of per-task-limit and
per-node-limit. It is now driven by per-slot-size and number of slots. We
should use these new terms through-out.
- Before getting into details, consider the following additional
memory-related parameters than can be configured to enable better scheduling:\
The above line is no longer needed.

capacity_scheduler.html
- Feature for monitoring RAM no more. Remove all references.
- Working of scheduling
-- Point 1: 4 parameters, not three. Parameters described in cluster_setup.
vmem.reserved no more used.
-- Point 2: This is changed completely. No more offsets.
Total = numSlots * PerSlotMemSize.
Used = Sigma(numSlotsPerTask * PerSlotMemSize)
-- Point 3: JT now rejects the jobs, not the scheduler.
- See the MapReduce Tutorial for details on how the TT monitors memory usage.
See cluster_setup instead?

- Need to update mapred_tutorial.html's memory management section. Aslo need a
reference to this in both cluster_setup.html as well as capacity_scheduler.html

- Another point I've already mentioned on the JIRA.
Along with everything else, we should document that job setup and job cleanup
tasks of all jobs, either requiring or not requiring high memory for their maps
and reduces, still run on a single slot.

Document changes to the memory management and scheduling model
--

Key: MAPREDUCE-1018
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: documentation
Affects Versions: 0.21.0
Reporter: Hemanth Yamijala
Priority: Blocker
Fix For: 0.21.0

Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch,
MAPRED-1018-commons.patch

There were changes done for the configuration, monitoring and scheduling of
high ram jobs. This must be documented in the mapred-defaults.xml and also on
forrest documentation

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.

2009-11-15 Thread Hemanth Yamijala (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778233#action_12778233
]

Hemanth Yamijala commented on MAPREDUCE-1143:
-

I spoke to Amarsri and Rahul about my comments and found out some explanations:

bq. For instance, even after this patch, I see that the number of running tasks
is decremented under different checks when a task completes and when a task
fails. I assume this is for good reason, but still it is difficult to review.

So, the different checks are as follows:

{code}
completedTask() {
if (this tip is complete) {
return;
}
update counters
}

failedTask() {
if (any attempt was running for this tip before status update) {
update counters
}
}
{code}

It appears completedTask doesn't need the check for TIP being complete at all,
as it can never happen. A tip is marked complete only if atleast one attempt
has completed and remains so. If another attempt comes in reporting success
now, we fail this in status update and do not follow the completedTask code
path at all. So, for all practical purposes, counters are being updated
unconditionally in completedTask. Further, in the same code path, the task is
removed from the active tasks as well. Hence no further check is necessary.

The check in failedTask is required though. This is because a task can fail
*after* it has been marked as succeeded. For e.g. if there are fetch failures
for a map, or if a tracker is lost. In this case, we should not update counters
again because they would have already been updated when the task succeeded.

However, in this context, I am a little worried that we are checking for any
attempt being running before status update, rather than this specific attempt.
At least in theory it is possible this results in some inconsistency.

Consider this sequence of events:
- A task is scheduled
- It is speculated
- It completes - Counters are decremented here.
- It fails (lost TT, fetch failures) - The current patch will decrement
counters here again.
- The speculated attempt succeeds.

In practice though, this scenario may not be very likely. Apparently fetch
failures and lost TTs are the only extreme cases when this is possible. And
there is considerable time lag that can happen before a task completes and it
has to be failed. The time lag will in most cases be large enough to kill the
speculative attempt as well.

With this background, is it worth changing the current patch to:

{code}
failedTask() {
if (this task was running before status update) {
update counters
}
}
{code}

This seems more correct to me, but was wondering if it was worth the change.
Thoughts ?

runningMapTasks counter is not properly decremented in case of failed Tasks.

Key: MAPREDUCE-1143
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143
Project: Hadoop Map/Reduce
Issue Type: Bug
Reporter: rahul k singh
Priority: Blocker
Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch,
MAPRED-1143-2.patch, MAPRED-1143-3.patch, MAPRED-1143-4.patch,
MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch,
MAPRED-1143-ydist-3.patch, MAPRED-1143-ydist-4.patch,
MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-896) Users can set non-writable permissions on temporary files for TT and can abuse disk usage.

2009-11-15 Thread Ravi Gummadi (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Gummadi updated MAPREDUCE-896:
---

Attachment: MR-896.patch

Attaching patch with the fix. Please review and provide your comments.

Patch does the following:

(1) When deleting $jobId/$attemptId/work or when deleting $jobId/$attemptId, TT
uses task-controller to enable the path for deletion(by changing permissions).
(a) LinuxTaskController sets 770 as permissions for all the files/directories
within this dir recursively and then TT will delete the dir. (b)
DefaultTaskController sets rwx for user(same as TT) for this dir recursively
and then TT deletes the dir.

(2) Deletion of $jobId is done as earlier because user can't create any files
in this dir.

(3) Deletion of work dir in TaskRunner(useful with jvm reuse) is also done by
changing the permissions first(no taskcontroller is needed in this case).

(4) With jvm reuse, after final task of a Jvm is finished, workDir is deleted
using procedure mentioned in (1) above by task-controller.

(5) Modified TestTaskTrackerLocalization and
TestLocalizationWithLinuxTaskController to have directories/files created in
$attemptId and set nonwritable permissions which tests whether (a)
DefaultTaskController and (b) LinuxTaskController would be able to remove these
new files from $attemptId.

Users can set non-writable permissions on temporary files for TT and can
abuse disk usage.
--

Key: MAPREDUCE-896
URL: https://issues.apache.org/jira/browse/MAPREDUCE-896
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: tasktracker
Affects Versions: 0.21.0
Reporter: Vinod K V
Assignee: Ravi Gummadi
Fix For: 0.21.0

Attachments: MR-896.patch

As of now, irrespective of the TaskController in use, TT itself does a full
delete on local files created by itself or job tasks. This step, depending
upon TT's umask and the permissions set by files by the user, for e.g in
job-work/task-work or child.tmp directories, may or may not go through
successful completion fully. Thus is left an opportunity for abusing disk
space usage either accidentally or intentionally by TT/users.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-15 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1140:
---

Status: Open  (was: Patch Available)

 Per cache-file refcount can become negative when tasks release 
 distributed-cache files
 --

 Key: MAPREDUCE-1140
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1140-1.txt, patch-1140-2.txt, 
 patch-1140-ydist.txt, patch-1140.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1147) Map output records counter missing for map-only jobs in new API

2009-11-15 Thread Amar Kamat (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778235#action_12778235
 ] 

Amar Kamat commented on MAPREDUCE-1147:
---

For branch 20, all tests except the ones mentioned below have passed.
# hdfs.TestDatanodeBlockScanner FAILED (timeout) 
# hdfs.TestDistributedFileSystem FAILED
# hdfs.server.namenode.TestFsck FAILED (timeout)
# mapred.TestReduceFetch FAILED

None of these seems related to this issue.

 Map output records counter missing for map-only jobs in new API
 ---

 Key: MAPREDUCE-1147
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1147
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.21.0
Reporter: Chris Douglas
Assignee: Amar Kamat
Priority: Blocker
 Fix For: 0.20.2

 Attachments: mapred-1147-v1.3.patch, mapred-1147-v1.4-y20.patch, 
 mapred-1147-v1.4.patch


 In the new API, the counter for map output records is not incremented for 
 map-only jobs

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-15 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1140:
---

Attachment: patch-1140-2.txt

Patch with review comments incorporated.

 Per cache-file refcount can become negative when tasks release 
 distributed-cache files
 --

 Key: MAPREDUCE-1140
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1140-1.txt, patch-1140-2.txt, 
 patch-1140-ydist.txt, patch-1140.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

2009-11-15 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1140:
---

Status: Patch Available  (was: Open)

 Per cache-file refcount can become negative when tasks release 
 distributed-cache files
 --

 Key: MAPREDUCE-1140
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1140-1.txt, patch-1140-2.txt, 
 patch-1140-ydist.txt, patch-1140.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1185) URL to JT webconsole for running job and job history should be the same

2009-11-15 Thread Jothi Padmanabhan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778242#action_12778242
 ] 

Jothi Padmanabhan commented on MAPREDUCE-1185:
--

In the test case, does it also make sense to add one more check -- do not set 
{{conn.setInstanceFollowRedirects(false)}} and then ensure that 
{{conn.connect()}} is successful. If not the contents, we would at least verify 
that the redirection works as expected, no?

 URL to JT webconsole for running job and job history should be the same
 ---

 Key: MAPREDUCE-1185
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1185
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Sharad Agarwal
Assignee: Sharad Agarwal
 Attachments: 1185_v1.patch, 1185_v2.patch, 1185_v3.patch


 The tracking url for running jobs and the jobs which are retired is 
 different. This creates problem for clients which caches the job running url 
 because soon it becomes invalid when job is retired.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1018) Document changes to the memory management and scheduling model

[jira] Created: (MAPREDUCE-1215) Counter deprecation warnings in jobtracker log are excessive

[jira] Commented: (MAPREDUCE-1018) Document changes to the memory management and scheduling model

[jira] Commented: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.

[jira] Updated: (MAPREDUCE-896) Users can set non-writable permissions on temporary files for TT and can abuse disk usage.

[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

[jira] Commented: (MAPREDUCE-1147) Map output records counter missing for map-only jobs in new API

[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

[jira] Updated: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files

[jira] Commented: (MAPREDUCE-1185) URL to JT webconsole for running job and job history should be the same

10 matches

Site Navigation

Mail list logo

Footer information