[
https://issues.apache.org/jira/browse/HADOOP-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661503#action_12661503
]
Vinod K V commented on HADOOP-4939:
-----------------------------------
Few comments:
- The patch is breaking compilation because of the change in ClusterStatus
constructor:
-- src/mapred/org/apache/hadoop/mapred/LocalJobRunner.java +389
-- src/test/org/apache/hadoop/mapred/TestJobQueueTaskScheduler.java:138
- When sleepJob(or rather examples) are not on the path, it fails but with the
output as follows:
{code}
JOB org.apache.hadoop.examples.SleepJob failed to run
Waiting for the job org.apache.hadoop.examples.SleepJob to start
{code}
We should avoid the last line, *if* we can.
- We can report progress of the jobs every once in a while when running the
tests. Now it just stays dumb till the progress reaches the threshold values.
- I think writing statements to a LOG is better than printing on standard
output.
- With a HOD allocation, the lost TaskTrackers simulating testcase fails even
though keys are setup. This is because hadoop-daemons.sh tries on remote nodes
to change to the non-existend directory HADOOP_HOME.
{code}
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME"
\; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
{code}
A simple solution would be to throw away all changes from hadoop-daemon.sh
and hadoop-daemons.sh and simpley use slaves.sh as follows:
{code}
HOSTLIST=conf/_reliability_test_slaves_file_ ./bin/slaves.sh ls
{code}
- the -ww flag to ps (ps auxw -ww) is not available on cygwin. It only
modifies screen output and can be avoided. A side nit that I observed is that
SIGCONT doesn't seem to work on cygwin. That would make the lost tasktracker
simulation test completely useless on cygwin.
- The randomness of failures is pretty peculiar in the tests. Though it can be
admitted that it can be changed later if need be.
> Create a test that would inject random failures for tasks in large jobs and
> would also inject TaskTracker failures
> ------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-4939
> URL: https://issues.apache.org/jira/browse/HADOOP-4939
> Project: Hadoop Core
> Issue Type: Sub-task
> Components: mapred, test
> Reporter: Devaraj Das
> Assignee: Devaraj Das
> Fix For: 0.20.0
>
> Attachments: 4939.1.patch, 4939.patch
>
>
> Create a test that would inject random failures for tasks in large jobs and
> would also inject TaskTracker failures
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.