Large-scale reliability tests
-----------------------------
Key: HADOOP-2483
URL: https://issues.apache.org/jira/browse/HADOOP-2483
Project: Hadoop
Issue Type: Test
Components: test
Reporter: Arun C Murthy
The fact that we do not have any large-scale reliability tests bothers me. I'll
be first to admit that it isn't the easiest of tasks, but I'd like to start a
discussion around this... especially given that the code-base is growing to an
extent that interactions due to small changes are very hard to predict.
One of the simple scripts I run for every patch I work on does something very
simple: run sort500 (or greater), then it randomly picks n tasktrackers from
${HADOOP_CONF_DIR}/conf/slaves and then kills them, a similar script one kills
and restarts the tasktrackers.
This helps in checking a fair number of reliability stories: lost tasktrackers,
task-failures etc. Clearly this isn't good enough to cover everything, but a
start.
Lets discuss - What do we do for HDFS? We need more for Map-Reduce!
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.