[ https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852501#action_12852501 ]
Joshua Harlow commented on HDFS-708: ------------------------------------ Looks good to me as well. Just a couple thoughts/questions. 1. Would it be correct to have a "create" set of jobs job that would ensure before reads/deletes/writes.. that the files exist (instead of generating in a previous job)? That way the data is created on demand, instead of needing to have a separate job that runs beforehand that just does data population (this stage would not affect the overall timing allotted and could be done at the start of the testing)? 2. It would probably be useful to add in a seed number so that the tests can be "mostly" repeated (ie write and deletes can't really be truly repeated since they modify underlying storage)? 3. Might it be useful to add in the future the ability to specify your own distribution "objects" that "generate" operation objects so that the current set of operations can be expanded without core changes, ie a plugin like framework for generating the distribution and for generating the actual set of operations that will occur (allowing for something like a AppendReadDelete operation or similar which will be created distributed according to a square wave as an example)? > A stress-test tool for HDFS. > ---------------------------- > > Key: HDFS-708 > URL: https://issues.apache.org/jira/browse/HDFS-708 > Project: Hadoop HDFS > Issue Type: New Feature > Components: test, tools > Affects Versions: 0.22.0 > Reporter: Konstantin Shvachko > Fix For: 0.22.0 > > Attachments: SLiveTest.pdf > > > It would be good to have a tool for automatic stress testing HDFS, which > would provide IO-intensive load on HDFS cluster. > The idea is to start the tool, let it run overnight, and then be able to > analyze possible failures. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.