[ 
https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858038#action_12858038
 ] 

Konstantin Shvachko commented on HDFS-708:
------------------------------------------

Joshua asked what random file generation mean, as per this sentence from the 
design doc:
2. Randomly chooses a file name. File names are enumerated, so choosing a file 
means choosing its sequence number, which defines the entire file path.

I mean by this that we have a static enumeration of files. We choose a random 
number, and then calculate a full path for the corresponding file using that 
number.
The static enumeration is like a heap structure. We have an array f0, f1, f2, 
... There is a root r. The root's children are files f0 and f1. And two 
directories d0 and d1. The children of d0 are the files f2, f3 (and the 
directories d2, d3). The children of d1 are the files f4, f5 as well as the 
directories d4, d5. And so on. This provides 2 files per directory. 
We can generalize it to p files per directory for a fixed p. Here the root's 
children will be p files f0,...,f(p-1) and p directories d0,...,d(p-1). And so 
on. Importantly if you have a file fz, then it's parent is always the directory 
dz', where z' = z/p - 1.
I don't want to use long numbers for file names. So within a directory its 
child files are named {{file_i}} and sub-directories are named {{dir_i}} for i 
= 0,...p-1.
Then given a number z the path of file fz is calculateed recursively. File name 
of fz is {{file_(z%p)}}. Its parent is the directory dz', where z' = z/p - 1, 
and the name of dz' is {{dir_(z'%p)}}. Going further up the tree while the the 
indexes are positive.

In the test we choose a random z and build a path out of it. If the operation 
is create we create a file with this path. In HDFS all missing directories 
along the path will be created automatically. If fz already exists the create 
fails. 
For read we do the same, but the operation fails if the file does not exist.

Similar approach is used in class {{FileNameGenerator}}. 

> A stress-test tool for HDFS.
> ----------------------------
>
>                 Key: HDFS-708
>                 URL: https://issues.apache.org/jira/browse/HDFS-708
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: test, tools
>    Affects Versions: 0.22.0
>            Reporter: Konstantin Shvachko
>             Fix For: 0.22.0
>
>         Attachments: SLiveTest.pdf
>
>
> It would be good to have a tool for automatic stress testing HDFS, which 
> would provide IO-intensive load on HDFS cluster.
> The idea is to start the tool, let it run overnight, and then be able to 
> analyze possible failures.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to