[ https://issues.apache.org/jira/browse/HDFS-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972970#action_12972970 ]
M. C. Srivas commented on HDFS-1445: ------------------------------------ If no one really uses hardlinks, why don't you get rid of this altogether? > Batch the calls in DataStorage to FileUtil.createHardLink(), so we call it > once per directory instead of once per file > ---------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-1445 > URL: https://issues.apache.org/jira/browse/HDFS-1445 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: data-node > Affects Versions: 0.20.2 > Reporter: Matt Foley > Assignee: Matt Foley > Fix For: 0.22.0 > > > It was a bit of a puzzle why we can do a full scan of a disk in about 30 > seconds during FSDir() or getVolumeMap(), but the same disk took 11 minutes > to do Upgrade replication via hardlinks. It turns out that the > org.apache.hadoop.fs.FileUtil.createHardLink() method does an outcall to > Runtime.getRuntime().exec(), to utilize native filesystem hardlink > capability. So it is forking a full-weight external process, and we call it > on each individual file to be replicated. > As a simple check on the possible cost of this approach, I built a Perl test > script (under Linux on a production-class datanode). Perl also uses a > compiled and optimized p-code engine, and it has both native support for > hardlinks and the ability to do "exec". > - A simple script to create 256,000 files in a directory tree organized like > the Datanode, took 10 seconds to run. > - Replicating that directory tree using hardlinks, the same way as the > Datanode, took 12 seconds using native hardlink support. > - The same replication using outcalls to exec, one per file, took 256 > seconds! > - Batching the calls, and doing 'exec' once per directory instead of once > per file, took 16 seconds. > Obviously, your mileage will vary based on the number of blocks per volume. > A volume with less than about 4000 blocks will have only 65 directories. A > volume with more than 4K and less than about 250K blocks will have 4200 > directories (more or less). And there are two files per block (the data file > and the .meta file). So the average number of files per directory may vary > from 2:1 to 500:1. A node with 50K blocks and four volumes will have 25K > files per volume, or an average of about 6:1. So this change may be expected > to take it down from, say, 12 minutes per volume to 2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.