[ https://issues.apache.org/jira/browse/HADOOP-8564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Suresh Srinivas reopened HADOOP-8564: ------------------------------------- Reopened because I had reverted the patch earlier due to build issue. > Port and extend Hadoop native libraries for Windows to address datanode > concurrent reading and writing issue > ------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-8564 > URL: https://issues.apache.org/jira/browse/HADOOP-8564 > Project: Hadoop Common > Issue Type: Bug > Components: io > Affects Versions: 1-win > Reporter: Chuan Liu > Assignee: Chuan Liu > Fix For: 1-win > > Attachments: HADOOP-8564-branch-1-win.patch, > HADOOP-8564-branch-1-win.patch > > > HDFS files are made up of blocks. First, let’s look at writing. When the data > is written to datanode, an active or temporary file is created to receive > packets. After the last packet for the block is received, we will finalize > the block. One step during finalization is to rename the block file to a new > directory. The relevant code can be found via the call sequence: > FSDataSet.finalizeBlockInternal -> FSDir.addBlock. > {code} > if ( ! metaData.renameTo( newmeta ) || > ! src.renameTo( dest ) ) { > throw new IOException( "could not move files for " + b + > " from tmp to " + > dest.getAbsolutePath() ); > } > {code} > Let’s then switch to reading. On HDFS, it is expected the client can also > read these unfinished blocks. So when the read calls from client reach > datanode, the datanode will open an input stream on the unfinished block file. > The problem comes in when the file is opened for reading while the datanode > receives last packet from client and try to rename the finished block file. > This operation will succeed on Linux, but not on Windows . The behavior can > be modified on Windows to open the file with FILE_SHARE_DELETE flag on, i.e. > sharing the delete (including renaming) permission with other processes while > opening the file. There is also a Java bug ([id > 6357433|http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6357433]) reported > a while back on this. However, since this behavior exists for Java on Windows > since JDK 1.0, the Java developers do not want to break the backward > compatibility on this behavior. Instead, a new file system API is proposed in > JDK 7. > As outlined in the [Java forum|http://www.java.net/node/645421] by the Java > developer (kbr), there are three ways to fix the problem: > # Use different mechanism in the application in dealing with files. > # Create a new implementation of InputStream abstract class using Windows > native code. > # Patch JDK with a private patch that alters FileInputStream behavior. > For the third option, it cannot fix the problem for users using Oracle JDK. > We discussed some options for the first approach. For example one option is > to use two phase renaming, i.e. first hardlink; then remove the old hardlink > when read is finished. This option was thought to be rather pervasive. > Another option discussed is to change the HDFS behavior on Windows by not > allowing client reading unfinished blocks. However this behavior change is > thought to be problematic and may affect other application build on top of > HDFS. > For all the reasons discussed above, we will use the second approach to > address the problem. > If there are better options to fix the problem, we would also like to hear > about them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira