[ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Shvachko updated HADOOP-2073: ---------------------------------------- Attachment: versionFileSize.patch Killing data-nodes immediately after they started turned out to be a good crash test. Thanks Michael. I am attaching a patch that changes file size after writing the data rather than before. That way VERSION never gets emptied. This will solve current Michael's problem. In general, I agree with Raghu we should check our code for inconsistencies the file system state can get into as a result of different crash scenarios. I think this patch should go into 0.15 > Datanode corruption if machine dies while writing VERSION file > -------------------------------------------------------------- > > Key: HADOOP-2073 > URL: https://issues.apache.org/jira/browse/HADOOP-2073 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.14.0 > Reporter: Michael Bieniosek > Assignee: Raghu Angadi > Attachments: versionFileSize.patch > > > Yesterday, due to a bad mapreduce job, some of my machines went on OOM > killing sprees and killed a bunch of datanodes, among other processes. Since > my monitoring software kept trying to bring up the datanodes, only to have > the kernel kill them off again, each machine's datanode was probably killed > many times. A large percentage of these datanodes will not come up now, and > write this message to the logs: > 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: > org.apache.hadoop.dfs.InconsistentFSStateException: Directory > /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid. > When I check, /hadoop/dfs/data/current/VERSION is an empty file. > Consequently, I have to delete all the blocks on the datanode and start over. > Since the OOM killing sprees happened simultaneously on several datanodes in > my DFS cluster, this could have crippled my dfs cluster. > I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this: > {{{ > /** > * Write version file. > * > * @throws IOException > */ > void write() throws IOException { > corruptPreUpgradeStorage(root); > write(getVersionFile()); > } > void write(File to) throws IOException { > Properties props = new Properties(); > setFields(props, this); > RandomAccessFile file = new RandomAccessFile(to, "rws"); > FileOutputStream out = null; > try { > file.setLength(0); > file.seek(0); > out = new FileOutputStream(file.getFD()); > props.store(out, null); > } finally { > if (out != null) { > out.close(); > } > file.close(); > } > } > }}} > So if the datanode dies after file.setLength(0), but before props.store(out, > null), the VERSION file will get trashed in the corrupted state I see. Maybe > it would be better if this method created a temporary file VERSION.tmp, and > then copied it to VERSION, then deleted VERSION.tmp? That way, if VERSION > was detected to be corrupt, the datanode could look at VERSION.tmp to recover > the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.