[ https://issues.apache.org/jira/browse/HDFS-4936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harsh J resolved HDFS-4936. --------------------------- Resolution: Not A Problem > Handle overflow condition for txid going over Long.MAX_VALUE > ------------------------------------------------------------ > > Key: HDFS-4936 > URL: https://issues.apache.org/jira/browse/HDFS-4936 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 2.0.0-alpha > Reporter: Harsh J > Priority: Minor > > Hat tip to [~fengdon...@gmail.com] for the question that lead to this (on > mailing lists). > I hacked up my local NN's txids manually to go very large (close to max) and > decided to try out if this causes any harm. I basically bumped up the freshly > formatted files' starting txid to 9223372036854775805 (and ensured image > references the same by hex-editing it): > {code} > ➜ current ls > VERSION > fsimage_9223372036854775805.md5 > fsimage_9223372036854775805 > seen_txid > ➜ current cat seen_txid > 9223372036854775805 > {code} > NameNode started up as expected. > {code} > 13/06/25 18:30:08 INFO namenode.FSImage: Image file of size 129 loaded in 0 > seconds. > 13/06/25 18:30:08 INFO namenode.FSImage: Loaded image for txid > 9223372036854775805 from > /temp-space/tmp-default/dfs-cdh4/name/current/fsimage_9223372036854775805 > 13/06/25 18:30:08 INFO namenode.FSEditLog: Starting log segment at > 9223372036854775806 > {code} > I could create a bunch of files and do regular ops (counting to much after > the long max increments). I created over 10 files, just to make it go well > over the Long.MAX_VALUE. > Quitting NameNode and restarting fails though, with the following error: > {code} > 13/06/25 18:31:08 INFO namenode.FileJournalManager: Recovering unfinalized > segments in > /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current > 13/06/25 18:31:08 INFO namenode.FileJournalManager: Finalizing edits file > /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_inprogress_9223372036854775806 > -> > /Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/name/current/edits_9223372036854775806-9223372036854775807 > 13/06/25 18:31:08 FATAL namenode.NameNode: Exception in namenode join > java.io.IOException: Gap in transactions. Expected to be able to read up > until at least txid 9223372036854775806 but unable to find any edit logs > containing txid -9223372036854775808 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1194) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1152) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:616) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:592) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:435) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:397) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:399) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:433) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:609) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:590) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1141) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1205) > {code} > Looks like we also lose some edits when we restart, as noted by the finalized > edits filename: > {code} > VERSION > edits_9223372036854775806-9223372036854775807 > fsimage_9223372036854775805 > fsimage_9223372036854775805.md5 > seen_txid > {code} > It seems like we won't be able to handle the case where txid overflows. Its a > very very large number so that's not an immediate concern but seemed worthy > of a report. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira