The NFS seems to be having problem as NFS locking causes namenode hangup. Can't be there any other way, say if namenode starts writing synchronously to secondary namenode apart from local directories, then in case of namenode failover, we can start the primary namenode process on secondary namenode and the latest checkpointed fsimage is already there on secondary namenode.
This also raises a fundamental question, whether we can run secondary namenode process on the same node as primary namenode process without any out of memory / heap exceptions ? Also ideally what should be the memory size of primary namenode if alone and when with secondary namenode process ? Andrzej Bialecki wrote: > > Dhruba Borthakur wrote: >> A good way to implement failover is to make the Namenode log transactions >> to >> more than one directory, typically a local directory and a NFS mounted >> directory. The Namenode writes transactions to both directories >> synchronously. >> >> If the Namenode machine dies, copy the fsimage and fsiedits from the NFS >> server and you will have recovered *all* committed transactions. >> >> The SecondaryNamenode pulls the fsimage and fsedits once every configured >> period, typically ranging from a few minutes to an hour. If you use the >> image from the SecondaryNamenode, you might lose the last few minutes of >> transactions. > > That's a good idea. But then, what's the purpose of running a secondary > namenode, if it can't guarantee that the data loss is minimal ??? > Should't edits be written synchronously to a secondary namenode, and > fsimage updated synchronously whenever a primary namenode performs a > checkpoint? > > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > -- View this message in context: http://www.nabble.com/NameNode-failover-procedure-tp11711842p18740089.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.