Hi, can you, please, explain the difference between fs.default.name and dfs.http.address (like how and when is SecondaryNameNode using fs.default.name and how/when dfs.http.address). I have set them both to same (namenode's) hostname:port. Is this correct (or dfs.http.address needs some other port)?
Thanks, Tomislav On Wed, 2008-10-29 at 16:10 -0700, Konstantin Shvachko wrote: > SecondaryNameNode uses http protocol to transfer the image and the edits > from the primary name-node and vise versa. > So the secondary does not access local files on the primary directly. > The primary NN should know the secondary's http address. > And the secondary NN need to know both fs.default.name and dfs.http.address > of the primary. > > In general we usually create one configuration file hadoop-site.xml > and copy it to all other machines. So you don't need to set up different > values for all servers. > > Regards, > --Konstantin > > Tomislav Poljak wrote: > > Hi, > > I'm not clear on how does SecondaryNameNode communicates with NameNode > > (if deployed on separate machine). Does SecondaryNameNode uses direct > > connection (over some port and protocol) or is it enough for > > SecondaryNameNode to have access to data which NameNode writes locally > > on disk? > > > > Tomislav > > > > On Wed, 2008-10-29 at 09:08 -0400, Jean-Daniel Cryans wrote: > >> I think a lot of the confusion comes from this thread : > >> http://www.nabble.com/NameNode-failover-procedure-td11711842.html > >> > >> Particularly because the wiki was updated with wrong information, not > >> maliciously I'm sure. This information is now gone for good. > >> > >> Otis, your solution is pretty much like the one given by Dhruba Borthakur > >> and augmented by Konstantin Shvachko later in the thread but I never did it > >> myself. > >> > >> One thing should be clear though, the NN is and will remain a SPOF (just > >> like HBase's Master) as long as a distributed manager service (like > >> Zookeeper) is not plugged into Hadoop to help with failover. > >> > >> J-D > >> > >> On Wed, Oct 29, 2008 at 2:12 AM, Otis Gospodnetic < > >> [EMAIL PROTECTED]> wrote: > >> > >>> Hi, > >>> So what is the "recipe" for avoiding NN SPOF using only what comes with > >>> Hadoop? > >>> > >>> From what I can tell, I think one has to do the following two things: > >>> > >>> 1) configure primary NN to save namespace and xa logs to multiple dirs, > >>> one > >>> of which is actually on a remotely mounted disk, so that the data actually > >>> lives on a separate disk on a separate box. This saves namespace and xa > >>> logs on multiple boxes in case of primary NN hardware failure. > >>> > >>> 2) configure secondary NN to periodically merge fsimage+edits and create > >>> the fsimage checkpoint. This really is a second NN process running on > >>> another box. It sounds like this secondary NN has to somehow have access > >>> to > >>> fsimage & edits files from the primary NN server. > >>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNodedoes > >>> not describe the best practise around that - the recommended way to > >>> give secondary NN access to primary NN's fsimage and edits files. Should > >>> one mount a disk from the primary NN box to the secondary NN box to get > >>> access to those files? Or is there a simpler way? > >>> In any case, this checkpoint is just a merge of fsimage+edits files and > >>> again is there in case the box with the primary NN dies. That's what's > >>> described on > >>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNodemore > >>> or less. > >>> > >>> Is this sufficient, or are there other things one has to do to eliminate > >>> NN > >>> SPOF? > >>> > >>> > >>> Thanks, > >>> Otis > >>> -- > >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >>> > >>> > >>> > >>> ----- Original Message ---- > >>>> From: Jean-Daniel Cryans <[EMAIL PROTECTED]> > >>>> To: core-user@hadoop.apache.org > >>>> Sent: Tuesday, October 28, 2008 8:14:44 PM > >>>> Subject: Re: SecondaryNameNode on separate machine > >>>> > >>>> Tomislav. > >>>> > >>>> Contrary to popular belief the secondary namenode does not provide > >>> failover, > >>>> it's only used to do what is described here : > >>>> > >>> http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNode > >>>> So the term "secondary" does not mean "a second one" but is more like "a > >>>> second part of". > >>>> > >>>> J-D > >>>> > >>>> On Tue, Oct 28, 2008 at 9:44 AM, Tomislav Poljak wrote: > >>>> > >>>>> Hi, > >>>>> I'm trying to implement NameNode failover (or at least NameNode local > >>>>> data backup), but it is hard since there is no official documentation. > >>>>> Pages on this subject are created, but still empty: > >>>>> > >>>>> http://wiki.apache.org/hadoop/NameNodeFailover > >>>>> http://wiki.apache.org/hadoop/SecondaryNameNode > >>>>> > >>>>> I have been browsing the web and hadoop mailing list to see how this > >>>>> should be implemented, but I got even more confused. People are asking > >>>>> do we even need SecondaryNameNode etc. (since NameNode can write local > >>>>> data to multiple locations, so one of those locations can be a mounted > >>>>> disk from other machine). I think I understand the motivation for > >>>>> SecondaryNameNode (to create a snapshoot of NameNode data every n > >>>>> seconds/hours), but setting (deploying and running) SecondaryNameNode > >>> on > >>>>> different machine than NameNode is not as trivial as I expected. First > >>> I > >>>>> found that if I need to run SecondaryNameNode on other machine than > >>>>> NameNode I should change masters file on NameNode (change localhost to > >>>>> SecondaryNameNode host) and set some properties in hadoop-site.xml on > >>>>> SecondaryNameNode (fs.default.name, fs.checkpoint.dir, > >>>>> fs.checkpoint.period etc.) > >>>>> > >>>>> This was enough to start SecondaryNameNode when starting NameNode with > >>>>> bin/start-dfs.sh , but it didn't create image on SecondaryNameNode. > >>> Then > >>>>> I found that I need to set dfs.http.address on NameNode address (so now > >>>>> I have NameNode address in both fs.default.name and dfs.http.address). > >>>>> > >>>>> Now I get following exception: > >>>>> > >>>>> 2008-10-28 09:18:00,098 ERROR NameNode.Secondary - Exception in > >>>>> doCheckpoint: > >>>>> 2008-10-28 09:18:00,098 ERROR NameNode.Secondary - > >>>>> java.net.SocketException: Unexpected end of file from server > >>>>> > >>>>> My questions are following: > >>>>> How to resolve this problem (this exception)? > >>>>> Do I need additional property in SecondaryNameNode's hadoop-site.xml or > >>>>> NameNode's hadoop-site.xml? > >>>>> > >>>>> How should NameNode failover work ideally? Is it like this: > >>>>> > >>>>> SecondaryNameNode runs on separate machine than NameNode and stores > >>>>> NameNode's data (fsimage and fsiedits) locally in fs.checkpoint.dir. > >>>>> When NameNode machine crashes, we start NameNode on machine where > >>>>> SecondaryNameNode was running and we set dfs.name.dir to > >>>>> fs.checkpoint.dir. Also we need to change how DNS resolves NameNode > >>>>> hostname (change from the primary to the secondary). > >>>>> > >>>>> Is this correct ? > >>>>> > >>>>> Tomislav > >>>>> > >>>>> > >>>>> > >>> > > > >