Upgrade from 0.19 to 0.20 issue
Hi I just upgraded from 0.19 to 0.20 everything seems fine however the web monitoring tool doesn't work any more: neither http://mydomain.com:50070/webapps/hdfs/dfshealth.jsp nore http//mydomain.com:50070/dfshealth.jsp Both give me a 404. Same stands for the job tracking tool Any idea where to look at ? -- -MilleBii-
Re: Upgrading namenode/secondary node hardware
I see it is not so obvious and potentially dangerous so I will be learning experimenting first. Thx for the tip. 2011/6/17 Steve Loughran ste...@apache.org On 16/06/11 14:19, MilleBii wrote: But if my Filesystem is up running fine... do I have to worry at all or will the copy (ftp transfer) of hdfs will be enough. I'm not going to make any predictions there as if/when things go wrong -you do need to shut down the FS before the move -you ought to get the edit logs replayed before the move -you may want to try experimenting with copying the namenode data and bringing up the namenode (without any datanodes connected to, so it comes up in safe mode), to make sure everything works. I'd also worry that if you aren't familiar with the edit log, you may need to spend some time learning the subtle details of namenode journalling, replaying, backup and restoration, and what the secondary namenode does. It's easy to bring up a cluster and get overconfident that it works, right up to the moment it stops working. Experiment with your cluster's and teams' failure handling before you really need it 2011/6/16 Steve Loughranste...@apache.org On 15/06/11 15:54, MilleBii wrote: Thx. #1 don't understand the edit logs remark. well, that's something you need to work on as its the key to keeping your cluster working. The edit log is the journal of changes made to a namenode, which gets streamed to HDD and your secondary Namenode. After a NN restart, it has to replay all changes since the last checkpoint to get its directory structure up to date. Lose the edit log and you may as well reformat the disks. -- -MilleBii-
Re: Upgrading namenode/secondary node hardware
But if my Filesystem is up running fine... do I have to worry at all or will the copy (ftp transfer) of hdfs will be enough. 2011/6/16 Steve Loughran ste...@apache.org On 15/06/11 15:54, MilleBii wrote: Thx. #1 don't understand the edit logs remark. well, that's something you need to work on as its the key to keeping your cluster working. The edit log is the journal of changes made to a namenode, which gets streamed to HDD and your secondary Namenode. After a NN restart, it has to replay all changes since the last checkpoint to get its directory structure up to date. Lose the edit log and you may as well reformat the disks. -- -MilleBii-
Re: Upgrading namenode/secondary node hardware
Thx. #1 don't understand the edit logs remark. #2 good nice #3 my provider will give me a server with a different IP, so I will have to change all /etc/hosts to point to the new master. But I don't need to change the master/slaves files indeed. 2011/6/15 Steve Loughran ste...@apache.org On 14/06/11 22:01, MilleBii wrote: I want/need to upgrade my namenode/secondary node hardware. Actually also acts as one of the datanodes. Could not find any how-to guides. So what is the process to switch from one hardware to the next. 1. For HDFS data : is it just a matter of copying all the hdfs data from old server to new server. yes, put it in the same place on your HA storage and you may not even need to reconfigure it. If you didn't shut down the filesystem cleanly, you'll need to replay the edit logs. 2. what about the decommissioning procedure of data node, is it necessary in that case ? You shouldn't need to. This is no different from handling failover of a namenode, which you ought to try from time to time anyway, with two common tactics -have ready-to-go replacement servers with the same hostname/IP and shared storage -have ready-to-go replacement servers with different hostnames, then with your cluster management tools bounce the workers into a new configuration. 3.For MapRed: need to change the master in cluster configuration files I'd give the new boxes the same hostnames and IPAddresses as before, and nothing else will notice. And I recommend having good cluster management tooling anyway, of course. -- -MilleBii-
Re: Upgrading namenode/secondary node hardware
Do you have a recommendation for a good cluster management tooling ? 2011/6/15 MilleBii mille...@gmail.com Thx. #1 don't understand the edit logs remark. #2 good nice #3 my provider will give me a server with a different IP, so I will have to change all /etc/hosts to point to the new master. But I don't need to change the master/slaves files indeed. 2011/6/15 Steve Loughran ste...@apache.org On 14/06/11 22:01, MilleBii wrote: I want/need to upgrade my namenode/secondary node hardware. Actually also acts as one of the datanodes. Could not find any how-to guides. So what is the process to switch from one hardware to the next. 1. For HDFS data : is it just a matter of copying all the hdfs data from old server to new server. yes, put it in the same place on your HA storage and you may not even need to reconfigure it. If you didn't shut down the filesystem cleanly, you'll need to replay the edit logs. 2. what about the decommissioning procedure of data node, is it necessary in that case ? You shouldn't need to. This is no different from handling failover of a namenode, which you ought to try from time to time anyway, with two common tactics -have ready-to-go replacement servers with the same hostname/IP and shared storage -have ready-to-go replacement servers with different hostnames, then with your cluster management tools bounce the workers into a new configuration. 3.For MapRed: need to change the master in cluster configuration files I'd give the new boxes the same hostnames and IPAddresses as before, and nothing else will notice. And I recommend having good cluster management tooling anyway, of course. -- -MilleBii- -- -MilleBii-
Upgrading namenode/secondary node hardware
I want/need to upgrade my namenode/secondary node hardware. Actually also acts as one of the datanodes. Could not find any how-to guides. So what is the process to switch from one hardware to the next. 1. For HDFS data : is it just a matter of copying all the hdfs data from old server to new server. 2. what about the decommissioning procedure of data node, is it necessary in that case ? 3.For MapRed: need to change the master in cluster configuration files Any help or pointer welcomed ! -- -MilleBii-
Re: Job failing on same map twice no logs
Fixed the slowness issue was in my nutch configuration which I had changed in the meantime. Any one can help where to look for potential issues ? Logs are desperately empty of errors ? 2011/6/3 MilleBii mille...@gmail.com I have just upgraded my single node conf with a new one. Seemed to woork fine. Did run a balance operation. First job failed on map63 after 4 attempts Second job failed on the same map63 Logs are empty, what is strange it is that jobs became slower. In both case map63 is executed by the master node which was working before. Suspecting memory leaks I stop the cluster and started again. Fine Run a hadoop fsck. fine 3rd time is in progress, but it looks like even slower now. Any suggestion what to do ? -- -MilleBii- -- -MilleBii-
Re: Job failing on same map twice no logs
This is the best I found. http://www.brics.dk/automaton/doc/index.html?dk/brics/automaton/RegExp.html 2011/6/4 MilleBii mille...@gmail.com Fixed the slowness issue was in my nutch configuration which I had changed in the meantime. Any one can help where to look for potential issues ? Logs are desperately empty of errors ? 2011/6/3 MilleBii mille...@gmail.com I have just upgraded my single node conf with a new one. Seemed to woork fine. Did run a balance operation. First job failed on map63 after 4 attempts Second job failed on the same map63 Logs are empty, what is strange it is that jobs became slower. In both case map63 is executed by the master node which was working before. Suspecting memory leaks I stop the cluster and started again. Fine Run a hadoop fsck. fine 3rd time is in progress, but it looks like even slower now. Any suggestion what to do ? -- -MilleBii- -- -MilleBii- -- -MilleBii-
Job failing on same map twice no logs
I have just upgraded my single node conf with a new one. Seemed to woork fine. Did run a balance operation. First job failed on map63 after 4 attempts Second job failed on the same map63 Logs are empty, what is strange it is that jobs became slower. In both case map63 is executed by the master node which was working before. Suspecting memory leaks I stop the cluster and started again. Fine Run a hadoop fsck. fine 3rd time is in progress, but it looks like even slower now. Any suggestion what to do ? -- -MilleBii-
Re: Adding first datanode isn't working
Firewall of ubuntu box. 2011/6/2 jagaran das jagaran_...@yahoo.co.in ufw From: MilleBii mille...@gmail.com To: common-user@hadoop.apache.org Sent: Wed, 1 June, 2011 3:37:23 PM Subject: Re: Adding first datanode isn't working OK found my issue. Turned off ufw and it sees the datanode. So I need to fix my ufw setup. 2011/6/1 MilleBii mille...@gmail.com Thx, already did that so I can ssh phraseless master to master and master to slave1. Same as before datanode tasktracker are starting up/shuting down well on slave1 2011/6/1 jagaran das jagaran_...@yahoo.co.in Check the password less SSH is working or not Regards, Jagaran From: MilleBii mille...@gmail.com To: common-user@hadoop.apache.org Sent: Wed, 1 June, 2011 12:28:54 PM Subject: Adding first datanode isn't working Newbie on hadoop clusters. I have setup my two nodes conf as described by M. G. Noll http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ / The data node has datanode tasktracker running (jps command shows them), which means start-dfs.sh and start-mapred.sh worked fine. I can also shut them down gracefully. However in the WEB UI I only see one node for the DFS Live Node : 1 Dead Node : 0 Same thing on the MapRed WEB interface. Datanode logs on slave are just empty. Did check the network settings both nodes have access to each other on relevant ports. Did make sure namespaceID are the same ( https://issues.apache.org/jira/browse/HDFS-107) I did try to put data in the DFS worked but no data seemed to arrive in the slave datanode. Also tried a small MapRed only master node has been actually working, but that could be because there is only data in the master. Right ? -- -MilleBii- -- -MilleBii- -- -MilleBii- -- -MilleBii-
Adding first datanode isn't working
Newbie on hadoop clusters. I have setup my two nodes conf as described by M. G. Noll http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ The data node has datanode tasktracker running (jps command shows them), which means start-dfs.sh and start-mapred.sh worked fine. I can also shut them down gracefully. However in the WEB UI I only see one node for the DFS Live Node : 1 Dead Node : 0 Same thing on the MapRed WEB interface. Datanode logs on slave are just empty. Did check the network settings both nodes have access to each other on relevant ports. Did make sure namespaceID are the same ( https://issues.apache.org/jira/browse/HDFS-107) I did try to put data in the DFS worked but no data seemed to arrive in the slave datanode. Also tried a small MapRed only master node has been actually working, but that could be because there is only data in the master. Right ? -- -MilleBii-
Re: Adding first datanode isn't working
Thx, already did that so I can ssh phraseless master to master and master to slave1. Same as before datanode tasktracker are starting up/shuting down well on slave1 2011/6/1 jagaran das jagaran_...@yahoo.co.in Check the password less SSH is working or not Regards, Jagaran From: MilleBii mille...@gmail.com To: common-user@hadoop.apache.org Sent: Wed, 1 June, 2011 12:28:54 PM Subject: Adding first datanode isn't working Newbie on hadoop clusters. I have setup my two nodes conf as described by M. G. Noll http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ The data node has datanode tasktracker running (jps command shows them), which means start-dfs.sh and start-mapred.sh worked fine. I can also shut them down gracefully. However in the WEB UI I only see one node for the DFS Live Node : 1 Dead Node : 0 Same thing on the MapRed WEB interface. Datanode logs on slave are just empty. Did check the network settings both nodes have access to each other on relevant ports. Did make sure namespaceID are the same ( https://issues.apache.org/jira/browse/HDFS-107) I did try to put data in the DFS worked but no data seemed to arrive in the slave datanode. Also tried a small MapRed only master node has been actually working, but that could be because there is only data in the master. Right ? -- -MilleBii- -- -MilleBii-
Re: Adding first datanode isn't working
OK found my issue. Turned off ufw and it sees the datanode. So I need to fix my ufw setup. 2011/6/1 MilleBii mille...@gmail.com Thx, already did that so I can ssh phraseless master to master and master to slave1. Same as before datanode tasktracker are starting up/shuting down well on slave1 2011/6/1 jagaran das jagaran_...@yahoo.co.in Check the password less SSH is working or not Regards, Jagaran From: MilleBii mille...@gmail.com To: common-user@hadoop.apache.org Sent: Wed, 1 June, 2011 12:28:54 PM Subject: Adding first datanode isn't working Newbie on hadoop clusters. I have setup my two nodes conf as described by M. G. Noll http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ The data node has datanode tasktracker running (jps command shows them), which means start-dfs.sh and start-mapred.sh worked fine. I can also shut them down gracefully. However in the WEB UI I only see one node for the DFS Live Node : 1 Dead Node : 0 Same thing on the MapRed WEB interface. Datanode logs on slave are just empty. Did check the network settings both nodes have access to each other on relevant ports. Did make sure namespaceID are the same ( https://issues.apache.org/jira/browse/HDFS-107) I did try to put data in the DFS worked but no data seemed to arrive in the slave datanode. Also tried a small MapRed only master node has been actually working, but that could be because there is only data in the master. Right ? -- -MilleBii- -- -MilleBii- -- -MilleBii-
Re: Could not obtain block
Increased the ulimit to 64000 ... same problem stop/start-all ... same problem but on a different block which of course present, so it looks like there is nothing wrong with actual data in the hdfs. I use the Nutch default hadoop 0.19.x anything related ? 2010/1/30 Ken Goodhope kengoodh...@gmail.com Could not obtain block errors are often caused by running out of available file handles. You can confirm this by going to the shell and entering ulimit -n. If it says 1024, the default, then you will want to increase it to about 64,000. On Fri, Jan 29, 2010 at 4:06 PM, MilleBii mille...@gmail.com wrote: X-POST with Nutch mailing list. HEEELP !!! Kind of get stuck on this one. I backed-up my hdfs data, reformated the hdfs, put data back, try to merge my segments together and it explodes again. Exception in thread Lucene Merge Thread #0 org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Could not obtain block: blk_4670839132945043210_1585 file=/user/nutch/crawl/indexed-segments/20100113003609/part-0/_ym.frq at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:309) If I go into the hfds/data directory I DO find the faulty block Could it be a synchro problem on the segment merger code ? 2010/1/29 MilleBii mille...@gmail.com I'm looking for some help. I'm Nutch user, everything was working fine, but now I get the following error when indexing. I have a single note pseudo distributed set up. Some people on the Nutch list indicated to me that I could full, so I remove many things and hdfs is far from full. This file directory was perfectly OK the day before. I did a hadoop fsck... report says healthy. What can I do ? Is is safe to do a Linux FSCK just in case ? Caused by: java.io.IOException: Could not obtain block: blk_8851198258748412820_9031 file=/user/nutch/crawl/indexed-segments/20100111233601/part-0/_103.frq -- -MilleBii- -- -MilleBii- -- Ken Goodhope Cell: 425-750-5616 362 Bellevue Way NE Apt N415 Bellevue WA, 98004 -- -MilleBii-
Re: Could not obtain block
Ken, FIXED !!! SO MUCH THANKS Command prompt ulimit wasn't enough, one needs to hard set it and reboot explained here http://posidev.com/blog/2009/06/04/set-ulimit-parameters-on-ubuntu/ 2010/1/30 MilleBii mille...@gmail.com Increased the ulimit to 64000 ... same problem stop/start-all ... same problem but on a different block which of course present, so it looks like there is nothing wrong with actual data in the hdfs. I use the Nutch default hadoop 0.19.x anything related ? 2010/1/30 Ken Goodhope kengoodh...@gmail.com Could not obtain block errors are often caused by running out of available file handles. You can confirm this by going to the shell and entering ulimit -n. If it says 1024, the default, then you will want to increase it to about 64,000. On Fri, Jan 29, 2010 at 4:06 PM, MilleBii mille...@gmail.com wrote: X-POST with Nutch mailing list. HEEELP !!! Kind of get stuck on this one. I backed-up my hdfs data, reformated the hdfs, put data back, try to merge my segments together and it explodes again. Exception in thread Lucene Merge Thread #0 org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Could not obtain block: blk_4670839132945043210_1585 file=/user/nutch/crawl/indexed-segments/20100113003609/part-0/_ym.frq at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:309) If I go into the hfds/data directory I DO find the faulty block Could it be a synchro problem on the segment merger code ? 2010/1/29 MilleBii mille...@gmail.com I'm looking for some help. I'm Nutch user, everything was working fine, but now I get the following error when indexing. I have a single note pseudo distributed set up. Some people on the Nutch list indicated to me that I could full, so I remove many things and hdfs is far from full. This file directory was perfectly OK the day before. I did a hadoop fsck... report says healthy. What can I do ? Is is safe to do a Linux FSCK just in case ? Caused by: java.io.IOException: Could not obtain block: blk_8851198258748412820_9031 file=/user/nutch/crawl/indexed-segments/20100111233601/part-0/_103.frq -- -MilleBii- -- -MilleBii- -- Ken Goodhope Cell: 425-750-5616 362 Bellevue Way NE Apt N415 Bellevue WA, 98004 -- -MilleBii- -- -MilleBii-
Re: Could not obtain block
X-POST with Nutch mailing list. HEEELP !!! Kind of get stuck on this one. I backed-up my hdfs data, reformated the hdfs, put data back, try to merge my segments together and it explodes again. Exception in thread Lucene Merge Thread #0 org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: Could not obtain block: blk_4670839132945043210_1585 file=/user/nutch/crawl/indexed-segments/20100113003609/part-0/_ym.frq at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:309) If I go into the hfds/data directory I DO find the faulty block Could it be a synchro problem on the segment merger code ? 2010/1/29 MilleBii mille...@gmail.com I'm looking for some help. I'm Nutch user, everything was working fine, but now I get the following error when indexing. I have a single note pseudo distributed set up. Some people on the Nutch list indicated to me that I could full, so I remove many things and hdfs is far from full. This file directory was perfectly OK the day before. I did a hadoop fsck... report says healthy. What can I do ? Is is safe to do a Linux FSCK just in case ? Caused by: java.io.IOException: Could not obtain block: blk_8851198258748412820_9031 file=/user/nutch/crawl/indexed-segments/20100111233601/part-0/_103.frq -- -MilleBii- -- -MilleBii-