Re: Performance / cluster scaling question
FYI, Just ran a 50 node cluster using one of the new kernels for Fedora with all nodes forced onto the same 'availability zone' and there were no timeouts or failed writes. On Mar 27, 2008, at 4:16 PM, Chris K Wensel wrote: If it's any consolation, I'm seeing similar behaviors on 0.16.0 when running on EC2 when I push the number of nodes in the cluster past 40. On Mar 24, 2008, at 6:31 AM, André Martin wrote: Thanks for the clarification, dhruba :-) Anyway, what can cause those other exceptions such as "Could not get block locations" and "DataXceiver: java.io.EOFException"? Can anyone give me a little more insight about those exceptions? And does anyone have a similar workload (frequent writes and deletion of small files), and what could cause the performance degradation (see first post)? I think HDFS should be able to handle two million and more files/blocks... Also, I observed that some of my datanodes do not "heartbeat" to the namenode for several seconds (up to 400 :-() from time to time - when I check those specific datanodes and do a "top", I see the "du" command running that seems to got stuck?!? Thanks and Happy Easter :-) Cu on the 'net, Bye - bye, < André èrbnA > dhruba Borthakur wrote: The namenode lazily instructs a Datanode to delete blocks. As a response to every heartbeat from a Datanode, the Namenode instructs it to delete a maximum on 100 blocks. Typically, the heartbeat periodicity is 3 seconds. The heartbeat thread in the Datanode deletes the block files synchronously before it can send the next heartbeat. That's the reason a small number (like 100) was chosen. If you have 8 datanodes, your system will probably delete about 800 blocks every 3 seconds. Thanks, dhruba -Original Message- From: André Martin [mailto:[EMAIL PROTECTED] Sent: Friday, March 21, 2008 3:06 PM To: core-user@hadoop.apache.org Subject: Re: Performance / cluster scaling question After waiting a few hours (without having any load), the block number and "DFS Used" space seems to go down... My question is: is the hardware simply too weak/slow to send the block deletion request to the datanodes in a timely manner, or do simply those "crappy" HDDs cause the delay, since I noticed that I can take up to 40 minutes when deleting ~400.000 files at once manually using "rm -r"... Actually - my main concern is why the performance à la the throughput goes down - any ideas? Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/
Re: Reduce Hangs
On Fri, Mar 28, 2008 at 12:31 AM, 朱盛凯 <[EMAIL PROTECTED]> wrote: > Hi, > > I met this problem in my cluster before, I think I can share with you some > of my experience. > But it may not work in you case. > > The job in my cluster always hung at 16% of reduce. It occured because the > reduce task could not fetch the > map output from other nodes. > > In my case, two factors may result in this faliure of communication > between > two task trackers. > > One is the firewall block the trackers from communications. I solved this > by > disabling the firewall. > The other factor is that trackers refer to other nodes by host name only, > but not ip address. I solved this by editing the file /etc/hosts > with mapping from hostname to ip address of all nodes in cluster. I meet this problem with the same reason too. Try to host names to all your /etc/hosts files . > > > I hope my experience will be helpful for you. > > On 3/27/08, Natarajan, Senthil <[EMAIL PROTECTED]> wrote: > > > > Hi, > > I have small Hadoop cluster, one master and three slaves. > > When I try the example wordcount on one of our log file (size ~350 MB) > > > > Map runs fine but reduce always hangs (sometime around 19%,60% ...) > after > > very long time it finishes. > > I am seeing this error > > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out > > In the log I am seeing this > > INFO org.apache.hadoop.mapred.TaskTracker: > > task_200803261535_0001_r_00_0 0.1834% reduce > copy (11 of 20 at > > 0.02 MB/s) > > > > > Do you know what might be the problem. > > Thanks, > > Senthil > > > > > -- [EMAIL PROTECTED] Institute of Computing Technology, Chinese Academy of Sciences, Beijing.
Re: Problems with 0.16.1
never mind. i found archive.apache.org. On Mar 27, 2008, at 6:00 PM, Chris K Wensel wrote: is this still true? if so, can we restore 0.16.0 at http://www.apache.org/dist/hadoop/core/ ? just realized it was missing as I am rebuilding my ec2 ami's. On Mar 17, 2008, at 8:43 PM, Owen O'Malley wrote: We believe that there has been a regression in release 0.16.1 with respect to the reliability of HDFS. In particular, tasks get stuck talking to the data nodes when they are under load. It seems to be HADOOP-3033. We are currently testing it further. In the mean time, I would suggest holding off 0.16.1. -- Owen Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/ Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/
Re: Problems with 0.16.1
is this still true? if so, can we restore 0.16.0 at http://www.apache.org/dist/hadoop/ core/ ? just realized it was missing as I am rebuilding my ec2 ami's. On Mar 17, 2008, at 8:43 PM, Owen O'Malley wrote: We believe that there has been a regression in release 0.16.1 with respect to the reliability of HDFS. In particular, tasks get stuck talking to the data nodes when they are under load. It seems to be HADOOP-3033. We are currently testing it further. In the mean time, I would suggest holding off 0.16.1. -- Owen Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/ http://www.cascading.org/
Re: Performance / cluster scaling question
If it's any consolation, I'm seeing similar behaviors on 0.16.0 when running on EC2 when I push the number of nodes in the cluster past 40. On Mar 24, 2008, at 6:31 AM, André Martin wrote: Thanks for the clarification, dhruba :-) Anyway, what can cause those other exceptions such as "Could not get block locations" and "DataXceiver: java.io.EOFException"? Can anyone give me a little more insight about those exceptions? And does anyone have a similar workload (frequent writes and deletion of small files), and what could cause the performance degradation (see first post)? I think HDFS should be able to handle two million and more files/blocks... Also, I observed that some of my datanodes do not "heartbeat" to the namenode for several seconds (up to 400 :-() from time to time - when I check those specific datanodes and do a "top", I see the "du" command running that seems to got stuck?!? Thanks and Happy Easter :-) Cu on the 'net, Bye - bye, < André èrbnA > dhruba Borthakur wrote: The namenode lazily instructs a Datanode to delete blocks. As a response to every heartbeat from a Datanode, the Namenode instructs it to delete a maximum on 100 blocks. Typically, the heartbeat periodicity is 3 seconds. The heartbeat thread in the Datanode deletes the block files synchronously before it can send the next heartbeat. That's the reason a small number (like 100) was chosen. If you have 8 datanodes, your system will probably delete about 800 blocks every 3 seconds. Thanks, dhruba -Original Message- From: André Martin [mailto:[EMAIL PROTECTED] Sent: Friday, March 21, 2008 3:06 PM To: core-user@hadoop.apache.org Subject: Re: Performance / cluster scaling question After waiting a few hours (without having any load), the block number and "DFS Used" space seems to go down... My question is: is the hardware simply too weak/slow to send the block deletion request to the datanodes in a timely manner, or do simply those "crappy" HDDs cause the delay, since I noticed that I can take up to 40 minutes when deleting ~400.000 files at once manually using "rm -r"... Actually - my main concern is why the performance à la the throughput goes down - any ideas? Chris K Wensel [EMAIL PROTECTED] http://chris.wensel.net/
RE: nfs mount hadoop-site?
Shouldn't be any issues with this. It's similar to the setup we have right now. -Xavier -Original Message- From: Colin Freas Sent: Thursday, March 27, 2008 12:05 PM To: Hadoop Subject: nfs mount hadoop-site? are there any issues with having the hadoop-site.xml in .../conf placed on an nfs mounted dir that all my nodes have access to? -colin
Re: Using HDFS as native storage
We looked seriously at HDFS and MogileFS and considered (and instantly rejected a *bunch* of others). HDFS was eliminated based on number of files, lack of HA and lack of reference implementations serving large scale web sites directly from it. Mogile had HA (using crude tools), reference implementations and could obviously be scaled to pretty large levels. It also had an implementation that was simple enough for our guys to fix the defects. At that point, the time window for further evaluation closed and we had to make a decision. On 3/27/08 11:30 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote: > > might be off-topic but how would you compare GlusterFS to HDFS and > MogileFS for such an application? Did you look at that at all and > decided against it? > > > Ted Dunning wrote: >> We evaluated several options for just this problem and eventually settled on >> MogileFS. That said, Mogile needed several weeks of work to get it ready >> for prime time. It will work pretty well for modest sized collections, but >> for our stuff (many hundreds of millions of files, approaching PB of >> storage), it just wasn't ready. The fixes had to do with sharding the name >> database across many mySQL instances and improving the handling of storage >> system up-state. >> >> >> On 3/27/08 2:13 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote: >> >>> Hi, >>> >>> we're looking for options for creating a scalable storage solution based >>> on commodity hardware for media files (spacewise dominated video files >>> of a few hundred MB but also to store up to a few million smaller files >>> such as thumbnails). The system will start with a few TB and should be >>> able to scale to about a PB. >>> >>> Is anyone using HDFS for native storage for critical files or is it just >>> common to use HDFS for large amounts of temporary more or less >>> non-critical data? What would be the trade-offs to decide whether to use >>> HDFS or something like GlusterFS? Note that we'r ecurrently not planning >>> on using MapReduce. >>> >>> Thanks in advance, >>> >>> Robert >>> >> >
[Map/Reduce][HDFS]
Hello, I'm working on large amount of logs, and I've noticed that the distribution of data on the network (./hadoop dfs -put input input) takes a lot of time. Let's says that my data is already distributed among the network, is there anyway to say to hadoop to use the already existing distribution ?. Thanks -- Jean-Pierre <[EMAIL PROTECTED]>
nfs mount hadoop-site?
are there any issues with having the hadoop-site.xml in .../conf placed on an nfs mounted dir that all my nodes have access to? -colin
Re: Do multiple small files share one block?
Robert Krüger wrote: this seems like an FAQ but I didn't explicitly see it in the docs: Is the minmium size a file occupies on HDFS controlled by the block size, i.e. would using the default block size of 64 B result in consumption of 64 MB if I stored a file of 1 byte? No. The last block in a file is only be as long as it needs to be. Doug
Do multiple small files share one block?
Hi, this seems like an FAQ but I didn't explicitly see it in the docs: Is the minmium size a file occupies on HDFS controlled by the block size, i.e. would using the default block size of 64 B result in consumption of 64 MB if I stored a file of 1 byte? I would assume yes based on the fact that HDFS was primarily developed for the distribution of large files and sentences like "Internally, a file is split into one or more blocks and these blocks are stored in a set of Datanodes" seem to imply it but I can't find a definite answer. If so and the default block size is 64MB it feels like abusing HDFS if we set the block size to 10k just because a few hundred thousand files of ours are only a few K in size but I should probably run some benchmarks. Thanks in advance, Robert
Re: Using HDFS as native storage
might be off-topic but how would you compare GlusterFS to HDFS and MogileFS for such an application? Did you look at that at all and decided against it? Ted Dunning wrote: We evaluated several options for just this problem and eventually settled on MogileFS. That said, Mogile needed several weeks of work to get it ready for prime time. It will work pretty well for modest sized collections, but for our stuff (many hundreds of millions of files, approaching PB of storage), it just wasn't ready. The fixes had to do with sharding the name database across many mySQL instances and improving the handling of storage system up-state. On 3/27/08 2:13 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote: Hi, we're looking for options for creating a scalable storage solution based on commodity hardware for media files (spacewise dominated video files of a few hundred MB but also to store up to a few million smaller files such as thumbnails). The system will start with a few TB and should be able to scale to about a PB. Is anyone using HDFS for native storage for critical files or is it just common to use HDFS for large amounts of temporary more or less non-critical data? What would be the trade-offs to decide whether to use HDFS or something like GlusterFS? Note that we'r ecurrently not planning on using MapReduce. Thanks in advance, Robert -- (-) Robert Krüger (-) SIGNAL 7 Gesellschaft für Informationstechnologie mbH (-) Landwehrstraße 4 - 64293 Darmstadt, (-) Tel: +49 (0) 6151 969 96 11, Fax: +49 (0) 6151 969 96 29 (-) [EMAIL PROTECTED], www.signal7.de (-) Amtsgericht Darmstadt, HRB 6833 (-) Geschäftsführer: Robert Krüger, Frank Peters, Jochen Strunk
Re: Using HDFS as native storage
We evaluated several options for just this problem and eventually settled on MogileFS. That said, Mogile needed several weeks of work to get it ready for prime time. It will work pretty well for modest sized collections, but for our stuff (many hundreds of millions of files, approaching PB of storage), it just wasn't ready. The fixes had to do with sharding the name database across many mySQL instances and improving the handling of storage system up-state. On 3/27/08 2:13 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote: > > Hi, > > we're looking for options for creating a scalable storage solution based > on commodity hardware for media files (spacewise dominated video files > of a few hundred MB but also to store up to a few million smaller files > such as thumbnails). The system will start with a few TB and should be > able to scale to about a PB. > > Is anyone using HDFS for native storage for critical files or is it just > common to use HDFS for large amounts of temporary more or less > non-critical data? What would be the trade-offs to decide whether to use > HDFS or something like GlusterFS? Note that we'r ecurrently not planning > on using MapReduce. > > Thanks in advance, > > Robert >
Re: Append data in hdfs_write
Yes. The present work-arounds for this are pretty complicated. option1) you can write small files relatively frequently and every time you write some number of them, you can concatenate them and delete them. These concatenations can receive the same treatment. If managed carefully in conjunction with a safe status update mechanism like zookeeper, you can have a pretty robust system that reflects new data with fairly low latency (on the order of seconds behind). option2) you can accumulate data in a non-HDFS location until it is big enough to push to HDFS. This can be done in conjunction with option1. The danger is that you run the risk of losing data if the accumulator fails before burping data to HDFS. This is very commonly used for log files that are consolidated at the hourly level and transferred to HDFS. On 3/27/08 12:02 AM, "Raghavendra K" <[EMAIL PROTECTED]> wrote: > Hi, > Thanks for the reply. > Does this mean that once I close a file, I can open it only for reading? > And if I reopen the same file to write any data then the old data will be > lost and again its as good as a new file being created with the same name? > > On Thu, Mar 27, 2008 at 12:23 PM, dhruba Borthakur <[EMAIL PROTECTED]> > wrote: > >> HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700 >> for more details. >> >> Thanks, >> dhruba >> >> -Original Message- >> From: Raghavendra K [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, March 26, 2008 11:29 PM >> To: core-user@hadoop.apache.org >> Subject: Append data in hdfs_write >> >> Hi, >> I am using >> hdfsWrite to write data onto a file. >> Whenever I close the file and re open it for writing it will start >> writing >> from the position 0 (rewriting the old data). >> Is there any way to append data onto a file using hdfsWrite. >> I cannot use hdfsTell because it works only when opened in RDONLY mode >> and >> also I dont know the number of bytes written onto the file previously. >> Please throw some light onto it. >> >> -- >> Regards, >> Raghavendra K >> > >
Re: Using HDFS as native storage
I can't offer any insights into other clustering FS solutions, but I think it's a very safe bet to say that Google relies entirely on GFS for their long-term storage. Granted, they almost certainly make offline backups of business-critical data, but I would assume that everything related to GMail, Google Code, Picasa, Google Docs, etc. is stored in, and only in, one or more massive GFS clusters. Take for instance their pride in the fact that they (claim) to have lost only one 64MB block in the history of their modern infrastructure (that is, since 2004). Look at it another way: how would you backup petabytes of data? When you've got multiple data centers consisting of thousands of nodes, and every data block is replicated on at least three machines, what's the point of backups? Again, I'm no expert, I'm just basing this on everything I've read and watched about Google. Hopefully others will have more enlightened comments. n
Re: Reduce Hangs
Hi, I met this problem in my cluster before, I think I can share with you some of my experience. But it may not work in you case. The job in my cluster always hung at 16% of reduce. It occured because the reduce task could not fetch the map output from other nodes. In my case, two factors may result in this faliure of communication between two task trackers. One is the firewall block the trackers from communications. I solved this by disabling the firewall. The other factor is that trackers refer to other nodes by host name only, but not ip address. I solved this by editing the file /etc/hosts with mapping from hostname to ip address of all nodes in cluster. I hope my experience will be helpful for you. On 3/27/08, Natarajan, Senthil <[EMAIL PROTECTED]> wrote: > > Hi, > I have small Hadoop cluster, one master and three slaves. > When I try the example wordcount on one of our log file (size ~350 MB) > > Map runs fine but reduce always hangs (sometime around 19%,60% ...) after > very long time it finishes. > I am seeing this error > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out > In the log I am seeing this > INFO org.apache.hadoop.mapred.TaskTracker: > task_200803261535_0001_r_00_0 0.1834% reduce > copy (11 of 20 at > 0.02 MB/s) > > > Do you know what might be the problem. > Thanks, > Senthil > >
Re: Reduce Hangs
On Thu, 27 Mar 2008, Natarajan, Senthil wrote: > Hi, > I have small Hadoop cluster, one master and three slaves. > When I try the example wordcount on one of our log file (size ~350 MB) > > Map runs fine but reduce always hangs (sometime around 19%,60% ...) after > very long time it finishes. > I am seeing this error > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out This error occurs when the reducer fails to fetch map-task-output from 5 unique map tasks. Before considering an attempt as failed the reducer tries to fetch the map output for 7 times in 5 mins (default config). In case of the job failure check the following 1. Is this problem common in all the reducers? 2. Are the map tasks same across all the reducers for which the failure is reported? 3. Is there atleast one map task whose output is successfully fetched? If the job becomes successful then there might be some problem with the reducer. Amar > In the log I am seeing this > INFO org.apache.hadoop.mapred.TaskTracker: task_200803261535_0001_r_00_0 > 0.1834% reduce > copy (11 of 20 at 0.02 MB/s) > > > Do you know what might be the problem. > Thanks, > Senthil > >
Reduce Hangs
Hi, I have small Hadoop cluster, one master and three slaves. When I try the example wordcount on one of our log file (size ~350 MB) Map runs fine but reduce always hangs (sometime around 19%,60% ...) after very long time it finishes. I am seeing this error Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out In the log I am seeing this INFO org.apache.hadoop.mapred.TaskTracker: task_200803261535_0001_r_00_0 0.1834% reduce > copy (11 of 20 at 0.02 MB/s) > Do you know what might be the problem. Thanks, Senthil
Re: complete documentation of hadoop-site.xml
It is a good place to start http://wiki.apache.org/hadoop/GettingStartedWithHadoop Check the articles which descrive how to set up a sigle node and a two cluster node. http://wiki.apache.org/hadoop/HadoopArticles I followed them and it work! ;) The two cluster node installation also scales to more nodes, just adding the hostnames to the "slaves" file. But first you should make it run for a Single host. (ssh related) A full description of the xml file is provided with the Hadoop installation. $HADOOP_HOME/conf/hadoop-default.xml But I shouldn't have a look at this fill until you make it work with 1 and 2 nodes. On 27/03/2008, John Menzer <[EMAIL PROTECTED]> wrote: > > Hello, > > where can I find a complete list of all possible configuration properties > (e.g. in file hadoop-site.xml)? > I am experiencing lots of bind errors in my log-files when trying to > start-dfs.sh, start-mapred.sh, start-all.sh! > > That's why I think, I have to change some port settings. But I don't know > which bind error refers to which port setting and I even don't know which > settings are relevant at all. > > Why can't I find a complete documentation or at least a list of these > properties??? > > Thank you for any help! > > John > > > -- > View this message in context: > http://www.nabble.com/complete-documentation-of-hadoop-site.xml-tp16323391p16323391.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > >
How to handle bind errors?
I am experiencing bind errors in my log files when trying to start up the hadoop cluster. However, the log files do not really show me which adress/port pair is responsible for the error. I tried some property settings in the hadoop-site.xml but actually I don't know which setting is responsible for what deamon. I am using hadoop 0.16.1 Is there any systematic way of eliminating port errors? -- View this message in context: http://www.nabble.com/How-to-handle-bind-errors--tp16323398p16323398.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
complete documentation of hadoop-site.xml
Hello, where can I find a complete list of all possible configuration properties (e.g. in file hadoop-site.xml)? I am experiencing lots of bind errors in my log-files when trying to start-dfs.sh, start-mapred.sh, start-all.sh! That's why I think, I have to change some port settings. But I don't know which bind error refers to which port setting and I even don't know which settings are relevant at all. Why can't I find a complete documentation or at least a list of these properties??? Thank you for any help! John -- View this message in context: http://www.nabble.com/complete-documentation-of-hadoop-site.xml-tp16323391p16323391.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Using HDFS as native storage
Hi, we're looking for options for creating a scalable storage solution based on commodity hardware for media files (spacewise dominated video files of a few hundred MB but also to store up to a few million smaller files such as thumbnails). The system will start with a few TB and should be able to scale to about a PB. Is anyone using HDFS for native storage for critical files or is it just common to use HDFS for large amounts of temporary more or less non-critical data? What would be the trade-offs to decide whether to use HDFS or something like GlusterFS? Note that we'r ecurrently not planning on using MapReduce. Thanks in advance, Robert
Re: Append data in hdfs_write
Hi, Thanks for the reply. Does this mean that once I close a file, I can open it only for reading? And if I reopen the same file to write any data then the old data will be lost and again its as good as a new file being created with the same name? On Thu, Mar 27, 2008 at 12:23 PM, dhruba Borthakur <[EMAIL PROTECTED]> wrote: > HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700 > for more details. > > Thanks, > dhruba > > -Original Message- > From: Raghavendra K [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 26, 2008 11:29 PM > To: core-user@hadoop.apache.org > Subject: Append data in hdfs_write > > Hi, > I am using > hdfsWrite to write data onto a file. > Whenever I close the file and re open it for writing it will start > writing > from the position 0 (rewriting the old data). > Is there any way to append data onto a file using hdfsWrite. > I cannot use hdfsTell because it works only when opened in RDONLY mode > and > also I dont know the number of bytes written onto the file previously. > Please throw some light onto it. > > -- > Regards, > Raghavendra K > -- Regards, Raghavendra K