detecting stalled daemons?
Quick question for the hadoop / linux masters out there: I recently observed a stalled tasktracker daemon on our production cluster, and was wondering if there were common tests to detect failures so that administration tools (e.g. monit) can automatically restart the daemon. The particular observed symptoms were: - the node was dropped by the jobtracker - information in /proc listed the tasktracker process as sleeping, not zombie - the web interface (port 50060) was unresponsive, though telnet did connect - no error information in the hadoop logs -- they simply were no longer being updated I certainly cannot be the first person to encounter this - anyone have a neat and tidy solution they could share? (And yes, we will eventually we go down the nagios / ganglia / cloudera desktop path but we're waiting until we're running CDH2.) Many thanks, -James Warren
hadoop startup problem
hello everyone, i have a problem in hadoop startup ,every time i try to start hadoop name node doesnot start and when i tried to stop name node ,it gives an error :no name node to start. i tried to format the name node and it works well ,but now i have data in hadoop and formatting name node will erase all data. what can i do? thanks in advance, asmaa -- View this message in context: http://www.nabble.com/hadoop-startup-problem-tp25800609p25800609.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Error: INFO ipc.Client: Retrying connect to server: /192.168.100.11:8020. Already tried 0 time(s).
Hi, I am new to Hadoop. I just configured it based on the documentation. While I was running example program wordcount.java, I am getting errors. When I gave command $ /bin/hadoop dfs -mkdir santhosh , I am getting error as 09/10/08 13:30:12 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 0 time(s). 09/10/08 13:30:13 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 1 time(s). 09/10/08 13:30:14 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 2 time(s). 09/10/08 13:30:15 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 3 time(s). 09/10/08 13:30:16 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 4 time(s). 09/10/08 13:30:17 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 5 time(s). 09/10/08 13:30:18 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 6 time(s). 09/10/08 13:30:19 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 7 time(s). 09/10/08 13:30:20 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 8 time(s). 09/10/08 13:30:21 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 9 time(s). Bad connection to FS. command aborted. I can able to ssh to the server without any password. And I didnt get any error while I was formating HDFS using the command $ bin/hadoop namenode -format . Please help me . what should I do now?. Thank you. -- Gandham Santhosh
Recommended file-system for DataNode
Hi. I'm using the stock Ext3 as the most tested one, but I wonder, has someone ever tried, or even using there days in production another file system, like JFS, XFS or even maybe Ext4? I'm exploring way to boost the performance of DataNodes, and this seems as one of possible venues. Thanks for any info!
Re: hadoop startup problem
It sounds like the name node is crashing on startup. What kind of errors are there in the name node log? On Thu, Oct 8, 2009 at 4:01 AM, asmaa.atef sw_as...@hotmail.com wrote: hello everyone, i have a problem in hadoop startup ,every time i try to start hadoop name node doesnot start and when i tried to stop name node ,it gives an error :no name node to start. i tried to format the name node and it works well ,but now i have data in hadoop and formatting name node will erase all data. what can i do? thanks in advance, asmaa -- View this message in context: http://www.nabble.com/hadoop-startup-problem-tp25800609p25800609.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Recommended file-system for DataNode
I have used xfs pretty extensively, it seemed to be somewhat faster than ext3. The only trouble we had related to some machines running the PAE 32 bit kernels, where we the filesystems lockup. That is an obscure use case however. Running JBOD with your dfs.data.dir listing a directory on each device speeds things up, as does keeping other users off of the disks/machine. On Thu, Oct 8, 2009 at 4:37 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. I'm using the stock Ext3 as the most tested one, but I wonder, has someone ever tried, or even using there days in production another file system, like JFS, XFS or even maybe Ext4? I'm exploring way to boost the performance of DataNodes, and this seems as one of possible venues. Thanks for any info! -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Recommended file-system for DataNode
Hi. Thanks for the info, question is whether XFS performance justifies switching from the more common Ext3? JBOD is a great approach indeed. Regards. 2009/10/8 Jason Venner jason.had...@gmail.com I have used xfs pretty extensively, it seemed to be somewhat faster than ext3. The only trouble we had related to some machines running the PAE 32 bit kernels, where we the filesystems lockup. That is an obscure use case however. Running JBOD with your dfs.data.dir listing a directory on each device speeds things up, as does keeping other users off of the disks/machine. On Thu, Oct 8, 2009 at 4:37 AM, Stas Oskin stas.os...@gmail.com wrote: Hi. I'm using the stock Ext3 as the most tested one, but I wonder, has someone ever tried, or even using there days in production another file system, like JFS, XFS or even maybe Ext4? I'm exploring way to boost the performance of DataNodes, and this seems as one of possible venues. Thanks for any info! -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Recommended file-system for DataNode
As an aside, there's a short article comparing the two in the latest edition of Linux Journal. It was hardly scientific, but the main points were: - XFS is faster than ext3, especially for large files - XFS is currently unsupported on Red Hat Enterprise, but apparently will be soon. On Thu, Oct 8, 2009 at 12:12 PM, Stas Oskin stas.os...@gmail.com wrote: Thanks for the info, question is whether XFS performance justifies switching from the more common Ext3? -- Tom Wheeler http://www.tomwheeler.com/
Re: Recommended file-system for DataNode
Busy datanodes become bound by the metadata lookup times for the directory and inode entries required to open a block. Anything that optimizes that will help substantially. We are thinking of playing with brtfs, and using a small SSD for our file system metadata, and the spinning disks for the block storage. On Thu, Oct 8, 2009 at 10:26 AM, Tom Wheeler tomwh...@gmail.com wrote: As an aside, there's a short article comparing the two in the latest edition of Linux Journal. It was hardly scientific, but the main points were: - XFS is faster than ext3, especially for large files - XFS is currently unsupported on Red Hat Enterprise, but apparently will be soon. On Thu, Oct 8, 2009 at 12:12 PM, Stas Oskin stas.os...@gmail.com wrote: Thanks for the info, question is whether XFS performance justifies switching from the more common Ext3? -- Tom Wheeler http://www.tomwheeler.com/ -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Recommended file-system for DataNode
Check out the bottom of this page: http://wiki.apache.org/hadoop/DiskSetup noatime is all we've done in our environment. I haven't found it worth the time to optimize further since we're CPU bound in most of our jobs. -paul On Thu, Oct 8, 2009 at 3:26 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Thanks for the info. What about JFS, any idea how well it compares to XFS? From what I read, JFS is considered more stable then XFS, but less performing, so I wonder if this true. Also, Ext4 is around the corner and was recently accepted into kernel, so I wonder if anyone knows about this one. Regards. 2009/10/8 Tom Wheeler tomwh...@gmail.com As an aside, there's a short article comparing the two in the latest edition of Linux Journal. It was hardly scientific, but the main points were: - XFS is faster than ext3, especially for large files - XFS is currently unsupported on Red Hat Enterprise, but apparently will be soon. On Thu, Oct 8, 2009 at 12:12 PM, Stas Oskin stas.os...@gmail.com wrote: Thanks for the info, question is whether XFS performance justifies switching from the more common Ext3? -- Tom Wheeler http://www.tomwheeler.com/
Re: Recommended file-system for DataNode
I've used XFS on Silicon Graphics machines and JFS on AIX systems -- both were quite fast and extremely reliable, though this long predates my use of Hadoop. To your question, I recently came across a blog that compares performance of several Linux filesystems: http://log.amitshah.net/2009/04/re-comparing-file-systems.html I'd consider his results anecdotal unless the tests reflect the actual workload of a datanode, but since he's made the code available, you could probably adapt it yourself to get a better measure. On Thu, Oct 8, 2009 at 2:26 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Thanks for the info. What about JFS, any idea how well it compares to XFS? From what I read, JFS is considered more stable then XFS, but less performing, so I wonder if this true. Also, Ext4 is around the corner and was recently accepted into kernel, so I wonder if anyone knows about this one. -- Tom Wheeler http://www.tomwheeler.com/
Re: Recommended file-system for DataNode
On Thu, Oct 8, 2009 at 4:00 PM, Jason Venner jason.had...@gmail.com wrote: noatime is absolutely essential, I forget to mention it, because it is automatic now for me. I have a fun story about atime, I have some Solaris machines with ZFS file systems, and I was doing a find on a 6 level hashed directory tree with 25 leaf nodes. The find on a cold idle file system was running slowly, and the machine was writing at 5-10MB/sec, solaris lets you toggle atime at runtime, when I turned it off, the writes went to 0, and the find drastically speeded up. This is very representative of a datanode with many blocks. On Thu, Oct 8, 2009 at 12:43 PM, Tom Wheeler tomwh...@gmail.com wrote: I've used XFS on Silicon Graphics machines and JFS on AIX systems -- both were quite fast and extremely reliable, though this long predates my use of Hadoop. To your question, I recently came across a blog that compares performance of several Linux filesystems: http://log.amitshah.net/2009/04/re-comparing-file-systems.html I'd consider his results anecdotal unless the tests reflect the actual workload of a datanode, but since he's made the code available, you could probably adapt it yourself to get a better measure. On Thu, Oct 8, 2009 at 2:26 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Thanks for the info. What about JFS, any idea how well it compares to XFS? From what I read, JFS is considered more stable then XFS, but less performing, so I wonder if this true. Also, Ext4 is around the corner and was recently accepted into kernel, so I wonder if anyone knows about this one. -- Tom Wheeler http://www.tomwheeler.com/ -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals The good news is its not like you are stuck into the file system you pick. Assuming you use the normal replication level 3, you can pull out a datanode, format it's disk with any FS you want and then stick it back into the cluster. Hadoop should not care after all. Not suggesting this...but you could theoretically run each node with a different file system, look at the performance and say THIS is the one for me
University of Maryland: cloud computing assistant professor position
FYI---the University of Maryland is seeking an assistant professor in cloud computing. See job description below. = College of Information Studies, Maryland's iSchool University of Maryland, College Park Assistant Professor in Cloud Computing The recently-formed Cloud Computing Center in the College of Information Studies (Maryland's iSchool) at the University of Maryland, College Park invites highly qualified individuals to apply for a tenure-track faculty position in the area of cloud computing applications. Expertise is sought in information processing systems, architectures, and applications. Areas of interest include, but are not limited to, information retrieval, natural language processing, machine learning, databases, data mining, data analytics, network analysis, and multimedia processing. An emphasis on scalability and parallel/distributed approaches to processing is especially desirable, as well as expertise in working with Web and other large datasets. The Cloud Computing Center (http://ccc.umiacs.umd.edu/) explores the many aspects of cloud computing as it impacts technology, people, and society. It is an interdisciplinary research center in the iSchool that draws faculty from computer science, engineering, business, and linguistics. Qualifications. Ph.D. in an appropriate field; demonstrated record of research accomplishments or demonstrated potential for research excellence; demonstrated record of attracting and managing outside funding or demonstrated potential for attracting and managing outside funding; and evidence of effective and innovative teaching or demonstrated potential for excellence in teaching. Application Submission. Applicants will be reviewed on an ongoing basis. For best consideration, submit applications prior to November 15, 2009. Send application materials, including a CV, a research statement, a teaching statement, and a cover letter that clearly describes your primary area(s) of expertise and the specific contributions that you would make to the College and the Cloud Computing Center, by email to ischoolcloudsea...@umd.edu. All applications will be acknowledged electronically within 3 business days; however, it is the responsibility of the applicant to make sure that the materials were received. Inquiries can be directed to Jimmy Lin, search committee chair, at jimmy...@umd.edu. The University of Maryland is an equal opportunity, affirmative action employer. Minorities and women are encouraged to apply.
retrieving sequenceFile Postion of Key in mapper task
Hi, I need to get the position of the key being processed in a mapper task. My inputFile is a sequence file I tried the Context, but the best i could get was the inputsplit position and the file name My other option is to start recording the pos in the key value while generating the sequence file. But that would mean rewriting all the files i already have :( any thoughts? ishwar
Re: Recommended file-system for DataNode
Hi Jason. Brtfs is cool, I read that it has a 10% better performance then any other FS coming next to it. Can you post here the results of any your findings? Regards. 2009/10/8 Jason Venner jason.had...@gmail.com Busy datanodes become bound by the metadata lookup times for the directory and inode entries required to open a block. Anything that optimizes that will help substantially. We are thinking of playing with brtfs, and using a small SSD for our file system metadata, and the spinning disks for the block storage. On Thu, Oct 8, 2009 at 10:26 AM, Tom Wheeler tomwh...@gmail.com wrote: As an aside, there's a short article comparing the two in the latest edition of Linux Journal. It was hardly scientific, but the main points were: - XFS is faster than ext3, especially for large files - XFS is currently unsupported on Red Hat Enterprise, but apparently will be soon. On Thu, Oct 8, 2009 at 12:12 PM, Stas Oskin stas.os...@gmail.com wrote: Thanks for the info, question is whether XFS performance justifies switching from the more common Ext3? -- Tom Wheeler http://www.tomwheeler.com/ -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Recommended file-system for DataNode
On Thu, Oct 8, 2009 at 9:15 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. I head about this option before, but never actually tried it. There is also another option, called relatime, which described as being more compatible then noatime. Can anyone comment on this? Regards. 2009/10/8 Edward Capriolo edlinuxg...@gmail.com On Thu, Oct 8, 2009 at 4:00 PM, Jason Venner jason.had...@gmail.com wrote: noatime is absolutely essential, I forget to mention it, because it is automatic now for me. I have a fun story about atime, I have some Solaris machines with ZFS file systems, and I was doing a find on a 6 level hashed directory tree with 25 leaf nodes. The find on a cold idle file system was running slowly, and the machine was writing at 5-10MB/sec, solaris lets you toggle atime at runtime, when I turned it off, the writes went to 0, and the find drastically speeded up. This is very representative of a datanode with many blocks. On Thu, Oct 8, 2009 at 12:43 PM, Tom Wheeler tomwh...@gmail.com wrote: I've used XFS on Silicon Graphics machines and JFS on AIX systems -- both were quite fast and extremely reliable, though this long predates my use of Hadoop. To your question, I recently came across a blog that compares performance of several Linux filesystems: http://log.amitshah.net/2009/04/re-comparing-file-systems.html I'd consider his results anecdotal unless the tests reflect the actual workload of a datanode, but since he's made the code available, you could probably adapt it yourself to get a better measure. On Thu, Oct 8, 2009 at 2:26 PM, Stas Oskin stas.os...@gmail.com wrote: Hi. Thanks for the info. What about JFS, any idea how well it compares to XFS? From what I read, JFS is considered more stable then XFS, but less performing, so I wonder if this true. Also, Ext4 is around the corner and was recently accepted into kernel, so I wonder if anyone knows about this one. -- Tom Wheeler http://www.tomwheeler.com/ -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals The good news is its not like you are stuck into the file system you pick. Assuming you use the normal replication level 3, you can pull out a datanode, format it's disk with any FS you want and then stick it back into the cluster. Hadoop should not care after all. Not suggesting this...but you could theoretically run each node with a different file system, look at the performance and say THIS is the one for me Relatime is like a noatime that gets updated only periodically, not every read. Hadoop does not use atime so there is no benefit to relative. Go 'noatime nodiratime' although I think nodiratime is a subset of noatime. FYI almost nothing really uses noatime, I heard mutt does and older versions of vim might have. But I turned if and never had an issue.
Re: detecting stalled daemons?
Hi James, This doesn't quite answer your original question, but if you want to help track down these kinds of bugs, you should grab a stack trace next time this happens. You can do this either using jstack from the command line, by visiting /stacks on the HTTP interface, or by sending the process a SIGQUIT (kill -QUIT pid). If you go the SIGQUIT route, the stack dump will show up in that daemon's stdout log (logs/hadoop-out). Oftentimes the stack trace will be enough for the developers to track down a deadlock, or it may point to some sort of configuration issue on your machine. -Todd On Wed, Oct 7, 2009 at 11:19 PM, james warren ja...@rockyou.com wrote: Quick question for the hadoop / linux masters out there: I recently observed a stalled tasktracker daemon on our production cluster, and was wondering if there were common tests to detect failures so that administration tools (e.g. monit) can automatically restart the daemon. The particular observed symptoms were: - the node was dropped by the jobtracker - information in /proc listed the tasktracker process as sleeping, not zombie - the web interface (port 50060) was unresponsive, though telnet did connect - no error information in the hadoop logs -- they simply were no longer being updated I certainly cannot be the first person to encounter this - anyone have a neat and tidy solution they could share? (And yes, we will eventually we go down the nagios / ganglia / cloudera desktop path but we're waiting until we're running CDH2.) Many thanks, -James Warren
Re: detecting stalled daemons?
On Thu, Oct 8, 2009 at 9:20 PM, Todd Lipcon t...@cloudera.com wrote: Hi James, This doesn't quite answer your original question, but if you want to help track down these kinds of bugs, you should grab a stack trace next time this happens. You can do this either using jstack from the command line, by visiting /stacks on the HTTP interface, or by sending the process a SIGQUIT (kill -QUIT pid). If you go the SIGQUIT route, the stack dump will show up in that daemon's stdout log (logs/hadoop-out). Oftentimes the stack trace will be enough for the developers to track down a deadlock, or it may point to some sort of configuration issue on your machine. -Todd On Wed, Oct 7, 2009 at 11:19 PM, james warren ja...@rockyou.com wrote: Quick question for the hadoop / linux masters out there: I recently observed a stalled tasktracker daemon on our production cluster, and was wondering if there were common tests to detect failures so that administration tools (e.g. monit) can automatically restart the daemon. The particular observed symptoms were: - the node was dropped by the jobtracker - information in /proc listed the tasktracker process as sleeping, not zombie - the web interface (port 50060) was unresponsive, though telnet did connect - no error information in the hadoop logs -- they simply were no longer being updated I certainly cannot be the first person to encounter this - anyone have a neat and tidy solution they could share? (And yes, we will eventually we go down the nagios / ganglia / cloudera desktop path but we're waiting until we're running CDH2.) Many thanks, -James Warren James, I am using nagios to run a web_check on each of the components web interfaces. http://www.jointhegrid.com/svn/hadoop-cacti-jtg/trunk/check_scripts/0_19/ I know there is a Jira open to add life cycle methods to each hadoop component that can be polled for progress. I dont know the # off hand. Edward
Re: Error: INFO ipc.Client: Retrying connect to server: /192.168.100.11:8020. Already tried 0 time(s).
Hi Santosh, Check whether all the datanodes are up and running, using the command 'bin/hadoop dfsadmin -report'. On Thu, Oct 8, 2009 at 4:24 AM, santosh gandham santhosh...@gmail.comwrote: Hi, I am new to Hadoop. I just configured it based on the documentation. While I was running example program wordcount.java, I am getting errors. When I gave command $ /bin/hadoop dfs -mkdir santhosh , I am getting error as 09/10/08 13:30:12 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 0 time(s). 09/10/08 13:30:13 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 1 time(s). 09/10/08 13:30:14 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 2 time(s). 09/10/08 13:30:15 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 3 time(s). 09/10/08 13:30:16 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 4 time(s). 09/10/08 13:30:17 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 5 time(s). 09/10/08 13:30:18 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 6 time(s). 09/10/08 13:30:19 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 7 time(s). 09/10/08 13:30:20 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 8 time(s). 09/10/08 13:30:21 INFO ipc.Client: Retrying connect to server: / 192.168.100.11:8020. Already tried 9 time(s). Bad connection to FS. command aborted. I can able to ssh to the server without any password. And I didnt get any error while I was formating HDFS using the command $ bin/hadoop namenode -format . Please help me . what should I do now?. Thank you. -- Gandham Santhosh
Re: retrieving sequenceFile Postion of Key in mapper task
Hi Ishwar, You can implement a custom MapRunner and retrieve the position from the reader before calling your map function. Be aware though, that for block compressed files, the position returned represents block start position, not the individual record position. Ahad. On Thu, Oct 8, 2009 at 4:23 PM, ishwar ramani rvmish...@gmail.com wrote: Hi, I need to get the position of the key being processed in a mapper task. My inputFile is a sequence file I tried the Context, but the best i could get was the inputsplit position and the file name My other option is to start recording the pos in the key value while generating the sequence file. But that would mean rewriting all the files i already have :( any thoughts? ishwar
Re: retrieving sequenceFile Postion of Key in mapper task
Oops, memory fails me. To correct my previous statement, for block compressed files, getPosition reflects the position in the input stream of the NEXT compressed block of data, so you have to watch for the change in position after reading the key/value to capture a block transition. Ahad. On Thu, Oct 8, 2009 at 10:22 PM, Ahad Rana a...@commoncrawl.org wrote: Hi Ishwar, You can implement a custom MapRunner and retrieve the position from the reader before calling your map function. Be aware though, that for block compressed files, the position returned represents block start position, not the individual record position. Ahad. On Thu, Oct 8, 2009 at 4:23 PM, ishwar ramani rvmish...@gmail.com wrote: Hi, I need to get the position of the key being processed in a mapper task. My inputFile is a sequence file I tried the Context, but the best i could get was the inputsplit position and the file name My other option is to start recording the pos in the key value while generating the sequence file. But that would mean rewriting all the files i already have :( any thoughts? ishwar