Re: java.io.IOException: Could not get block locations. Aborting...
We got the same problem as you when using MultipleOutputFormat both on hadoop 0.18 and 0.19. On hadoop 0.18, increasing the xceivers count does not fix the problem. But we found many error message complaining that xceiverCount exceeded the limit of concurrent xcievers in datanode (running on hadoop 0.19) log. After we increased the xceivers count, the problem was gone. I guess you are using hadoop 0.18. Please try 0.19. Good luck. Scott Whitecross wrote: I tried modifying the settings, and I'm still running into the same issue. I increased the xceivers count (fs.datanode.max.xcievers) in the hadoop-site.xml file. I also checked to make sure the file handles were increased, but they were fairly high to begin with. I don't think I'm dealing with anything out of the ordinary either. I'm process three large 'log' files, totaling around 5 GB, and producing around 8000 output files after some data processing, probably totals 6 or 7 gig. In the past, I've produced a lot fewer files, and that has been fine. When I change the process to output to just a few files, no problem again. Anything else beyond the limits? Is HDFS creating a substantial amount of temp files as well? On Feb 9, 2009, at 8:11 PM, Bryan Duxbury wrote: Correct. +1 to Jason's more unix file handles suggestion. That's a must-have. -Bryan On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote: This would be an addition to the hadoop-site.xml file, to up dfs.datanode.max.xcievers? Thanks. On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote: Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: java.io.IOException: Could not get block locations. Aborting...
You will have to increase the per user file descriptor limit. For most linux machines the file /etc/security/limits.conf controls this on a per user basis. You will need to log in a fresh shell session after making the changes, to see them. Any login shells started before the change and process started by those shell will have the old limits. If you are opening vast numbers of files you may need to increase the per system limits, via the /etc/sysctl.conf file and the fs.file-max parameter. This page seems to be a decent reference: http://bloggerdigest.blogspot.com/2006/10/file-descriptors-vs-linux-performance.html On Mon, Feb 9, 2009 at 1:01 PM, Scott Whitecross sc...@dataxu.com wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: java.io.IOException: Could not get block locations. Aborting...
Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 (DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run (DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: java.io.IOException: Could not get block locations. Aborting...
This would be an addition to the hadoop-site.xml file, to up dfs.datanode.max.xcievers? Thanks. On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote: Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: java.io.IOException: Could not get block locations. Aborting...
Correct. +1 to Jason's more unix file handles suggestion. That's a must-have. -Bryan On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote: This would be an addition to the hadoop-site.xml file, to up dfs.datanode.max.xcievers? Thanks. On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote: Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1400 (DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: java.io.IOException: Could not get block locations. Aborting...
On Feb 9, 2009, at 7:50 PM, jason hadoop wrote: The other issue you may run into, with many files in your HDFS is that you may end up with more than a few 100k worth of blocks on each of your datanodes. At present this can lead to instability due to the way the periodic block reports to the namenode are handled. The more blocks per datanode, the larger the risk of congestion collapse in your hdfs. Of course, if you stay below, say, 500k, you don't have much of a risk of congestion. In our experience, 500k blocks or less is going to be fine with decent hardware. Between 500k and 750k, you will hit a wall somewhere depending on your hardware. Good luck getting anything above 750k. The recommendation is that you keep this number as low as possible -- and explore the limits of your system and hardware in testing before you discover them in production :) Brian On Mon, Feb 9, 2009 at 5:11 PM, Bryan Duxbury br...@rapleaf.com wrote: Correct. +1 to Jason's more unix file handles suggestion. That's a must-have. -Bryan On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote: This would be an addition to the hadoop-site.xml file, to up dfs.datanode.max.xcievers? Thanks. On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote: Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: java.io.IOException: Could not get block locations. Aborting...
I tried modifying the settings, and I'm still running into the same issue. I increased the xceivers count (fs.datanode.max.xcievers) in the hadoop-site.xml file. I also checked to make sure the file handles were increased, but they were fairly high to begin with. I don't think I'm dealing with anything out of the ordinary either. I'm process three large 'log' files, totaling around 5 GB, and producing around 8000 output files after some data processing, probably totals 6 or 7 gig. In the past, I've produced a lot fewer files, and that has been fine. When I change the process to output to just a few files, no problem again. Anything else beyond the limits? Is HDFS creating a substantial amount of temp files as well? On Feb 9, 2009, at 8:11 PM, Bryan Duxbury wrote: Correct. +1 to Jason's more unix file handles suggestion. That's a must-have. -Bryan On Feb 9, 2009, at 3:09 PM, Scott Whitecross wrote: This would be an addition to the hadoop-site.xml file, to up dfs.datanode.max.xcievers? Thanks. On Feb 9, 2009, at 5:54 PM, Bryan Duxbury wrote: Small files are bad for hadoop. You should avoid keeping a lot of small files if possible. That said, that error is something I've seen a lot. It usually happens when the number of xcievers hasn't been adjusted upwards from the default of 256. We run with 8000 xcievers, and that seems to solve our problems. I think that if you have a lot of open files, this problem happens a lot faster. -Bryan On Feb 9, 2009, at 1:01 PM, Scott Whitecross wrote: Hi all - I've been running into this error the past few days: java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient $DFSOutputStream.processDatanodeError(DFSClient.java:2143) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access $1400(DFSClient.java:1735) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream $DataStreamer.run(DFSClient.java:1889) It seems to be related to trying to write to many files to HDFS. I have a class extending org.apache.hadoop.mapred.lib.MultipleOutputFormat and if I output to a few file names, everything works. However, if I output to thousands of small files, the above error occurs. I'm having trouble isolating the problem, as the problem doesn't occur in the debugger unfortunately. Is this a memory issue, or is there an upper limit to the number of files HDFS can hold? Any settings to adjust? Thanks.
Re: java.io.IOException: Could not get block locations. Aborting...
The Could not get block locations exception was gone after a Hadoop restart, but further down the road our job failed again. I checked the logs for discarding calls and found a bunch of them, plus the namenode appeared to have a load spike at that time, so it seems it is getting overloaded. Do you know how can we prevent this? Currently the namenode machine is not running anything but the namenode and the secondary namenode, and the cluster only has 16 machines. Typically secondary namenode should be running on a different machine. It requires the same amount of resources as a namenode. So, if you have say 8G ram node and your namenode is taking like 2-3G of space, your secondary namenode would also take up so much of space. To cross check try to see if the load spike on namenode was during the time secondary namenode was checkpointing (by looking at secondary namenode logs). Thanks, Lohit - Original Message From: Piotr Kozikowski [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Monday, August 11, 2008 12:20:05 PM Subject: Re: java.io.IOException: Could not get block locations. Aborting... Hi again, The Could not get block locations exception was gone after a Hadoop restart, but further down the road our job failed again. I checked the logs for discarding calls and found a bunch of them, plus the namenode appeared to have a load spike at that time, so it seems it is getting overloaded. Do you know how can we prevent this? Currently the namenode machine is not running anything but the namenode and the secondary namenode, and the cluster only has 16 machines. Thank you Piotr On Fri, 2008-08-08 at 17:31 -0700, Dhruba Borthakur wrote: It is possible that your namenode is overloaded and is not able to respond to RPC requests from clients. Please check the namenode logs to see if you see lines of the form discarding calls dhrua On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov [EMAIL PROTECTED] wrote: I come across the same issue and also with hadoop 0.17.1 would be interesting if someone say the cause of the issue. Alex 2008/8/8 Steve Loughran [EMAIL PROTECTED] Piotr Kozikowski wrote: Hi there: We would like to know what are the most likely causes of this sort of error: Exception closing file /data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_22_0/part-00022 java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818) Our map-reduce job does not fail completely but over 50% of the map tasks fail with this same error. We recently migrated our cluster from 0.16.4 to 0.17.1, previously we didn't have this problem using the same input data in a similar map-reduce job Thank you, Piotr When I see this, its because the filesystem isnt completely up: there are no locations for a specific file, meaning the client isn't getting back the names of any datanodes holding the data from the name nodes. I've got a patch in JIRA that prints out the name of the file in question, as that could be useful. -- Best Regards Alexander Aristov
Re: java.io.IOException: Could not get block locations. Aborting...
Hi again, The Could not get block locations exception was gone after a Hadoop restart, but further down the road our job failed again. I checked the logs for discarding calls and found a bunch of them, plus the namenode appeared to have a load spike at that time, so it seems it is getting overloaded. Do you know how can we prevent this? Currently the namenode machine is not running anything but the namenode and the secondary namenode, and the cluster only has 16 machines. Thank you Piotr On Fri, 2008-08-08 at 17:31 -0700, Dhruba Borthakur wrote: It is possible that your namenode is overloaded and is not able to respond to RPC requests from clients. Please check the namenode logs to see if you see lines of the form discarding calls dhrua On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov [EMAIL PROTECTED] wrote: I come across the same issue and also with hadoop 0.17.1 would be interesting if someone say the cause of the issue. Alex 2008/8/8 Steve Loughran [EMAIL PROTECTED] Piotr Kozikowski wrote: Hi there: We would like to know what are the most likely causes of this sort of error: Exception closing file /data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_22_0/part-00022 java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818) Our map-reduce job does not fail completely but over 50% of the map tasks fail with this same error. We recently migrated our cluster from 0.16.4 to 0.17.1, previously we didn't have this problem using the same input data in a similar map-reduce job Thank you, Piotr When I see this, its because the filesystem isnt completely up: there are no locations for a specific file, meaning the client isn't getting back the names of any datanodes holding the data from the name nodes. I've got a patch in JIRA that prints out the name of the file in question, as that could be useful. -- Best Regards Alexander Aristov
Re: java.io.IOException: Could not get block locations. Aborting...
Piotr Kozikowski wrote: Hi there: We would like to know what are the most likely causes of this sort of error: Exception closing file /data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_22_0/part-00022 java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818) Our map-reduce job does not fail completely but over 50% of the map tasks fail with this same error. We recently migrated our cluster from 0.16.4 to 0.17.1, previously we didn't have this problem using the same input data in a similar map-reduce job Thank you, Piotr When I see this, its because the filesystem isnt completely up: there are no locations for a specific file, meaning the client isn't getting back the names of any datanodes holding the data from the name nodes. I've got a patch in JIRA that prints out the name of the file in question, as that could be useful.
Re: java.io.IOException: Could not get block locations. Aborting...
I come across the same issue and also with hadoop 0.17.1 would be interesting if someone say the cause of the issue. Alex 2008/8/8 Steve Loughran [EMAIL PROTECTED] Piotr Kozikowski wrote: Hi there: We would like to know what are the most likely causes of this sort of error: Exception closing file /data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_22_0/part-00022 java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818) Our map-reduce job does not fail completely but over 50% of the map tasks fail with this same error. We recently migrated our cluster from 0.16.4 to 0.17.1, previously we didn't have this problem using the same input data in a similar map-reduce job Thank you, Piotr When I see this, its because the filesystem isnt completely up: there are no locations for a specific file, meaning the client isn't getting back the names of any datanodes holding the data from the name nodes. I've got a patch in JIRA that prints out the name of the file in question, as that could be useful. -- Best Regards Alexander Aristov
Re: java.io.IOException: Could not get block locations. Aborting...
It is possible that your namenode is overloaded and is not able to respond to RPC requests from clients. Please check the namenode logs to see if you see lines of the form discarding calls dhrua On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov [EMAIL PROTECTED] wrote: I come across the same issue and also with hadoop 0.17.1 would be interesting if someone say the cause of the issue. Alex 2008/8/8 Steve Loughran [EMAIL PROTECTED] Piotr Kozikowski wrote: Hi there: We would like to know what are the most likely causes of this sort of error: Exception closing file /data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_22_0/part-00022 java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818) Our map-reduce job does not fail completely but over 50% of the map tasks fail with this same error. We recently migrated our cluster from 0.16.4 to 0.17.1, previously we didn't have this problem using the same input data in a similar map-reduce job Thank you, Piotr When I see this, its because the filesystem isnt completely up: there are no locations for a specific file, meaning the client isn't getting back the names of any datanodes holding the data from the name nodes. I've got a patch in JIRA that prints out the name of the file in question, as that could be useful. -- Best Regards Alexander Aristov
Re: java.io.IOException: Could not get block locations. Aborting...
Thank you for the reply. Apparently whatever it was is now gone after a hadoop restart, but I'll keep that in mind should it happen again. Piotr On Fri, 2008-08-08 at 17:31 -0700, Dhruba Borthakur wrote: It is possible that your namenode is overloaded and is not able to respond to RPC requests from clients. Please check the namenode logs to see if you see lines of the form discarding calls dhrua On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov [EMAIL PROTECTED] wrote: I come across the same issue and also with hadoop 0.17.1 would be interesting if someone say the cause of the issue. Alex 2008/8/8 Steve Loughran [EMAIL PROTECTED] Piotr Kozikowski wrote: Hi there: We would like to know what are the most likely causes of this sort of error: Exception closing file /data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_22_0/part-00022 java.io.IOException: Could not get block locations. Aborting... at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702) at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818) Our map-reduce job does not fail completely but over 50% of the map tasks fail with this same error. We recently migrated our cluster from 0.16.4 to 0.17.1, previously we didn't have this problem using the same input data in a similar map-reduce job Thank you, Piotr When I see this, its because the filesystem isnt completely up: there are no locations for a specific file, meaning the client isn't getting back the names of any datanodes holding the data from the name nodes. I've got a patch in JIRA that prints out the name of the file in question, as that could be useful. -- Best Regards Alexander Aristov