El 3/9/2011 6:27 AM, Evert Lammerts escribió:
We see a lot of IOExceptions coming from HDFS during a job that does nothing
but untar 100 files (1 per Mapper, sizes vary between 5GB and 80GB) that are in
HDFS, to HDFS. DataNodes are also showing Exceptions that I think are related.
(See stacktraces below.)
This job should not be able to overload the system I think... I realize that
much data needs to go over the lines, but HDFS should still be responsive. Any
ideas / help is much appreciated!
Some details:
* Hadoop 0.20.2 (CDH3b4)
* 5 node cluster plus 1 node for JT/NN (Sun Thumpers)
* 4 cores/node, 4GB RAM/core
* CentOS 5.5
Job output:
java.io.IOException: java.io.IOException: Could not obtain block:
blk_-3695352030358969086_130839
file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
Which is the ouput of:
bin/hadoop dfsadmin -report
Which is the output of:
bin/hadoop fsck /user/emeij/icwsm-data-test/
at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:449)
at ilps.DownloadICWSM$UntarMapper.map(DownloadICWSM.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:390)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324)
at org.apache.hadoop.mapred.Child$4.run(Child.java:240)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:234)
Caused by: java.io.IOException: Could not obtain block:
blk_-3695352030358969086_130839
file=/user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
Which is the ouput of:
bin/hadoop fsck /user/emeij/icwsm-data-test/01-26-SOCIAL_MEDIA.tar.gz
--files -blocks -racks
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:55)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:335)
at ilps.DownloadICWSM$CopyThread.run(DownloadICWSM.java:149)
Example DataNode Exceptions (not that these come from the node at
192.168.28.211):
2011-03-08 19:40:40,297 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Exception in receiveBlock for block blk_-9222067946733189014_3798233
java.io.EOFException: while trying to read 3067064 bytes
2011-03-08 19:40:41,018 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
/192.168.28.211:50050, dest: /192.168.28.211:49748, bytes: 0, op: HDFS_READ,
cliID: DFSClient_attempt_201103071120_0030_m_000032_0, offset: 30
72, srvID: DS-568746059-145.100.2.180-50050-1291128670510, blockid:
blk_3596618013242149887_4060598, duration: 2632000
2011-03-08 19:40:41,049 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Exception in receiveBlock for block blk_-9221028436071074510_2325937
java.io.EOFException: while trying to read 2206400 bytes
2011-03-08 19:40:41,348 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Exception in receiveBlock for block blk_-9221549395563181322_4024529
java.io.EOFException: while trying to read 3037288 bytes
2011-03-08 19:40:41,357 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
Exception in receiveBlock for block blk_-9221885906633018147_3895876
java.io.EOFException: while trying to read 1981952 bytes
2011-03-08 19:40:41,434 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
Block blk_-9221885906633018147_3895876 unfinalized and removed.
2011-03-08 19:40:41,434 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
writeBlock blk_-9221885906633018147_3895876 received exception
java.io.EOFException: while trying to read 1981952 bytes
2011-03-08 19:40:41,434 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.28.211:50050,
storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 1981952 bytes
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
2011-03-08 19:40:41,465 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
Block blk_-9221549395563181322_4024529 unfinalized and removed.
2011-03-08 19:40:41,466 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
writeBlock blk_-9221549395563181322_4024529 received exception
java.io.EOFException: while trying to read 3037288 bytes
2011-03-08 19:40:41,466 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(192.168.28.211:50050,
storageID=DS-568746059-145.100.2.180-50050-1291128670510, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 3037288 bytes
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:270)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:357)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:378)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:534)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:417)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:122)
Cheers,
Evert Lammerts
Consultant eScience& Cloud Services
SARA Computing& Network Services
Operations, Support& Development
Phone: +31 20 888 4101
Email: evert.lamme...@sara.nl
http://www.sara.nl
Then on the DataNode where you have the particular block
(blk_-3695352030358969086_130839 )
you can visit the web interface
http://192.168.28.211:50075/blockScannerReport to see what's happening
on the node
Regards
--
Marcos Luís Ortíz Valmaseda
Software Engineer
Universidad de las Ciencias Informáticas
Linux User # 418229
http://uncubanitolinuxero.blogspot.com
http://www.linkedin.com/in/marcosluis2186