[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793720#action_12793720 ] Zheng Shao commented on HDFS-770: - bad machine: {code} /proc/sys/vm/block_dump:0 /proc/sys/vm/dirty_background_ratio:10 /proc/sys/vm/dirty_expire_centisecs:3000 /proc/sys/vm/dirty_ratio:40 /proc/sys/vm/dirty_writeback_centisecs:500 /proc/sys/vm/hugetlb_shm_group:0 /proc/sys/vm/laptop_mode:0 /proc/sys/vm/legacy_va_layout:0 /proc/sys/vm/lowmem_reserve_ratio:256 32 /proc/sys/vm/max_map_count:65536 /proc/sys/vm/min_free_kbytes:16383 /proc/sys/vm/nr_hugepages:0 /proc/sys/vm/nr_pdflush_threads:2 /proc/sys/vm/overcommit_memory:0 /proc/sys/vm/overcommit_ratio:50 /proc/sys/vm/page-cluster:3 /proc/sys/vm/swappiness:60 /proc/sys/vm/swap_token_timeout:0 0 /proc/sys/vm/vfs_cache_pressure:100 {code} good machine: {code} /proc/sys/vm/block_dump:0 /proc/sys/vm/dirty_background_ratio:10 /proc/sys/vm/dirty_expire_centisecs:2999 /proc/sys/vm/dirty_ratio:40 /proc/sys/vm/dirty_writeback_centisecs:499 /proc/sys/vm/drop_caches:0 /proc/sys/vm/hugetlb_shm_group:0 /proc/sys/vm/laptop_mode:0 /proc/sys/vm/legacy_va_layout:0 /proc/sys/vm/lowmem_reserve_ratio:256 256 /proc/sys/vm/max_map_count:65536 /proc/sys/vm/min_free_kbytes:23004 /proc/sys/vm/min_slab_ratio:5 /proc/sys/vm/min_unmapped_ratio:1 /proc/sys/vm/nr_hugepages:0 /proc/sys/vm/nr_pdflush_threads:2 /proc/sys/vm/overcommit_memory:0 /proc/sys/vm/overcommit_ratio:50 /proc/sys/vm/page-cluster:3 /proc/sys/vm/panic_on_oom:0 /proc/sys/vm/percpu_pagelist_fraction:0 /proc/sys/vm/swappiness:60 /proc/sys/vm/vfs_cache_pressure:100 /proc/sys/vm/zone_reclaim_mode:0 {code} > SocketTimeoutException: timeout while waiting for channel to be ready for read > -- > > Key: HDFS-770 > URL: https://issues.apache.org/jira/browse/HDFS-770 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/libhdfs, data-node, hdfs client, name-node >Affects Versions: 0.20.1 > Environment: Ubuntu Linux 8.04 >Reporter: Leon Mergen > Attachments: client.txt, datanode.txt, filewriter.cpp, namenode.txt > > > We're having issues with timeouts occurring in our client: for some reason, a > timeout of 63000 milliseconds is triggered while writing HDFS data. Since we > currently have a single-server setup, this results in our client terminating > with a "All datanodes are bad" IOException. > We're running all services, including the client, on our single server, so it > cannot be a network error. The load on the client is extremely low during > this period: only a few kilobytes a minute were being written around the time > the error occured. > After browsing a bit online, a lot of people talk about setting > "dfs.datanode.socket.write.timeout" to 0 as a solution for this problem. Due > to the low load of our system during this period, however, I do feel this is > a real error and a timeout that should not be occurring. I have attached 3 > logs of the namenode, datanode and client. > It could be that this is related to > http://issues.apache.org/jira/browse/HDFS-693 > Any pointers on how I can assist to resolve this issue will be greatly > appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793385#action_12793385 ] Todd Lipcon commented on HDFS-770: -- Hey Zheng, Any chance you can run "grep . /proc/sys/vm/*" on the system that does show the problem, and compare the results to the one that doesn't show the problem? I'm thinking this might just be a factor of system level tuning. See http://www.westnet.com/~gsmith/content/linux-pdflush.htm > SocketTimeoutException: timeout while waiting for channel to be ready for read > -- > > Key: HDFS-770 > URL: https://issues.apache.org/jira/browse/HDFS-770 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/libhdfs, data-node, hdfs client, name-node >Affects Versions: 0.20.1 > Environment: Ubuntu Linux 8.04 >Reporter: Leon Mergen > Attachments: client.txt, datanode.txt, filewriter.cpp, namenode.txt > > > We're having issues with timeouts occurring in our client: for some reason, a > timeout of 63000 milliseconds is triggered while writing HDFS data. Since we > currently have a single-server setup, this results in our client terminating > with a "All datanodes are bad" IOException. > We're running all services, including the client, on our single server, so it > cannot be a network error. The load on the client is extremely low during > this period: only a few kilobytes a minute were being written around the time > the error occured. > After browsing a bit online, a lot of people talk about setting > "dfs.datanode.socket.write.timeout" to 0 as a solution for this problem. Due > to the low load of our system during this period, however, I do feel this is > a real error and a timeout that should not be occurring. I have attached 3 > logs of the namenode, datanode and client. > It could be that this is related to > http://issues.apache.org/jira/browse/HDFS-693 > Any pointers on how I can assist to resolve this issue will be greatly > appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793374#action_12793374 ] Todd Lipcon commented on HDFS-770: -- I assume the _fbk kernel doesn't have wacky patches backported from the last couple months of kernel development? Will run your test program on a couple boxes here as well, thanks for uploading it. > SocketTimeoutException: timeout while waiting for channel to be ready for read > -- > > Key: HDFS-770 > URL: https://issues.apache.org/jira/browse/HDFS-770 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/libhdfs, data-node, hdfs client, name-node >Affects Versions: 0.20.1 > Environment: Ubuntu Linux 8.04 >Reporter: Leon Mergen > Attachments: client.txt, datanode.txt, filewriter.cpp, namenode.txt > > > We're having issues with timeouts occurring in our client: for some reason, a > timeout of 63000 milliseconds is triggered while writing HDFS data. Since we > currently have a single-server setup, this results in our client terminating > with a "All datanodes are bad" IOException. > We're running all services, including the client, on our single server, so it > cannot be a network error. The load on the client is extremely low during > this period: only a few kilobytes a minute were being written around the time > the error occured. > After browsing a bit online, a lot of people talk about setting > "dfs.datanode.socket.write.timeout" to 0 as a solution for this problem. Due > to the low load of our system during this period, however, I do feel this is > a real error and a timeout that should not be occurring. I have attached 3 > logs of the namenode, datanode and client. > It could be that this is related to > http://issues.apache.org/jira/browse/HDFS-693 > Any pointers on how I can assist to resolve this issue will be greatly > appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793371#action_12793371 ] Zheng Shao commented on HDFS-770: - The new machine is running: "Linux hadoop0100.xxx.yyy.com 2.6.20-39_fbk #1 SMP Mon Mar 16 20:33:46 PDT 2009 x86_64 x86_64 x86_64 GNU/Linux" (uname -a) with 12 local disks. You can use the attached filewriter.cpp to see if your box has the problem mentioned above. > SocketTimeoutException: timeout while waiting for channel to be ready for read > -- > > Key: HDFS-770 > URL: https://issues.apache.org/jira/browse/HDFS-770 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/libhdfs, data-node, hdfs client, name-node >Affects Versions: 0.20.1 > Environment: Ubuntu Linux 8.04 >Reporter: Leon Mergen > Attachments: client.txt, datanode.txt, filewriter.cpp, namenode.txt > > > We're having issues with timeouts occurring in our client: for some reason, a > timeout of 63000 milliseconds is triggered while writing HDFS data. Since we > currently have a single-server setup, this results in our client terminating > with a "All datanodes are bad" IOException. > We're running all services, including the client, on our single server, so it > cannot be a network error. The load on the client is extremely low during > this period: only a few kilobytes a minute were being written around the time > the error occured. > After browsing a bit online, a lot of people talk about setting > "dfs.datanode.socket.write.timeout" to 0 as a solution for this problem. Due > to the low load of our system during this period, however, I do feel this is > a real error and a timeout that should not be occurring. I have attached 3 > logs of the namenode, datanode and client. > It could be that this is related to > http://issues.apache.org/jira/browse/HDFS-693 > Any pointers on how I can assist to resolve this issue will be greatly > appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793363#action_12793363 ] Todd Lipcon commented on HDFS-770: -- Hi Zheng, What kernel is the newer box running? Is it a brand new kernel? 2.6.32 has significant changes to dirty page writeback -Todd > SocketTimeoutException: timeout while waiting for channel to be ready for read > -- > > Key: HDFS-770 > URL: https://issues.apache.org/jira/browse/HDFS-770 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/libhdfs, data-node, hdfs client, name-node >Affects Versions: 0.20.1 > Environment: Ubuntu Linux 8.04 >Reporter: Leon Mergen > Attachments: client.txt, datanode.txt, namenode.txt > > > We're having issues with timeouts occurring in our client: for some reason, a > timeout of 63000 milliseconds is triggered while writing HDFS data. Since we > currently have a single-server setup, this results in our client terminating > with a "All datanodes are bad" IOException. > We're running all services, including the client, on our single server, so it > cannot be a network error. The load on the client is extremely low during > this period: only a few kilobytes a minute were being written around the time > the error occured. > After browsing a bit online, a lot of people talk about setting > "dfs.datanode.socket.write.timeout" to 0 as a solution for this problem. Due > to the low load of our system during this period, however, I do feel this is > a real error and a timeout that should not be occurring. I have attached 3 > logs of the namenode, datanode and client. > It could be that this is related to > http://issues.apache.org/jira/browse/HDFS-693 > Any pointers on how I can assist to resolve this issue will be greatly > appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12789514#action_12789514 ] Zheng Shao commented on HDFS-770: - In my case, it seems a problem with my box. It's running "2.6.12-1.1398_FC4smp #1 SMP Fri Jul 15 01:05:24 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux" (uname -a) with 4 local hard disks. I wrote a simple program (tried both Java and C) to write out a huge file, and the writes get blocked frequently after writing out 5~10GB of data. The block was sometimes as long as 18 seconds. Tried it on a newer box and the problem disappeared. The block is in "FileOutputStream.write" and "fwrite". Each write takes 60KB of data. I guess it's not a problem of the hardware, because "iostat -x -n 2" shows that the disk write speed is still non-zero (10-30MB/s) when this blocking happens. > SocketTimeoutException: timeout while waiting for channel to be ready for read > -- > > Key: HDFS-770 > URL: https://issues.apache.org/jira/browse/HDFS-770 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/libhdfs, data-node, hdfs client, name-node >Affects Versions: 0.20.1 > Environment: Ubuntu Linux 8.04 >Reporter: Leon Mergen > Attachments: client.txt, datanode.txt, namenode.txt > > > We're having issues with timeouts occurring in our client: for some reason, a > timeout of 63000 milliseconds is triggered while writing HDFS data. Since we > currently have a single-server setup, this results in our client terminating > with a "All datanodes are bad" IOException. > We're running all services, including the client, on our single server, so it > cannot be a network error. The load on the client is extremely low during > this period: only a few kilobytes a minute were being written around the time > the error occured. > After browsing a bit online, a lot of people talk about setting > "dfs.datanode.socket.write.timeout" to 0 as a solution for this problem. Due > to the low load of our system during this period, however, I do feel this is > a real error and a timeout that should not be occurring. I have attached 3 > logs of the namenode, datanode and client. > It could be that this is related to > http://issues.apache.org/jira/browse/HDFS-693 > Any pointers on how I can assist to resolve this issue will be greatly > appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788005#action_12788005 ] Leon Mergen commented on HDFS-770: -- For what it's worth, the timeout problems I was having are gone after I have set "dfs.datanode.socket.write.timeout" to 63, so it definitely appears as if the DN went to sleep for some reason. > SocketTimeoutException: timeout while waiting for channel to be ready for read > -- > > Key: HDFS-770 > URL: https://issues.apache.org/jira/browse/HDFS-770 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/libhdfs, data-node, hdfs client, name-node >Affects Versions: 0.20.1 > Environment: Ubuntu Linux 8.04 >Reporter: Leon Mergen > Attachments: client.txt, datanode.txt, namenode.txt > > > We're having issues with timeouts occurring in our client: for some reason, a > timeout of 63000 milliseconds is triggered while writing HDFS data. Since we > currently have a single-server setup, this results in our client terminating > with a "All datanodes are bad" IOException. > We're running all services, including the client, on our single server, so it > cannot be a network error. The load on the client is extremely low during > this period: only a few kilobytes a minute were being written around the time > the error occured. > After browsing a bit online, a lot of people talk about setting > "dfs.datanode.socket.write.timeout" to 0 as a solution for this problem. Due > to the low load of our system during this period, however, I do feel this is > a real error and a timeout that should not be occurring. I have attached 3 > logs of the namenode, datanode and client. > It could be that this is related to > http://issues.apache.org/jira/browse/HDFS-693 > Any pointers on how I can assist to resolve this issue will be greatly > appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787900#action_12787900 ] Zheng Shao commented on HDFS-770: - I saw the same thing with hadoop 0.19 while doing heavy-weight writes. {code} 09/10/29 01:05:18 WARN hdfs.DFSClient: DataStreamer Exception: java.net.SocketTimeoutException: 3 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.\ SocketChannel[connected local=/aa.bb.cc.dd:55040 remote=/ee.ff.gg.hh:50010] at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:162) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2323) {code} > SocketTimeoutException: timeout while waiting for channel to be ready for read > -- > > Key: HDFS-770 > URL: https://issues.apache.org/jira/browse/HDFS-770 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/libhdfs, data-node, hdfs client, name-node >Affects Versions: 0.20.1 > Environment: Ubuntu Linux 8.04 >Reporter: Leon Mergen > Attachments: client.txt, datanode.txt, namenode.txt > > > We're having issues with timeouts occurring in our client: for some reason, a > timeout of 63000 milliseconds is triggered while writing HDFS data. Since we > currently have a single-server setup, this results in our client terminating > with a "All datanodes are bad" IOException. > We're running all services, including the client, on our single server, so it > cannot be a network error. The load on the client is extremely low during > this period: only a few kilobytes a minute were being written around the time > the error occured. > After browsing a bit online, a lot of people talk about setting > "dfs.datanode.socket.write.timeout" to 0 as a solution for this problem. Due > to the low load of our system during this period, however, I do feel this is > a real error and a timeout that should not be occurring. I have attached 3 > logs of the namenode, datanode and client. > It could be that this is related to > http://issues.apache.org/jira/browse/HDFS-693 > Any pointers on how I can assist to resolve this issue will be greatly > appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-770) SocketTimeoutException: timeout while waiting for channel to be ready for read
[ https://issues.apache.org/jira/browse/HDFS-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777612#action_12777612 ] Raghu Angadi commented on HDFS-770: --- >From the datanode log : > 2009-11-13 06:18:21,965 DEBUG org.apache.hadoop.ipc.RPC: Call: sendHeartbeat > 14 > 2009-11-13 06:19:38,081 DEBUG org.apache.hadoop.ipc.Client: IPC Client (47) > connection to dfs.hadoop.tsukku.solatis/127.0.0.1:9000 from hadoop: closed Note that there is no activity on DataNode for 77 seconds. There are number of possibilities, common one being GC. we haven't seen GC taking this long DN though. Assuming DN went to sleep for some reason, rest of the behaviour is expected. If you do expect such delays, what you need to increase is the read timeout for "responder thread" in DFSOutputStream (there is a config for generic read timeout that applies to sockets in many contexts). > SocketTimeoutException: timeout while waiting for channel to be ready for read > -- > > Key: HDFS-770 > URL: https://issues.apache.org/jira/browse/HDFS-770 > Project: Hadoop HDFS > Issue Type: Bug > Components: contrib/libhdfs, data-node, hdfs client, name-node >Affects Versions: 0.20.1 > Environment: Ubuntu Linux 8.04 >Reporter: Leon Mergen > Attachments: client.txt, datanode.txt, namenode.txt > > > We're having issues with timeouts occurring in our client: for some reason, a > timeout of 63000 milliseconds is triggered while writing HDFS data. Since we > currently have a single-server setup, this results in our client terminating > with a "All datanodes are bad" IOException. > We're running all services, including the client, on our single server, so it > cannot be a network error. The load on the client is extremely low during > this period: only a few kilobytes a minute were being written around the time > the error occured. > After browsing a bit online, a lot of people talk about setting > "dfs.datanode.socket.write.timeout" to 0 as a solution for this problem. Due > to the low load of our system during this period, however, I do feel this is > a real error and a timeout that should not be occurring. I have attached 3 > logs of the namenode, datanode and client. > It could be that this is related to > http://issues.apache.org/jira/browse/HDFS-693 > Any pointers on how I can assist to resolve this issue will be greatly > appreciated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.