Re: Hadoop cluster hardware details for big data
Hi, Thanks a lot for your timely help. Your valuable answers helped us to understand what kind of hardware to use when it comes to huge data. With Regards, Karthik On 7/6/11, Steve Loughran ste...@apache.org wrote: On 06/07/11 13:18, Michel Segel wrote: Wasn't the answer 42? ;-P 42 = 40 + NN +2ary NN, assuming the JT runs on 2ary or on one of the worker nodes Looking at your calc... You forgot to factor in the number of slots per node. So the number is only a fraction. Assume 10 slots per node. (10 because it makes the math easier.) I thought something was wrong. Then I thought of the server revenue and decided not to look that hard. -- With Regards, Karthik
Hadoop cluster hardware details for big data
Hi, Has anyone here used hadoop to process more than 3TB of data? If so we would like to know how many machines you used in your cluster and about the hardware configuration. The objective is to know how to handle huge data in Hadoop cluster. -- With Regards, Karthik
Re: Hadoop cluster hardware details for big data
Hi, I wanted to know the time required to process huge datasets and number of machines used for them. On 7/6/11, Harsh J ha...@cloudera.com wrote: Have you taken a look at http://wiki.apache.org/hadoop/PoweredBy? It contains information relevant to your question, if not a detailed answer. On Wed, Jul 6, 2011 at 4:13 PM, Karthik Kumar karthik84ku...@gmail.com wrote: Hi, Has anyone here used hadoop to process more than 3TB of data? If so we would like to know how many machines you used in your cluster and about the hardware configuration. The objective is to know how to handle huge data in Hadoop cluster. -- With Regards, Karthik -- Harsh J -- With Regards, Karthik
Log files expanding at an alarming rate
Hi, I am using a small cluster of 1 master node and 2 salves. The tasktracker log in slaves are increasing in size approximately 1mb per second. This results in disk space getting low over time. This happens even if no job is running. Please suggest ways to slow down the size of the log file. Also I would like to know why the log files are written even if there is no jobs running. -- With Regards, Karthik
Re: Hadoop in Real time applications
Hi, Thanks for the clarification. On Thu, Feb 17, 2011 at 2:09 PM, Niels Basjes ni...@basjes.nl wrote: 2011/2/17 Karthik Kumar karthik84ku...@gmail.com: Can Hadoop be used for Real time Applications such as banking solutions... Hadoop consists of several components. Components like HDFS and HBase are quite suitable for interactive solutions (as in: I usually get an answer within 0.x seconds). If you really need realtime (as in: I want a guarantee that I have an answer within 0.x seconds) the answer is: No, HDFS/HBase cannot guarantee that. Other components like MapReduce (and Hive which run on top of MapReduce) are purely batch oriented. -- Met vriendelijke groeten, Niels Basjes -- With Regards, Karthik
Hadoop in Real time applications
Can Hadoop be used for Real time Applications such as banking solutions... -- With Regards, Karthik
Cannot copy files to HDFS
Hi, I am new to Hadoop. I am using Hadoop 0.20.2 version. I tried to copy a file of size 300 MB from local to HDFS. It showed the error as below. Please help me in solving this issue. 11/01/26 13:01:52 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: An existing connection was forcibly closed by the remote host at sun.nio.ch.SocketDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:33) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2314) 11/01/26 13:01:52 WARN hdfs.DFSClient: Error Recovery for block blk_4184614741505116937_1012 bad datanode[0] 160.110.184.114:50010 11/01/26 13:01:52 WARN hdfs.DFSClient: Error Recovery for block blk_4184614741505116937_1012 in pipeline 160.110.184.114:50010, 160.110.184.111:50010: bad datanode 160.110.184.114:50010 11/01/26 13:01:55 WARN hdfs.DFSClient: Error Recovery for block blk_4184614741505116937_1012 failed because recovery from primary datanode 160.110.184.111:50010 failed 1 times. Pipeline was 160.110.184.114:50010, 160.110.184.111:50010. Will retry... 11/01/26 13:01:55 WARN hdfs.DFSClient: Error Recovery for block blk_4184614741505116937_1012 bad datanode[0] 160.110.184.114:50010 11/01/26 13:01:55 WARN hdfs.DFSClient: Error Recovery for block blk_4184614741505116937_1012 in pipeline 160.110.184.114:50010, 160.110.184.111:50010: bad datanode 160.110.184.114:50010 11/01/26 13:02:28 WARN hdfs.DFSClient: DataStreamer Exception: java.io.IOException: An existing connection was forcibly closed by the remote host at sun.nio.ch.SocketDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:33) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2314) 11/01/26 13:02:28 WARN hdfs.DFSClient: Error Recovery for block blk_4184614741505116937_1013 bad datanode[0] 160.110.184.111:50010 copyFromLocal: All datanodes 160.110.184.111:50010 are bad. Aborting... 11/01/26 13:02:28 ERROR hdfs.DFSClient: Exception closing file /hdfs/data/input/cdr10M.csv : java.io.IOException: All datanodes 160.110.184.111:50010 are bad. Aborting... java.io.IOException: All datanodes 160.110.184.111:50010 are bad. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2556) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2265) -- With Regards, Karthik
Re: Task tracker and Data node not stopping
Hi Ken, Thank you for your quick reply. I dont know how to find the process which is overwriting those files. Anyhow i re-installed Cygwin from the scratch and the problem is solved. On Thu, Jul 15, 2010 at 9:49 PM, Ken Goodhope kengoodh...@gmail.com wrote: Inside hadoop-env.sh, you will see a property that sets the directory for pids to be written too. Check which directory it is and then investigate the possibility that some other process is deleting, or overwriting those files. If you are using NFS, with all nodes pointing at the same directory, then it might be a matter of each node overwriting the same file. Either way, the stop scripts look for those pid files, and used them to stop the correct daemon. If they are not found, or if the file contains the wrong pid, the script will echo no process to stop. On Thu, Jul 15, 2010 at 4:51 AM, Karthik Kumar karthik84ku...@gmail.com wrote: Hi, I am using a cluster of two machines one master and one slave. When i try to stop the cluster using stop-all.sh it is displaying as below. the task tracker and datanode are also not stopped in the slave. Please help me in solving this. stopping jobtracker 160.110.150.29: no tasktracker to stop stopping namenode 160.110.150.29: no datanode to stop localhost: stopping secondarynamenode -- With Regards, Karthik -- With Regards, Karthik
Task tracker and Data node not stopping
Hi, I am using a cluster of two machines one master and one slave. When i try to stop the cluster using stop-all.sh it is displaying as below. the task tracker and datanode are also not stopped in the slave. Please help me in solving this. stopping jobtracker 160.110.150.29: no tasktracker to stop stopping namenode 160.110.150.29: no datanode to stop localhost: stopping secondarynamenode -- With Regards, Karthik