Re: Best Linux Operating system used for Hadoop
I suggest CentOS 5.7 / RHEL 5.7 CentOS 6.2 runs also stable - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Jan 27, 2012, at 10:15 AM, Sujit Dhamale wrote: Hi All, I am new to Hadoop, Can any one tell me which is the best Linux Operating system used for installing running Hadoop. ?? now a day i am using Ubuntu 11.4 and install Hadoop on it but it crashes number of times . can some please help me out ??? Kind regards Sujit Dhamale
Re: Best Linux Operating system used for Hadoop
Thanks a lot Alex, i will install Linux RHEL today Only . --Sujit Dhamale On Fri, Jan 27, 2012 at 2:49 PM, alo alt wget.n...@googlemail.com wrote: I suggest CentOS 5.7 / RHEL 5.7 CentOS 6.2 runs also stable - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Jan 27, 2012, at 10:15 AM, Sujit Dhamale wrote: Hi All, I am new to Hadoop, Can any one tell me which is the best Linux Operating system used for installing running Hadoop. ?? now a day i am using Ubuntu 11.4 and install Hadoop on it but it crashes number of times . can some please help me out ??? Kind regards Sujit Dhamale
Re: NoSuchElementException while Reduce step
hey there must be sum problem with the key or value, reducer didnt find the expected value. On Fri, Jan 27, 2012 at 1:23 AM, Rajesh Sai T tsairaj...@gmail.com wrote: Hi, I'm new to Hadoop. I'm trying to write my custom data types for Writable types. So, that Map class will produce my structure as value of key. And Reduce class shall work on these list of my structure values. Below is my program, please guide me what needs to be done to overcome this. I find it does pass Map phase but while Reducing on last iteration is pops an exception and job terminates. import java.io.IOException; import java.util.*; import java.io.DataOutput; import java.io.DataInput; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.*; import org.apache.hadoop.mapred.*; public class InvertedGroupIndex { public static class InvertedStruct implements Writable { public String location; public int count; public InvertedStruct (int _count, String _str2) { this.count = _count; this.location = _str2; } public InvertedStruct () { this(0, null); } /*public void set (String _str1, String _str2) { this.word = _str1; this.location = _str2; }*/ public void write(DataOutput out) throws IOException { out.writeInt(this.count); out.writeChars(this.location); } public void readFields(DataInput in) throws IOException { count = in.readInt(); location = in.readLine(); } public String toString() { return count + ; + location; } public int getCount() { return count; } public String getString() { return location; } } public static class InvertedMap extends MapReduceBase implements MapperLongWritable, Text, Text, InvertedStruct { private final static IntWritable count = new IntWritable(1); private final static Text word = new Text(); public void map (LongWritable key, Text val, OutputCollectorText, InvertedStructoutput, Reporter report) throws IOException { FileSplit filesplit = (FileSplit) report.getInputSplit(); String fileName = filesplit.getPath().getName(); //location.set(fileName); //InvertedStruct result = new InvertedStruct(1, fileName); String line = val.toString(); StringTokenizer token = new StringTokenizer(line.toLowerCase()); while (token.hasMoreTokens()) { word.set(token.nextToken()); output.collect(word, new InvertedStruct(1, fileName)); } } } public static class InvertedReducer extends MapReduceBase implements ReducerText, InvertedStruct, Text, Text { public void reduce(Text key, IteratorInvertedStruct values, OutputCollectorText, Text output, Reporter reporter) throws IOException, NoSuchElementException { int sum=0; StringBuilder toReturn = new StringBuilder(); while (values.hasNext()) { sum += values.next().getCount(); toReturn.append(values.next().getString()); } String s = String.valueOf(sum) + toReturn.toString(); output.collect(key, new Text(s)); } } public static void main (String[] args) throws IOException { //JobClient client = new JobClient(); JobConf conf = new JobConf(InvertedGroupIndex.class); conf.setJobName(InvertedGroupIndex); conf.setMapperClass(InvertedMap.class); //conf.setCombinerClass(InvertedReducer.class); conf.setReducerClass(InvertedReducer.class); conf.setMapOutputValueClass(InvertedStruct.class); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(Text.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } } Thanks, Sai
jobtracker url(Critical)
Hey folks, i m facing a problem, with job Tracker URL, actually i added a node to the cluster and after sometime i restart the cluster, then i found that my job tracker is showing recent added node in *nodes * but rest of nodes are not available not even in *blacklist. * * * can any1 have any idea why its happening. Thanks and regards Vikas Srivastava
Re: jobtracker url(Critical)
Vikas, Have you ensured your non-appearing tasktracker services are started/alive and carry no communication errors in their logs? Did you perhaps bring up a firewall accidentally, that was not present before? On Fri, Jan 27, 2012 at 4:47 PM, hadoop hive hadooph...@gmail.com wrote: Hey folks, i m facing a problem, with job Tracker URL, actually i added a node to the cluster and after sometime i restart the cluster, then i found that my job tracker is showing recent added node in *nodes * but rest of nodes are not available not even in *blacklist. * * * can any1 have any idea why its happening. Thanks and regards Vikas Srivastava -- Harsh J Customer Ops. Engineer, Cloudera
Re: jobtracker url(Critical)
Hey Harsh, but after sumtym they are available 1 by 1 in jobtracker URL. any idea how they add up slowly slowly. regards Vikas On Fri, Jan 27, 2012 at 5:05 PM, Harsh J ha...@cloudera.com wrote: Vikas, Have you ensured your non-appearing tasktracker services are started/alive and carry no communication errors in their logs? Did you perhaps bring up a firewall accidentally, that was not present before? On Fri, Jan 27, 2012 at 4:47 PM, hadoop hive hadooph...@gmail.com wrote: Hey folks, i m facing a problem, with job Tracker URL, actually i added a node to the cluster and after sometime i restart the cluster, then i found that my job tracker is showing recent added node in *nodes * but rest of nodes are not available not even in *blacklist. * * * can any1 have any idea why its happening. Thanks and regards Vikas Srivastava -- Harsh J Customer Ops. Engineer, Cloudera
Re: jobtracker url(Critical)
Task tracker sometimes so not clean up their mapred temp directories well if that is the case the tt on startup can spent many minutes deleting files. I use find to delete files older then a couple of days. On Friday, January 27, 2012, hadoop hive hadooph...@gmail.com wrote: Hey Harsh, but after sumtym they are available 1 by 1 in jobtracker URL. any idea how they add up slowly slowly. regards Vikas On Fri, Jan 27, 2012 at 5:05 PM, Harsh J ha...@cloudera.com wrote: Vikas, Have you ensured your non-appearing tasktracker services are started/alive and carry no communication errors in their logs? Did you perhaps bring up a firewall accidentally, that was not present before? On Fri, Jan 27, 2012 at 4:47 PM, hadoop hive hadooph...@gmail.com wrote: Hey folks, i m facing a problem, with job Tracker URL, actually i added a node to the cluster and after sometime i restart the cluster, then i found that my job tracker is showing recent added node in *nodes * but rest of nodes are not available not even in *blacklist. * * * can any1 have any idea why its happening. Thanks and regards Vikas Srivastava -- Harsh J Customer Ops. Engineer, Cloudera
Re: Too many open files Error
Hi Harsh and Idris ... so the only drawback for increasing the value of xcievers is memory issue? In that case then I'll set it to a value that doesn't fill the memory ... Thanks, Mark On Thu, Jan 26, 2012 at 10:37 PM, Idris Ali psychid...@gmail.com wrote: Hi Mark, As Harsh pointed out it is not good idea to increase the Xceiver count to arbitrarily higher value, I suggested to increase the xceiver count just to unblock execution of your program temporarily. Thanks, -Idris On Fri, Jan 27, 2012 at 10:39 AM, Harsh J ha...@cloudera.com wrote: You are technically allowing DN to run 1 million block transfer (in/out) threads by doing that. It does not take up resources by default sure, but now it can be abused with requests to make your DN run out of memory and crash cause its not bound to proper limits now. On Fri, Jan 27, 2012 at 5:49 AM, Mark question markq2...@gmail.com wrote: Harsh, could you explain briefly why is 1M setting for xceiver is bad? the job is working now ... about the ulimit -u it shows 200703, so is that why connection is reset by peer? How come it's working with the xceiver modification? Thanks, Mark On Thu, Jan 26, 2012 at 12:21 PM, Harsh J ha...@cloudera.com wrote: Agree with Raj V here - Your problem should not be the # of transfer threads nor the number of open files given that stacktrace. And the values you've set for the transfer threads are far beyond recommendations of 4k/8k - I would not recommend doing that. Default in 1.0.0 is 256 but set it to 2048/4096, which are good value to have when noticing increased HDFS load, or when running services like HBase. You should instead focus on why its this particular job (or even particular task, which is important to notice) that fails, and not other jobs (or other task attempts). On Fri, Jan 27, 2012 at 1:10 AM, Raj V rajv...@yahoo.com wrote: Mark You have this Connection reset by peer. Why do you think this problem is related to too many open files? Raj From: Mark question markq2...@gmail.com To: common-user@hadoop.apache.org Sent: Thursday, January 26, 2012 11:10 AM Subject: Re: Too many open files Error Hi again, I've tried : property namedfs.datanode.max.xcievers/name value1048576/value /property but I'm still getting the same error ... how high can I go?? Thanks, Mark On Thu, Jan 26, 2012 at 9:29 AM, Mark question markq2...@gmail.com wrote: Thanks for the reply I have nothing about dfs.datanode.max.xceivers on my hdfs-site.xml so hopefully this would solve the problem and about the ulimit -n , I'm running on an NFS cluster, so usually I just start Hadoop with a single bin/start-all.sh ... Do you think I can add it by bin/Datanode -ulimit n ? Mark On Thu, Jan 26, 2012 at 7:33 AM, Mapred Learn mapred.le...@gmail.com wrote: U need to set ulimit -n bigger value on datanode and restart datanodes. Sent from my iPhone On Jan 26, 2012, at 6:06 AM, Idris Ali psychid...@gmail.com wrote: Hi Mark, On a lighter note what is the count of xceivers? dfs.datanode.max.xceivers property in hdfs-site.xml? Thanks, -idris On Thu, Jan 26, 2012 at 5:28 PM, Michel Segel michael_se...@hotmail.comwrote: Sorry going from memory... As user Hadoop or mapred or hdfs what do you see when you do a ulimit -a? That should give you the number of open files allowed by a single user... Sent from a remote device. Please excuse any typos... Mike Segel On Jan 26, 2012, at 5:13 AM, Mark question markq2...@gmail.com wrote: Hi guys, I get this error from a job trying to process 3Million records. java.io.IOException: Bad connect ack with firstBadLink 192.168.1.20:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2903) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) When I checked the logfile of the datanode-20, I see : 2012-01-26 03:00:11,827 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 192.168.1.20:50010, storageID=DS-97608578-192.168.1.20-50010-1327575205369, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native
Re: Best Linux Operating system used for Hadoop
Hi, I suggest you Fedora, in my opinion its more powerful than other distribution. i have run hadoop on it without any problem, good luck On 01/27/2012 06:15 PM, Sujit Dhamale wrote: Hi All, I am new to Hadoop, Can any one tell me which is the best Linux Operating system used for installing running Hadoop. ?? now a day i am using Ubuntu 11.4 and install Hadoop on it but it crashes number of times . can some please help me out ??? Kind regards Sujit Dhamale