I need help building Hadoop 2.0 for windows
I usually run Hadoop on a linux cluster but do most of my development in single machine mode under windows. This was fairly straightforward for 0.2. For 1.0 I needed to copy and fix FileUtils but for 2.0 I am expected to build 2 files from source - WinUtils.exe and hadoop.dll. There is really only ONE serious windows configuration: 64 bit intel and NO good reason why these files could not be available in a binary distribution - Is anyone using Hadoop 2.X and developing under windows whi can help. My configuration builds WunUtils.exe but not hadoop.dll and fails trying to set permission on a staging file file:/tmp/hadoop-Steve/mapred/staging/Steve2116067144/.staging I could use help and advice -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com
accessing hadoop filesystem from Tomcat
Hi all, I just want to confirm if my understanding with Hadoop FileSystem object is correct or not. From the source code of org.apache.hadoop.fs. FileSystem (either from version 1.0.4 or 2.2.0), the method public static FileSystem get(URI uri, Configuration conf) throws IOException is using some sort of cache: CACHE.get(uri, conf); My understanding is that Tomcat usually create multiple threads to handle Http requests, and those threads will use the same FileSystem object (because of the cache). This will resulting in an error, right? The next question is, if I want to disable the cache, should I just introduce a new key fs.hdfs.impl.disable.cache and set the value to true? And another key fs.har.impl.disable.cache for HAR FileSystem? Best regards, Henry The privileged confidential information contained in this email is intended for use only by the addressees as indicated by the original sender of this email. If you are not the addressee indicated in this email or are not responsible for delivery of the email to such a person, please kindly reply to the sender indicating this fact and delete all copies of it from your computer and network server immediately. Your cooperation is highly appreciated. It is advised that any unauthorized use of confidential information of Winbond is strictly prohibited; and any information in this email irrelevant to the official business of Winbond shall be deemed as neither given nor endorsed by Winbond.
Re: XML to TEXT
Hi, Thanks a lot . Ranjini On Fri, Jan 3, 2014 at 10:40 PM, Diego Gutierrez diego.gutier...@ucsp.edu.pe wrote: Hi, I suggest to use the XPath, this is a native java support for parse xml and json formats. For the main problem, like distcp command( http://hadoop.apache.org/docs/r0.19.0/distcp.pdf ) there is no need of a reduce function, because you can parse the xml input file and create the file you need in the map function.For example the following code reads an xml file in HDFS, parse it and create a new file ( /result.txt ) with the expected format: id,name 100,RR Mapper function: import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.InputStream; import java.net.URI; import javax.xml.namespace.QName; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpressionException; import javax.xml.xpath.XPathFactory; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import org.w3c.dom.Document; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.xml.sax.SAXException; import com.sun.org.apache.xml.internal.dtm.ref.DTMNodeList; public class XmlToTextMapper extends MapperLongWritable, Text, Text, Text { private static final XPathFactory xpathFactory = XPathFactory.newInstance(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String resultFileName = /result.txt; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(resultFileName), conf); FSDataOutputStream out = fs.create(new Path(resultFileName)); InputStream resultIS = new ByteArrayInputStream(new byte[0]); String header = id,name\n; out.write(header.getBytes()); String xmlContent = value.toString(); InputStream is = new ByteArrayInputStream(xmlContent.getBytes()); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder; try { builder = factory.newDocumentBuilder(); Document doc = builder.parse(is); DTMNodeList list = (DTMNodeList) getNode(/main/data, doc, XPathConstants.NODESET); int size = list.getLength(); for (int i = 0; i size; i++) { Node node = list.item(i); String line = ; NodeList nodeList = node.getChildNodes(); int childNumber = nodeList.getLength(); for (int j = 0; j childNumber; j++) { line += nodeList.item(j).getTextContent() + ,; } if (line.endsWith(,)) line = line.substring(0, line.length() - 1); line += \n; out.write(line.getBytes()); } } catch (ParserConfigurationException e) { MyLogguer.log(error: + e.getMessage()); e.printStackTrace(); } catch (SAXException e) { MyLogguer.log(error: + e.getMessage()); e.printStackTrace(); } catch (XPathExpressionException e) { MyLogguer.log(error: + e.getMessage()); e.printStackTrace(); } IOUtils.copyBytes(resultIS, out, 4096, true); out.close(); } public static Object getNode(String xpathStr, Node node, QName retunType) throws XPathExpressionException { XPath xpath = xpathFactory.newXPath(); return xpath.evaluate(xpathStr, node, retunType); } } -- Main class: public class Main { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err .println(Usage: XMLtoText input path output path); System.exit(-1); } Job job = new Job(); job.setJarByClass(Main.class); job.setJobName(XML to Text); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(XmlToTextMapper.class); job.setNumReduceTasks(0); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); System.exit(job.waitForCompletion(true) ? 0 : 1); } } To execute the job you can use : bin/hadoop Main /data.xml
Fine tunning
Hi, I have a input File of 16 fields in it. Using Mapreduce code need to load the hbase tables. The first eight has to go into one table in hbase and last eight has to got to another hbase table. The data is being loaded into hbase table in 0.11 sec , but if any lookup is being added in the mapreduce code, For eg, the input file has one attribute named currency , it will have a master table currency. need to match both values to print it. The table which has lookup takes long time to get load. For 13250 records it take 59 mins. How to make fine tune to reduce the time for its loading. Please help. Thanks in advance. Ranjini.R
Re: XML to TEXT
hi rajini Can u use hive? then u can just use xpaths in ur select clause cheers R+ On Mon, Jan 6, 2014 at 2:44 PM, Ranjini Rathinam ranjinibe...@gmail.comwrote: Hi, Thanks a lot . Ranjini On Fri, Jan 3, 2014 at 10:40 PM, Diego Gutierrez diego.gutier...@ucsp.edu.pe wrote: Hi, I suggest to use the XPath, this is a native java support for parse xml and json formats. For the main problem, like distcp command( http://hadoop.apache.org/docs/r0.19.0/distcp.pdf ) there is no need of a reduce function, because you can parse the xml input file and create the file you need in the map function.For example the following code reads an xml file in HDFS, parse it and create a new file ( /result.txt ) with the expected format: id,name 100,RR Mapper function: import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.InputStream; import java.net.URI; import javax.xml.namespace.QName; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpressionException; import javax.xml.xpath.XPathFactory; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import org.w3c.dom.Document; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.xml.sax.SAXException; import com.sun.org.apache.xml.internal.dtm.ref.DTMNodeList; public class XmlToTextMapper extends MapperLongWritable, Text, Text, Text { private static final XPathFactory xpathFactory = XPathFactory.newInstance(); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String resultFileName = /result.txt; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(resultFileName), conf); FSDataOutputStream out = fs.create(new Path(resultFileName)); InputStream resultIS = new ByteArrayInputStream(new byte[0]); String header = id,name\n; out.write(header.getBytes()); String xmlContent = value.toString(); InputStream is = new ByteArrayInputStream(xmlContent.getBytes()); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder; try { builder = factory.newDocumentBuilder(); Document doc = builder.parse(is); DTMNodeList list = (DTMNodeList) getNode(/main/data, doc, XPathConstants.NODESET); int size = list.getLength(); for (int i = 0; i size; i++) { Node node = list.item(i); String line = ; NodeList nodeList = node.getChildNodes(); int childNumber = nodeList.getLength(); for (int j = 0; j childNumber; j++) { line += nodeList.item(j).getTextContent() + ,; } if (line.endsWith(,)) line = line.substring(0, line.length() - 1); line += \n; out.write(line.getBytes()); } } catch (ParserConfigurationException e) { MyLogguer.log(error: + e.getMessage()); e.printStackTrace(); } catch (SAXException e) { MyLogguer.log(error: + e.getMessage()); e.printStackTrace(); } catch (XPathExpressionException e) { MyLogguer.log(error: + e.getMessage()); e.printStackTrace(); } IOUtils.copyBytes(resultIS, out, 4096, true); out.close(); } public static Object getNode(String xpathStr, Node node, QName retunType) throws XPathExpressionException { XPath xpath = xpathFactory.newXPath(); return xpath.evaluate(xpathStr, node, retunType); } } -- Main class: public class Main { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err .println(Usage: XMLtoText input path output path); System.exit(-1); } Job job = new Job(); job.setJarByClass(Main.class); job.setJobName(XML to Text); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(XmlToTextMapper.class); job.setNumReduceTasks(0); job.setMapOutputKeyClass(Text.class);
Hadoop permissions issue
I’m trying to run Nutch 2.2.1 on a Hadoop 1.2.1 cluster. The fetch phase runs fine. But in the next job, this error comes up java.lang.NullPointerException at org.apache.avro.util.Utf8.init(Utf8.java:37) at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) I’m running three nodes namely nutch1,2,3. The first one’s in the masters file and all are listed in the slaves file. The /etc/hosts file lists all machines along with their IP addresses. Can someone help me? -- Manikandan Saravanan Architect - Technology TheSocialPeople
Understanding MapReduce source code : Flush operations
Hi, I am using hadoop/ map reduce for aout 2.5 years. I want to understand the internals of the hadoop source code. Let me put my requirement very clear. I want to have a look at the code where of flush operations that happens after the reduce phase. Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output. I essentially want to look at the code of that flushing operation. Regards, Nagarjuna K
Fwd: Understanding MapReduce source code : Flush operations
-- Forwarded message -- From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com Date: Mon, Jan 6, 2014 at 8:09 AM Subject: Understanding MapReduce source code : Flush operations To: mapreduce-u...@hadoop.apache.org Hi, I am using hadoop/ map reduce for aout 2.5 years. I want to understand the internals of the hadoop source code. Let me put my requirement very clear. I want to have a look at the code where of flush operations that happens after the reduce phase. Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output. I essentially want to look at the code of that flushing operation. Regards, Nagarjuna K
Re: Hadoop permissions issue
Based on the Exception type, it looks like something in your job is looking for a valid value, and not finding it. You will probably need to share the job code for people to help with this - to my eyes, this doesn't appear to be a Hadoop configuration issue, or any kind of problem with how the system is working. Are you using Avro inputs and outputs? If your reduce is trying to parse an Avro record, it may be that the field type is not correct, or maybe there is a reference to an outside schema object that is not available... If you provide more information about the context of the error (use case, program goal, code block, something like that) then it is easier to help you. *Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Mon, Jan 6, 2014 at 8:08 AM, Manikandan Saravanan manikan...@thesocialpeople.net wrote: I’m trying to run Nutch 2.2.1 on a Hadoop 1.2.1 cluster. The fetch phase runs fine. But in the next job, this error comes up java.lang.NullPointerException at org.apache.avro.util.Utf8.init(Utf8.java:37) at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) I’m running three nodes namely nutch1,2,3. The first one’s in the masters file and all are listed in the slaves file. The /etc/hosts file lists all machines along with their IP addresses. Can someone help me? -- Manikandan Saravanan Architect - Technology TheSocialPeople http://thesocialpeople.net
Re: Hadoop permissions issue
I’m running Nutch 2.2.1 on a Hadoop cluster. I’m running 5000 links from the DMOZ Open Directory Project. The reduce job stops exactly at 33% all the time and it throws this exception. From the nutch mailing list, it seems that my job is stumbling upon a repUrl value that’s null. -- Manikandan Saravanan Architect - Technology TheSocialPeople On 6 January 2014 at 7:14:41 pm, Devin Suiter RDX (dsui...@rdx.com) wrote: Based on the Exception type, it looks like something in your job is looking for a valid value, and not finding it. You will probably need to share the job code for people to help with this - to my eyes, this doesn't appear to be a Hadoop configuration issue, or any kind of problem with how the system is working. Are you using Avro inputs and outputs? If your reduce is trying to parse an Avro record, it may be that the field type is not correct, or maybe there is a reference to an outside schema object that is not available... If you provide more information about the context of the error (use case, program goal, code block, something like that) then it is easier to help you. Devin Suiter Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Mon, Jan 6, 2014 at 8:08 AM, Manikandan Saravanan manikan...@thesocialpeople.net wrote: I’m trying to run Nutch 2.2.1 on a Hadoop 1.2.1 cluster. The fetch phase runs fine. But in the next job, this error comes up java.lang.NullPointerException at org.apache.avro.util.Utf8.init(Utf8.java:37) at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) I’m running three nodes namely nutch1,2,3. The first one’s in the masters file and all are listed in the slaves file. The /etc/hosts file lists all machines along with their IP addresses. Can someone help me? -- Manikandan Saravanan Architect - Technology TheSocialPeople
Fwd: Understanding MapReduce source code : Flush operations
-- Forwarded message -- From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com Date: Mon, Jan 6, 2014 at 6:39 PM Subject: Understanding MapReduce source code : Flush operations To: mapreduce-u...@hadoop.apache.org Hi, I am using hadoop/ map reduce for aout 2.5 years. I want to understand the internals of the hadoop source code. Let me put my requirement very clear. I want to have a look at the code where of flush operations that happens after the reduce phase. Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output. I essentially want to look at the code of that flushing operation. Regards, Nagarjuna K
Re: Understanding MapReduce source code : Flush operations
Please do not tell me since last 2.5 years you have not used virtual Hadoop environment to debug your Map Reduce application before deploying to Production environment No one can stop you looking at the code , Hadoop and its ecosystem is open-source On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: -- Forwarded message -- From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com Date: Mon, Jan 6, 2014 at 6:39 PM Subject: Understanding MapReduce source code : Flush operations To: mapreduce-u...@hadoop.apache.org Hi, I am using hadoop/ map reduce for aout 2.5 years. I want to understand the internals of the hadoop source code. Let me put my requirement very clear. I want to have a look at the code where of flush operations that happens after the reduce phase. Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output. I essentially want to look at the code of that flushing operation. Regards, Nagarjuna K
Re: Fine tunning
Can you please share how you are doing the lookup? On Mon, Jan 6, 2014 at 4:23 AM, Ranjini Rathinam ranjinibe...@gmail.comwrote: Hi, I have a input File of 16 fields in it. Using Mapreduce code need to load the hbase tables. The first eight has to go into one table in hbase and last eight has to got to another hbase table. The data is being loaded into hbase table in 0.11 sec , but if any lookup is being added in the mapreduce code, For eg, the input file has one attribute named currency , it will have a master table currency. need to match both values to print it. The table which has lookup takes long time to get load. For 13250 records it take 59 mins. How to make fine tune to reduce the time for its loading. Please help. Thanks in advance. Ranjini.R
Re: Spill Failed Caused by ArrayIndexOutOfBoundsException
The error is happening during Sort And Spill phase org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill It seems like you are trying to compare two Int values and it fails during compare Caused by: java.lang.ArrayIndexOutOfBoundsException: 99614720 at org.apache.hadoop.io.WritableComparator.readInt(WritableComparator.java:158) at org.apache.hadoop.io.BooleanWritable$Comparator. compare(BooleanWritable.java:103) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1116) at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95) at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java: 1404) On Mon, Jan 6, 2014 at 3:21 PM, Paul Mahon pma...@decarta.com wrote: I have a hadoop program that I'm running with version 1.2.1 which fails in a peculiar place. Most mappers complete without error, but some fail with this stack trace: java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: java.lang.ArrayIndexOutOfBoundsException: 99614720 at org.apache.hadoop.io.WritableComparator.readInt(WritableComparator.java:158) at org.apache.hadoop.io.BooleanWritable$Comparator.compare(BooleanWritable.java:103) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1116) at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95) at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1404) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:858) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1349) I've noticed that that array index is exactly the size of the bufvoid, but I'm not sure if that has any significance. The exception isn't happening in my WritableComparable or any of my code, it's all in hadoop. I'm not sure what to do to track down what I'm doing to cause the problem. Has anyone seen a problem like this or have any suggestions of where to look for the problem in my code?
Re: Understanding MapReduce source code : Flush operations
This is not in DFSClient. Before the output is written on to HDFS, lot of operations take place. Like reducer output in mem reaching 90% of HDFS block size, then starting to flush the data etc.., So, my requirement is to have a look at that code where in I want to change the logic a bit which suits my convenience. On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Assuming your output is going to HDFS, you want to look at DFSClient. Reducer uses FileSystem to write the output. You need to start looking at how DFSClient chunks the output and sends them across to the remote data-nodes. Thanks +Vinod On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: I want to have a look at the code where of flush operations that happens after the reduce phase. Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output. I essentially want to look at the code of that flushing operation. What is the class(es) I need to look into On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya smarty.ju...@gmail.comwrote: Please do not tell me since last 2.5 years you have not used virtual Hadoop environment to debug your Map Reduce application before deploying to Production environment No one can stop you looking at the code , Hadoop and its ecosystem is open-source On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: -- Forwarded message -- From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com Date: Mon, Jan 6, 2014 at 6:39 PM Subject: Understanding MapReduce source code : Flush operations To: mapreduce-u...@hadoop.apache.org Hi, I am using hadoop/ map reduce for aout 2.5 years. I want to understand the internals of the hadoop source code. Let me put my requirement very clear. I want to have a look at the code where of flush operations that happens after the reduce phase. Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output. I essentially want to look at the code of that flushing operation. Regards, Nagarjuna K CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: unable to compile hadoop source code
El ene 6, 2014 10:48 PM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com escribió: Hi, I checked out the source code from https://svn.apache.org/repos/asf/hadoop/common/trunk/ I tried to compile the code with mvn. I am compiling this on a mac os X , mavericks. Any help is appreciated. It failed at the following stage [INFO] Apache Hadoop Auth Examples ... SUCCESS [5.017s] [INFO] Apache Hadoop Common .. FAILURE [1:39.797s] [INFO] Apache Hadoop NFS . SKIPPED [INFO] Apache Hadoop Common Project .. SKIPPED [INFO] [ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. Thanks, Nagarjuna K
RE: unable to compile hadoop source code
You can read Build instructions for Hadoop. http://svn.apache.org/repos/asf/hadoop/common/trunk/BUILDING.txt For your problem, proto-buf not set in PATH. After setting, recheck proto-buffer version is 2.5 From: nagarjuna kanamarlapudi [mailto:nagarjuna.kanamarlap...@gmail.com] Sent: 07 January 2014 09:18 To: user@hadoop.apache.org Subject: unable to compile hadoop source code Hi, I checked out the source code from https://svn.apache.org/repos/asf/hadoop/common/trunk/ I tried to compile the code with mvn. I am compiling this on a mac os X , mavericks. Any help is appreciated. It failed at the following stage [INFO] Apache Hadoop Auth Examples ... SUCCESS [5.017s] [INFO] Apache Hadoop Common .. FAILURE [1:39.797s] [INFO] Apache Hadoop NFS . SKIPPED [INFO] Apache Hadoop Common Project .. SKIPPED [INFO] [ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. Thanks, Nagarjuna K
Re: unable to compile hadoop source code
Thanks all, the following is required Download from http://code.google.com/p/protobuf/downloads/list $ ./configure $ make $ make check $ make install Then compile the source code On Tue, Jan 7, 2014 at 9:46 AM, Rohith Sharma K S rohithsharm...@huawei.com wrote: You can read Build instructions for Hadoop. http://svn.apache.org/repos/asf/hadoop/common/trunk/BUILDING.txt For your problem, proto-buf not set in PATH. After setting, recheck proto-buffer version is 2.5 *From:* nagarjuna kanamarlapudi [mailto:nagarjuna.kanamarlap...@gmail.com] *Sent:* 07 January 2014 09:18 *To:* user@hadoop.apache.org *Subject:* unable to compile hadoop source code Hi, I checked out the source code from https://svn.apache.org/repos/asf/hadoop/common/trunk/ I tried to compile the code with mvn. I am compiling this on a mac os X , mavericks. Any help is appreciated. It failed at the following stage [INFO] Apache Hadoop Auth Examples ... SUCCESS [5.017s] [INFO] Apache Hadoop Common .. FAILURE [1:39.797s] [INFO] Apache Hadoop NFS . SKIPPED [INFO] Apache Hadoop Common Project .. SKIPPED [INFO] [ERROR] Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not return a version - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. Thanks, Nagarjuna K
Re: Understanding MapReduce source code : Flush operations
What OutputFormat are you using? Once it reaches OutputFormat (specifically RecordWriter) it all depends on what the RecordWriter does. Are you using some OutputFormat with a RecordWriter that buffers like this? Thanks, +Vinod On Jan 6, 2014, at 7:11 PM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: This is not in DFSClient. Before the output is written on to HDFS, lot of operations take place. Like reducer output in mem reaching 90% of HDFS block size, then starting to flush the data etc.., So, my requirement is to have a look at that code where in I want to change the logic a bit which suits my convenience. On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Assuming your output is going to HDFS, you want to look at DFSClient. Reducer uses FileSystem to write the output. You need to start looking at how DFSClient chunks the output and sends them across to the remote data-nodes. Thanks +Vinod On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: I want to have a look at the code where of flush operations that happens after the reduce phase. Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output. I essentially want to look at the code of that flushing operation. What is the class(es) I need to look into On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya smarty.ju...@gmail.com wrote: Please do not tell me since last 2.5 years you have not used virtual Hadoop environment to debug your Map Reduce application before deploying to Production environment No one can stop you looking at the code , Hadoop and its ecosystem is open-source On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com wrote: -- Forwarded message -- From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com Date: Mon, Jan 6, 2014 at 6:39 PM Subject: Understanding MapReduce source code : Flush operations To: mapreduce-u...@hadoop.apache.org Hi, I am using hadoop/ map reduce for aout 2.5 years. I want to understand the internals of the hadoop source code. Let me put my requirement very clear. I want to have a look at the code where of flush operations that happens after the reduce phase. Reducer writes the output to OutputFormat which inturn pushes that to memory and once it reaches 90% of chunk size it starts to flush the reducer output. I essentially want to look at the code of that flushing operation. Regards, Nagarjuna K CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: What makes a map to fail?
When I click on individual maps logs, it says Aggregation is not enabled. Try the nodemanager at slave1-machine:60933 How could I enable aggregation? On Sun, Jan 5, 2014 at 1:21 PM, Harsh J ha...@cloudera.com wrote: Every failed task typically carries a diagnostic message and a set of logs for you to investigate what caused it to fail. Try visiting the task's logs on the JT UI by clicking through individual failed attempts to find the reason of its failure. On Sun, Jan 5, 2014 at 11:03 PM, Saeed Adel Mehraban s.ade...@gmail.com wrote: Hi all, My task jobs are failing due to many failed maps. I want to know what makes a map to fail? Is it something like exceptions or what? -- Harsh J