I need help building Hadoop 2.0 for windows

2014-01-06 Thread Steve Lewis
I usually run Hadoop on a linux cluster but do most of my development in
single machine mode under windows.
This was fairly straightforward for 0.2. For 1.0 I needed to copy and fix
FileUtils but for 2.0 I am expected to build 2 files from source -
WinUtils.exe and hadoop.dll. There is really only ONE serious windows
configuration: 64 bit intel and NO good reason why these files could not be
available in a binary distribution -
Is anyone using Hadoop 2.X and developing under windows whi can help.
My configuration builds WunUtils.exe but not hadoop.dll and fails trying to
set permission on a staging
file file:/tmp/hadoop-Steve/mapred/staging/Steve2116067144/.staging

I could use help and advice

-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com


accessing hadoop filesystem from Tomcat

2014-01-06 Thread Henry Hung
Hi all,

I just want to confirm if my understanding with Hadoop FileSystem object is 
correct or not.

From the source code of org.apache.hadoop.fs. FileSystem (either from version 
1.0.4 or 2.2.0), the method
public static FileSystem get(URI uri, Configuration conf) throws IOException

is using some sort of cache:
CACHE.get(uri, conf);

My understanding is that Tomcat usually create multiple threads to handle Http 
requests, and those threads will use the same FileSystem object (because of the 
cache).
This will resulting in an error, right?

The next question is, if I want to disable the cache, should I just introduce a 
new key fs.hdfs.impl.disable.cache and set the value to true?
And another key fs.har.impl.disable.cache for HAR FileSystem?

Best regards,
Henry


The privileged confidential information contained in this email is intended for 
use only by the addressees as indicated by the original sender of this email. 
If you are not the addressee indicated in this email or are not responsible for 
delivery of the email to such a person, please kindly reply to the sender 
indicating this fact and delete all copies of it from your computer and network 
server immediately. Your cooperation is highly appreciated. It is advised that 
any unauthorized use of confidential information of Winbond is strictly 
prohibited; and any information in this email irrelevant to the official 
business of Winbond shall be deemed as neither given nor endorsed by Winbond.


Re: XML to TEXT

2014-01-06 Thread Ranjini Rathinam
Hi,

Thanks a lot .

Ranjini

On Fri, Jan 3, 2014 at 10:40 PM, Diego Gutierrez 
diego.gutier...@ucsp.edu.pe wrote:

  Hi,

 I suggest to use the XPath, this is a native java support for parse xml
 and json formats.

 For the main problem, like distcp command(
 http://hadoop.apache.org/docs/r0.19.0/distcp.pdf ) there is no need of a
 reduce function, because you can parse the xml input file and create the
 file you need in the map function.For example the following code reads an
 xml file in HDFS, parse it and create a new file ( /result.txt ) with the
 expected format:
 id,name
 100,RR


 Mapper function:

 import java.io.ByteArrayInputStream;
 import java.io.IOException;
 import java.io.InputStream;
 import java.net.URI;

 import javax.xml.namespace.QName;
 import javax.xml.parsers.DocumentBuilder;
 import javax.xml.parsers.DocumentBuilderFactory;
 import javax.xml.parsers.ParserConfigurationException;
 import javax.xml.xpath.XPath;
 import javax.xml.xpath.XPathConstants;
 import javax.xml.xpath.XPathExpressionException;
 import javax.xml.xpath.XPathFactory;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FSDataOutputStream;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.IOUtils;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Mapper;
 import org.w3c.dom.Document;
 import org.w3c.dom.Node;
 import org.w3c.dom.NodeList;
 import org.xml.sax.SAXException;

 import com.sun.org.apache.xml.internal.dtm.ref.DTMNodeList;

 public class XmlToTextMapper extends MapperLongWritable, Text, Text,
 Text {

 private static final XPathFactory xpathFactory =
 XPathFactory.newInstance();

 @Override
 public void map(LongWritable key, Text value, Context context)
 throws IOException, InterruptedException {

 String resultFileName = /result.txt;


 Configuration conf = new Configuration();
 FileSystem fs = FileSystem.get(URI.create(resultFileName), conf);
 FSDataOutputStream out = fs.create(new Path(resultFileName));

 InputStream resultIS = new ByteArrayInputStream(new byte[0]);

 String header = id,name\n;
 out.write(header.getBytes());

 String xmlContent = value.toString();
 InputStream is = new ByteArrayInputStream(xmlContent.getBytes());
 DocumentBuilderFactory factory =
 DocumentBuilderFactory.newInstance();
 DocumentBuilder builder;
 try {
 builder = factory.newDocumentBuilder();
 Document doc = builder.parse(is);
 DTMNodeList list = (DTMNodeList) getNode(/main/data, doc,
 XPathConstants.NODESET);

 int size = list.getLength();
 for (int i = 0; i  size; i++) {
 Node node = list.item(i);
 String line = ;
 NodeList nodeList = node.getChildNodes();
 int childNumber = nodeList.getLength();
 for (int j = 0; j  childNumber; j++) {
 line += nodeList.item(j).getTextContent() + ,;
 }
 if (line.endsWith(,))
 line = line.substring(0, line.length() - 1);
 line += \n;
 out.write(line.getBytes());

 }

 } catch (ParserConfigurationException e) {
 MyLogguer.log(error:  + e.getMessage());
 e.printStackTrace();
 } catch (SAXException e) {
 MyLogguer.log(error:  + e.getMessage());
 e.printStackTrace();
 } catch (XPathExpressionException e) {
 MyLogguer.log(error:  + e.getMessage());
 e.printStackTrace();
 }

 IOUtils.copyBytes(resultIS, out, 4096, true);
 out.close();
 }

 public static Object getNode(String xpathStr, Node node, QName
 retunType)
 throws XPathExpressionException {
 XPath xpath = xpathFactory.newXPath();
 return xpath.evaluate(xpathStr, node, retunType);
 }
 }



 --
 Main class:


 public class Main {

 public static void main(String[] args) throws Exception {

 if (args.length != 2) {
 System.err
 .println(Usage: XMLtoText input path output
 path);
 System.exit(-1);
 }

 Job job = new Job();
 job.setJarByClass(Main.class);
 job.setJobName(XML to Text);
 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job, new Path(args[1]));

 job.setMapperClass(XmlToTextMapper.class);
 job.setNumReduceTasks(0);
 job.setMapOutputKeyClass(Text.class);
 job.setMapOutputValueClass(Text.class);
 System.exit(job.waitForCompletion(true) ? 0 : 1);

 }
 }

 To execute the job you can use :

  bin/hadoop Main /data.xml 

Fine tunning

2014-01-06 Thread Ranjini Rathinam
 Hi,

I have a input File of 16 fields in it.

Using Mapreduce code need to load the hbase tables.

The first eight has to go into one table in hbase and last eight has to got
to another hbase table.

The data is being loaded into hbase table in 0.11 sec , but if any lookup
is being added in the mapreduce code,
For eg, the input file has one  attribute named currency , it will have a
master table currency. need to match both values to print it.

The table which has lookup takes long time to get load. For 13250 records
it take 59 mins.

How to make fine tune to reduce the time for its loading.

Please help.

Thanks in advance.

Ranjini.R


Re: XML to TEXT

2014-01-06 Thread Rajesh Nagaraju
hi rajini

Can u use hive? then u can just use xpaths in ur select clause

cheers
R+


On Mon, Jan 6, 2014 at 2:44 PM, Ranjini Rathinam ranjinibe...@gmail.comwrote:

 Hi,

 Thanks a lot .

 Ranjini

 On Fri, Jan 3, 2014 at 10:40 PM, Diego Gutierrez 
 diego.gutier...@ucsp.edu.pe wrote:

  Hi,

 I suggest to use the XPath, this is a native java support for parse xml
 and json formats.

 For the main problem, like distcp command(
 http://hadoop.apache.org/docs/r0.19.0/distcp.pdf ) there is no need of a
 reduce function, because you can parse the xml input file and create the
 file you need in the map function.For example the following code reads an
 xml file in HDFS, parse it and create a new file ( /result.txt ) with the
 expected format:
 id,name
 100,RR


 Mapper function:

 import java.io.ByteArrayInputStream;
 import java.io.IOException;
 import java.io.InputStream;
 import java.net.URI;

 import javax.xml.namespace.QName;
 import javax.xml.parsers.DocumentBuilder;
 import javax.xml.parsers.DocumentBuilderFactory;
 import javax.xml.parsers.ParserConfigurationException;
 import javax.xml.xpath.XPath;
 import javax.xml.xpath.XPathConstants;
 import javax.xml.xpath.XPathExpressionException;
 import javax.xml.xpath.XPathFactory;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FSDataOutputStream;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.IOUtils;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapreduce.Mapper;
 import org.w3c.dom.Document;
 import org.w3c.dom.Node;
 import org.w3c.dom.NodeList;
 import org.xml.sax.SAXException;

 import com.sun.org.apache.xml.internal.dtm.ref.DTMNodeList;

 public class XmlToTextMapper extends MapperLongWritable, Text, Text,
 Text {

 private static final XPathFactory xpathFactory =
 XPathFactory.newInstance();

 @Override
 public void map(LongWritable key, Text value, Context context)
 throws IOException, InterruptedException {

 String resultFileName = /result.txt;


 Configuration conf = new Configuration();
 FileSystem fs = FileSystem.get(URI.create(resultFileName), conf);
 FSDataOutputStream out = fs.create(new Path(resultFileName));

 InputStream resultIS = new ByteArrayInputStream(new byte[0]);

 String header = id,name\n;
 out.write(header.getBytes());

 String xmlContent = value.toString();
 InputStream is = new ByteArrayInputStream(xmlContent.getBytes());
 DocumentBuilderFactory factory =
 DocumentBuilderFactory.newInstance();
 DocumentBuilder builder;
 try {
 builder = factory.newDocumentBuilder();
 Document doc = builder.parse(is);
 DTMNodeList list = (DTMNodeList) getNode(/main/data, doc,
 XPathConstants.NODESET);

 int size = list.getLength();
 for (int i = 0; i  size; i++) {
 Node node = list.item(i);
 String line = ;
 NodeList nodeList = node.getChildNodes();
 int childNumber = nodeList.getLength();
 for (int j = 0; j  childNumber; j++) {
 line += nodeList.item(j).getTextContent() + ,;
 }
 if (line.endsWith(,))
 line = line.substring(0, line.length() - 1);
 line += \n;
 out.write(line.getBytes());

 }

 } catch (ParserConfigurationException e) {
 MyLogguer.log(error:  + e.getMessage());
 e.printStackTrace();
 } catch (SAXException e) {
 MyLogguer.log(error:  + e.getMessage());
 e.printStackTrace();
 } catch (XPathExpressionException e) {
 MyLogguer.log(error:  + e.getMessage());
 e.printStackTrace();
 }

 IOUtils.copyBytes(resultIS, out, 4096, true);
 out.close();
 }

 public static Object getNode(String xpathStr, Node node, QName
 retunType)
 throws XPathExpressionException {
 XPath xpath = xpathFactory.newXPath();
 return xpath.evaluate(xpathStr, node, retunType);
 }
 }



 --
  Main class:


 public class Main {

 public static void main(String[] args) throws Exception {

 if (args.length != 2) {
 System.err
 .println(Usage: XMLtoText input path output
 path);
 System.exit(-1);
 }

 Job job = new Job();
 job.setJarByClass(Main.class);
 job.setJobName(XML to Text);
 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileOutputFormat.setOutputPath(job, new Path(args[1]));

 job.setMapperClass(XmlToTextMapper.class);
 job.setNumReduceTasks(0);
 job.setMapOutputKeyClass(Text.class);
 

Hadoop permissions issue

2014-01-06 Thread Manikandan Saravanan
I’m trying to run Nutch 2.2.1 on a Hadoop 1.2.1 cluster. The fetch phase runs 
fine. But in the next job, this error comes up

java.lang.NullPointerException
at org.apache.avro.util.Utf8.init(Utf8.java:37)
at 
org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

I’m running three nodes namely nutch1,2,3. The first one’s in the masters file 
and all are listed in the slaves file. The /etc/hosts file lists all machines 
along with their IP addresses. Can someone help me?

-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople

Understanding MapReduce source code : Flush operations

2014-01-06 Thread nagarjuna kanamarlapudi
Hi,

I am using hadoop/ map reduce for aout 2.5 years. I want to understand the
internals of the hadoop source code.

Let me put my requirement very clear.

I want to have a look at the code where of flush operations that happens
after the reduce phase.

Reducer writes the output to OutputFormat which inturn pushes that to
memory and once it reaches 90% of chunk size it starts to flush the reducer
output.

I essentially want to look at the code of that flushing operation.




Regards,
Nagarjuna K


Fwd: Understanding MapReduce source code : Flush operations

2014-01-06 Thread nagarjuna kanamarlapudi
-- Forwarded message --
From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com
Date: Mon, Jan 6, 2014 at 8:09 AM
Subject: Understanding MapReduce source code : Flush operations
To: mapreduce-u...@hadoop.apache.org


Hi,

I am using hadoop/ map reduce for aout 2.5 years. I want to understand the
internals of the hadoop source code.

Let me put my requirement very clear.

I want to have a look at the code where of flush operations that happens
after the reduce phase.

Reducer writes the output to OutputFormat which inturn pushes that to
memory and once it reaches 90% of chunk size it starts to flush the reducer
output.

I essentially want to look at the code of that flushing operation.




Regards,
Nagarjuna K


Re: Hadoop permissions issue

2014-01-06 Thread Devin Suiter RDX
Based on the Exception type, it looks like something in your job is looking
for a valid value, and not finding it.

You will probably need to share the job code for people to help with this -
to my eyes, this doesn't appear to be a Hadoop configuration issue, or any
kind of problem with how the system is working.

Are you using Avro inputs and outputs? If your reduce is trying to parse an
Avro record, it may be that the field type is not correct, or maybe there
is a reference to an outside schema object that is not available...

If you provide more information about the context of the error (use case,
program goal, code block, something like that) then it is easier to help
you.



*Devin Suiter*
Jr. Data Solutions Software Engineer
100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Mon, Jan 6, 2014 at 8:08 AM, Manikandan Saravanan 
manikan...@thesocialpeople.net wrote:

 I’m trying to run Nutch 2.2.1 on a Hadoop 1.2.1 cluster. The fetch phase
 runs fine. But in the next job, this error comes up

 java.lang.NullPointerException

 at org.apache.avro.util.Utf8.init(Utf8.java:37)

 at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)

 at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)

 at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)

 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)

 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:415)

 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)

 at org.apache.hadoop.mapred.Child.main(Child.java:249)


 I’m running three nodes namely nutch1,2,3. The first one’s in the masters
 file and all are listed in the slaves file. The /etc/hosts file lists all
 machines along with their IP addresses. Can someone help me?

 --
 Manikandan Saravanan
 Architect - Technology
 TheSocialPeople http://thesocialpeople.net



Re: Hadoop permissions issue

2014-01-06 Thread Manikandan Saravanan
I’m running Nutch 2.2.1 on a Hadoop cluster. I’m running 5000 links from the 
DMOZ Open Directory Project. The reduce job stops exactly at 33% all the time 
and it throws this exception. From the nutch mailing list, it seems that my job 
is stumbling upon a repUrl value that’s null.
-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople

On 6 January 2014 at 7:14:41 pm, Devin Suiter RDX (dsui...@rdx.com) wrote:

Based on the Exception type, it looks like something in your job is looking for 
a valid value, and not finding it.

You will probably need to share the job code for people to help with this - to 
my eyes, this doesn't appear to be a Hadoop configuration issue, or any kind of 
problem with how the system is working.

Are you using Avro inputs and outputs? If your reduce is trying to parse an 
Avro record, it may be that the field type is not correct, or maybe there is a 
reference to an outside schema object that is not available...

If you provide more information about the context of the error (use case, 
program goal, code block, something like that) then it is easier to help you.



Devin Suiter
Jr. Data Solutions Software Engineer

100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
Google Voice: 412-256-8556 | www.rdx.com


On Mon, Jan 6, 2014 at 8:08 AM, Manikandan Saravanan 
manikan...@thesocialpeople.net wrote:
I’m trying to run Nutch 2.2.1 on a Hadoop 1.2.1 cluster. The fetch phase runs 
fine. But in the next job, this error comes up

java.lang.NullPointerException
at org.apache.avro.util.Utf8.init(Utf8.java:37)
at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

I’m running three nodes namely nutch1,2,3. The first one’s in the masters file 
and all are listed in the slaves file. The /etc/hosts file lists all machines 
along with their IP addresses. Can someone help me?

-- 
Manikandan Saravanan
Architect - Technology
TheSocialPeople



Fwd: Understanding MapReduce source code : Flush operations

2014-01-06 Thread nagarjuna kanamarlapudi
-- Forwarded message --
From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com
Date: Mon, Jan 6, 2014 at 6:39 PM
Subject: Understanding MapReduce source code : Flush operations
To: mapreduce-u...@hadoop.apache.org


Hi,

I am using hadoop/ map reduce for aout 2.5 years. I want to understand the
internals of the hadoop source code.

Let me put my requirement very clear.

I want to have a look at the code where of flush operations that happens
after the reduce phase.

Reducer writes the output to OutputFormat which inturn pushes that to
memory and once it reaches 90% of chunk size it starts to flush the reducer
output.

I essentially want to look at the code of that flushing operation.




Regards,
Nagarjuna K


Re: Understanding MapReduce source code : Flush operations

2014-01-06 Thread Hardik Pandya
Please do not tell me since last 2.5 years you have not used virtual Hadoop
environment to debug your Map Reduce application before deploying to
Production environment

No one can stop you looking at the code , Hadoop and its ecosystem is
open-source


On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi 
nagarjuna.kanamarlap...@gmail.com wrote:



 -- Forwarded message --
 From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com
 Date: Mon, Jan 6, 2014 at 6:39 PM
 Subject: Understanding MapReduce source code : Flush operations
 To: mapreduce-u...@hadoop.apache.org


 Hi,

 I am using hadoop/ map reduce for aout 2.5 years. I want to understand the
 internals of the hadoop source code.

 Let me put my requirement very clear.

 I want to have a look at the code where of flush operations that happens
 after the reduce phase.

 Reducer writes the output to OutputFormat which inturn pushes that to
 memory and once it reaches 90% of chunk size it starts to flush the reducer
 output.

 I essentially want to look at the code of that flushing operation.




 Regards,
 Nagarjuna K




Re: Fine tunning

2014-01-06 Thread Hardik Pandya
Can you please share how you are doing the lookup?




On Mon, Jan 6, 2014 at 4:23 AM, Ranjini Rathinam ranjinibe...@gmail.comwrote:

 Hi,

 I have a input File of 16 fields in it.

 Using Mapreduce code need to load the hbase tables.

 The first eight has to go into one table in hbase and last eight has to
 got to another hbase table.

 The data is being loaded into hbase table in 0.11 sec , but if any lookup
 is being added in the mapreduce code,
 For eg, the input file has one  attribute named currency , it will have a
 master table currency. need to match both values to print it.

 The table which has lookup takes long time to get load. For 13250 records
 it take 59 mins.

 How to make fine tune to reduce the time for its loading.

 Please help.

 Thanks in advance.

 Ranjini.R





Re: Spill Failed Caused by ArrayIndexOutOfBoundsException

2014-01-06 Thread Hardik Pandya
The error is happening during Sort And Spill phase

org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill

It seems like you are trying to compare two Int values and it fails during
compare

Caused by: java.lang.ArrayIndexOutOfBoundsException: 99614720
at
org.apache.hadoop.io.WritableComparator.readInt(WritableComparator.java:158)
at
org.apache.hadoop.io.BooleanWritable$Comparator.
compare(BooleanWritable.java:103)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1116)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:
1404)


On Mon, Jan 6, 2014 at 3:21 PM, Paul Mahon pma...@decarta.com wrote:

 I have a hadoop program that I'm running with version 1.2.1 which
 fails in a peculiar place. Most mappers complete without error, but
 some fail with this stack trace:

 java.io.IOException: Spill failed
 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
 at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
 at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at

 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 99614720
 at

 org.apache.hadoop.io.WritableComparator.readInt(WritableComparator.java:158)
 at

 org.apache.hadoop.io.BooleanWritable$Comparator.compare(BooleanWritable.java:103)
 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1116)
 at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
 at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
 at

 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1404)
 at

 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:858)
 at

 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1349)

 I've noticed that that array index is exactly the size of the bufvoid,
 but I'm not sure if that has any significance.

 The exception isn't happening in my WritableComparable or any of my
 code, it's all in hadoop. I'm not sure what to do to track down what
 I'm doing to cause the problem. Has anyone seen a problem like this or
 have any suggestions of where to look for the problem in my code?



Re: Understanding MapReduce source code : Flush operations

2014-01-06 Thread nagarjuna kanamarlapudi
This is not in DFSClient.

Before the output is written on to HDFS, lot of operations take place.

Like reducer output in mem reaching 90% of HDFS block size, then starting
to flush  the data etc..,

So, my requirement is to have a look at that code where in I want to change
the logic a bit which suits my convenience.


On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli 
vino...@hortonworks.com wrote:

 Assuming your output is going to HDFS, you want to look at DFSClient.

 Reducer uses FileSystem to write the output. You need to start looking at
 how DFSClient chunks the output and sends them across to the remote
 data-nodes.

 Thanks
 +Vinod

 On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi 
 nagarjuna.kanamarlap...@gmail.com wrote:

 I want to have a look at the code where of flush operations that happens
 after the reduce phase.

 Reducer writes the output to OutputFormat which inturn pushes that to
 memory and once it reaches 90% of chunk size it starts to flush the reducer
 output.

 I essentially want to look at the code of that flushing operation.


 What is the class(es) I need to look into


 On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya smarty.ju...@gmail.comwrote:

 Please do not tell me since last 2.5 years you have not used virtual
 Hadoop environment to debug your Map Reduce application before deploying to
 Production environment

 No one can stop you looking at the code , Hadoop and its ecosystem is
 open-source


 On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi 
 nagarjuna.kanamarlap...@gmail.com wrote:



 -- Forwarded message --
 From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com
  Date: Mon, Jan 6, 2014 at 6:39 PM
 Subject: Understanding MapReduce source code : Flush operations
 To: mapreduce-u...@hadoop.apache.org


  Hi,

 I am using hadoop/ map reduce for aout 2.5 years. I want to understand
 the internals of the hadoop source code.

 Let me put my requirement very clear.

 I want to have a look at the code where of flush operations that happens
 after the reduce phase.

 Reducer writes the output to OutputFormat which inturn pushes that to
 memory and once it reaches 90% of chunk size it starts to flush the reducer
 output.

 I essentially want to look at the code of that flushing operation.




 Regards,
 Nagarjuna K





 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: unable to compile hadoop source code

2014-01-06 Thread Diego Gutierrez
El ene 6, 2014 10:48 PM, nagarjuna kanamarlapudi 
nagarjuna.kanamarlap...@gmail.com escribió:

 Hi,
 I checked out the source code from
https://svn.apache.org/repos/asf/hadoop/common/trunk/

 I tried to compile the code with mvn.

 I am compiling this on a mac os X , mavericks.  Any help is appreciated.

 It failed at the following stage


 [INFO] Apache Hadoop Auth Examples ... SUCCESS
[5.017s]

 [INFO] Apache Hadoop Common .. FAILURE
[1:39.797s]

 [INFO] Apache Hadoop NFS . SKIPPED

 [INFO] Apache Hadoop Common Project .. SKIPPED








 [INFO]


 [ERROR] Failed to execute goal
org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc
(compile-protoc) on project hadoop-common:
org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not
return a version - [Help 1]

 [ERROR]

 [ERROR] To see the full stack trace of the errors, re-run Maven with the
-e switch.

 [ERROR] Re-run Maven using the -X switch to enable full debug logging.



 Thanks,
 Nagarjuna K



RE: unable to compile hadoop source code

2014-01-06 Thread Rohith Sharma K S
You can read Build instructions for Hadoop.
http://svn.apache.org/repos/asf/hadoop/common/trunk/BUILDING.txt

For your problem, proto-buf not set in PATH. After setting, recheck 
proto-buffer version is 2.5

From: nagarjuna kanamarlapudi [mailto:nagarjuna.kanamarlap...@gmail.com]
Sent: 07 January 2014 09:18
To: user@hadoop.apache.org
Subject: unable to compile hadoop source code

Hi,
I checked out the source code from 
https://svn.apache.org/repos/asf/hadoop/common/trunk/

I tried to compile the code with mvn.

I am compiling this on a mac os X , mavericks.  Any help is appreciated.

It failed at the following stage



[INFO] Apache Hadoop Auth Examples ... SUCCESS [5.017s]

[INFO] Apache Hadoop Common .. FAILURE [1:39.797s]

[INFO] Apache Hadoop NFS . SKIPPED

[INFO] Apache Hadoop Common Project .. SKIPPED








[INFO] 

[ERROR] Failed to execute goal 
org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) 
on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: 
'protoc --version' did not return a version - [Help 1]

[ERROR]

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.


Thanks,
Nagarjuna K



Re: unable to compile hadoop source code

2014-01-06 Thread nagarjuna kanamarlapudi
Thanks all, the following is required

Download from http://code.google.com/p/protobuf/downloads/list


$ ./configure
$ make
$ make check
$ make install


Then compile the source code


On Tue, Jan 7, 2014 at 9:46 AM, Rohith Sharma K S rohithsharm...@huawei.com
 wrote:

  You can read Build instructions for Hadoop.

 http://svn.apache.org/repos/asf/hadoop/common/trunk/BUILDING.txt



 For your problem, proto-buf not set in PATH. After setting, recheck
 proto-buffer version is 2.5



 *From:* nagarjuna kanamarlapudi [mailto:nagarjuna.kanamarlap...@gmail.com]

 *Sent:* 07 January 2014 09:18
 *To:* user@hadoop.apache.org
 *Subject:* unable to compile hadoop source code



 Hi,

 I checked out the source code from
 https://svn.apache.org/repos/asf/hadoop/common/trunk/



 I tried to compile the code with mvn.



 I am compiling this on a mac os X , mavericks.  Any help is appreciated.



 It failed at the following stage





 [INFO] Apache Hadoop Auth Examples ... SUCCESS [5.017s]

 [INFO] Apache Hadoop Common .. FAILURE
 [1:39.797s]

 [INFO] Apache Hadoop NFS . SKIPPED

 [INFO] Apache Hadoop Common Project .. SKIPPED















 [INFO]
 

 [ERROR] Failed to execute goal
 org.apache.hadoop:hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc
 (compile-protoc) on project hadoop-common:
 org.apache.maven.plugin.MojoExecutionException: 'protoc --version' did not
 return a version - [Help 1]

 [ERROR]

 [ERROR] To see the full stack trace of the errors, re-run Maven with the
 -e switch.

 [ERROR] Re-run Maven using the -X switch to enable full debug logging.





 Thanks,

 Nagarjuna K





Re: Understanding MapReduce source code : Flush operations

2014-01-06 Thread Vinod Kumar Vavilapalli

What OutputFormat are you using?

Once it reaches OutputFormat (specifically RecordWriter) it all depends on what 
the RecordWriter does. Are you using some OutputFormat with a RecordWriter that 
buffers like this?

Thanks,
+Vinod

On Jan 6, 2014, at 7:11 PM, nagarjuna kanamarlapudi 
nagarjuna.kanamarlap...@gmail.com wrote:

 This is not in DFSClient.
 
 Before the output is written on to HDFS, lot of operations take place.
 
 Like reducer output in mem reaching 90% of HDFS block size, then starting to 
 flush  the data etc..,
 
 So, my requirement is to have a look at that code where in I want to change 
 the logic a bit which suits my convenience.
 
 
 On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli 
 vino...@hortonworks.com wrote:
 Assuming your output is going to HDFS, you want to look at DFSClient.
 
 Reducer uses FileSystem to write the output. You need to start looking at how 
 DFSClient chunks the output and sends them across to the remote data-nodes.
 
 Thanks
 +Vinod
 
 On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi 
 nagarjuna.kanamarlap...@gmail.com wrote:
 
 I want to have a look at the code where of flush operations that happens 
 after the reduce phase.
 
 Reducer writes the output to OutputFormat which inturn pushes that to memory 
 and once it reaches 90% of chunk size it starts to flush the reducer output. 
 
 I essentially want to look at the code of that flushing operation.
 
 
 What is the class(es) I need to look into 
 
 
 On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya smarty.ju...@gmail.com 
 wrote:
 Please do not tell me since last 2.5 years you have not used virtual Hadoop 
 environment to debug your Map Reduce application before deploying to 
 Production environment
 
 No one can stop you looking at the code , Hadoop and its ecosystem is 
 open-source
 
 
 On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi 
 nagarjuna.kanamarlap...@gmail.com wrote:
 
 
 -- Forwarded message --
 From: nagarjuna kanamarlapudi nagarjuna.kanamarlap...@gmail.com
 Date: Mon, Jan 6, 2014 at 6:39 PM
 Subject: Understanding MapReduce source code : Flush operations
 To: mapreduce-u...@hadoop.apache.org
 
 
 Hi,
 
 I am using hadoop/ map reduce for aout 2.5 years. I want to understand the 
 internals of the hadoop source code. 
 
 Let me put my requirement very clear.
 
 I want to have a look at the code where of flush operations that happens 
 after the reduce phase.
 
 Reducer writes the output to OutputFormat which inturn pushes that to memory 
 and once it reaches 90% of chunk size it starts to flush the reducer output. 
 
 I essentially want to look at the code of that flushing operation.
 
 
 
 
 Regards,
 Nagarjuna K
 
 
 
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.
 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: What makes a map to fail?

2014-01-06 Thread Saeed Adel Mehraban
When I click on individual maps logs, it says Aggregation is not enabled.
Try the nodemanager at slave1-machine:60933
How could I enable aggregation?


On Sun, Jan 5, 2014 at 1:21 PM, Harsh J ha...@cloudera.com wrote:

 Every failed task typically carries a diagnostic message and a set of
 logs for you to investigate what caused it to fail. Try visiting the
 task's logs on the JT UI by clicking through individual failed
 attempts to find the reason of its failure.

 On Sun, Jan 5, 2014 at 11:03 PM, Saeed Adel Mehraban s.ade...@gmail.com
 wrote:
  Hi all,
  My task jobs are failing due to many failed maps. I want to know what
 makes
  a map to fail? Is it something like exceptions or what?



 --
 Harsh J