RE: Hadoop - non disk based sorting?

2011-11-29 Thread Ravi teja ch n v
Hi Mingxi,

>From your stacktrace, I understand that the OutOfMemoryError has actually 
>occured while copying the MapOutputs, not while sorting them.

Since your Mapoutputs are huge and your reducer does have enough heap memory, 
you got the problem.
When you have made the reducers to 200, your Map outputs have got partitioned 
amoung 200 reducers, so you didnt get this problem.

By setting the max memory of your reducer with mapred.child.java.opts, you can 
get over this problem.

Regards,
Ravi teja



From: Mingxi Wu [mingxi...@turn.com]
Sent: 30 November 2011 05:14:49
To: common-dev@hadoop.apache.org
Subject: Hadoop - non disk based sorting?

Hi,

I have a question regarding the shuffle phase of reducer.

It appears when there are large map output (in my case, 5 billion records), I 
will have out of memory Error like below.

Error: java.lang.OutOfMemoryError: Java heap space at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not 
constraint by memory.
So, why will user run into this outofmemory error? After I increased my number 
of reducers from 100 to 200, the problem went away.

Any input regarding this memory issue would be appreciated!

Thanks,

Mingxi


RE: Hadoop - non disk based sorting?

2011-12-01 Thread Ravi teja ch n v
Hi Mingxi ,

>So, why when map outputs are huge, reducer will not able to copy them?

The Reducer  will copy the Map output into its inmemory buffer. When the 
Reducer JVM doesnt have enough memory to accomodate the 
Map output, then it leads to OutOfMemoryException.

>Can you please kindly explain what's the function of mapred.child.java.opts? 
>how does it relate to copy?

The Maps and Reducers will be launched in separate child JVMs launched at the 
Tasktrackers.
When the Tasktracker launches the Map or Reduce JVMs, it uses the 
mapred.child.java.opts as JVM arguments for the new child JVMs.

Regards,
Ravi Teja

From: Mingxi Wu [mingxi...@turn.com]
Sent: 01 December 2011 12:37:54
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Thanks Ravi.

So, why when map outputs are huge, reducer will not able to copy them?

Can you please kindly explain what's the function of mapred.child.java.opts? 
how does it relate to copy?

Thank you,

Mingxi

-Original Message-----
From: Ravi teja ch n v [mailto:raviteja.c...@huawei.com]
Sent: Tuesday, November 29, 2011 9:46 PM
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Hi Mingxi,

>From your stacktrace, I understand that the OutOfMemoryError has actually 
>occured while copying the MapOutputs, not while sorting them.

Since your Mapoutputs are huge and your reducer does have enough heap memory, 
you got the problem.
When you have made the reducers to 200, your Map outputs have got partitioned 
amoung 200 reducers, so you didnt get this problem.

By setting the max memory of your reducer with mapred.child.java.opts, you can 
get over this problem.

Regards,
Ravi teja



From: Mingxi Wu [mingxi...@turn.com]
Sent: 30 November 2011 05:14:49
To: common-dev@hadoop.apache.org
Subject: Hadoop - non disk based sorting?

Hi,

I have a question regarding the shuffle phase of reducer.

It appears when there are large map output (in my case, 5 billion records), I 
will have out of memory Error like below.

Error: java.lang.OutOfMemoryError: Java heap space at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not 
constraint by memory.
So, why will user run into this outofmemory error? After I increased my number 
of reducers from 100 to 200, the problem went away.

Any input regarding this memory issue would be appreciated!

Thanks,

Mingxi


RE: Hadoop - non disk based sorting?

2011-12-01 Thread Ravi teja ch n v
Hi Mingxi ,

>So, why when map outputs are huge, reducer will not able to copy them?

The Reducer  will copy the Map output into its inmemory buffer. When the 
Reducer JVM doesnt have enough memory to accomodate the
Map output, then it leads to OutOfMemoryException.

>Can you please kindly explain what's the function of mapred.child.java.opts? 
>how does it relate to copy?

The Maps and Reducers will be launched in separate child JVMs launched at the 
Tasktrackers.
When the Tasktracker launches the Map or Reduce JVMs, it uses the 
mapred.child.java.opts as JVM arguments for the new child JVMs.

Regards,
Ravi Teja

From: Mingxi Wu [mingxi...@turn.com]
Sent: 01 December 2011 12:37:54
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Thanks Ravi.

So, why when map outputs are huge, reducer will not able to copy them?

Can you please kindly explain what's the function of mapred.child.java.opts? 
how does it relate to copy?

Thank you,

Mingxi

-Original Message-----
From: Ravi teja ch n v [mailto:raviteja.c...@huawei.com]
Sent: Tuesday, November 29, 2011 9:46 PM
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Hi Mingxi,

>From your stacktrace, I understand that the OutOfMemoryError has actually 
>occured while copying the MapOutputs, not while sorting them.

Since your Mapoutputs are huge and your reducer does have enough heap memory, 
you got the problem.
When you have made the reducers to 200, your Map outputs have got partitioned 
amoung 200 reducers, so you didnt get this problem.

By setting the max memory of your reducer with mapred.child.java.opts, you can 
get over this problem.

Regards,
Ravi teja



From: Mingxi Wu [mingxi...@turn.com]
Sent: 30 November 2011 05:14:49
To: common-dev@hadoop.apache.org
Subject: Hadoop - non disk based sorting?

Hi,

I have a question regarding the shuffle phase of reducer.

It appears when there are large map output (in my case, 5 billion records), I 
will have out of memory Error like below.

Error: java.lang.OutOfMemoryError: Java heap space at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not 
constraint by memory.
So, why will user run into this outofmemory error? After I increased my number 
of reducers from 100 to 200, the problem went away.

Any input regarding this memory issue would be appreciated!

Thanks,

Mingxi


RE: Hadoop - non disk based sorting?

2011-12-01 Thread Ravi teja ch n v
Hi Bobby,

 You are right that the Map outputs when copied will be spilled to the disk, 
but in case the the reducer cannot accomodate the copy inmemory. 
(shuffleInMemory and shuffleToDisk are chosen by rammanager based on inmemory 
size)

 But according to the stack trace provided by Mingxi, 
 >org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
 > 

The problem has occured,after the inmemory copy was chosen, 

Regards,
Ravi Teja

From: Robert Evans [ev...@yahoo-inc.com]
Sent: 01 December 2011 21:44:50
To: common-dev@hadoop.apache.org
Subject: Re: Hadoop - non disk based sorting?

Mingxi,

My understanding was that just like with the maps that when a reducer's in 
memory buffer fills up it too will spill to disk as part of the sort.  In fact 
I think it uses the exact same code for doing the sort as the map does.  There 
may be an issue where your sort buffer is some how too large for the amount of 
heap that you requested as part of the mapred.child.java.opts.  I have 
personally run a reduce that took in 300GB of data, which it successfully 
sorted, to test this very thing.  And no the box did not have 300 GB of RAM.

--Bobby Evans

On 12/1/11 4:12 AM, "Ravi teja ch n v"  wrote:

Hi Mingxi ,

>So, why when map outputs are huge, reducer will not able to copy them?

The Reducer  will copy the Map output into its inmemory buffer. When the 
Reducer JVM doesnt have enough memory to accomodate the
Map output, then it leads to OutOfMemoryException.

>Can you please kindly explain what's the function of mapred.child.java.opts? 
>how does it relate to copy?

The Maps and Reducers will be launched in separate child JVMs launched at the 
Tasktrackers.
When the Tasktracker launches the Map or Reduce JVMs, it uses the 
mapred.child.java.opts as JVM arguments for the new child JVMs.

Regards,
Ravi Teja

From: Mingxi Wu [mingxi...@turn.com]
Sent: 01 December 2011 12:37:54
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Thanks Ravi.

So, why when map outputs are huge, reducer will not able to copy them?

Can you please kindly explain what's the function of mapred.child.java.opts? 
how does it relate to copy?

Thank you,

Mingxi

-----Original Message-
From: Ravi teja ch n v [mailto:raviteja.c...@huawei.com]
Sent: Tuesday, November 29, 2011 9:46 PM
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Hi Mingxi,

>From your stacktrace, I understand that the OutOfMemoryError has actually 
>occured while copying the MapOutputs, not while sorting them.

Since your Mapoutputs are huge and your reducer does have enough heap memory, 
you got the problem.
When you have made the reducers to 200, your Map outputs have got partitioned 
amoung 200 reducers, so you didnt get this problem.

By setting the max memory of your reducer with mapred.child.java.opts, you can 
get over this problem.

Regards,
Ravi teja



From: Mingxi Wu [mingxi...@turn.com]
Sent: 30 November 2011 05:14:49
To: common-dev@hadoop.apache.org
Subject: Hadoop - non disk based sorting?

Hi,

I have a question regarding the shuffle phase of reducer.

It appears when there are large map output (in my case, 5 billion records), I 
will have out of memory Error like below.

Error: java.lang.OutOfMemoryError: Java heap space at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not 
constraint by memory.
So, why will user run into this outofmemory error? After I increased my number 
of reducers from 100 to 200, the problem went away.

Any input regarding this memory issue would be appreciated!

Thanks,

Mingxi


407 error while building Hadoop

2012-01-12 Thread Ravi teja ch n v


Hi Team,

I have got a problem building Hadoop with the proxy settings.
My linux machine has maven proxy settings configured and working fine, but
the build fails with the following error, inspite of passing the username and 
pwd.

mvn package -Pdist -Dtar -Dhttp.proxyHost=***.com  -Dhttp.proxyPort=8080 
-Dhttp.proxyUser=  -Dhttp.proxyPass=



main:
[INFO] Executed tasks
[INFO]
[INFO] --- maven-antrun-plugin:1.6:run (xprepare-package-hadoop-daemon) @ 
hadoop-hdfs ---
[INFO] Executing tasks

main:
 [get] Getting: 
http://archive.apache.org/dist/commons/daemon/binaries/1.0.3/linux/commons-daemon-1.0.3-bin-linux-i686.tar.gz
 [get] To: 
/home/isap/.hudson/jobs/Hadoop/workspace/hadoop-hdfs-project/hadoop-hdfs/downloads/commons-daemon-1.0.3-bin-linux-i686.tar.gz
 [get] Error opening connection java.io.IOException: Server returned HTTP 
response code: 407 for URL: 
http://archive.apache.org/dist/commons/daemon/binaries/1.0.3/linux/commons-daemon-1.0.3-bin-linux-i686.tar.gz
 [get] Error opening connection java.io.IOException: Server returned HTTP 
response code: 407 for URL: 
http://archive.apache.org/dist/commons/daemon/binaries/1.0.3/linux/commons-daemon-1.0.3-bin-linux-i686.tar.gz
 [get] Error opening connection java.io.IOException: Server returned HTTP 
response code: 407 for URL: 
http://archive.apache.org/dist/commons/daemon/binaries/1.0.3/linux/commons-daemon-1.0.3-bin-linux-i686.tar.gz
 [get] Can't get 
http://archive.apache.org/dist/commons/daemon/binaries/1.0.3/linux/commons-daemon-1.0.3-bin-linux-i686.tar.gz
 to 
/home/isap/.hudson/jobs/Hadoop/workspace/hadoop-hdfs-project/hadoop-hdfs/downloads/commons-daemon-1.0.3-bin-linux-i686.tar.gz

[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Hadoop HDFS  FAILURE [1:36.333s]
[INFO] Apache Hadoop HttpFS .. SKIPPED
[INFO] Apache Hadoop HDFS BookKeeper Journal . SKIPPED





Any help will be highly appreciable...



Thanks and Regards,

Ravi Teja




[jira] [Created] (HADOOP-13124) Support weights/priority for user in Faircallqueue

2016-05-10 Thread Ravi Teja Ch N V (JIRA)
Ravi Teja Ch N V created HADOOP-13124:
-

 Summary: Support weights/priority for user in Faircallqueue
 Key: HADOOP-13124
 URL: https://issues.apache.org/jira/browse/HADOOP-13124
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 2.6.0
Reporter: Ravi Teja Ch N V


Fair call queue evaluates fairness across all the user submissions.
This might be unfair for the users with more priority/importance than others 
and users whose usage is higher than others.

Having priorities or weights in faircallqueue will enable the weighted fair 
share , which will enable these scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-7926) Test-patch should have maven.test.failure.ignore,maven.test.error.ignore to run all the tests even in case of failure/error.

2011-12-14 Thread Ravi Teja Ch N V (Created) (JIRA)
Test-patch should have maven.test.failure.ignore,maven.test.error.ignore to run 
all the tests even in case of failure/error.


 Key: HADOOP-7926
 URL: https://issues.apache.org/jira/browse/HADOOP-7926
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.0
Reporter: Ravi Teja Ch N V


This approach will help to know all the failures even if some testcase fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira