Re: Which branch for my patch?

2011-12-01 Thread Niels Basjes
Thanks,

I'll get busy creating a new patch over the next few days.

Niels Basjes

On Wed, Nov 30, 2011 at 18:51, Eli Collins e...@cloudera.com wrote:

 Hey Niels,

 Thanks for contributing.  The nest place to contribute new features is
 to trunk. It's currently an easy merge from trunk to branch 23 to get
 it in a 23.x release (you can set the jira's target version to 23.1 to
 indicate this).

 Your patch based on the old structure would be useful for backporting
 this feature from trunk to a release with the old structure (eg 1.x,
 0.22). To request inclusion in a 1.x release set the target version to
 1.1.0 (and generate a patch against branch-1). To request inclusion in
 0.22 set target version to 0.22.0 (and generate a patch against
 branch-0.22).

 Thanks,
 Eli

 On Wed, Nov 30, 2011 at 8:23 AM, Niels Basjes ni...@basjes.nl wrote:
  Hi all,
 
  A while ago I created a feature for Hadoop and submitted this to be
  included (HADOOP-7076) .
  Around the same time the MRv2 started happening and the entire source
 tree
  was restructured.
 
  At this moment I'm prepared to change the patch I created earlier so I
 can
  submit it again for your consideration.
 
  Caused by the email about the new branches (branch-1 and branch-1.0) I'm
 a
  bit puzzled at this moment where to start.
 
  I see the mentioned branches and the trunk at probable starting points.
 
  As far as I understand the repository structure the branch-1 is the basis
  for the old style Hadoop and the trunk is the basis for the yarn
 Hadoop.
 
  For which branch of the source tree should I make my changes so you guys
  will reevaluate it for inclusion?
 
  Thanks.
 
  --
  Best regards / Met vriendelijke groeten,
 
  Niels Basjes




-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


[jira] [Created] (HADOOP-7876) Allow access to BlockKey/DelegationKey endoded key for RPC over protobuf

2011-12-01 Thread Suresh Srinivas (Created) (JIRA)
Allow access to BlockKey/DelegationKey endoded key for RPC over protobuf


 Key: HADOOP-7876
 URL: https://issues.apache.org/jira/browse/HADOOP-7876
 Project: Hadoop Common
  Issue Type: New Feature
  Components: ipc
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.24.0


In order to support RPC over protobuf, the BlockKey needs to provide access to 
encoded key. The byte[] encoded key will be transported over protobuf as 
byte[], instead of SecretKey.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Jenkins: Hadoop-Common-0.23-Build #82

2011-12-01 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-0.23-Build/82/changes

Changes:

[mahadev] MAPREDUCE-3452. fifoscheduler web ui page always shows 0% used for 
the queue. (Jonathan Eagles via mahadev) - Merging r1208999 from trunk

[mahadev] MAPREDUCE-3463. Second AM fails to recover properly when first AM is 
killed with java.lang.IllegalArgumentException causing lost job. (Siddharth 
Seth via mahadev) - Merging r1208994 from trunk

[jitendra] Merged r1208926 from trunk for HADOOP-7854.

[mahadev] MAPREDUCE-3488. Streaming jobs are failing because the main class 
isnt set in the pom files. (mahadev) - Merging r1208796 from trunk

[tucu] Merge -r 1208767:1208768 from trunk to branch. FIXES: MAPREDUCE-3477

[tucu] Merge -r 1208750:1208751 from trunk to branch. FIXES: HADOOP-7853

--
[...truncated 8043 lines...]
[INFO] Installing 
https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-dist/target/hadoop-dist-0.23.1-SNAPSHOT.jar
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-dist/0.23.1-SNAPSHOT/hadoop-dist-0.23.1-SNAPSHOT.jar
[INFO] Installing 
https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-dist/pom.xml
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-dist/0.23.1-SNAPSHOT/hadoop-dist-0.23.1-SNAPSHOT.pom
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop Main  SUCCESS [1.228s]
[INFO] Apache Hadoop Project POM . SUCCESS [0.383s]
[INFO] Apache Hadoop Annotations . SUCCESS [1.293s]
[INFO] Apache Hadoop Project Dist POM  SUCCESS [0.361s]
[INFO] Apache Hadoop Assemblies .. SUCCESS [0.163s]
[INFO] Apache Hadoop Auth  SUCCESS [2.040s]
[INFO] Apache Hadoop Auth Examples ... SUCCESS [1.007s]
[INFO] Apache Hadoop Common .. SUCCESS [24.999s]
[INFO] Apache Hadoop Common Project .. SUCCESS [0.028s]
[INFO] Apache Hadoop HDFS  SUCCESS [20.155s]
[INFO] Apache Hadoop HDFS Project  SUCCESS [0.030s]
[INFO] hadoop-yarn ... SUCCESS [0.118s]
[INFO] hadoop-yarn-api ... SUCCESS [6.767s]
[INFO] hadoop-yarn-common  SUCCESS [9.124s]
[INFO] hadoop-yarn-server  SUCCESS [0.065s]
[INFO] hadoop-yarn-server-common . SUCCESS [2.889s]
[INFO] hadoop-yarn-server-nodemanager  SUCCESS [5.512s]
[INFO] hadoop-yarn-server-web-proxy .. SUCCESS [2.508s]
[INFO] hadoop-yarn-server-resourcemanager  SUCCESS [6.705s]
[INFO] hadoop-yarn-server-tests .. SUCCESS [0.929s]
[INFO] hadoop-mapreduce-client ... SUCCESS [0.052s]
[INFO] hadoop-mapreduce-client-core .. SUCCESS [10.496s]
[INFO] hadoop-yarn-applications .. SUCCESS [0.059s]
[INFO] hadoop-yarn-applications-distributedshell . SUCCESS [1.883s]
[INFO] hadoop-yarn-site .. SUCCESS [0.102s]
[INFO] hadoop-mapreduce-client-common  SUCCESS [6.297s]
[INFO] hadoop-mapreduce-client-shuffle ... SUCCESS [1.564s]
[INFO] hadoop-mapreduce-client-app ... SUCCESS [5.824s]
[INFO] hadoop-mapreduce-client-hs  SUCCESS [2.345s]
[INFO] hadoop-mapreduce-client-jobclient . SUCCESS [2.595s]
[INFO] Apache Hadoop MapReduce Examples .. SUCCESS [2.838s]
[INFO] hadoop-mapreduce .. SUCCESS [0.081s]
[INFO] Apache Hadoop MapReduce Streaming . SUCCESS [3.162s]
[INFO] Apache Hadoop Tools ... SUCCESS [0.050s]
[INFO] Apache Hadoop Distribution  SUCCESS [0.092s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 2:04.838s
[INFO] Finished at: Thu Dec 01 09:05:11 UTC 2011
[INFO] Final Memory: 151M/914M
[INFO] 
+ cd hadoop-common-project
+ /home/jenkins/tools/maven/latest/bin/mvn clean verify checkstyle:checkstyle 
findbugs:findbugs -DskipTests -Pdist -Dtar -Psrc -Pnative -Pdocs
[INFO] Scanning for projects...
[INFO] 
[INFO] Reactor Build Order:
[INFO] 
[INFO] Apache Hadoop Annotations
[INFO] Apache Hadoop Auth
[INFO] Apache Hadoop Auth Examples
[INFO] Apache Hadoop Common
[INFO] Apache Hadoop Common Project
[INFO] 

RE: Hadoop - non disk based sorting?

2011-12-01 Thread Ravi teja ch n v
Hi Mingxi ,

So, why when map outputs are huge, reducer will not able to copy them?

The Reducer  will copy the Map output into its inmemory buffer. When the 
Reducer JVM doesnt have enough memory to accomodate the 
Map output, then it leads to OutOfMemoryException.

Can you please kindly explain what's the function of mapred.child.java.opts? 
how does it relate to copy?

The Maps and Reducers will be launched in separate child JVMs launched at the 
Tasktrackers.
When the Tasktracker launches the Map or Reduce JVMs, it uses the 
mapred.child.java.opts as JVM arguments for the new child JVMs.

Regards,
Ravi Teja

From: Mingxi Wu [mingxi...@turn.com]
Sent: 01 December 2011 12:37:54
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Thanks Ravi.

So, why when map outputs are huge, reducer will not able to copy them?

Can you please kindly explain what's the function of mapred.child.java.opts? 
how does it relate to copy?

Thank you,

Mingxi

-Original Message-
From: Ravi teja ch n v [mailto:raviteja.c...@huawei.com]
Sent: Tuesday, November 29, 2011 9:46 PM
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Hi Mingxi,

From your stacktrace, I understand that the OutOfMemoryError has actually 
occured while copying the MapOutputs, not while sorting them.

Since your Mapoutputs are huge and your reducer does have enough heap memory, 
you got the problem.
When you have made the reducers to 200, your Map outputs have got partitioned 
amoung 200 reducers, so you didnt get this problem.

By setting the max memory of your reducer with mapred.child.java.opts, you can 
get over this problem.

Regards,
Ravi teja



From: Mingxi Wu [mingxi...@turn.com]
Sent: 30 November 2011 05:14:49
To: common-dev@hadoop.apache.org
Subject: Hadoop - non disk based sorting?

Hi,

I have a question regarding the shuffle phase of reducer.

It appears when there are large map output (in my case, 5 billion records), I 
will have out of memory Error like below.

Error: java.lang.OutOfMemoryError: Java heap space at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not 
constraint by memory.
So, why will user run into this outofmemory error? After I increased my number 
of reducers from 100 to 200, the problem went away.

Any input regarding this memory issue would be appreciated!

Thanks,

Mingxi


Re: how to check which scheduler is currently running on hadoop

2011-12-01 Thread shivam tiwari
Is there any other way by which we can check that other scheduler is
actually running and not the default one, rather than checking
mapred-site.xml file

On Thu, Dec 1, 2011 at 2:10 AM, Praveen Sripati praveensrip...@gmail.comwrote:

 Hi,

 Check the mapreduce.jobtracker.taskscheduler property in the
 mapred-site.xml, if it's not set then check what it defaults to.

 Praveen

 On Thu, Dec 1, 2011 at 5:07 AM, shivam tiwari
 shivam.tiwari2...@gmail.comwrote:

  Hi,
 
  please tell me how I can check which scheduler is currently running on
  hadoop
 
  --
  Regards
 
  Shivam Tiwari
  Graduate student
  CISE Department
  University of  Florida,
  Gainesville FL 32611
  Email - shi...@cise.ufl.edu
 shivam.tiwari2...@gmail.com
 




-- 
Regards

Shivam Tiwari
Graduate student
CISE Department
University of  Florida,
Gainesville FL 32611
Email - shi...@cise.ufl.edu
shivam.tiwari2...@gmail.com


Re: how to check which scheduler is currently running on hadoop

2011-12-01 Thread Harsh J
Shivam,

Visit JobTrackerHost/conf to see the taskScheduler config that's in effect on 
the runtime.
Visit JobTrackerHost/scheduler to see a scheduler web UI if its put one up 
(default does not provide any, others may).

May I ask why you're looking to confirm? Is something not working the way you 
expect it to?

On 01-Dec-2011, at 5:33 PM, shivam tiwari wrote:

 Is there any other way by which we can check that other scheduler is
 actually running and not the default one, rather than checking
 mapred-site.xml file
 
 On Thu, Dec 1, 2011 at 2:10 AM, Praveen Sripati 
 praveensrip...@gmail.comwrote:
 
 Hi,
 
 Check the mapreduce.jobtracker.taskscheduler property in the
 mapred-site.xml, if it's not set then check what it defaults to.
 
 Praveen
 
 On Thu, Dec 1, 2011 at 5:07 AM, shivam tiwari
 shivam.tiwari2...@gmail.comwrote:
 
 Hi,
 
 please tell me how I can check which scheduler is currently running on
 hadoop
 
 --
 Regards
 
 Shivam Tiwari
 Graduate student
 CISE Department
 University of  Florida,
 Gainesville FL 32611
 Email - shi...@cise.ufl.edu
   shivam.tiwari2...@gmail.com
 
 
 
 
 
 -- 
 Regards
 
 Shivam Tiwari
 Graduate student
 CISE Department
 University of  Florida,
 Gainesville FL 32611
 Email - shi...@cise.ufl.edu
shivam.tiwari2...@gmail.com



Re: how to check which scheduler is currently running on hadoop

2011-12-01 Thread Steve Loughran

On 01/12/11 12:03, shivam tiwari wrote:

Is there any other way by which we can check that other scheduler is
actually running and not the default one, rather than checking
mapred-site.xml file



If you are really worried you could kill -QUIT the JT process and look 
in the stack traces.


Re: Hadoop - non disk based sorting?

2011-12-01 Thread Todd Lipcon
I've seen this issue in jobs with many many map tasks and small
reducer heaps. There is some heap space needed for the actual map
completion events, etc, and that isn't accounted for in determining
when to spill the fetch outputs to disk. Would be a nice patch to add
code that calculates the in-memory size of these objects during the
fetch phase and subtracts them from the heap size before multiplying
out the spill percentages, etc.

-Todd

On Thu, Dec 1, 2011 at 8:14 AM, Robert Evans ev...@yahoo-inc.com wrote:
 Mingxi,

 My understanding was that just like with the maps that when a reducer's in 
 memory buffer fills up it too will spill to disk as part of the sort.  In 
 fact I think it uses the exact same code for doing the sort as the map does.  
 There may be an issue where your sort buffer is some how too large for the 
 amount of heap that you requested as part of the mapred.child.java.opts.  I 
 have personally run a reduce that took in 300GB of data, which it 
 successfully sorted, to test this very thing.  And no the box did not have 
 300 GB of RAM.

 --Bobby Evans

 On 12/1/11 4:12 AM, Ravi teja ch n v raviteja.c...@huawei.com wrote:

 Hi Mingxi ,

So, why when map outputs are huge, reducer will not able to copy them?

 The Reducer  will copy the Map output into its inmemory buffer. When the 
 Reducer JVM doesnt have enough memory to accomodate the
 Map output, then it leads to OutOfMemoryException.

Can you please kindly explain what's the function of mapred.child.java.opts? 
how does it relate to copy?

 The Maps and Reducers will be launched in separate child JVMs launched at the 
 Tasktrackers.
 When the Tasktracker launches the Map or Reduce JVMs, it uses the 
 mapred.child.java.opts as JVM arguments for the new child JVMs.

 Regards,
 Ravi Teja
 
 From: Mingxi Wu [mingxi...@turn.com]
 Sent: 01 December 2011 12:37:54
 To: common-dev@hadoop.apache.org
 Subject: RE: Hadoop - non disk based sorting?

 Thanks Ravi.

 So, why when map outputs are huge, reducer will not able to copy them?

 Can you please kindly explain what's the function of mapred.child.java.opts? 
 how does it relate to copy?

 Thank you,

 Mingxi

 -Original Message-
 From: Ravi teja ch n v [mailto:raviteja.c...@huawei.com]
 Sent: Tuesday, November 29, 2011 9:46 PM
 To: common-dev@hadoop.apache.org
 Subject: RE: Hadoop - non disk based sorting?

 Hi Mingxi,

 From your stacktrace, I understand that the OutOfMemoryError has actually 
 occured while copying the MapOutputs, not while sorting them.

 Since your Mapoutputs are huge and your reducer does have enough heap memory, 
 you got the problem.
 When you have made the reducers to 200, your Map outputs have got partitioned 
 amoung 200 reducers, so you didnt get this problem.

 By setting the max memory of your reducer with mapred.child.java.opts, you 
 can get over this problem.

 Regards,
 Ravi teja


 
 From: Mingxi Wu [mingxi...@turn.com]
 Sent: 30 November 2011 05:14:49
 To: common-dev@hadoop.apache.org
 Subject: Hadoop - non disk based sorting?

 Hi,

 I have a question regarding the shuffle phase of reducer.

 It appears when there are large map output (in my case, 5 billion records), I 
 will have out of memory Error like below.

 Error: java.lang.OutOfMemoryError: Java heap space at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
  at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452)
  at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
  at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

 However, I thought the shuffling phase is using disk-based sort, which is not 
 constraint by memory.
 So, why will user run into this outofmemory error? After I increased my 
 number of reducers from 100 to 200, the problem went away.

 Any input regarding this memory issue would be appreciated!

 Thanks,

 Mingxi




-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Resolved] (HADOOP-7877) Federation: update Balancer documentation

2011-12-01 Thread Tsz Wo (Nicholas), SZE (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HADOOP-7877.


   Resolution: Fixed
Fix Version/s: 0.23.1
   0.24.0
 Hadoop Flags: Reviewed

I have committed this.

 Federation: update Balancer documentation
 -

 Key: HADOOP-7877
 URL: https://issues.apache.org/jira/browse/HADOOP-7877
 Project: Hadoop Common
  Issue Type: Task
  Components: documentation
Affects Versions: 0.23.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.24.0, 0.23.1

 Attachments: h1685_20111201.patch, h1685_20111201b.patch, screenshot 
 for the updated cli doc.jpg


 Update Balancer documentation for the new balancing policy and CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: how to check which scheduler is currently running on hadoop

2011-12-01 Thread Arun Murthy
Depending on the version of Hadoop you are using, you can goto
http://jthost:50030/scheduler to check.

This will work from hadoop-0.20.203 onwards.

Arun

Sent from my iPhone

On Nov 30, 2011, at 5:38 PM, shivam tiwari shivam.tiwari2...@gmail.com wrote:

 Hi,

 please tell me how I can check which scheduler is currently running on
 hadoop

 --
 Regards

 Shivam Tiwari
 Graduate student
 CISE Department
 University of  Florida,
 Gainesville FL 32611
 Email - shi...@cise.ufl.edu
shivam.tiwari2...@gmail.com


Re: Snow Leopard Compilation Help

2011-12-01 Thread Alejandro Abdelnur
Ron,

Hadoop native currently does not compile in Mac OS X. There have been some
JIRAs to fix that, but nobody took on them.

Thanks.

Alejandro

On Thu, Dec 1, 2011 at 3:55 PM, Ronald Petty ronald.pe...@gmail.com wrote:

 Hello,

 I am new to Hadoop development and seem to be stuck on building with Snow
 Leopard.  Here is what is going on:

   1. svn checkout
 http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-trunk
   2. wget http://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
   3. tar ... proto...gz; cd proto...
   4. ./configure --prefix=/hadoop/contribute/protobuf/;make;make install
   5. export PATH=/hadoop/contribute/protobuf/bin/:$PATH
   6. cd hadoop-trunk
   7. mvn clean
   8. mvn install -Dmaven.test.skip.exec=true
   9. mvn assembly:assembly -Pnative
   10. Error

 [INFO] --- make-maven-plugin:1.0-beta-1:make-install (compile) @
 hadoop-common ---
 [INFO] /bin/sh ./libtool  --tag=CC   --mode=compile gcc -DHAVE_CONFIG_H -I.
  -I/Library/Java/Home/include

 -I/hadoop/contribute/hadoop-trunk/hadoop-common-project/hadoop-common/target/native/src

 -I/hadoop/contribute/hadoop-trunk/hadoop-common-project/hadoop-common/target/native/javah
 -I/usr/local/include -g -Wall -fPIC -O2 -m64 -g -O2 -MT ZlibCompressor.lo
 -MD -MP -MF .deps/ZlibCompressor.Tpo -c -o ZlibCompressor.lo `test -f
 'src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c' || echo
 './'`src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c
 [INFO] libtool: compile:  gcc -DHAVE_CONFIG_H -I.
 -I/Library/Java/Home/include

 -I/hadoop/contribute/hadoop-trunk/hadoop-common-project/hadoop-common/target/native/src

 -I/hadoop/contribute/hadoop-trunk/hadoop-common-project/hadoop-common/target/native/javah
 -I/usr/local/include -g -Wall -fPIC -O2 -m64 -g -O2 -MT ZlibCompressor.lo
 -MD -MP -MF .deps/ZlibCompressor.Tpo -c
 src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c  -fno-common -DPIC
 -o .libs/ZlibCompressor.o
 [INFO] src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c: In function
 'Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_initIDs':
 [INFO] src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c:71: error:
 'libnotfound' undeclared (first use in this function)
 [INFO] src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c:71: error:
 (Each undeclared identifier is reported only once
 [INFO] src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c:71: error:
 for each function it appears in.)
 [INFO] make: *** [ZlibCompressor.lo] Error 1
 [INFO]
 
 [INFO] Reactor Summary:
 [INFO]
 [INFO] Apache Hadoop Main  FAILURE
 [46.914s]
 [INFO] Apache Hadoop Project POM . SKIPPED
 [INFO] Apache Hadoop Annotations . SKIPPED

 

 I looked around and found this
 http://wiki.apache.org/hadoop/UsingLzoCompression.  I tried to mess with
 lzo via MacPort.  Seems to be there, but I am not certain where to go from
 here.

 Also, how do you search the mail archives (
 http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/)?

 Thanks for the help.

 Kindest regards.

 Ron



Re: Snow Leopard Compilation Help

2011-12-01 Thread Milind.Bhandarkar
Ronald,

Please take a look at https://issues.apache.org/jira/browse/HADOOP-7147,
and https://issues.apache.org/jira/browse/HADOOP-7824

- milind


On 12/1/11 5:31 PM, Ronald Petty ronald.pe...@gmail.com wrote:

Alejandro,

I suppose I will give it a go since that is the computer I have.  I tried
searching on JIRA for issues mac related but its hard for me to tell which
ones might be related or not.  Should I just figure it out and email the
list with my fix (if I find one?)

Ron

On Thu, Dec 1, 2011 at 7:23 PM, Alejandro Abdelnur
t...@cloudera.comwrote:

 Ron,

 Hadoop native currently does not compile in Mac OS X. There have been
some
 JIRAs to fix that, but nobody took on them.

 Thanks.

 Alejandro

 On Thu, Dec 1, 2011 at 3:55 PM, Ronald Petty ronald.pe...@gmail.com
 wrote:

  Hello,
 
  I am new to Hadoop development and seem to be stuck on building with
Snow
  Leopard.  Here is what is going on:
 
1. svn checkout
  http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-trunk
2. wget http://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
3. tar ... proto...gz; cd proto...
4. ./configure --prefix=/hadoop/contribute/protobuf/;make;make
install
5. export PATH=/hadoop/contribute/protobuf/bin/:$PATH
6. cd hadoop-trunk
7. mvn clean
8. mvn install -Dmaven.test.skip.exec=true
9. mvn assembly:assembly -Pnative
10. Error
 
  [INFO] --- make-maven-plugin:1.0-beta-1:make-install (compile) @
  hadoop-common ---
  [INFO] /bin/sh ./libtool  --tag=CC   --mode=compile gcc
-DHAVE_CONFIG_H
 -I.
   -I/Library/Java/Home/include
 
 
 
-I/hadoop/contribute/hadoop-trunk/hadoop-common-project/hadoop-common/tar
get/native/src
 
 
 
-I/hadoop/contribute/hadoop-trunk/hadoop-common-project/hadoop-common/tar
get/native/javah
  -I/usr/local/include -g -Wall -fPIC -O2 -m64 -g -O2 -MT
ZlibCompressor.lo
  -MD -MP -MF .deps/ZlibCompressor.Tpo -c -o ZlibCompressor.lo `test -f
  'src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c' || echo
  './'`src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c
  [INFO] libtool: compile:  gcc -DHAVE_CONFIG_H -I.
  -I/Library/Java/Home/include
 
 
 
-I/hadoop/contribute/hadoop-trunk/hadoop-common-project/hadoop-common/tar
get/native/src
 
 
 
-I/hadoop/contribute/hadoop-trunk/hadoop-common-project/hadoop-common/tar
get/native/javah
  -I/usr/local/include -g -Wall -fPIC -O2 -m64 -g -O2 -MT
ZlibCompressor.lo
  -MD -MP -MF .deps/ZlibCompressor.Tpo -c
  src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c  -fno-common
 -DPIC
  -o .libs/ZlibCompressor.o
  [INFO] src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c: In
 function
  'Java_org_apache_hadoop_io_compress_zlib_ZlibCompressor_initIDs':
  [INFO] src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c:71:
error:
  'libnotfound' undeclared (first use in this function)
  [INFO] src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c:71:
error:
  (Each undeclared identifier is reported only once
  [INFO] src/org/apache/hadoop/io/compress/zlib/ZlibCompressor.c:71:
error:
  for each function it appears in.)
  [INFO] make: *** [ZlibCompressor.lo] Error 1
  [INFO]
  

  [INFO] Reactor Summary:
  [INFO]
  [INFO] Apache Hadoop Main  FAILURE
  [46.914s]
  [INFO] Apache Hadoop Project POM . SKIPPED
  [INFO] Apache Hadoop Annotations . SKIPPED
 
  
 
  I looked around and found this
  http://wiki.apache.org/hadoop/UsingLzoCompression.  I tried to mess
with
  lzo via MacPort.  Seems to be there, but I am not certain where to go
 from
  here.
 
  Also, how do you search the mail archives (
  http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/)?
 
  Thanks for the help.
 
  Kindest regards.
 
  Ron
 




RE: Hadoop - non disk based sorting?

2011-12-01 Thread Ravi teja ch n v
Hi Bobby,

 You are right that the Map outputs when copied will be spilled to the disk, 
but in case the the reducer cannot accomodate the copy inmemory. 
(shuffleInMemory and shuffleToDisk are chosen by rammanager based on inmemory 
size)

 But according to the stack trace provided by Mingxi, 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
  

The problem has occured,after the inmemory copy was chosen, 

Regards,
Ravi Teja

From: Robert Evans [ev...@yahoo-inc.com]
Sent: 01 December 2011 21:44:50
To: common-dev@hadoop.apache.org
Subject: Re: Hadoop - non disk based sorting?

Mingxi,

My understanding was that just like with the maps that when a reducer's in 
memory buffer fills up it too will spill to disk as part of the sort.  In fact 
I think it uses the exact same code for doing the sort as the map does.  There 
may be an issue where your sort buffer is some how too large for the amount of 
heap that you requested as part of the mapred.child.java.opts.  I have 
personally run a reduce that took in 300GB of data, which it successfully 
sorted, to test this very thing.  And no the box did not have 300 GB of RAM.

--Bobby Evans

On 12/1/11 4:12 AM, Ravi teja ch n v raviteja.c...@huawei.com wrote:

Hi Mingxi ,

So, why when map outputs are huge, reducer will not able to copy them?

The Reducer  will copy the Map output into its inmemory buffer. When the 
Reducer JVM doesnt have enough memory to accomodate the
Map output, then it leads to OutOfMemoryException.

Can you please kindly explain what's the function of mapred.child.java.opts? 
how does it relate to copy?

The Maps and Reducers will be launched in separate child JVMs launched at the 
Tasktrackers.
When the Tasktracker launches the Map or Reduce JVMs, it uses the 
mapred.child.java.opts as JVM arguments for the new child JVMs.

Regards,
Ravi Teja

From: Mingxi Wu [mingxi...@turn.com]
Sent: 01 December 2011 12:37:54
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Thanks Ravi.

So, why when map outputs are huge, reducer will not able to copy them?

Can you please kindly explain what's the function of mapred.child.java.opts? 
how does it relate to copy?

Thank you,

Mingxi

-Original Message-
From: Ravi teja ch n v [mailto:raviteja.c...@huawei.com]
Sent: Tuesday, November 29, 2011 9:46 PM
To: common-dev@hadoop.apache.org
Subject: RE: Hadoop - non disk based sorting?

Hi Mingxi,

From your stacktrace, I understand that the OutOfMemoryError has actually 
occured while copying the MapOutputs, not while sorting them.

Since your Mapoutputs are huge and your reducer does have enough heap memory, 
you got the problem.
When you have made the reducers to 200, your Map outputs have got partitioned 
amoung 200 reducers, so you didnt get this problem.

By setting the max memory of your reducer with mapred.child.java.opts, you can 
get over this problem.

Regards,
Ravi teja



From: Mingxi Wu [mingxi...@turn.com]
Sent: 30 November 2011 05:14:49
To: common-dev@hadoop.apache.org
Subject: Hadoop - non disk based sorting?

Hi,

I have a question regarding the shuffle phase of reducer.

It appears when there are large map output (in my case, 5 billion records), I 
will have out of memory Error like below.

Error: java.lang.OutOfMemoryError: Java heap space at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1592)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1452)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1301)
 at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1233)

However, I thought the shuffling phase is using disk-based sort, which is not 
constraint by memory.
So, why will user run into this outofmemory error? After I increased my number 
of reducers from 100 to 200, the problem went away.

Any input regarding this memory issue would be appreciated!

Thanks,

Mingxi