[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Status: Open  (was: Patch Available)

 Make Gridmix emulate usage of Distributed Cache files
 -

 Key: MAPREDUCE-2407
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/gridmix
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.23.0

 Attachments: 2407.patch, 2407.v1.patch


 Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
 emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Attachment: 2407.v1.1.patch

Attaching new patch updating Amar's offline minor comments.

 Make Gridmix emulate usage of Distributed Cache files
 -

 Key: MAPREDUCE-2407
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/gridmix
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.23.0

 Attachments: 2407.patch, 2407.v1.1.patch, 2407.v1.patch


 Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
 emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Status: Patch Available  (was: Open)

 Make Gridmix emulate usage of Distributed Cache files
 -

 Key: MAPREDUCE-2407
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/gridmix
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.23.0

 Attachments: 2407.patch, 2407.v1.1.patch, 2407.v1.patch


 Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
 emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037792#comment-13037792
 ] 

Amar Kamat commented on MAPREDUCE-2407:
---

Patch looks good to me. +1

 Make Gridmix emulate usage of Distributed Cache files
 -

 Key: MAPREDUCE-2407
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/gridmix
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.23.0

 Attachments: 2407.patch, 2407.v1.1.patch, 2407.v1.patch


 Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
 emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2492) [MAPREDUCE] The new MapReduce API should make available task's progress to the task

2011-05-23 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-2492:
--

Attachment: MAPREDUCE-2492-v1.6.patch

Updated the testcase such that reduce() is called multiple times. test-patch 
and the modified testcases pass on my local box.

 [MAPREDUCE] The new MapReduce API should make available task's progress to 
 the task
 ---

 Key: MAPREDUCE-2492
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2492
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: MAPREDUCE-2492-v1.3.patch, MAPREDUCE-2492-v1.4.patch, 
 MAPREDUCE-2492-v1.5.patch, MAPREDUCE-2492-v1.6.patch


 There is no way to get the task's current progress in the new MapReduce API. 
 It would be nice to make it available so that the task (map/reduce) can use 
 it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2492) [MAPREDUCE] The new MapReduce API should make available task's progress to the task

2011-05-23 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-2492:
--

Status: Patch Available  (was: Open)

Running the latest patch (with NLineInputFormat) through Hudson.

 [MAPREDUCE] The new MapReduce API should make available task's progress to 
 the task
 ---

 Key: MAPREDUCE-2492
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2492
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: MAPREDUCE-2492-v1.3.patch, MAPREDUCE-2492-v1.4.patch, 
 MAPREDUCE-2492-v1.5.patch, MAPREDUCE-2492-v1.6.patch


 There is no way to get the task's current progress in the new MapReduce API. 
 It would be nice to make it available so that the task (map/reduce) can use 
 it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037892#comment-13037892
 ] 

Hadoop QA commented on MAPREDUCE-2407:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12480093/2407.v1.1.patch
  against trunk revision 1125599.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 10 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/291//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/291//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/291//console

This message is automatically generated.

 Make Gridmix emulate usage of Distributed Cache files
 -

 Key: MAPREDUCE-2407
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/gridmix
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.23.0

 Attachments: 2407.patch, 2407.v1.1.patch, 2407.v1.patch


 Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
 emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2492) [MAPREDUCE] The new MapReduce API should make available task's progress to the task

2011-05-23 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-2492:
--

Status: Open  (was: Patch Available)

 [MAPREDUCE] The new MapReduce API should make available task's progress to 
 the task
 ---

 Key: MAPREDUCE-2492
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2492
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: MAPREDUCE-2492-v1.3.patch, MAPREDUCE-2492-v1.4.patch, 
 MAPREDUCE-2492-v1.5.patch, MAPREDUCE-2492-v1.6.patch


 There is no way to get the task's current progress in the new MapReduce API. 
 It would be nice to make it available so that the task (map/reduce) can use 
 it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2492) [MAPREDUCE] The new MapReduce API should make available task's progress to the task

2011-05-23 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-2492:
--

Attachment: MAPREDUCE-2492-v1.7.patch

Found some issue with stale left over files in the shared directory. Modified 
the patch to use a unique folder for each test case. 

 [MAPREDUCE] The new MapReduce API should make available task's progress to 
 the task
 ---

 Key: MAPREDUCE-2492
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2492
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: MAPREDUCE-2492-v1.3.patch, MAPREDUCE-2492-v1.4.patch, 
 MAPREDUCE-2492-v1.5.patch, MAPREDUCE-2492-v1.6.patch, 
 MAPREDUCE-2492-v1.7.patch


 There is no way to get the task's current progress in the new MapReduce API. 
 It would be nice to make it available so that the task (map/reduce) can use 
 it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2492) [MAPREDUCE] The new MapReduce API should make available task's progress to the task

2011-05-23 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-2492:
--

Status: Patch Available  (was: Open)

 [MAPREDUCE] The new MapReduce API should make available task's progress to 
 the task
 ---

 Key: MAPREDUCE-2492
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2492
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: MAPREDUCE-2492-v1.3.patch, MAPREDUCE-2492-v1.4.patch, 
 MAPREDUCE-2492-v1.5.patch, MAPREDUCE-2492-v1.6.patch, 
 MAPREDUCE-2492-v1.7.patch


 There is no way to get the task's current progress in the new MapReduce API. 
 It would be nice to make it available so that the task (map/reduce) can use 
 it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

2011-05-23 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037940#comment-13037940
 ] 

Robert Joseph Evans commented on MAPREDUCE-2470:


I am very sorry about that.  I ran the tests after V1, but not V2 of the patch. 
 I will investigate the failures.

 Receiving NPE occasionally on RunningJob.getCounters() call
 ---

 Key: MAPREDUCE-2470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.21.0
 Environment: FreeBSD, Java6, Hadoop r0.21.0
Reporter: Aaron Baff
Assignee: Robert Joseph Evans
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, 
 counters_null_data.pcap


 This is running in a Java daemon that is used as an interface (Thrift) to get 
 information and data from MR Jobs. Using JobClient.getJob(JobID) I 
 successfully get a RunningJob object (I'm checking for NULL), and then rarely 
 I get an NPE when I do RunningJob.getCounters(). This seems to occur after 
 the daemon has been up and running for a while, and in the event of an 
 Exception, I close the JobClient, set it to NULL, and a new one should then 
 be created on the next request for data. Yet, I still seem to be unable to 
 fetch the Counters. Below is the stack trace.
 java.lang.NullPointerException
 at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
 at 
 org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
 at 
 com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
 at 
 com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
 at 
 com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
 at 
 org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
 at 
 org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


  Resolution: Fixed
Release Note: Makes Gridmix emulate HDFS based distributed cache files and 
local file system based distributed cache files.
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this to trunk.

 Make Gridmix emulate usage of Distributed Cache files
 -

 Key: MAPREDUCE-2407
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/gridmix
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.23.0

 Attachments: 2407.patch, 2407.v1.1.patch, 2407.v1.patch


 Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
 emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2528) NullPointerException in the job tracker UI, when we perform kill or change the priority of jobs without selecting the any job.

2011-05-23 Thread Devaraj K (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-2528:
-

Fix Version/s: 0.23.0
   Status: Patch Available  (was: Open)

Checking for at least one job selection before performing kill selected jobs or 
change priority.

 NullPointerException in the job tracker UI, when we perform kill or change 
 the priority of jobs without selecting the any job.
 --

 Key: MAPREDUCE-2528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.23.0
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2528.patch


 If we click on Kill Selected Jobs or Change button without selecting any job, 
 it is giving the below exception in the UI.
 {code}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter$RequestQuoter.getParameterValues(HttpServer.java:798)
 at org.apache.hadoop.mapred.JSPUtil.processButtons(JSPUtil.java:209)
 at 
 org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:146)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
 at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:871)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:324)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:741)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:213)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
 at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13037978#comment-13037978
 ] 

Hudson commented on MAPREDUCE-2407:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #695 (See 
[https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/695/])
MAPREDUCE-2407. Make GridMix emulate usage of distributed cache files in 
simulated jobs.

ravigummadi : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1126499
Files : 
* 
/hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/PseudoLocalFs.java
* 
/hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/DistributedCacheEmulator.java
* 
/hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestPseudoLocalFs.java
* 
/hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
* 
/hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateData.java
* 
/hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/JobCreator.java
* /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml
* 
/hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateDistCacheData.java
* 
/hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/DebugJobProducer.java
* 
/hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestDistCacheEmulation.java
* 
/hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestGridmixSubmission.java


 Make Gridmix emulate usage of Distributed Cache files
 -

 Key: MAPREDUCE-2407
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/gridmix
Affects Versions: 0.23.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.23.0

 Attachments: 2407.patch, 2407.v1.1.patch, 2407.v1.patch


 Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
 emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2492) [MAPREDUCE] The new MapReduce API should make available task's progress to the task

2011-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038004#comment-13038004
 ] 

Hadoop QA commented on MAPREDUCE-2492:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12480106/MAPREDUCE-2492-v1.7.patch
  against trunk revision 1125599.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/293//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/293//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/293//console

This message is automatically generated.

 [MAPREDUCE] The new MapReduce API should make available task's progress to 
 the task
 ---

 Key: MAPREDUCE-2492
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2492
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: MAPREDUCE-2492-v1.3.patch, MAPREDUCE-2492-v1.4.patch, 
 MAPREDUCE-2492-v1.5.patch, MAPREDUCE-2492-v1.6.patch, 
 MAPREDUCE-2492-v1.7.patch


 There is no way to get the task's current progress in the new MapReduce API. 
 It would be nice to make it available so that the task (map/reduce) can use 
 it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2489) Jobsplits with random hostnames can make the queue unusable

2011-05-23 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038008#comment-13038008
 ] 

jirapos...@reviews.apache.org commented on MAPREDUCE-2489:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/776/
---

Review request for hadoop-mapreduce.


Summary
---

We saw an issue where a custom InputSplit was returning invalid hostnames 
(non-repeating) for the splits that were then causing the JobTracker to attempt 
to excessively resolve host names. This caused a major slowdown for the 
JobTracker. We should prevent invalid InputSplit hostnames from affecting 
everyone else.

I propose we implement some verification for the hostnames to try to ensure 
that we only do DNS lookups on valid hostnames (and fail otherwise). We could 
also fail the job after a certain number of failures in the resolve.

NOTE: This requires the changes in HADOOP-7314


This addresses bug MAPREDUCE-2489.
https://issues.apache.org/jira/browse/MAPREDUCE-2489


Diffs
-

  trunk/ivy.xml 1125074 
  trunk/ivy/libraries.properties 1125074 
  
trunk/src/contrib/mumak/src/java/org/apache/hadoop/mapred/SimulatorJobTracker.java
 1125074 
  trunk/src/java/org/apache/hadoop/mapred/JobInProgress.java 1125074 
  trunk/src/java/org/apache/hadoop/mapred/JobTracker.java 1125074 

Diff: https://reviews.apache.org/r/776/diff


Testing
---


Thanks,

Jeffrey



 Jobsplits with random hostnames can make the queue unusable
 ---

 Key: MAPREDUCE-2489
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2489
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Jeffrey Naisbitt
Assignee: Jeffrey Naisbitt
 Attachments: MAPREDUCE-2489-mapred.patch


 We saw an issue where a custom InputSplit was returning invalid hostnames for 
 the splits that were then causing the JobTracker to attempt to excessively 
 resolve host names.  This caused a major slowdown for the JobTracker.  We 
 should prevent invalid InputSplit hostnames from affecting everyone else.
 I propose we implement some verification for the hostnames to try to ensure 
 that we only do DNS lookups on valid hostnames (and fail otherwise).  We 
 could also fail the job after a certain number of failures in the resolve.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

2011-05-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2470:
---

Attachment: MAPREDUCE-2470-v3.patch

Fixed issues with fault injection build.

 Receiving NPE occasionally on RunningJob.getCounters() call
 ---

 Key: MAPREDUCE-2470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.21.0
 Environment: FreeBSD, Java6, Hadoop r0.21.0
Reporter: Aaron Baff
Assignee: Robert Joseph Evans
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, 
 MAPREDUCE-2470-v3.patch, counters_null_data.pcap


 This is running in a Java daemon that is used as an interface (Thrift) to get 
 information and data from MR Jobs. Using JobClient.getJob(JobID) I 
 successfully get a RunningJob object (I'm checking for NULL), and then rarely 
 I get an NPE when I do RunningJob.getCounters(). This seems to occur after 
 the daemon has been up and running for a while, and in the event of an 
 Exception, I close the JobClient, set it to NULL, and a new one should then 
 be created on the next request for data. Yet, I still seem to be unable to 
 fetch the Counters. Below is the stack trace.
 java.lang.NullPointerException
 at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
 at 
 org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
 at 
 com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
 at 
 com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
 at 
 com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
 at 
 org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
 at 
 org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2492) [MAPREDUCE] The new MapReduce API should make available task's progress to the task

2011-05-23 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-2492:
--

   Resolution: Fixed
Fix Version/s: 0.23.0
 Release Note: Map and Reduce task can access the attempt's overall 
progress via TaskAttemptContext.
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I just committed this.

 [MAPREDUCE] The new MapReduce API should make available task's progress to 
 the task
 ---

 Key: MAPREDUCE-2492
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2492
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2492-v1.3.patch, MAPREDUCE-2492-v1.4.patch, 
 MAPREDUCE-2492-v1.5.patch, MAPREDUCE-2492-v1.6.patch, 
 MAPREDUCE-2492-v1.7.patch


 There is no way to get the task's current progress in the new MapReduce API. 
 It would be nice to make it available so that the task (map/reduce) can use 
 it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-2511) Progress reported by map tasks of a map-only job is incorrect

2011-05-23 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat resolved MAPREDUCE-2511.
---

Resolution: Duplicate

Fixed as part of MAPREDUCE-2492.

 Progress reported by map tasks of a map-only job is incorrect
 -

 Key: MAPREDUCE-2511
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2511
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat

 For a map task of a map-reduce job, the progress bar is (logically) split 
 into 2 distinct phases
 1. Map Phase
 2. Sort Phase
 The map phase manages 66% of the overall tasks progress while the sort phase 
 governs the rest i.e 33%. 
 For a map task of a map-only job, there is no sort phase. Hence the entire 
 map phase should govern 100% of the task's progress. Currently, the progress 
 bar is split divided into 66%-33% irrespective of whether the job has 
 reducers or not (i.e whether there is a sort phase or not).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-2523) TestTaskContext should cleanup its temporary files/folders on completion

2011-05-23 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat resolved MAPREDUCE-2523.
---

Resolution: Duplicate

Fixed as part of MAPREDUCE-2492.

 TestTaskContext should cleanup its temporary files/folders on completion
 

 Key: MAPREDUCE-2523
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2523
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Amar Kamat
Assignee: Amar Kamat
  Labels: test
 Fix For: 0.23.0


 TestTaskContext creates in and out folders in the current working 
 directory. Ideally these files should go under test.build.data or /tmp. 
 Also the testcase should delete these files on completion. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-2519) Progress reported by a reduce task executed via LocalJobRunner is incorrect

2011-05-23 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat resolved MAPREDUCE-2519.
---

Resolution: Duplicate

Fixed as part of MAPREDUCE-2492.

 Progress reported by a reduce task executed via LocalJobRunner is incorrect
 ---

 Key: MAPREDUCE-2519
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2519
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
  Labels: localjobrunner, progress, reduce

 ReduceTask splits its progress reporting into 3 phases
 1. Copy
 2. Shuffule
 3. Reduce
 When the reduce task is run using a LocalJobRunner, the copy phase is ignored 
 (skipped) but the progress is not updated. This results in a mismatch in the 
 Reduce task's progress.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2528) NullPointerException in the job tracker UI, when we perform kill or change the priority of jobs without selecting the any job.

2011-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038068#comment-13038068
 ] 

Hadoop QA commented on MAPREDUCE-2528:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12480113/MAPREDUCE-2528.patch
  against trunk revision 1126499.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/294//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/294//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/294//console

This message is automatically generated.

 NullPointerException in the job tracker UI, when we perform kill or change 
 the priority of jobs without selecting the any job.
 --

 Key: MAPREDUCE-2528
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2528
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.23.0
Reporter: Devaraj K
Assignee: Devaraj K
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2528.patch


 If we click on Kill Selected Jobs or Change button without selecting any job, 
 it is giving the below exception in the UI.
 {code}
 java.lang.NullPointerException
 at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter$RequestQuoter.getParameterValues(HttpServer.java:798)
 at org.apache.hadoop.mapred.JSPUtil.processButtons(JSPUtil.java:209)
 at 
 org.apache.hadoop.mapred.jobtracker_jsp._jspService(jobtracker_jsp.java:146)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
 at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:871)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:324)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:879)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:741)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:213)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
 at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2492) [MAPREDUCE] The new MapReduce API should make available task's progress to the task

2011-05-23 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038084#comment-13038084
 ] 

Tom White commented on MAPREDUCE-2492:
--

This is an incompatible change, since it adds a method to the public @Stable 
mapred.Reporter interface. Is it possible to rework this to only change the new 
API, as the title suggests? This would then be a compatible change since the 
classes that have been changed in the new API are private or @Evolving.

Also, it looks like this wasn't reviewed before being committed.

 [MAPREDUCE] The new MapReduce API should make available task's progress to 
 the task
 ---

 Key: MAPREDUCE-2492
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2492
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2492-v1.3.patch, MAPREDUCE-2492-v1.4.patch, 
 MAPREDUCE-2492-v1.5.patch, MAPREDUCE-2492-v1.6.patch, 
 MAPREDUCE-2492-v1.7.patch


 There is no way to get the task's current progress in the new MapReduce API. 
 It would be nice to make it available so that the task (map/reduce) can use 
 it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2186) DistributedRaidFileSystem should implement getFileBlockLocations()

2011-05-23 Thread Ramkumar Vadali (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038086#comment-13038086
 ] 

Ramkumar Vadali commented on MAPREDUCE-2186:


The main motivation to open this jira was to allow CombineFileInputFormat to 
work when there are missing blocks. CombineFileInputFormat figures out the 
host/rack information for input blocks and uses that information to create 
input splits. It does not handle the case where a block does not have any 
host/rack information.

The proposed fix to return the location of parity blocks in the case where 
source blocks are missing is not good because it is fixing the problem in the 
wrong place. It also causes us to get false locality. 
Instead of changing RAID FS to handle this case, its better to fix CFIF to 
handle the case when there are missing blocks (MAPREDUCE-2185)

 DistributedRaidFileSystem should implement getFileBlockLocations()
 --

 Key: MAPREDUCE-2186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali

 If a RAIDed file has missing blocks, 
 DistributedRaidFileSystem.getFileBlockLocations() would return no block 
 locations. This could lead a client to believe that the file is not readable. 
 But if parity data is available, the file actually is readable.
 It would be better to implement getFileBlockLocations() and return the 
 location of the parity blocks that would be needed to reconstruct the missing 
 block.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-2186) DistributedRaidFileSystem should implement getFileBlockLocations()

2011-05-23 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali resolved MAPREDUCE-2186.


Resolution: Won't Fix

Better to fix MAPREDUCE-2185

 DistributedRaidFileSystem should implement getFileBlockLocations()
 --

 Key: MAPREDUCE-2186
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Ramkumar Vadali
Assignee: Ramkumar Vadali

 If a RAIDed file has missing blocks, 
 DistributedRaidFileSystem.getFileBlockLocations() would return no block 
 locations. This could lead a client to believe that the file is not readable. 
 But if parity data is available, the file actually is readable.
 It would be better to implement getFileBlockLocations() and return the 
 location of the parity blocks that would be needed to reconstruct the missing 
 block.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-2498) TestRaidShellFsck failing on trunk

2011-05-23 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali reassigned MAPREDUCE-2498:
--

Assignee: Ramkumar Vadali  (was: Todd Lipcon)

 TestRaidShellFsck failing on trunk
 --

 Key: MAPREDUCE-2498
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2498
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/raid
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Ramkumar Vadali
Priority: Critical
 Fix For: 0.23.0

 Attachments: mapreduce-2498.txt


 TestRaidShellFsck.testFileBlockAndParityBlockMissingHar2 has been failing the 
 last several builds:
 Error Message: parity file not HARed after 40s
 java.io.IOException: parity file not HARed after 40s
at 
 org.apache.hadoop.raid.TestRaidShellFsck.raidTestFiles(TestRaidShellFsck.java:281)
at 
 org.apache.hadoop.raid.TestRaidShellFsck.setUp(TestRaidShellFsck.java:181)
at 
 org.apache.hadoop.raid.TestRaidShellFsck.testFileBlockAndParityBlockMissingHar2(TestRaidShellFsck.java:666)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2492) [MAPREDUCE] The new MapReduce API should make available task's progress to the task

2011-05-23 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038134#comment-13038134
 ] 

Tom White commented on MAPREDUCE-2492:
--

 I think it makes sense to have Reporter provide the task's progress to the 
 map/reduce task attempts. I would prefer marking this change as incompatible.

It's certainly an improvement, but does it warrant an incompatible change to an 
interface marked as stable? There may be an argument that it does, but I would 
have expected to see some discussion about this before it was committed. Why 
not only change the new API and tell users that they need to use that one if 
they want to use this feature?

 [MAPREDUCE] The new MapReduce API should make available task's progress to 
 the task
 ---

 Key: MAPREDUCE-2492
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2492
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2492-v1.3.patch, MAPREDUCE-2492-v1.4.patch, 
 MAPREDUCE-2492-v1.5.patch, MAPREDUCE-2492-v1.6.patch, 
 MAPREDUCE-2492-v1.7.patch


 There is no way to get the task's current progress in the new MapReduce API. 
 It would be nice to make it available so that the task (map/reduce) can use 
 it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2470) Receiving NPE occasionally on RunningJob.getCounters() call

2011-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038143#comment-13038143
 ] 

Hadoop QA commented on MAPREDUCE-2470:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12480124/MAPREDUCE-2470-v3.patch
  against trunk revision 1126499.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/295//console

This message is automatically generated.

 Receiving NPE occasionally on RunningJob.getCounters() call
 ---

 Key: MAPREDUCE-2470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2470
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.21.0
 Environment: FreeBSD, Java6, Hadoop r0.21.0
Reporter: Aaron Baff
Assignee: Robert Joseph Evans
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2470-v1.patch, MAPREDUCE-2470-v2.patch, 
 MAPREDUCE-2470-v3.patch, counters_null_data.pcap


 This is running in a Java daemon that is used as an interface (Thrift) to get 
 information and data from MR Jobs. Using JobClient.getJob(JobID) I 
 successfully get a RunningJob object (I'm checking for NULL), and then rarely 
 I get an NPE when I do RunningJob.getCounters(). This seems to occur after 
 the daemon has been up and running for a while, and in the event of an 
 Exception, I close the JobClient, set it to NULL, and a new one should then 
 be created on the next request for data. Yet, I still seem to be unable to 
 fetch the Counters. Below is the stack trace.
 java.lang.NullPointerException
 at org.apache.hadoop.mapred.Counters.downgrade(Counters.java:77)
 at 
 org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:381)
 at 
 com.telescope.HadoopThrift.service.ServiceImpl.getReportResults(ServiceImpl.java:350)
 at 
 com.telescope.HadoopThrift.gen.HadoopThrift$Processor$getReportResults.process(HadoopThrift.java:545)
 at 
 com.telescope.HadoopThrift.gen.HadoopThrift$Processor.process(HadoopThrift.java:421)
 at 
 org.apache.thrift.server.TNonblockingServer$FrameBuffer.invoke(TNonblockingServer.java:697)
 at 
 org.apache.thrift.server.THsHaServer$Invocation.run(THsHaServer.java:317)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

2011-05-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2495:
---

Status: Open  (was: Patch Available)

Chris indicated as a side comment in a different conversation that the sleeps 
in the tests are not very good, so I am reworking the tests to avoid using 
sleep.

 The distributed cache cleanup thread has no monitoring to check to see if it 
 has died for some reason
 -

 Key: MAPREDUCE-2495
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Minor
 Attachments: MAPREDUCE-2495-20.20X-V1.patch, 
 MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, 
 MAPREDUCE-2495-v1.patch, MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch


 The cleanup thread in the distributed cache handles IOExceptions and the like 
 correctly, but just to be a bit more defensive it would be good to monitor 
 the thread, and check that it is still alive regularly, so that the 
 distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

2011-05-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2495:
---

Status: Patch Available  (was: Open)

 The distributed cache cleanup thread has no monitoring to check to see if it 
 has died for some reason
 -

 Key: MAPREDUCE-2495
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Minor
 Attachments: MAPREDUCE-2495-20.20X-V1.patch, 
 MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, 
 MAPREDUCE-2495-20.20X-V4.patch, MAPREDUCE-2495-v1.patch, 
 MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch, MAPREDUCE-2495-v4.patch


 The cleanup thread in the distributed cache handles IOExceptions and the like 
 correctly, but just to be a bit more defensive it would be good to monitor 
 the thread, and check that it is still alive regularly, so that the 
 distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2495) The distributed cache cleanup thread has no monitoring to check to see if it has died for some reason

2011-05-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2495:
---

Attachment: MAPREDUCE-2495-v4.patch
MAPREDUCE-2495-20.20X-V4.patch

Tests no longer sleep

 The distributed cache cleanup thread has no monitoring to check to see if it 
 has died for some reason
 -

 Key: MAPREDUCE-2495
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2495
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
Priority: Minor
 Attachments: MAPREDUCE-2495-20.20X-V1.patch, 
 MAPREDUCE-2495-20.20X-V2.patch, MAPREDUCE-2495-20.20X-V3.patch, 
 MAPREDUCE-2495-20.20X-V4.patch, MAPREDUCE-2495-v1.patch, 
 MAPREDUCE-2495-v2.patch, MAPREDUCE-2495-v3.patch, MAPREDUCE-2495-v4.patch


 The cleanup thread in the distributed cache handles IOExceptions and the like 
 correctly, but just to be a bit more defensive it would be good to monitor 
 the thread, and check that it is still alive regularly, so that the 
 distributed cache does not fill up the entire disk on the node. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2524) Backport trunk heuristics for failing maps when we get fetch failures retrieving map output during shuffle

2011-05-23 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated MAPREDUCE-2524:
-

Attachment: MAPREDUCE2524-patch-20security.txt

patch for the branch-0.20-security.  

 Backport trunk heuristics for failing maps when we get fetch failures 
 retrieving map output during shuffle
 --

 Key: MAPREDUCE-2524
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2524
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.204.0
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Minor
 Fix For: 0.20.205.0

 Attachments: MAPREDUCE2524-patch-20security.txt


 The heuristics for failing maps when we get map output fetch failures during 
 the shuffle is pretty conservative in 20. Backport the heuristics from trunk 
 which are more aggressive, simpler, and configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2524) Backport trunk heuristics for failing maps when we get fetch failures retrieving map output during shuffle

2011-05-23 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038174#comment-13038174
 ] 

Thomas Graves commented on MAPREDUCE-2524:
--

   
The javadoc and eclipse failures existed before/without these changes.

  [exec] -1 overall. [exec]
 [exec] +1 @author.  The patch does not contain any @author tags. 
[exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] -1 javadoc.  The javadoc tool appears to have generated 1 
warning messages. [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to 
differ from the contents of the lib directories. [exec] [exec]

 Backport trunk heuristics for failing maps when we get fetch failures 
 retrieving map output during shuffle
 --

 Key: MAPREDUCE-2524
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2524
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.204.0
Reporter: Thomas Graves
Assignee: Thomas Graves
Priority: Minor
 Fix For: 0.20.205.0

 Attachments: MAPREDUCE2524-patch-20security.txt


 The heuristics for failing maps when we get map output fetch failures during 
 the shuffle is pretty conservative in 20. Backport the heuristics from trunk 
 which are more aggressive, simpler, and configurable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2185) Infinite loop at creating splits using CombineFileInputFormat

2011-05-23 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2185:
---

Attachment: MAPREDUCE-2185.patch

For blocks that do not have hosts associated with them, use 
NetworkTopology.DEFAULT_RACK as the rack location. This avoids the infinite 
loop later on in getMoreSplits()

 Infinite loop at creating splits using CombineFileInputFormat
 -

 Key: MAPREDUCE-2185
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2185
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: MAPREDUCE-2185.patch


 This is caused by a missing block in HDFS. So the block's locations are 
 empty. The following code adds the block to blockToNodes map but not to 
 rackToBlocks map. Later on when generating splits, only blocks in 
 rackToBlocks are removed from blockToNodes map. So blockToNodes map can never 
 become empty therefore causing infinite loop
 {code}
   // add this block to the block -- node locations map
   blockToNodes.put(oneblock, oneblock.hosts);
   // add this block to the rack -- block map
   for (int j = 0; j  oneblock.racks.length; j++) {
  ..
   }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2185) Infinite loop at creating splits using CombineFileInputFormat

2011-05-23 Thread Ramkumar Vadali (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ramkumar Vadali updated MAPREDUCE-2185:
---

Assignee: Ramkumar Vadali  (was: Hairong Kuang)
  Status: Patch Available  (was: Open)

 Infinite loop at creating splits using CombineFileInputFormat
 -

 Key: MAPREDUCE-2185
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2185
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Reporter: Hairong Kuang
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2185.patch


 This is caused by a missing block in HDFS. So the block's locations are 
 empty. The following code adds the block to blockToNodes map but not to 
 rackToBlocks map. Later on when generating splits, only blocks in 
 rackToBlocks are removed from blockToNodes map. So blockToNodes map can never 
 become empty therefore causing infinite loop
 {code}
   // add this block to the block -- node locations map
   blockToNodes.put(oneblock, oneblock.hosts);
   // add this block to the rack -- block map
   for (int j = 0; j  oneblock.racks.length; j++) {
  ..
   }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Status: Open  (was: Patch Available)

In a different conversation with Chris he mentioned that sleeps in the tests 
are bad, and that if they have to be there then they should be tied together 
with some constant values.  I am reworking the tests to deal with constant 
values.

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MAPREDUCE-2494-V1.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Status: Patch Available  (was: Open)

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-23 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated MAPREDUCE-2494:
---

Attachment: MAPREDUCE-2494-V2.patch

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2492) [MAPREDUCE] The new MapReduce API should make available task's progress to the task

2011-05-23 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038214#comment-13038214
 ] 

Chris Douglas commented on MAPREDUCE-2492:
--

This is partially my fault. I only had time for a quick pass over the patch, 
and had missed the mapred API change among the updates to {{mapreduce.lib}} 
classes.

I agree with Tom. It's a useful feature, but changing only the new API is 
probably the better course. Any breakage is unlikely- it's adding, not removing 
a method from a framework type almost never implemented by users- but I'd lean 
away from any modifications to the old APIs unless they affect correctness.

 [MAPREDUCE] The new MapReduce API should make available task's progress to 
 the task
 ---

 Key: MAPREDUCE-2492
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2492
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.23.0
Reporter: Amar Kamat
Assignee: Amar Kamat
 Fix For: 0.23.0

 Attachments: MAPREDUCE-2492-v1.3.patch, MAPREDUCE-2492-v1.4.patch, 
 MAPREDUCE-2492-v1.5.patch, MAPREDUCE-2492-v1.6.patch, 
 MAPREDUCE-2492-v1.7.patch


 There is no way to get the task's current progress in the new MapReduce API. 
 It would be nice to make it available so that the task (map/reduce) can use 
 it. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2521) Mapreduce RPM integration project

2011-05-23 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang updated MAPREDUCE-2521:
-

Attachment: MAPREDUCE-2521-4.patch

Change configuration directory from $PREFIX/conf to $PREFIX/etc/hadoop per 
Owen's recommendation.  For RPM/deb, it will use /etc/hadoop as default, and 
create symlink for $PREFIX/etc/hadoop point to /etc/hadoop.

 Mapreduce RPM integration project
 -

 Key: MAPREDUCE-2521
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2521
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: build
 Environment: Java 6, RHEL 5.5
Reporter: Eric Yang
Assignee: Eric Yang
 Attachments: MAPREDUCE-2521-1.patch, MAPREDUCE-2521-2.patch, 
 MAPREDUCE-2521-3.patch, MAPREDUCE-2521-4.patch, MAPREDUCE-2521.patch


 This jira is corresponding to HADOOP-6255 and associated directory layout 
 change. The patch for creating Mapreduce rpm packaging should be posted here 
 for patch test build to verify against mapreduce svn trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2185) Infinite loop at creating splits using CombineFileInputFormat

2011-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038301#comment-13038301
 ] 

Hadoop QA commented on MAPREDUCE-2185:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12480153/MAPREDUCE-2185.patch
  against trunk revision 1126591.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/297//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/297//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/297//console

This message is automatically generated.

 Infinite loop at creating splits using CombineFileInputFormat
 -

 Key: MAPREDUCE-2185
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2185
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Reporter: Hairong Kuang
Assignee: Ramkumar Vadali
 Attachments: MAPREDUCE-2185.patch


 This is caused by a missing block in HDFS. So the block's locations are 
 empty. The following code adds the block to blockToNodes map but not to 
 rackToBlocks map. Later on when generating splits, only blocks in 
 rackToBlocks are removed from blockToNodes map. So blockToNodes map can never 
 become empty therefore causing infinite loop
 {code}
   // add this block to the block -- node locations map
   blockToNodes.put(oneblock, oneblock.hosts);
   // add this block to the rack -- block map
   for (int j = 0; j  oneblock.racks.length; j++) {
  ..
   }
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2494) Make the distributed cache delete entires using LRU priority

2011-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038309#comment-13038309
 ] 

Hadoop QA commented on MAPREDUCE-2494:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12480159/MAPREDUCE-2494-V2.patch
  against trunk revision 1126591.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/298//testReport/
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/298//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/298//console

This message is automatically generated.

 Make the distributed cache delete entires using LRU priority
 

 Key: MAPREDUCE-2494
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2494
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache
Affects Versions: 0.21.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans
 Attachments: MAPREDUCE-2494-V1.patch, MAPREDUCE-2494-V2.patch


 Currently the distributed cache will wait until a cache directory is above a 
 preconfigured threshold.  At which point it will delete all entries that are 
 not currently being used.  It seems like we would get far fewer cache misses 
 if we kept some of them around, even when they are not being used.  We should 
 add in a configurable percentage for a goal of how much of the cache should 
 remain clear when not in use, and select objects to delete based off of how 
 recently they were used, and possibly also how large they are/how difficult 
 is it to download them again.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2485) reinitialize CLASSPATH variable when executing Mapper/Reducer code

2011-05-23 Thread Tom Melendez (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038312#comment-13038312
 ] 

Tom Melendez commented on MAPREDUCE-2485:
-

OK, so false alarm, it does work if I specify mapred.child.env.  I did so like 
this in my config file and I'm good:

property
  namemapred.child.env/name
  
valueCLASSPATH=/usr/lib/hadoop/:/usr/lib/hadoop/lib/:/usr/lib/hadoop/conf:/usr/lib/hadoop/hadoop-0.20.2-cdh3u0-core.jar:/usr/share/java/commons-logging.jar/value
/property

I previously had quotes around classpath var and that barfed.

 reinitialize CLASSPATH variable when executing Mapper/Reducer code
 --

 Key: MAPREDUCE-2485
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2485
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: pipes
Affects Versions: 0.20.2
 Environment: Ubuntu 10.04 LTS
Reporter: Tom Melendez

 We're using pipes, and using libhdfs inside our mapper and reducer code.  
 We've determined that we need to execute a putenv call in order for libhdfs 
 to actually have access to the CLASSPATH.  Ideally, it should just use the 
 CLASSPATH we set when the job was executed.
 For some more context, see these threads:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/ae9808d80fb132fb?tvc=2
 http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/25830

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (MAPREDUCE-2485) reinitialize CLASSPATH variable when executing Mapper/Reducer code

2011-05-23 Thread Tom Melendez (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom Melendez resolved MAPREDUCE-2485.
-

Resolution: Invalid

False alarm, works OK.

 reinitialize CLASSPATH variable when executing Mapper/Reducer code
 --

 Key: MAPREDUCE-2485
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2485
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: pipes
Affects Versions: 0.20.2
 Environment: Ubuntu 10.04 LTS
Reporter: Tom Melendez

 We're using pipes, and using libhdfs inside our mapper and reducer code.  
 We've determined that we need to execute a putenv call in order for libhdfs 
 to actually have access to the CLASSPATH.  Ideally, it should just use the 
 CLASSPATH we set when the job was executed.
 For some more context, see these threads:
 https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/ae9808d80fb132fb?tvc=2
 http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/25830

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2521) Mapreduce RPM integration project

2011-05-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038350#comment-13038350
 ] 

Hadoop QA commented on MAPREDUCE-2521:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12480179/MAPREDUCE-2521-4.patch
  against trunk revision 1126801.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 16 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

-1 release audit.  The applied patch generated 4 release audit warnings 
(more than the trunk's current 2 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/299//testReport/
Release audit warnings: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/299//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/299//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/299//console

This message is automatically generated.

 Mapreduce RPM integration project
 -

 Key: MAPREDUCE-2521
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2521
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: build
 Environment: Java 6, RHEL 5.5
Reporter: Eric Yang
Assignee: Eric Yang
 Attachments: MAPREDUCE-2521-1.patch, MAPREDUCE-2521-2.patch, 
 MAPREDUCE-2521-3.patch, MAPREDUCE-2521-4.patch, MAPREDUCE-2521.patch


 This jira is corresponding to HADOOP-6255 and associated directory layout 
 change. The patch for creating Mapreduce rpm packaging should be posted here 
 for patch test build to verify against mapreduce svn trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira