[jira] [Updated] (MAPREDUCE-4049) plugin for generic shuffle service

2012-04-05 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-4049:
--

Target Version/s: 0.24.0, 1.1.0, 0.23.2, 0.23.3, 1.0.3  (was: 1.0.3, 
0.23.3, 1.0.2, 0.23.2, 1.1.0, 0.24.0)
   Fix Version/s: (was: 1.0.2)

This patch was not committed to Hadoop-1.0.2.  Corrected the Fix Versions and 
Target Versions fields.

 plugin for generic shuffle service
 --

 Key: MAPREDUCE-4049
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance, task, tasktracker
Affects Versions: 0.23.1, 1.0.1
Reporter: Avner BenHanoch
  Labels: merge, plugin, rdma, shuffle
 Attachments: HADOOP-1.0.2.patch, HADOOP-1.0.x.patch, Hadoop Shuffle 
 Consumer Plugin TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, 
 MAPREDUCE-4049-branch-1.0.2.patch, mapred-site.xml, mapred.diff, src.tgz, 
 test.diff


 Support generic shuffle service as set of two plugins: ShuffleProvider  
 ShuffleConsumer.
 This will satisfy the following needs:
 # Better shuffle and merge performance. For example: we are working on 
 shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, 
 or Infiniband) instead of using the current HTTP shuffle. Based on the fast 
 RDMA shuffle, the plugin can also utilize a suitable merge approach during 
 the intermediate merges. Hence, getting much better performance.
 # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden 
 dependency of NodeManager with a specific version of mapreduce shuffle 
 (currently targeted to 0.24.0).
 References:
 # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu 
 from Auburn University with others, 
 [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
 # I am attaching 2 documents with suggested Top Level Design for both plugins 
 (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4088) Task stuck in JobLocalizer prevented other tasks on the same node from committing

2012-04-05 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-4088:
--

Target Version/s: 1.0.3  (was: 1.0.2)

 Task stuck in JobLocalizer prevented other tasks on the same node from 
 committing
 -

 Key: MAPREDUCE-4088
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4088
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ravi Prakash
Priority: Critical

 We saw that as a result of HADOOP-6963, one task was stuck in this
 Thread 23668: (state = IN_NATIVE)
  - java.io.UnixFileSystem.getBooleanAttributes0(java.io.File) @bci=0 
 (Compiled frame; information may be imprecise)
  - java.io.UnixFileSystem.getBooleanAttributes(java.io.File) @bci=2, line=228 
 (Compiled frame)
  - java.io.File.exists() @bci=20, line=733 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=3, line=446 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 
  TONS MORE OF THIS SAME LINE
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
 .
 .
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Compiled frame)
  - org.apache.hadoop.fs.FileUtil.getDU(java.io.File) @bci=52, line=455 
 (Interpreted frame)
 ne=451 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCacheObjects(org.apache.hadoop.conf.Configuration,
  java.net.URI[], org.apache.hadoop.fs.Path[], long[], boolean[], boolean) 
 @bci=150, line=324 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.JobLocalizer.downloadPrivateCache(org.apache.hadoop.conf.Configuration)
  @bci=40, line=349 (Interpreted frame) 51, line=383 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.runSetup(java.lang.String, 
 java.lang.String, org.apache.hadoop.fs.Path, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=46, line=477 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=20, line=534 
 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer$3.run() @bci=1, line=531 
 (Interpreted frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.JobLocalizer.main(java.lang.String[]) @bci=266, 
 line=530 (Interpreted frame)
 While all other tasks on the same node were stuck in 
 Thread 32141: (state = BLOCKED)
  - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
  - 
 org.apache.hadoop.mapred.Task.commit(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter, 
 org.apache.hadoop.mapreduce.OutputCommitter) @bci=24, line=980 (Compiled 
 frame)
  - 
 org.apache.hadoop.mapred.Task.done(org.apache.hadoop.mapred.TaskUmbilicalProtocol,
  org.apache.hadoop.mapred.Task$TaskReporter) @bci=146, line=871 (Interpreted 
 frame)
  - org.apache.hadoop.mapred.ReduceTask.run(org.apache.hadoop.mapred.JobConf, 
 org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=470, line=423 
 (Interpreted frame)
  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=255 (Interpreted 
 frame)
  - 
 java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
  java.security.AccessControlContext) @bci=0 (Interpreted frame)
  - javax.security.auth.Subject.doAs(javax.security.auth.Subject, 
 java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
  - 
 org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
  @bci=14, line=1082 (Interpreted frame)
  - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=738, line=249 
 (Interpreted frame)
 This should never happen. A stuck task should never prevent other tasks from 
 different jobs on the same node from committing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3824) Distributed caches are not removed properly

2012-03-18 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3824:
--

Target Version/s: 1.0.2  (was: 1.0.1)
   Fix Version/s: (was: 1.0.1)
  1.0.2

Was committed to 1.0.2, not 1.0.1.

 Distributed caches are not removed properly
 ---

 Key: MAPREDUCE-3824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3824
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1.0.0
Reporter: Allen Wittenauer
Assignee: Thomas Graves
Priority: Critical
 Fix For: 1.0.2

 Attachments: MAPREDUCE-3824-branch-1.0.patch, 
 MAPREDUCE-3824-branch-1.0.patch, MAPREDUCE-3824-branch-1.0.txt


 Distributed caches are not being properly removed by the TaskTracker when 
 they are expected to be expired. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3851) Allow more aggressive action on detection of the jetty issue

2012-03-18 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3851:
--

Target Version/s: 1.0.2  (was: 1.0.1)
   Fix Version/s: (was: 1.1.0)

Corrected Target and Fixed versions.

 Allow more aggressive action on detection of the jetty issue
 

 Key: MAPREDUCE-3851
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3851
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 1.0.0
Reporter: Kihwal Lee
Assignee: Thomas Graves
 Fix For: 1.0.2

 Attachments: MAPREDUCE-3851.patch, MAPREDUCE-3851.patch, 
 MAPREDUCE-3851.patch, MAPREDUCE-3851.patch


 MAPREDUCE-2529 added the useful failure detection mechanism. In this jira, I 
 propose we add a periodic check inside TT and configurable action to 
 self-destruct. Blacklisting helps but is not enough. Hung jetty still accepts 
 connection and it takes very long time for clients to fail out. Short jobs 
 are delayed for hours because of this. This feature will be a nice companion 
 to MAPREDUCE-3184.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-03-18 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3583:
--

Fix Version/s: (was: 1.1.0)
   (was: 0.24.0)

 ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
 -

 Key: MAPREDUCE-3583
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
 Environment: 64-bit Linux:
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
Reporter: Zhihong Yu
Assignee: Zhihong Yu
Priority: Critical
 Fix For: 0.23.2, 1.0.2

 Attachments: mapreduce-3583-trunk-v2.txt, 
 mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, 
 mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, 
 mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, 
 mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
 mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, 
 mapreduce-3583-v7.txt, mapreduce-3583.txt


 HBase PreCommit builds frequently gave us NumberFormatException.
 From 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
 {code}
 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
 set.  User classes may not be found. See JobConf(Class) or 
 JobConf#setJar(String).
 java.lang.NumberFormatException: For input string: 18446743988060683582
   at 
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
   at java.lang.Long.parseLong(Long.java:422)
   at java.lang.Long.parseLong(Long.java:468)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
   at 
 org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
   at 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {code}
 From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
 causing NFE:
 {code}
 // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
 {code}
 You can find information on the OS at the beginning of 
 https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
 {code}
 asf011.sp2.ygridcore.net
 Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
 17:42:25 UTC 2011 x86_64 GNU/Linux
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 20
 file size   (blocks, -f) unlimited
 pending signals (-i) 16382
 max locked memory   (kbytes, -l) 64
 max memory size (kbytes, -m) unlimited
 open files  (-n) 6
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 8192
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 2048
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 6
 Running in Jenkins mode
 {code}
 From Nicolas Sze:
 {noformat}
 It looks like that the ppid is a 64-bit positive integer but Java long is 
 signed and so only works with 63-bit positive integers.  In your case,
   2^64  18446743988060683582  2^63.
 Therefore, there is a NFE. 
 {noformat}
 I propose changing allProcessInfo to MapString, ProcessInfo so that we 
 don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3992) Reduce fetcher doesn't verify HTTP status code of response

2012-03-18 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3992:
--

Target Version/s: 0.24.0, 0.23.3, 1.0.3  (was: 0.23.3, 1.0.2, 0.24.0)

 Reduce fetcher doesn't verify HTTP status code of response
 --

 Key: MAPREDUCE-3992
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3992
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.23.1, 0.24.0, 1.0.1
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mr-3992.txt


 Currently, the reduce fetch code doesn't check the HTTP status code of the 
 response. This can lead to the following situation:
 - the map output servlet gets an IOException after setting the headers but 
 before the first call to flush()
 - this causes it to send a response with a non-OK result code, including the 
 exception text as the response body (response.sendError() does this if the 
 response isn't committed)
 - it will still include the response headers indicating it's a valid response
 In the case of a merge-to-memory, the compression codec might then try to 
 interpret the HTML response as compressed data, resulting in either a huge 
 allocation (OOME) or some other nasty error. This bug seems to be present in 
 MR1, but haven't checked trunk/MR2 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3993) reduce fetch catch clause should catch RTEs as well

2012-03-18 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3993:
--

Target Version/s: 0.23.3, 1.0.3  (was: 0.23.3, 1.0.2)

 reduce fetch catch clause should catch RTEs as well
 ---

 Key: MAPREDUCE-3993
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3993
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Affects Versions: 0.23.1, 1.0.2
Reporter: Todd Lipcon

 When using a compression codec for intermediate compression, some cases of 
 corrupt data can cause the codec to throw exceptions other than IOException 
 (eg java.lang.InternalError). This will currently cause the whole reduce task 
 to fail, instead of simply treating it like another case of a failed fetch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3824) Distributed caches are not removed properly

2012-02-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3824:
--

   Resolution: Fixed
Fix Version/s: 1.0.1
   Status: Resolved  (was: Patch Available)

Committed to branch-1.0 and branch-1.
Thanks, Thomas, Allen, and reviewers!

 Distributed caches are not removed properly
 ---

 Key: MAPREDUCE-3824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3824
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distributed-cache
Affects Versions: 1.0.0
Reporter: Allen Wittenauer
Assignee: Thomas Graves
Priority: Critical
 Fix For: 1.0.1

 Attachments: MAPREDUCE-3824-branch-1.0.patch, 
 MAPREDUCE-3824-branch-1.0.patch, MAPREDUCE-3824-branch-1.0.txt


 Distributed caches are not being properly removed by the TaskTracker when 
 they are expected to be expired. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3184) Improve handling of fetch failures when a tasktracker is not responding on HTTP

2012-02-12 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3184:
--

Target Version/s: 1.0.1  (was: 1.1.0)
   Fix Version/s: (was: 1.1.0)
  1.0.1

 Improve handling of fetch failures when a tasktracker is not responding on 
 HTTP
 ---

 Key: MAPREDUCE-3184
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3184
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.20.205.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 1.0.1

 Attachments: mr-3184.txt


 On a 100 node cluster, we had an issue where one of the TaskTrackers was hit 
 by MAPREDUCE-2386 and stopped responding to fetches. The behavior observed 
 was the following:
 - every reducer would try to fetch the same map task, and fail after ~13 
 minutes.
 - At that point, all reducers would report this failed fetch to the JT for 
 the same task, and the task would be re-run.
 - Meanwhile, the reducers would move on to the next map task that ran on the 
 TT, and hang for another 13 minutes.
 The job essentially made no progress for hours, as each map task that ran on 
 the bad node was serially marked failed.
 To combat this issue, we should introduce a second type of failed fetch 
 notification, used when the TT does not respond at all (ie 
 SocketTimeoutException, etc). These fetch failure notifications should count 
 against the TT at large, rather than a single task. If more than half of the 
 reducers report such an issue for a given TT, then all of the tasks from that 
 TT should be re-run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

2012-02-12 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3343:
--

Target Version/s: 1.0.1  (was: 1.1.0)
   Fix Version/s: (was: 1.1.0)
  1.0.1

 TaskTracker Out of Memory because of distributed cache
 --

 Key: MAPREDUCE-3343
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ahmed Radwan
Assignee: zhaoyunjiong
  Labels: mapreduce, patch
 Fix For: 1.0.1

 Attachments: MAPREDUCE-3343_rev2.patch, 
 mapreduce-3343-release-0.20.205.0.patch


 This Out of Memory happens when you run large number of jobs (using the 
 distributed cache) on a TaskTracker. 
 Seems the basic issue is with the distributedCacheManager (instance of 
 TrackerDistributedCacheManager in TaskTracker.java), this gets created during 
 TaskTracker.initialize(), and it keeps references to 
 TaskDistributedCacheManager for every submitted job via the jobArchives Map, 
 also references to CacheStatus via cachedArchives map. I am not seeing these 
 cleaned up between jobs, so this can out of memory problems after really 
 large number of jobs are submitted. We have seen this issue in a number of 
 cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3751) Simplify job submission in gridmix

2012-01-30 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3751:
--

Target Version/s: 0.23.1, 1.1.0  (was: 1.0.1, 0.23.1)

Too late to be in 1.0.1, moving Target Version to 1.1.0.  Thanks.

 Simplify job submission in gridmix
 --

 Key: MAPREDUCE-3751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3751
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/gridmix, mrv2
Affects Versions: 0.23.0, 1.0.0
Reporter: Arun C Murthy

 Currently gridmix tries to gauge cluster load etc. and throttles job 
 submission. This makes it unpredictable and also is hard to support across 
 MR1 and MR2. 
 I propose we simplify it to be:
 # Replay mode - Just submit jobs in the interval as in the original trace.
 # Stress mode - Compress the interval with a given factor for all jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira





[jira] [Updated] (MAPREDUCE-3607) Port missing new API mapreduce lib classes to 1.x

2012-01-21 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3607:
--

Target Version/s: 1.0.1  (was: 1.1.0)

Please commit to both branch-1 and branch-1.0.  Thank you.

 Port missing new API mapreduce lib classes to 1.x
 -

 Key: MAPREDUCE-3607
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3607
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 1.0.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-3607.patch, MAPREDUCE-3607.patch, 
 MAPREDUCE-3607.patch


 There are a number of classes under mapreduce.lib that are not present in the 
 1.x series. Including these would help users and downstream projects using 
 the new MapReduce API migrate to later versions of Hadoop in the future.
 A few examples of where this would help:
 * Sqoop uses mapreduce.lib.db.DBWritable and 
 mapreduce.lib.input.CombineFileInputFormat (SQOOP-384).
 * Mahout uses mapreduce.lib.output.MultipleOutputs (MAHOUT-822).
 * HBase has a backport of mapreduce.lib.partition.InputSampler and 
 TotalOrderPartitioner (in org.apache.hadoop.hbase.mapreduce.hadoopbackport) - 
 it would be better if it used the ones in Hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3480) TestJvmReuse fails in 1.0

2011-12-28 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3480:
--

Fix Version/s: 1.0.0

 TestJvmReuse fails in 1.0
 -

 Key: MAPREDUCE-3480
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3480
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: 1.0.0

 Attachments: MAPREDUCE-3480-branch-1.patch


 TestJvmReuse is failing in apache builds, although it passes in my local 
 machine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3539) possible Cases for NullPointerException

2011-12-28 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3539:
--

Target Version/s: 1.1.0
   Fix Version/s: (was: 1.0.0)
  (was: 0.20.205.0)

No commits have been made, but 1.0.0 has been released.  Moving the target 
versions from Fixed Version to Target Version field, and changing to 1.1.0.

 possible Cases for NullPointerException
 ---

 Key: MAPREDUCE-3539
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3539
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.2, 0.20.205.0, 0.21.0, 1.0.0
Reporter: kavita sharma
Priority: Trivial
 Attachments: MAPREDUCE-3539-branch-1.patch, MAPREDUCE-3539.patch


 in DistCh.java
 {noformat}
 in setup method
 opWriter.close();
 if opWriter is null then if we will try to close will throw NPE.
 {noformat}
 {noformat}
 in checkDuplication method
 in.close();
 if in is null then if we will try to close will throw NPE.
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3475) JT can't renew its own tokens

2011-12-15 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3475:
--

Target Version/s: 1.0.0, 0.22.1  (was: 1.0.0)
   Fix Version/s: 1.0.0

Taking Owen's comment for a +1, I have committed this patch to 1.0.0 and 
branch-1.
Leaving the jira open with the expectation that this same fix is needed in 
v0.22.

 JT can't renew its own tokens
 -

 Key: MAPREDUCE-3475
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3475
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.205.0
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Fix For: 1.0.0

 Attachments: MAPREDUCE-3475-1.patch, MAPREDUCE-3475.patch


 When external systems submit jobs whose tasks need to submit additional jobs 
 (such as oozie/pig), they include their own MR token used to submit the job.  
 The token's renewer may not allow the JT to renew the token.  The JT log will 
 include very long SASL/GSSAPI exceptions when the job is submitted.  It is 
 also dubious for the JT to renew its token because it renders the expiry as 
 meaningless since the JT will renew its own token until the max lifetime is 
 exceeded.
 After speaking with Owen  Jitendra, the immediate solution is for the JT to 
 not attempt to renew its own tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3319) multifilewc from hadoop examples seems to be broken in 0.20.205.0

2011-12-15 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3319:
--

  Resolution: Fixed
   Fix Version/s: (was: 1.1.0)
  1.0.0
Target Version/s: 1.0.0  (was: 1.1.0, 0.24.0)
  Status: Resolved  (was: Patch Available)

Okay, Subroto, thanks.
Release 1.0.0 RC had to be re-spun, so merged this fix as promised.

 multifilewc from hadoop examples seems to be broken in 0.20.205.0
 -

 Key: MAPREDUCE-3319
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3319
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.205.0
Reporter: Roman Shaposhnik
Assignee: Subroto Sanyal
Priority: Blocker
  Labels: bigtop
 Fix For: 1.0.0

 Attachments: MAPREDUCE-3319.patch


 {noformat}
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop/hadoop-examples-0.20.205.0.22.jar multifilewc  examples/text 
 examples-output/multifilewc
 11/10/31 16:50:26 INFO mapred.FileInputFormat: Total input paths to process : 
 2
 11/10/31 16:50:26 INFO mapred.JobClient: Running job: job_201110311350_0220
 11/10/31 16:50:27 INFO mapred.JobClient:  map 0% reduce 0%
 11/10/31 16:50:42 INFO mapred.JobClient: Task Id : 
 attempt_201110311350_0220_m_00_0, Status : FAILED
 java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
 to org.apache.hadoop.io.LongWritable
   at 
 org.apache.hadoop.mapred.lib.LongSumReducer.reduce(LongSumReducer.java:44)
   at 
 org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1431)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3319) multifilewc from hadoop examples seems to be broken in 0.20.205.0

2011-12-09 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3319:
--

Fix Version/s: (was: 1.0.0)
   1.1.0

 multifilewc from hadoop examples seems to be broken in 0.20.205.0
 -

 Key: MAPREDUCE-3319
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3319
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.205.0
Reporter: Roman Shaposhnik
Assignee: Subroto Sanyal
Priority: Blocker
  Labels: bigtop
 Fix For: 1.1.0

 Attachments: MAPREDUCE-3319.patch


 {noformat}
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop/hadoop-examples-0.20.205.0.22.jar multifilewc  examples/text 
 examples-output/multifilewc
 11/10/31 16:50:26 INFO mapred.FileInputFormat: Total input paths to process : 
 2
 11/10/31 16:50:26 INFO mapred.JobClient: Running job: job_201110311350_0220
 11/10/31 16:50:27 INFO mapred.JobClient:  map 0% reduce 0%
 11/10/31 16:50:42 INFO mapred.JobClient: Task Id : 
 attempt_201110311350_0220_m_00_0, Status : FAILED
 java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
 to org.apache.hadoop.io.LongWritable
   at 
 org.apache.hadoop.mapred.lib.LongSumReducer.reduce(LongSumReducer.java:44)
   at 
 org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1431)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3319) multifilewc from hadoop examples seems to be broken in 0.20.205.0

2011-12-09 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3319:
--

Target Version/s: 0.24.0, 1.1.0  (was: 1.1.0)

 multifilewc from hadoop examples seems to be broken in 0.20.205.0
 -

 Key: MAPREDUCE-3319
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3319
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.205.0
Reporter: Roman Shaposhnik
Assignee: Subroto Sanyal
Priority: Blocker
  Labels: bigtop
 Fix For: 1.1.0

 Attachments: MAPREDUCE-3319.patch


 {noformat}
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop/hadoop-examples-0.20.205.0.22.jar multifilewc  examples/text 
 examples-output/multifilewc
 11/10/31 16:50:26 INFO mapred.FileInputFormat: Total input paths to process : 
 2
 11/10/31 16:50:26 INFO mapred.JobClient: Running job: job_201110311350_0220
 11/10/31 16:50:27 INFO mapred.JobClient:  map 0% reduce 0%
 11/10/31 16:50:42 INFO mapred.JobClient: Task Id : 
 attempt_201110311350_0220_m_00_0, Status : FAILED
 java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
 to org.apache.hadoop.io.LongWritable
   at 
 org.apache.hadoop.mapred.lib.LongSumReducer.reduce(LongSumReducer.java:44)
   at 
 org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1431)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3180) TaskTracker.java.orig accidentally checked in to 0.20-security-205

2011-11-28 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3180:
--

Target Version/s:   (was: 1.0.0)

 TaskTracker.java.orig accidentally checked in to 0.20-security-205
 --

 Key: MAPREDUCE-3180
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3180
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.205.0
Reporter: Matt Foley
Assignee: Matt Foley
Priority: Trivial

 The file src/mapred/org/apache/hadoop/mapred/TaskTracker.java.orig was 
 accidentally checked in as part of r1179465.  It is only in 
 0.20-security-205, not 0.20-security.  If there is a 0.20.205.1, remove it 
 then.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

2011-11-28 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3343:
--

Target Version/s: 1.1.0  (was: 1.0.0)

 TaskTracker Out of Memory because of distributed cache
 --

 Key: MAPREDUCE-3343
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ahmed Radwan
Assignee: zhaoyunjiong
  Labels: mapreduce, patch
 Fix For: 1.1.0

 Attachments: MAPREDUCE-3343_rev2.patch, 
 mapreduce-3343-release-0.20.205.0.patch


 This Out of Memory happens when you run large number of jobs (using the 
 distributed cache) on a TaskTracker. 
 Seems the basic issue is with the distributedCacheManager (instance of 
 TrackerDistributedCacheManager in TaskTracker.java), this gets created during 
 TaskTracker.initialize(), and it keeps references to 
 TaskDistributedCacheManager for every submitted job via the jobArchives Map, 
 also references to CacheStatus via cachedArchives map. I am not seeing these 
 cleaned up between jobs, so this can out of memory problems after really 
 large number of jobs are submitted. We have seen this issue in a number of 
 cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3319) multifilewc from hadoop examples seems to be broken in 0.20.205.0

2011-11-28 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3319:
--

Target Version/s: 1.1.0
   Fix Version/s: (was: 1.0.0)
  (was: 1.1.0)

Moved requested target versions from Fix Version to Target Version field.
Absent a viable patch in time for 1.0.0, this will have to go in 1.1.0.

 multifilewc from hadoop examples seems to be broken in 0.20.205.0
 -

 Key: MAPREDUCE-3319
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3319
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.205.0
Reporter: Roman Shaposhnik
Priority: Blocker
  Labels: bigtop

 {noformat}
 /usr/lib/hadoop/bin/hadoop jar 
 /usr/lib/hadoop/hadoop-examples-0.20.205.0.22.jar multifilewc  examples/text 
 examples-output/multifilewc
 11/10/31 16:50:26 INFO mapred.FileInputFormat: Total input paths to process : 
 2
 11/10/31 16:50:26 INFO mapred.JobClient: Running job: job_201110311350_0220
 11/10/31 16:50:27 INFO mapred.JobClient:  map 0% reduce 0%
 11/10/31 16:50:42 INFO mapred.JobClient: Task Id : 
 attempt_201110311350_0220_m_00_0, Status : FAILED
 java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast 
 to org.apache.hadoop.io.LongWritable
   at 
 org.apache.hadoop.mapred.lib.LongSumReducer.reduce(LongSumReducer.java:44)
   at 
 org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1431)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3343) TaskTracker Out of Memory because of distributed cache

2011-11-28 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3343:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolving since patch committed to stated version.

 TaskTracker Out of Memory because of distributed cache
 --

 Key: MAPREDUCE-3343
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3343
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1
Affects Versions: 0.20.205.0
Reporter: Ahmed Radwan
Assignee: zhaoyunjiong
  Labels: mapreduce, patch
 Fix For: 1.1.0

 Attachments: MAPREDUCE-3343_rev2.patch, 
 mapreduce-3343-release-0.20.205.0.patch


 This Out of Memory happens when you run large number of jobs (using the 
 distributed cache) on a TaskTracker. 
 Seems the basic issue is with the distributedCacheManager (instance of 
 TrackerDistributedCacheManager in TaskTracker.java), this gets created during 
 TaskTracker.initialize(), and it keeps references to 
 TaskDistributedCacheManager for every submitted job via the jobArchives Map, 
 also references to CacheStatus via cachedArchives map. I am not seeing these 
 cleaned up between jobs, so this can out of memory problems after really 
 large number of jobs are submitted. We have seen this issue in a number of 
 cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3169) Create a new MiniMRCluster equivalent which only provides client APIs cross MR1 and MR2

2011-11-27 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3169:
--

Target Version/s: 0.23.1, 1.0.0  (was: 0.23.0, 1.1.0)
   Fix Version/s: (was: 0.24.0)

 Create a new MiniMRCluster equivalent which only provides client APIs cross 
 MR1 and MR2
 ---

 Key: MAPREDUCE-3169
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3169
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1, mrv2, test
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Ahmed Radwan
 Fix For: 0.23.1, 1.0.0

 Attachments: MAPREDUCE-3169-0.20-security.patch, 
 MAPREDUCE-3169-0.20-security_rev2.patch, MAPREDUCE-3169-truck.patch, 
 MAPREDUCE-3169-trunk_deprecation_amendment.patch, 
 MAPREDUCE-3169-trunk_rev2.patch, MAPREDUCE-3169-trunk_rev3.patch


 Many dependent projects like HBase, Hive, Pig, etc, depend on MiniMRCluster 
 for writing tests. Many users do as well. MiniMRCluster, however, exposes MR 
 implementation details like the existence of TaskTrackers, JobTrackers, etc, 
 since it was used by MR1 for testing the server implementations as well.
 This JIRA is to create a new interface which could be implemented either by 
 MR1 or MR2 that exposes only the client-side portions of the MR framework. 
 Ideally it would be recompile-compatible with MiniMRCluster for most 
 applications, and the MR1 implementation could be backported to 20x branch. 
 Thus, dependent projects like HBase could migrate to this implementation and 
 test against both MR1 and MR2. We can also use this to port over the current 
 functional tests that use only the client-side features of MiniMRCluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3464) mapreduce jsp pages missing DOCTYPE [post-split branches]

2011-11-27 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3464:
--

Target Version/s: 0.23.1
   Fix Version/s: (was: 1.0.0)
 Summary: mapreduce jsp pages missing DOCTYPE [post-split branches] 
 (was: mapreduce jsp pages missing DOCTYPE)

Corrected this bug to only relate to post-split branches/trunk.
The parent bug only will be used for pre-split 1.0.

 mapreduce jsp pages missing DOCTYPE [post-split branches]
 -

 Key: MAPREDUCE-3464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3464
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Dave Vronay
Assignee: Dave Vronay
Priority: Trivial
 Fix For: 0.23.1


 Some jsp pages in the UI are missing a DOCTYPE declaration. This causes the 
 pages to render incorrectly on some browsers, such as IE9. Please see parent 
 bug HADOOP-7827 for details and patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3118) Backport Gridmix and Rumen features from trunk to Hadoop 0.20 security branch

2011-11-21 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3118:
--

Target Version/s: 0.20.206.0  (was: 0.20.205.1)

 Backport Gridmix and Rumen features from trunk to Hadoop 0.20 security branch
 -

 Key: MAPREDUCE-3118
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3118
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/gridmix, tools/rumen
Affects Versions: 0.20.206.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.20.206.0

 Attachments: gridmix_rumen_backports.v2.4.patch, 
 gridmix_rumen_backports.v2.5.patch, gridmix_rumen_backports.v2.6.patch


 Backporting all the features and bugfixes that went into gridmix and rumen of 
 trunk to hadoop 0.20 security branch. This will enable using all these 
 gridmix features and run gridmix/rumen on the history logs of 0.20 security 
 branch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3374) src/c++/task-controller/configure is not set executable in the tarball and that prevents task-controller from rebuilding

2011-11-09 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3374:
--

   Resolution: Fixed
Fix Version/s: 0.20.205.1
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to branch-0.20-security and branch-0.20-security-205.

 src/c++/task-controller/configure is not set executable in the tarball and 
 that prevents task-controller from rebuilding
 

 Key: MAPREDUCE-3374
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3374
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task-controller
Affects Versions: 0.20.205.0
Reporter: Roman Shaposhnik
 Fix For: 0.20.205.1

 Attachments: MAPREDUCE-3374.patch.txt, log.gz


 ant task-controller fails because src/c++/task-controller/configure is not 
 set executable

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-1933) Improve automated testcase for tasktracker dealing with corrupted disk.

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-1933:
--

 Description: 
Test TestCorruptedDiskJob was added in 0.20.203.0, to implement this test:
* After the TaskTracker has already run some tasks successfully, corrupt a 
disk by making the corresponding mapred.local.dir unreadable/unwritable. Make 
sure that jobs continue to succeed even though some tasks scheduled there fail. 

Cos provided suggestions for improvement in his comments of 23/Jul/10 18:40 and 
30/Jul/10 01:51, but the patch was committed without these improvements nor 
response to them.  This bug remains open until these improvements are 
implemented or responded to.

  was:
After the TaskTracker has already run some tasks successfully, corrupt a disk 
by making the corresponding mapred.local.dir unreadable/unwritable. 
Make sure that jobs continue to succeed even though some tasks scheduled there 
fail. 


Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)
 Summary: Improve automated testcase for tasktracker dealing with 
corrupted disk.  (was: Create automated testcase for tasktracker dealing with 
corrupted disk.)

Changed title and description to reflect the remaining work to be done per 
Eli's comment.

 Improve automated testcase for tasktracker dealing with corrupted disk.
 ---

 Key: MAPREDUCE-1933
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 0.20.203.0
Reporter: Iyappan Srinivasan
Assignee: Iyappan Srinivasan
 Attachments: 1933-ydist-security-patch.txt, 
 1933-ydist-security-patch.txt, MAPREDUCE-1933.patch, MAPREDUCE-1933.patch, 
 MAPREDUCE-1933.patch, TestCorruptedDiskJob.java


 Test TestCorruptedDiskJob was added in 0.20.203.0, to implement this test:
 * After the TaskTracker has already run some tasks successfully, corrupt a 
 disk by making the corresponding mapred.local.dir unreadable/unwritable. Make 
 sure that jobs continue to succeed even though some tasks scheduled there 
 fail. 
 Cos provided suggestions for improvement in his comments of 23/Jul/10 18:40 
 and 30/Jul/10 01:51, but the patch was committed without these improvements 
 nor response to them.  This bug remains open until these improvements are 
 implemented or responded to.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2340) optimize JobInProgress.initTasks()

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2340:
--

Target Version/s: 0.20.206.0, 0.22.0
   Fix Version/s: (was: 0.20.205.0)

 optimize JobInProgress.initTasks()
 --

 Key: MAPREDUCE-2340
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2340
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.20.1, 0.21.0
Reporter: Kang Xiao
 Attachments: MAPREDUCE-2340.patch, MAPREDUCE-2340.patch, 
 MAPREDUCE-2340.r1.diff


 JobTracker's hostnameToNodeMap cache can speed up JobInProgress.initTasks() 
 and JobInProgress.createCache() significantly. A test for 1 job with 10 
 maps on a 2400 cluster shows nearly 10 and 50 times speed up for initTasks() 
 and createCache(). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2264) Job status exceeds 100% in some cases

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2264:
--

Target Version/s: 0.20.206.0, 0.22.0
   Fix Version/s: (was: 0.20.205.0)
  (was: 0.22.0)

 Job status exceeds 100% in some cases 
 --

 Key: MAPREDUCE-2264
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.2, 0.20.205.0
Reporter: Adam Kramer
Assignee: Devaraj K
 Attachments: MAPREDUCE-2264-0.20.205-1.patch, 
 MAPREDUCE-2264-0.20.205.patch, MAPREDUCE-2264-0.20.3.patch, 
 MAPREDUCE-2264-trunk.patch, more than 100%.bmp


 I'm looking now at my jobtracker's list of running reduce tasks. One of them 
 is 120.05% complete, the other is 107.28% complete.
 I understand that these numbers are estimates, but there is no case in which 
 an estimate of 100% for a non-complete task is better than an estimate of 
 99.99%, nor is there any case in which an estimate greater than 100% is valid.
 I suggest that whatever logic is computing these set 99.99% as a hard maximum.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2508) vaidya script uses the wrong path for vaidya jar due to jar renaming

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2508:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 vaidya script uses the wrong path for vaidya jar due to jar renaming
 

 Key: MAPREDUCE-2508
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2508
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/vaidya
Reporter: Allen Wittenauer
Priority: Trivial

 This clearly wasn't tested in 203.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2780) Standardize the value of token service

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2780:
--

Target Version/s: 0.20.205.0, 0.23.0

 Standardize the value of token service
 --

 Key: MAPREDUCE-2780
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2780
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Fix For: 0.20.205.0

 Attachments: MAPREDUCE-2780-2.patch, MAPREDUCE-2780-3.patch, 
 MAPREDUCE-2780-4.patch, MAPREDUCE-2780.patch


 The token's service field must (currently) be set to ip:port.  All the 
 producers of a token are independently building the service string.  This 
 should be done via a common method to reduce the chance of error, and to 
 facilitate the field value being easily changed in the (near) future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2835) Make per-job counter limits configurable

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2835:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 Make per-job counter limits configurable
 

 Key: MAPREDUCE-2835
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2835
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.204.0
Reporter: Tom White
Assignee: Tom White
 Attachments: MAPREDUCE-2835.patch, MAPREDUCE-2835.patch


 The per-job counter limits introduced in MAPREDUCE-1943 are fixed, except for 
 the total number allowed per job (mapreduce.job.counters.limit). It would be 
 useful to make them all configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2817) MiniRMCluster hardcodes 'mapred.local.dir' configuration to 'build/test/mapred/local'

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2817:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 MiniRMCluster hardcodes 'mapred.local.dir' configuration to 
 'build/test/mapred/local'
 -

 Key: MAPREDUCE-2817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 0.20.2
 Environment: all
Reporter: Alejandro Abdelnur
Assignee: Harsh J
Priority: Minor
 Attachments: MAPREDUCE-2817.r1.diff


 The {{mapred.local.dir}} configuration property for the {{MiniMRCluster}} is 
 forced to {{build/test/mapred/local}}
 This is inconvenient in different situations. For example:
 * When running multiple tests using {{MiniMRCluster}} is not possible to see 
 the end state of the dir for a particular test
 * When using {{MiniMRCluster}} in another build system (i.e. Maven) that uses 
 a different output directory (target instead build)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2903) Map Tasks graph is throwing XML Parse error when Job is executed with 0 maps

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2903:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 Map Tasks graph is throwing XML Parse error when Job is executed with 0 maps
 

 Key: MAPREDUCE-2903
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2903
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.205.0
Reporter: Devaraj K
Assignee: Devaraj K
 Attachments: MAPREDUCE-2903.patch


 {code:xml}
 XML Parsing Error: no element found
 Location: 
 http://10.18.52.170:50030/taskgraph?type=mapjobid=job_201108291536_0001
 Line Number 1, Column 1:
 ^
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2931) CLONE - LocalJobRunner should support parallel mapper execution

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2931:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 CLONE - LocalJobRunner should support parallel mapper execution
 ---

 Key: MAPREDUCE-2931
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2931
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Forest Tan
Assignee: Aaron Kimball

 The LocalJobRunner currently supports only a single execution thread. Given 
 the prevalence of multi-core CPUs, it makes sense to allow users to run 
 multiple tasks in parallel for improved performance on small (local-only) 
 jobs.
 It is necessary to patch back MAPREDUCE-1367 into Hadoop 0.20.X version. 
 Also, MapReduce-434 should be submitted together.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2919) The JT web UI should show job start times

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2919:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 The JT web UI should show job start times 
 --

 Key: MAPREDUCE-2919
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2919
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.20.203.0
Reporter: Eli Collins
Assignee: Harsh J
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-2919.r1.diff, Screen shot 2011-09-04 at 
 1.14.00 AM.png


 It would be helpful if the list of jobs in the main JT web UI (running, 
 completed, failed..) had a column with the start time. Clicking into each job 
 detail can get tedious.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2957) The TT should not re-init if it has no good local dirs

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2957:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 The TT should not re-init if it has no good local dirs
 --

 Key: MAPREDUCE-2957
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2957
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Affects Versions: 0.20.204.0
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: mapreduce-2957.patch


 The TT will currently try to re-init itself on disk failure even if it has no 
 good local dirs. It should shutdown instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2959) The TT daemon should shutdown on fatal exceptions

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2959:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 The TT daemon should shutdown on fatal exceptions
 -

 Key: MAPREDUCE-2959
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2959
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.20.204.0
Reporter: Eli Collins
 Attachments: hung.tt.stacks


 Currently the TT daemon does not shutdown (the process still runs but fails 
 to heartbeat) if the TT gets a fatal exception (eg due to losing all its 
 local storage directories).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2960) A single TT disk failure can cause the job to fail

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2960:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 A single TT disk failure can cause the job to fail
 --

 Key: MAPREDUCE-2960
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2960
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Affects Versions: 0.20.204.0
Reporter: Eli Collins

 TaskInProgress#kill in the JT fails because TaskStatus#setFinishTimes fails 
 because no start time was set. There's no start time because TaskTracker#run 
 (DefaultTaskController#initializeJob) failed before it was set. The fix is to 
 have TT#launchTask set the start time before it starts the task runner, this 
 way there's a valid start time even if TT#run fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3011) TT should remove bad local dirs from conf to prevent constant disk checking

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3011:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 TT should remove bad local dirs from conf to prevent constant disk checking
 ---

 Key: MAPREDUCE-3011
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3011
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Affects Versions: 0.20.204.0
Reporter: Eli Collins

 Per HADOOP-7551 the TT does not remove bad mapred.local.dirs from the conf so 
 after a single disk failure *every* call to get a local path for reading or 
 writing results in a disk check of *all* configured local dirs. After 
 detecting that a local dir is bad we should remove it from the conf so that 
 we don't repeatedly perform this expensive operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3080) dfs calls from streaming fails with ExceptionInInitializerError

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3080:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 dfs calls from streaming fails with ExceptionInInitializerError
 ---

 Key: MAPREDUCE-3080
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3080
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.20.205.0
Reporter: Ramya Sunil

 Dfs calls from streaming seem to fail with the following error:
 {noformat}
 Exception in thread main java.lang.ExceptionInInitializerError
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:57)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
 Caused by: org.apache.commons.logging.LogConfigurationException: 
 User-specified log class 'org.apache.commons.logging.impl.Log4JLogger' cannot 
 be found or is not useable.
   at 
 org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:874)
   at 
 org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:604)
   at 
 org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:336)
   at 
 org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:310)
   at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:685)
   at org.apache.hadoop.conf.Configuration.clinit(Configuration.java:142)
   ... 3 more
 {noformat}
 commons-logging-1.1.1.jar is in the classpath. An easy way to reproduce this 
 is, on a secure deploy, hadoop --config $HADOOP_CONF_DIR jar 
 hadoop-streaming.jar -input UserInput -output Out -mapper hadoop --config 
 $HADOOP_CONF_DIR dfs -help -reducer NONE

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2127) mapreduce trunk builds are failing on hudson

2011-10-19 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2127:
--

Summary: mapreduce trunk builds are failing on hudson  (was: mapreduce 
trunk builds are filing on hudson .. )

 mapreduce trunk builds are failing on hudson
 

 Key: MAPREDUCE-2127
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2127
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, pipes
Affects Versions: 0.20.203.1, 0.20.204.0, 0.23.0
Reporter: Giridharan Kesavan
Assignee: Bruno Mahé
 Fix For: 0.22.0, 0.23.0

 Attachments: MAPREDUCE-2127.patch, MAPREDUCE-2127.patch


 https://hudson.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/507/console
 [exec] checking for pthread.h... yes
  [exec] checking for pthread_create in -lpthread... yes
  [exec] checking for HMAC_Init in -lssl... no
  [exec] configure: error: Cannot find libssl.so
  [exec] 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/src/c++/pipes/configure:
  line 4250: exit: please: numeric argument required
  [exec] 
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/src/c++/pipes/configure:
  line 4250: exit: please: numeric argument required
 BUILD FAILED
 /grid/0/hudson/hudson-slave/workspace/Hadoop-Mapreduce-trunk-Commit/trunk/build.xml:1647:
  exec returned: 255

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-486) JobTracker web UI counts COMMIT_PENDING tasks as Running

2011-10-18 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-486:
-

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.205.0)

 JobTracker web UI counts COMMIT_PENDING tasks as Running
 

 Key: MAPREDUCE-486
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-486
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.20.2
Reporter: Todd Lipcon
Assignee: Harsh J
Priority: Minor
 Attachments: 0.20-MAPREDUCE-486.r1.diff, After Commit_pending 
 column.png, Before Commit_pending column.png, runcount.jar, runcount.tar.gz


 In jobdetails.jsp, tasks in COMMIT_PENDING state are listed as Running. I 
 propose creating another column in this table for COMMIT_PENDING tasks, since 
 users find it confusing that a given job can have more tasks Running than 
 their total cluster capacity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-1720) 'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker UI

2011-10-18 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-1720:
--

Target Version/s: 0.20.206.0, 0.22.0
   Fix Version/s: (was: 0.20.205.0)
  (was: 0.22.0)

  'Killed' jobs and 'Failed' jobs should be displayed seperately in JobTracker 
 UI
 

 Key: MAPREDUCE-1720
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1720
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.20.1
 Environment: all
Reporter: Subramaniam Krishnan
Assignee: Harsh J
 Attachments: mapred.failed.killed.difference.png, 
 mapreduce.unsuccessfuljobs.ui.r1.diff


 The JobTracker UI shows both Failed/Killed Jobs as Failed. The Killed job 
 status has been separated from Failed as part of HADOOP-3924, so the UI needs 
 to be updated to reflect the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2621) TestCapacityScheduler fails with Queue q1 does not exist

2011-10-01 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2621:
--

Affects Version/s: (was: 0.20.205.0)
Fix Version/s: (was: 0.20.205.0)

 TestCapacityScheduler fails with Queue q1 does not exist
 

 Key: MAPREDUCE-2621
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2621
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.204.0
 Environment: 0.20.1xx-Secondary 
Reporter: Sherry Chen
Assignee: Sherry Chen
Priority: Minor
 Fix For: 0.20.204.0

 Attachments: MAPREDUCE-2621.patch, MAPREDUCE-2621_1.patch


 {quote}
 Error Message
 Queue q1 does not exist
 Stacktrace
 java.io.IOException: Queue q1 does not exist
   at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:354)
   at 
 org.apache.hadoop.mapred.TestCapacityScheduler$FakeJobInProgress.init(TestCapacityScheduler.java:172)
   at 
 org.apache.hadoop.mapred.TestCapacityScheduler.submitJob(TestCapacityScheduler.java:794)
   at 
 org.apache.hadoop.mapred.TestCapacityScheduler.submitJob(TestCapacityScheduler.java:818)
   at 
 org.apache.hadoop.mapred.TestCapacityScheduler.submitJobAndInit(TestCapacityScheduler.java:825)
   at 
 org.apache.hadoop.mapred.TestCapacityScheduler.testMultiTaskAssignmentInMultipleQueues(TestCapacityScheduler.java:1109)
 {quote}
 When queue name is invalid, an exception is thrown now. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2651) Race condition in Linux Task Controller for job log directory creation

2011-10-01 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2651:
--

Fix Version/s: (was: 0.20.205.0)

 Race condition in Linux Task Controller for job log directory creation
 --

 Key: MAPREDUCE-2651
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2651
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task-controller
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0

 Attachments: MAPREDUCE-2651-1.patch


 There is a rare race condition in linux task controller when concurrent task 
 processes tries to create job log directory at the same time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2777) Backport MAPREDUCE-220 to Hadoop 20 security branch

2011-09-28 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2777:
--

Fix Version/s: 0.20.206.0

 Backport MAPREDUCE-220 to Hadoop 20 security branch
 ---

 Key: MAPREDUCE-2777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2777
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.205.0
Reporter: Jonathan Eagles
Assignee: Amar Kamat
 Fix For: 0.20.206.0

 Attachments: mapreduce-2777-v1.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2777) Backport MAPREDUCE-220 to Hadoop 20 security branch

2011-09-28 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-2777:
--

Target Version/s: 0.20.206.0
   Fix Version/s: (was: 0.20.206.0)

 Backport MAPREDUCE-220 to Hadoop 20 security branch
 ---

 Key: MAPREDUCE-2777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2777
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.205.0
Reporter: Jonathan Eagles
Assignee: Amar Kamat
 Attachments: mapreduce-2777-v1.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira