date:20100211

[
https://issues.apache.org/jira/browse/MAPREDUCE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832528#action_12832528
]

Hadoop QA commented on MAPREDUCE-1474:
--

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12435397/MAPREDUCE-1474.patch
against trunk revision 908321.

+1 @author. The patch does not contain any @author tags.

+0 tests included. The patch appears to be a documentation patch that
doesn't require tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/console

This message is automatically generated.

forrest docs for archives is out of date.
-

Key: MAPREDUCE-1474
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1474
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: documentation
Reporter: Mahadev konar
Assignee: Mahadev konar
Fix For: 0.22.0

Attachments: MAPREDUCE-1474.patch

The docs for archives are out of date. The new docs that were checked into
hadoop common were lost because of the project split.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1305) Running distcp with -delete incurs avoidable penalties

[
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832541#action_12832541
]

Hadoop QA commented on MAPREDUCE-1305:
--

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12435423/M1305-2.patch
against trunk revision 908321.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/console

This message is automatically generated.

Running distcp with -delete incurs avoidable penalties
--

Key: MAPREDUCE-1305
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
Attachments: M1305-1.patch, M1305-2.patch, MAPREDUCE-1305.patch

*First problem*
In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus
objects when the path is all we need.
The performance problem comes from
org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries
to retrieve file permissions by issuing a ls -ld path which is painfully
slow.
Changed that to just serialize Path and not FileStatus.
*Second problem*
To delete the files we invoke the hadoop command line tool with option
-rmr path. Again, for each file.
Changed that to dstfs.delete(path, true)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1305) Running distcp with -delete incurs avoidable penalties

2010-02-11 Thread Peter Romianowski (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832575#action_12832575
 ] 

Peter Romianowski commented on MAPREDUCE-1305:
--

Thanks Chris for remove calls to FsShell. I've been very busy lately so I did 
not manage to compile the patch.

 Running distcp with -delete incurs avoidable penalties
 --

 Key: MAPREDUCE-1305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
 Attachments: M1305-1.patch, M1305-2.patch, MAPREDUCE-1305.patch


 *First problem*
 In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus 
 objects when the path is all we need.
 The performance problem comes from 
 org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries 
 to retrieve file permissions by issuing a ls -ld path which is painfully 
 slow.
 Changed that to just serialize Path and not FileStatus.
 *Second problem*
 To delete the files we invoke the hadoop command line tool with option 
 -rmr path. Again, for each file.
 Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1251) c++ utils doesn't compile


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832581#action_12832581
 ] 

Todd Lipcon commented on MAPREDUCE-1251:


This should be committed to branch-0.20 as well, since it causes a fail to 
build from release source on many systems. 

 c++ utils doesn't compile
 -

 Key: MAPREDUCE-1251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: ubuntu karmic 64-bit
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch


 c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for 
 HADOOP-5611 needs to be applied first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (MAPREDUCE-1251) c++ utils doesn't compile


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened MAPREDUCE-1251:



 c++ utils doesn't compile
 -

 Key: MAPREDUCE-1251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: ubuntu karmic 64-bit
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch


 c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for 
 HADOOP-5611 needs to be applied first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1305) Running distcp with -delete incurs avoidable penalties

2010-02-11 Thread Tsz Wo (Nicholas), SZE (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-1305:
--

Hadoop Flags: [Reviewed]

+1 patch looks good.

 Running distcp with -delete incurs avoidable penalties
 --

 Key: MAPREDUCE-1305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
 Attachments: M1305-1.patch, M1305-2.patch, MAPREDUCE-1305.patch


 *First problem*
 In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus 
 objects when the path is all we need.
 The performance problem comes from 
 org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries 
 to retrieve file permissions by issuing a ls -ld path which is painfully 
 slow.
 Changed that to just serialize Path and not FileStatus.
 *Second problem*
 To delete the files we invoke the hadoop command line tool with option 
 -rmr path. Again, for each file.
 Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1484) Framework should not sort the input splits

2010-02-11 Thread Owen O'Malley (JIRA)

Framework should not sort the input splits
--

 Key: MAPREDUCE-1484
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1484
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Owen O'Malley


Currently the framework sorts the input splits by size before the job is 
submitted. This makes it very difficult to run map only jobs that transform the 
input because the assignment of input names to output names isn't obvious. We 
fixed this once in HADOOP-1440, but the fix was broken so it was rolled back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832608#action_12832608
 ] 

Aaron Kimball commented on MAPREDUCE-1434:
--

Owen,

The {{getNewInputSplits}} method proposed above requires the InputFormat to 
maintain state containing the previously-enumerated InputSplits. The proposed 
command-line tools suggest independent user-side processes performing the 
addition of files to the job, making this challenging. Given that splits are 
calculated on the client, but the true list of input splits is held by the 
JobTracker (or is/could the splits file be written to HDFS?), calculating just 
the delta might be challenging.

I think it might be more reasonable if one of the following things were true:
* The client code just calls {{getInputSplits()}} again. The same algorithm is 
run as in initial job submission, but the output list may be longer than the 
previous list returned by this method. The InputFormat is responsible for 
ensuring that it doesn't return any fewer splits than it did before (i.e., 
don't drop inputs)
* For that matter, if the input queue for a job is dynamic, I don't see why 
this same mechanism couldn't be used to drop splits that are, for whatever 
reason, irrelevant.
* {{getNewInputSplits()}} should have the signature: {{InputSplit [] 
getNewInputSplits(JobContext job, ListInputSplit existingSplits) throws 
IOException, InterruptedException}}.

The latter case would present to the user a list of the existing inputs read 
from the existing 'splits' file for the job. That way state-tracking is 
unnecessary; you can just use (e.g.) a PathFilter to disregard things already 
in {{existingSplits}}.

A final proposition is that users must manually specify new paths (or other 
arbitrary arguments like database table names, URLs, etc) to include, in 
addition to the InputFormat. In which case, it might look more sane to have:
* {{getNewInputSplits()}} should have the signature: {{InputSplit [] 
getNewInputSplits(JobContext job, String... newSplitHints) throws IOException, 
InterruptedException}}.

The {{newSplitHints}} is effectively a user-specified argv; it can be decoded 
as a list of Paths, database tables, etc., and used appropriately by the 
InputFormat to generate new splits.

Other question: What are the semantics of a doubly-specified split? (Especially 
curious about the inexact match case, where the same file in HDFS is enumerated 
twice but the splits are at different offsets) Can/should the same file be 
processed twice in a job?

Finally: Why does a user-disconnect timeout kill the job? That's different than 
the usual case in MapReduce, where a user disconnect is not noticed by the 
server-side processes at all. I would think that after a user-disconnect 
timeout, that declares that all the input is added, and that the reduce phase 
can begin -- not that it should kill things. 

 Dynamic add input for one job
 -

 Key: MAPREDUCE-1434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
 Environment: 0.19.0
Reporter: Xing Shi

 Always we should firstly upload the data to hdfs, then we can analize the 
 data using hadoop mapreduce.
 Sometimes, the upload process takes long time. So if we can add input during 
 one job, the time can be saved.
 WHAT?
 Client:
 a) hadoop job -add-input jobId inputFormat ...
 Add the input to jobid
 b) hadoop job -add-input done
 Tell the JobTracker, the input has been prepared over.
 c) hadoop job -add-input status jobid
 Show how many input the jobid has.
 HOWTO?
 Mainly, I think we should do three things:
 1. JobClinet: here JobClient should support add input to a job, indeed, 
 JobClient generate the split, and submit to JobTracker.
 2. JobTracker: JobTracker support addInput, and add the new tasks to the 
 original mapTasks. Because the uploaded data will be 
 processed quickly, so it also should update the scheduler to support pending 
 a map task till Client tells the Job input done.
 3. Reducer: the reducer should also update the mapNums, so it will shuffle 
 right.
 This is the rough idea, and I will update it .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1436) Deadlock in preemption code in fair scheduler

2010-02-11 Thread Matei Zaharia (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832623#action_12832623
 ] 

Matei Zaharia commented on MAPREDUCE-1436:
--

Are you suggesting that I add a JobTracker lock in update() or in the 
JobListener methods? I think it's best to add it in update() because it also 
gets called from a separate thread. This actually happens quite rarely now (it 
used to be every few seconds, but it's every 15 seconds after MAPREDUCE-706, 
and can be set higher pretty safely).

BTW, I found another deadlock that seems to be much rarer (it happened when I 
was submitting about 50 jobs simultaneously) but is not related to preemption:

code

Found one Java-level deadlock:
=
IPC Server handler 24 on 9001:
  waiting to lock monitor 0x40c91750 (object 0x7fc0243e2c20, a 
org.apache.hadoop.mapred.JobTracker),
  which is held by IPC Server handler 0 on 9001
IPC Server handler 0 on 9001:
  waiting to lock monitor 0x40bc0770 (object 0x7fc0243e3080, a 
org.apache.hadoop.mapred.FairScheduler),
  which is held by FairScheduler update thread
FairScheduler update thread:
  waiting to lock monitor 0x4095dd98 (object 0x7fc0258bc0d0, a 
org.apache.hadoop.mapred.JobInProgress),
  which is held by IPC Server handler 0 on 9001

Java stack information for the threads listed above:
===
IPC Server handler 24 on 9001:
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2487)
- waiting to lock 0x7fc0243e2c20 (a 
org.apache.hadoop.mapred.JobTracker)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
IPC Server handler 0 on 9001:
at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2115)
- waiting to lock 0x7fc0243e3080 (a 
org.apache.hadoop.mapred.FairScheduler)
- locked 0x7fc0243e3420 (a java.util.TreeMap)
- locked 0x7fc0243e2c20 (a org.apache.hadoop.mapred.JobTracker)
at 
org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2510)
- locked 0x7fc0258bc0d0 (a org.apache.hadoop.mapred.JobInProgress)
at 
org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2146)
at 
org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2084)
- locked 0x7fc0258bc0d0 (a org.apache.hadoop.mapred.JobInProgress)
at 
org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:883)
- locked 0x7fc0258bc0d0 (a org.apache.hadoop.mapred.JobInProgress)
at 
org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3564)
at 
org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2758)
- locked 0x7fc0243e2c20 (a org.apache.hadoop.mapred.JobTracker)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2553)
- locked 0x7fc0243e2c20 (a org.apache.hadoop.mapred.JobTracker)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
FairScheduler update thread:
at 
org.apache.hadoop.mapred.JobInProgress.scheduleReduces(JobInProgress.java:1203)
- waiting to lock 0x7fc0258bc0d0 (a 
org.apache.hadoop.mapred.JobInProgress)
at 
org.apache.hadoop.mapred.JobSchedulable.updateDemand(JobSchedulable.java:53)
at 
org.apache.hadoop.mapred.PoolSchedulable.updateDemand(PoolSchedulable.java:81)
at org.apache.hadoop.mapred.FairScheduler.update(FairScheduler.java:577)
- locked 0x7fc0243e3080 (a org.apache.hadoop.mapred.FairScheduler)
at 
org.apache.hadoop.mapred.FairScheduler$UpdateThread.run(FairScheduler.java:277)
/code

The problem in this

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

[
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832642#action_12832642
]

Hadoop QA commented on MAPREDUCE-1309:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12435485/mapreduce-1309--2010-02-10.patch
against trunk revision 908321.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 17 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated 1 warning messages.

-1 javac. The applied patch generated 2219 javac compiler warnings (more
than the trunk's current 2215 warnings).

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/console

This message is automatically generated.

I want to change the rumen job trace generator to use a more modular internal
structure, to allow for more input log formats
-

Key: MAPREDUCE-1309
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
Project: Hadoop Map/Reduce
Issue Type: Improvement
Reporter: Dick King
Assignee: Dick King
Attachments: demuxer-plus-concatenated-files--2009-12-21.patch,
demuxer-plus-concatenated-files--2010-01-06.patch,
demuxer-plus-concatenated-files--2010-01-08-b.patch,
demuxer-plus-concatenated-files--2010-01-08-c.patch,
demuxer-plus-concatenated-files--2010-01-08-d.patch,
demuxer-plus-concatenated-files--2010-01-08.patch,
demuxer-plus-concatenated-files--2010-01-11.patch,
mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch,
mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch,
mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch

There are two orthogonal questions to answer when processing a job tracker
log: how will the logs and the xml configuration files be packaged, and in
which release of hadoop map/reduce were the logs generated? The existing
rumen only has a couple of answers to this question. The new engine will
handle three answers to the version question: 0.18, 0.20 and current, and two
answers to the packaging question: separate files with names derived from the
job ID, and concatenated files with a header between sections [used for
easier file interchange].

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1470) Move Delegation token into Common so that we can use it for MapReduce also


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832645#action_12832645
 ] 

Hudson commented on MAPREDUCE-1470:
---

Integrated in Hadoop-Mapreduce-trunk #232 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/])


 Move Delegation token into Common so that we can use it for MapReduce also
 --

 Key: MAPREDUCE-1470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1470
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.22.0

 Attachments: mr-1470.patch


 We need to update one reference for map/reduce when we move the hdfs 
 delegation tokens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832646#action_12832646
 ] 

Hudson commented on MAPREDUCE-1433:
---

Integrated in Hadoop-Mapreduce-trunk #232 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/])


 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.22.0

 Attachments: 1433.bp20.patch, 1433.bp20.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832648#action_12832648
 ] 

Hudson commented on MAPREDUCE-1399:
---

Integrated in Hadoop-Mapreduce-trunk #232 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/])


 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, 
 m1399_20100205trunk2_y0.20.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1448) [Mumak] mumak.sh does not honor --config option.


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832647#action_12832647
 ] 

Hudson commented on MAPREDUCE-1448:
---

Integrated in Hadoop-Mapreduce-trunk #232 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/])


 [Mumak] mumak.sh does not honor --config option.
 

 Key: MAPREDUCE-1448
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1448
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0, 0.22.0
Reporter: Hong Tang
Assignee: Hong Tang
 Fix For: 0.21.0

 Attachments: mapred-1448-2.patch, mapred-1448.patch


 When --config is specified, mumak.sh should put the customized conf directory 
 in the classpath.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832644#action_12832644
 ] 

Hudson commented on MAPREDUCE-1425:
---

Integrated in Hadoop-Mapreduce-trunk #232 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/])


 archive throws OutOfMemoryError
 ---

 Key: MAPREDUCE-1425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, 
 MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, 
 MAPREDUCE-1425_y_0.20.patch


 {noformat}
 -bash-3.1$ hadoop  archive -archiveName t4.har -p . t4 .
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.compile(Pattern.java:1432)
 at java.util.regex.Pattern.init(Pattern.java:1133)
 at java.util.regex.Pattern.compile(Pattern.java:847)
 at java.lang.String.replace(String.java:2208)
 at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
 at 
 org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
 at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at 
 org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1320) StringBuffer - StringBuilder occurence

[
https://issues.apache.org/jira/browse/MAPREDUCE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832654#action_12832654
]

Hadoop QA commented on MAPREDUCE-1320:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12428677/MAPREDUCE-1320.patch
against trunk revision 908321.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

-1 patch. The patch command could not apply the patch.

Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/443/console

This message is automatically generated.

StringBuffer - StringBuilder occurence

Key: MAPREDUCE-1320
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1320
Project: Hadoop Map/Reduce
Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Kay Kay
Fix For: 0.22.0

Attachments: MAPREDUCE-1320.patch

A good number of toString() implementations use StringBuffer when the
reference clearly does not go out of scope of the method and no concurrency
is needed. Patch contains replacing those occurences from StringBuffer to
StringBuilder.
Created against map/reduce project trunk .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1485) CapacityScheduler should have prevent a single job taking over large parts of a cluster

2010-02-11 Thread Arun C Murthy (JIRA)

CapacityScheduler should have prevent a single job taking over large parts of a 
cluster
---

 Key: MAPREDUCE-1485
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1485
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/capacity-sched
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.22.0


The proposal is to have a per-queue limit on the number of concurrent tasks a 
job can run on a cluster. 

We've seen cases where a single, large, job took over a majority of the cluster 
- worse, it meant that any bug in it caused issues for both the NameNode _and_ 
the JobTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed

2010-02-11 Thread Edward Capriolo (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832677#action_12832677
]

Edward Capriolo commented on MAPREDUCE-323:
---

Being able to control the structure better is definitely a nice feature.
Practically, for dividing the job folders by mm/dd/yy would solve the immediate
problem on having to clean and restart your JobTracker when you hit ext3 limit.
Introducing a variable into the jobtracker mapred.jobhistory.maxjobhistory and
a FIFO queue might be helpful as well. As things stand now a downtime and
cleanup is needed to keep the JobTracker running well, this is less then
optimal.

Improve the way job history files are managed
-

Key: MAPREDUCE-323
URL: https://issues.apache.org/jira/browse/MAPREDUCE-323
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: jobtracker
Affects Versions: 0.21.0, 0.22.0
Reporter: Amar Kamat
Assignee: Amareshwari Sriramadasu
Priority: Critical

Today all the jobhistory files are dumped in one _job-history_ folder. This
can cause problems when there is a need to search the history folder
(job-recovery etc). It would be nice if we group all the jobs under a _user_
folder. So all the jobs for user _amar_ will go in _history-folder/amar/_.
Jobs can be categorized using various features like _jobid, date, jobname_
etc but using _username_ will make the search much more efficient and also
will not result into namespace explosion.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1431) archive does not work with distcp -update

2010-02-11 Thread Tsz Wo (Nicholas), SZE (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832697#action_12832697
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1431:
---

Took a closer look: HarFileSystem extends FilterFileSystem and it uses the 
underlying file system to get file checksum.  That's why we got Wrong FS since 
HarFileSystem passes a har:// path to the underlying fs.getFileChecksum(..).  
In our case, the underlying fs is hdfs.


 archive does not work with distcp -update
 -

 Key: MAPREDUCE-1431
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1431
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0


 The following distcp command  works.
 {noformat}
 hadoop distcp -Dmapred.job.queue.name=q 
 har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101 t101_distcp
 {noformat}
 However, it does not work for -update.
 {noformat}
 -bash-3.1$ hadoop distcp -Dmapred.job.queue.name=q -update 
 har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101 t101_distcp
 10/01/29 20:06:53 INFO tools.DistCp: 
 srcPaths=[har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101]
 10/01/29 20:06:53 INFO tools.DistCp: destPath=t101
 java.lang.IllegalArgumentException: Wrong FS: 
 har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101/text-, expected: 
 hdfs://nn_hostname
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:463)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:46)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileChecksum(FilterFileSystem.java:250)
 at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1204)
 at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1084)
 ...
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

[
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832705#action_12832705
]

Hadoop QA commented on MAPREDUCE-1341:
--

+1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12435484/MAPREDUCE-1341.6.patch
against trunk revision 908321.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 27 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/console

This message is automatically generated.

Sqoop should have an option to create hive tables and skip the table import
step

Key: MAPREDUCE-1341
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
Fix For: 0.22.0

Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch,
MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch,
MAPREDUCE-1341.patch

In case the client only needs to create tables in hive, it would be helpful
if Sqoop had an optional parameter:
--hive-create-only
which would omit the time consuming table import step, generate hive create
table statements and run them.
If this feature seems useful, I can generate the patch. I have modified the
Sqoop code and built it on my development machine, and it seems to be working
well.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1334) contrib/index - test - TestIndexUpdater fails due to an additional presence of file _SUCCESS in hdfs

[
https://issues.apache.org/jira/browse/MAPREDUCE-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832709#action_12832709
]

Hadoop QA commented on MAPREDUCE-1334:
--

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12429081/MAPREDUCE-1334.patch
against trunk revision 908321.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/console

This message is automatically generated.

contrib/index - test - TestIndexUpdater fails due to an additional presence
of file _SUCCESS in hdfs
-

Key: MAPREDUCE-1334
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1334
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/index
Reporter: Kay Kay
Priority: Critical
Fix For: 0.21.0

Attachments: MAPREDUCE-1334.patch

$ cd src/contrib/index
$ ant clean test
This fails the test TestIndexUpdater due to a mismatch in the - doneFileNames
- data structure, when it is being run with different parameters.
(ArrayIndexOutOfBoundsException raised when inserting elements in
doneFileNames, array ).
Debugging further - there seems to be an additional file called as -
hdfs://localhost:36021/myoutput/_SUCCESS , taken into consideration in
addition to those that begins with done* . The presence of the extra file
causes the error.
Attaching a patch that would circumvent this by increasing the array length
of shards by 1 .
But longer term the test fixtures need to be probably revisited to see if the
presence of _SUCCESS as a file is a good thing to begin with before we even
get to this test case.
Any comments / suggestions on the same welcome.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (MAPREDUCE-1375) TestFileArgs fails intermittently


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned MAPREDUCE-1375:
--

Assignee: Todd Lipcon

 TestFileArgs fails intermittently
 -

 Key: MAPREDUCE-1375
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Amar Kamat
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: TEST-org.apache.hadoop.streaming.TestFileArgs.txt


 TestFileArgs failed once for me with the following error
 {code}
 expected:[job.jar
 sidefile
 tmp
 ] but was:[]
 sidefile
 tmp
 ] but was:[]
 at 
 org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107)
 at 
 org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1375) TestFileArgs fails intermittently

[
https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832751#action_12832751
]

Todd Lipcon commented on MAPREDUCE-1375:

I think I got this figured out. The issue is that the test actually tries to
write some roses are red text to ls's stdin. Very infrequently, the ls will
actually complete before the data can be flushed, so the task gets a Broken
pipe exception - see MAPREDUCE-1481. I'm actually unsure whether
MAPREDUCE-1481 is a bug, but the easy fix for this test is to make the input
so no data gets written into ls's stdin.

I'm running the test in a loop with this fix now. If it keeps going for a
couple hours without failure I'll post a patch. (before, this loop would fail
after about 10 minutes usually)

TestFileArgs fails intermittently
-

Key: MAPREDUCE-1375
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: test
Reporter: Amar Kamat
Assignee: Todd Lipcon
Fix For: 0.22.0

Attachments: TEST-org.apache.hadoop.streaming.TestFileArgs.txt

TestFileArgs failed once for me with the following error
{code}
expected:[job.jar
sidefile
tmp
] but was:[]
sidefile
tmp
] but was:[]
at
org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107)
at
org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123)
{code}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1481) Streaming should swallow IOExceptions when closing clientOut

[
https://issues.apache.org/jira/browse/MAPREDUCE-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832753#action_12832753
]

Todd Lipcon commented on MAPREDUCE-1481:

Actually, I think this is a bug but not quite how I described it. If the flush
fails, it means we were trying to write data into a streaming executable that
didn't consume all of its input.

I don't know what the expected behavior is here. Right now, the behavior is
that we stop consuming its output, but the task still succeeds so long as the
exit code is 0. I think this is incorrect. We should either entirely fail the
task regardless of exit code, or we should consume the rest of its output.

Streaming should swallow IOExceptions when closing clientOut

Key: MAPREDUCE-1481
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1481
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/streaming
Affects Versions: 0.20.1, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon

in PipeMapRed.mapRedFinished, streaming flushes and closes clientOut_, the
handle to the subprocess's stdin. If the subprocess has already exited or
closed its stdin, this will generate a Broken Pipe IOException. This causes
us to skip waitOutputThreads, which is incorrect, since the subprocess may
have data still written from stdout that needs to be read.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1375) TestFileArgs fails intermittently


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1375:
---

Attachment: mapreduce-1375.txt

I think this patch fixes the problem.

 TestFileArgs fails intermittently
 -

 Key: MAPREDUCE-1375
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Amar Kamat
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: mapreduce-1375.txt, 
 TEST-org.apache.hadoop.streaming.TestFileArgs.txt


 TestFileArgs failed once for me with the following error
 {code}
 expected:[job.jar
 sidefile
 tmp
 ] but was:[]
 sidefile
 tmp
 ] but was:[]
 at 
 org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107)
 at 
 org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1375) TestFileArgs fails intermittently


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1375:
---

Status: Patch Available  (was: Open)

 TestFileArgs fails intermittently
 -

 Key: MAPREDUCE-1375
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Amar Kamat
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: mapreduce-1375.txt, 
 TEST-org.apache.hadoop.streaming.TestFileArgs.txt


 TestFileArgs failed once for me with the following error
 {code}
 expected:[job.jar
 sidefile
 tmp
 ] but was:[]
 sidefile
 tmp
 ] but was:[]
 at 
 org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107)
 at 
 org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1462) Enable context-specific and stateful serializers in MapReduce


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1462:
-

Attachment: MAPREDUCE-1462-mr.patch
MAPREDUCE-1462-common.patch

In order to help understand the problem better I've created a demonstration 
patch that uses the SerializationContext-based user API, while retaining the 
Serialization code that exists in common. (In fact, I had to make some changes 
to the Serialization code so that it can retain its metadata in an instance 
variable.)

Here's what the configuration looks like for the user:

{code}
Schema keySchema = Schema.create(Schema.Type.STRING);
Schema valSchema = Schema.create(Schema.Type.LONG);
job.setSerialization(Job.SerializationContext.MAP_OUTPUT_KEY,
   new AvroGenericSerialization(keySchema));
job.setSerialization(Job.SerializationContext.MAP_OUTPUT_VALUE,
   new AvroGenericSerialization(valSchema));
{code}

 Enable context-specific and stateful serializers in MapReduce
 -

 Key: MAPREDUCE-1462
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1462
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: h-1462.patch, MAPREDUCE-1462-common.patch, 
 MAPREDUCE-1462-mr.patch


 Although the current serializer framework is powerful, within the context of 
 a job it is limited to picking a single serializer for a given class. 
 Additionally, Avro generic serialization can make use of additional 
 configuration/state such as the schema. (Most other serialization frameworks 
 including Writable, Jute/Record IO, Thrift, Avro Specific, and Protocol 
 Buffers only need the object's class name to deserialize the object.)
 With the goal of keeping the easy things easy and maintaining backwards 
 compatibility, we should be able to allow applications to use context 
 specific (eg. map output key) serializers in addition to the current type 
 based ones that handle the majority of the cases. Furthermore, we should be 
 able to support serializer specific configuration/metadata in a type safe 
 manor without cluttering up the base API with a lot of new methods that will 
 confuse new users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-434) local map-reduce job limited to single reducer

[
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832781#action_12832781
]

Hadoop QA commented on MAPREDUCE-434:
-

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12435513/MAPREDUCE-434.5.patch
against trunk revision 908321.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/console

This message is automatically generated.

local map-reduce job limited to single reducer
--

Key: MAPREDUCE-434
URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
Project: Hadoop Map/Reduce
Issue Type: Bug
Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch,
MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.patch

when mapred.job.tracker is set to 'local', my setNumReduceTasks call is
ignored, and the number of reduce tasks is set at 1.
This prevents me from locally debugging my partition function, which tries to
partition based on the number of reduce tasks.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-326:


Attachment: MAPREDUCE-326.pdf

Here's a proposal for a binary API for review.

 The lowest level map-reduce APIs should be byte oriented
 

 Key: MAPREDUCE-326
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: eric baldeschwieler
 Attachments: MAPREDUCE-326.pdf


 As discussed here:
 https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
 The templates, serializers and other complexities that allow map-reduce to 
 use arbitrary types complicate the design and lead to lots of object creates 
 and other overhead that a byte oriented design would not suffer.  I believe 
 the lowest level implementation of hadoop map-reduce should have byte string 
 oriented APIs (for keys and values).  This API would be more performant, 
 simpler and more easily cross language.
 The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1480) CombineFileRecordReader does not properly initialize child RecordReader

[
https://issues.apache.org/jira/browse/MAPREDUCE-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832784#action_12832784
]

Hadoop QA commented on MAPREDUCE-1480:
--

+1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12435529/MAPREDUCE-1480.2.patch
against trunk revision 908321.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/console

This message is automatically generated.

CombineFileRecordReader does not properly initialize child RecordReader
---

Key: MAPREDUCE-1480
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1480
Project: Hadoop Map/Reduce
Issue Type: Bug
Reporter: Aaron Kimball
Assignee: Aaron Kimball
Attachments: MAPREDUCE-1480.2.patch, MAPREDUCE-1480.patch

CombineFileRecordReader instantiates child RecordReader instances but never
calls their initialize() method to give them the proper TaskAttemptContext.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1486) Configuration data should be preserved within the same MapTask

Configuration data should be preserved within the same MapTask
--

 Key: MAPREDUCE-1486
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1486
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Aaron Kimball
Assignee: Aaron Kimball


Map tasks involve a number of Contexts -- at least a TaskAttemptContext and a 
MapContext. These context objects contain a Configuration each; when one 
context is initialized, it initializes its own Configuration by deep-copying a 
previous Configuration.

If one Context instance is used entirely prior to a second, more specific 
Context then the second Context should contain the configuration data 
initialized in the previous Context. This specifically affects the interaction 
between an InputFormat and its RecordReader instance(s).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1486) Configuration data should be preserved within the same MapTask


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1486:
-

Attachment: MAPREDUCE-1486.patch

Attaching patch which fixes this problem; now the same configuration data will 
flow forward through the map task. This patch also contains a test case that 
highlights the problem.

 Configuration data should be preserved within the same MapTask
 --

 Key: MAPREDUCE-1486
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1486
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1486.patch


 Map tasks involve a number of Contexts -- at least a TaskAttemptContext and a 
 MapContext. These context objects contain a Configuration each; when one 
 context is initialized, it initializes its own Configuration by deep-copying 
 a previous Configuration.
 If one Context instance is used entirely prior to a second, more specific 
 Context then the second Context should contain the configuration data 
 initialized in the previous Context. This specifically affects the 
 interaction between an InputFormat and its RecordReader instance(s).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1486) Configuration data should be preserved within the same MapTask


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1486:
-

Status: Patch Available  (was: Open)

 Configuration data should be preserved within the same MapTask
 --

 Key: MAPREDUCE-1486
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1486
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1486.patch


 Map tasks involve a number of Contexts -- at least a TaskAttemptContext and a 
 MapContext. These context objects contain a Configuration each; when one 
 context is initialized, it initializes its own Configuration by deep-copying 
 a previous Configuration.
 If one Context instance is used entirely prior to a second, more specific 
 Context then the second Context should contain the configuration data 
 initialized in the previous Context. This specifically affects the 
 interaction between an InputFormat and its RecordReader instance(s).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832797#action_12832797
 ] 

Aaron Kimball commented on MAPREDUCE-1341:
--

+1; patch #6 looks good to me. If someone could commit this, that'd be superb.


 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, 
 MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-11 Thread Leonid Furman (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832803#action_12832803
 ] 

Leonid Furman commented on MAPREDUCE-1341:
--

Thanks, Aaron!

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, 
 MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-326:


Attachment: MAPREDUCE-326-api.patch

And an accompanying draft patch for the raw API classes.

 The lowest level map-reduce APIs should be byte oriented
 

 Key: MAPREDUCE-326
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: eric baldeschwieler
 Attachments: MAPREDUCE-326-api.patch, MAPREDUCE-326.pdf


 As discussed here:
 https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
 The templates, serializers and other complexities that allow map-reduce to 
 use arbitrary types complicate the design and lead to lots of object creates 
 and other overhead that a byte oriented design would not suffer.  I believe 
 the lowest level implementation of hadoop map-reduce should have byte string 
 oriented APIs (for keys and values).  This API would be more performant, 
 simpler and more easily cross language.
 The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1220) Implement an in-cluster LocalJobRunner

[
https://issues.apache.org/jira/browse/MAPREDUCE-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832838#action_12832838
]

Tom White commented on MAPREDUCE-1220:
--

bq. Most of the effort involved teasing out the framework in the MapTask and
ReduceTask to allow several components such as MapOutputBuffer,
ReduceValuesIterator etc. to be used as 'pluggable' components.

Interesting. MAPREDUCE-326 has a proposal for making these components
pluggable, which might make the work of this JIRA simpler.

Implement an in-cluster LocalJobRunner
--

Key: MAPREDUCE-1220
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1220
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: client, jobtracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Fix For: 0.22.0

Attachments: MAPREDUCE-1220_yhadoop20.patch

Currently very small map-reduce jobs suffer from latency issues due to
overheads in Hadoop Map-Reduce such as scheduling, jvm startup etc. We've
periodically tried to optimize all parts of framework to achieve lower
latencies.
I'd like to turn the problem around a little bit. I propose we allow very
small jobs to run as a single task job with multiple maps and reduces i.e.
similar to our current implementation of the LocalJobRunner. Thus, under
certain conditions (maybe user-set configuration, or if input data is small
i.e. less a DFS blocksize) we could launch a special task which will run all
maps in a serial manner, followed by the reduces. This would really help
small jobs achieve significantly smaller latencies, thanks to lesser
scheduling overhead, jvm startup, lack of shuffle over the network etc.
This would be a huge benefit, especially on large clusters, to small Hive/Pig
queries.
Thoughts?

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1341:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Leonid!

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, 
 MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1469) Sqoop should disable speculative execution in export


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1469:
-

   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

+1

I've just committed this. Thanks Aaron!

 Sqoop should disable speculative execution in export
 

 Key: MAPREDUCE-1469
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1469
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1469.patch


 Concurrent writers of the same output shard may cause the database to try to 
 insert duplicate primary keys concurrently. Not a good situation. Speculative 
 execution should be forced off for this operation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1476) committer.needsTaskCommit should not be called for a task cleanup attempt