[jira] Created: (MAPREDUCE-1482) Better handling of task diagnostic information stored in the TaskInProgress

2010-02-11 Thread Amar Kamat (JIRA)
Better handling of task diagnostic information stored in the TaskInProgress
---

 Key: MAPREDUCE-1482
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1482
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amar Kamat


Task diagnostic information can be very large at times eating up Jobtracker's 
memory. There should be some way to avoid storing large error strings in 
JobTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1307) Introduce the concept of Job Permissions

2010-02-11 Thread Vinod K V (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1307:
-

Attachment: MAPREDUCE-1307-20100211.txt

Updated patch that fixes client side to print nice message in case of 
unauthorized access.

NOTE: CompletedJobStore needs to be fixed w.r.t authorization. This might 
involve serializing the ACLs to the job-store on DFS and using the same for 
authorizing further requests. I'll do it as part of a follow-up issue.

 Introduce the concept of Job Permissions
 

 Key: MAPREDUCE-1307
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1307
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Devaraj Das
Assignee: Vinod K V
 Fix For: 0.22.0

 Attachments: 1307-early-1.patch, MAPREDUCE-1307-20100210.txt, 
 MAPREDUCE-1307-20100211.txt


 It would be good to define the notion of job permissions analogous to file 
 permissions. Then the JobTracker can restrict who can read (e.g. look at 
 the job page) or modify (e.g. kill) jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1483) CompletedJobStore should be authorized using job-acls

2010-02-11 Thread Vinod K V (JIRA)
CompletedJobStore should be authorized using job-acls
-

 Key: MAPREDUCE-1483
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1483
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker, security
Reporter: Vinod K V
 Fix For: 0.22.0


MAPREDUCE-1307 adds job-acls. CompletedJobStore serves job-status off DFS after 
jobs are long gone and needs to have job-acls also serialized so as to 
facilitate authorization of job related requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1354) Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS accesses

2010-02-11 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832437#action_12832437
 ] 

Hemanth Yamijala commented on MAPREDUCE-1354:
-

One thing that was noticed was that the getCounters call in JobInProgress is 
synchronized. The wrapper call to getCounters in Jobtracker acquires a lock on 
the JT and then calls JobInProgress.getCounters. The problem is that if the job 
is being initialized under initTasks, then the jobtracker lock can get held up. 
We saw an instance of this on our clusters. To avoid this case, one solution 
could be to check if the job being queried is inited. This pattern is used in 
getTaskCompletionEvents.

 Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS 
 accesses
 -

 Key: MAPREDUCE-1354
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Devaraj Das
Assignee: Arun C Murthy
Priority: Critical
 Attachments: MAPREDUCE-1354_yhadoop20.patch


 It'd be nice to have the JobTracker object not be locked while accessing the 
 HDFS for reading the jobconf file and while writing the jobinfo file in the 
 submitJob method. We should see if we can avoid taking the lock altogether.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1455) Authorization for servlets

2010-02-11 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832450#action_12832450
 ] 

Ravi Gummadi commented on MAPREDUCE-1455:
-

One more thing is /logs, /static, /stack, /conf, /logLevel etc. are not going 
through authorization as part of this JIRA. It needs changes in common and will 
be addressed in a separate JIRA.

 Authorization for servlets
 --

 Key: MAPREDUCE-1455
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1455
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobtracker, security, tasktracker
Reporter: Devaraj Das
Assignee: Ravi Gummadi
 Fix For: 0.22.0


 This jira is about building the authorization for servlets (on top of 
 MAPREDUCE-1307). That is, the JobTracker/TaskTracker runs authorization 
 checks on web requests based on the configured job permissions. For e.g., if 
 the job permission is 600, then no one except the authenticated user can look 
 at the job details via the browser. The authenticated user in the servlet can 
 be obtained using the HttpServletRequest method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1398) TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed.

2010-02-11 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1398:
---

Assignee: Amareshwari Sriramadasu
  Status: Patch Available  (was: Open)

 TaskLauncher remains stuck on tasks waiting for free nodes even if task is 
 killed.
 --

 Key: MAPREDUCE-1398
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1398
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Reporter: Hemanth Yamijala
Assignee: Amareshwari Sriramadasu
 Attachments: patch-1398.txt


 Tasks could be assigned to trackers for slots that are running other tasks in 
 a commit pending state. This is an optimization done to pipeline task 
 assignment and launch. When the task reaches the tracker, it waits until 
 sufficient slots become free for it. This wait is done in the TaskLauncher 
 thread. Now, while waiting, if the task is killed externally (maybe because 
 the job finishes, etc), the TaskLauncher is not notified of this. So, it 
 continues to wait for the killed task to get sufficient slots. If slots do 
 not become free for a long time, this would result in considerable delay in 
 waking up the TaskLauncher thread. If the waiting task happens to be a high 
 RAM task, then it is also wasteful, because by waking up, it can make way for 
 normal tasks that can run on the available number of slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1398) TaskLauncher remains stuck on tasks waiting for free nodes even if task is killed.

2010-02-11 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1398:
---

Attachment: patch-1398.txt

Patch fixing the bug. Added a testcase which fails without the patch and passes 
with the patch.

 TaskLauncher remains stuck on tasks waiting for free nodes even if task is 
 killed.
 --

 Key: MAPREDUCE-1398
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1398
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Reporter: Hemanth Yamijala
 Attachments: patch-1398.txt


 Tasks could be assigned to trackers for slots that are running other tasks in 
 a commit pending state. This is an optimization done to pipeline task 
 assignment and launch. When the task reaches the tracker, it waits until 
 sufficient slots become free for it. This wait is done in the TaskLauncher 
 thread. Now, while waiting, if the task is killed externally (maybe because 
 the job finishes, etc), the TaskLauncher is not notified of this. So, it 
 continues to wait for the killed task to get sufficient slots. If slots do 
 not become free for a long time, this would result in considerable delay in 
 waking up the TaskLauncher thread. If the waiting task happens to be a high 
 RAM task, then it is also wasteful, because by waking up, it can make way for 
 normal tasks that can run on the available number of slots.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1354) Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS accesses

2010-02-11 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832468#action_12832468
 ] 

Amar Kamat commented on MAPREDUCE-1354:
---

Job initialization (job.split localization)  can also take up considerable 
amount of time. Hence we should avoid access to any getter calls to 
JobInProgress while the initialization is in progress. Following are the other 
methods that first lock the JobTracker and then JobInProgress potentially 
locking up the JobTracker during the job initialization.
- getMapTaskReports()
- getReduceTaskReports()
- getCleanupTaskReports()
- getSetupTaskReports()
- getTaskCompletionEvents()
- getTaskDiagnostics()
- setJobPriority()

 Refactor JobTracker.submitJob to not lock the JobTracker during the HDFS 
 accesses
 -

 Key: MAPREDUCE-1354
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Devaraj Das
Assignee: Arun C Murthy
Priority: Critical
 Attachments: MAPREDUCE-1354_yhadoop20.patch


 It'd be nice to have the JobTracker object not be locked while accessing the 
 HDFS for reading the jobconf file and while writing the jobinfo file in the 
 submitJob method. We should see if we can avoid taking the lock altogether.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1474) forrest docs for archives is out of date.

2010-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832528#action_12832528
 ] 

Hadoop QA commented on MAPREDUCE-1474:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435397/MAPREDUCE-1474.patch
  against trunk revision 908321.

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/313/console

This message is automatically generated.

 forrest docs for archives is out of date.
 -

 Key: MAPREDUCE-1474
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1474
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1474.patch


 The docs for archives are out of date. The new docs that were checked into 
 hadoop common were lost because of the project split.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1305) Running distcp with -delete incurs avoidable penalties

2010-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832541#action_12832541
 ] 

Hadoop QA commented on MAPREDUCE-1305:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435423/M1305-2.patch
  against trunk revision 908321.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/441/console

This message is automatically generated.

 Running distcp with -delete incurs avoidable penalties
 --

 Key: MAPREDUCE-1305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
 Attachments: M1305-1.patch, M1305-2.patch, MAPREDUCE-1305.patch


 *First problem*
 In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus 
 objects when the path is all we need.
 The performance problem comes from 
 org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries 
 to retrieve file permissions by issuing a ls -ld path which is painfully 
 slow.
 Changed that to just serialize Path and not FileStatus.
 *Second problem*
 To delete the files we invoke the hadoop command line tool with option 
 -rmr path. Again, for each file.
 Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1305) Running distcp with -delete incurs avoidable penalties

2010-02-11 Thread Peter Romianowski (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832575#action_12832575
 ] 

Peter Romianowski commented on MAPREDUCE-1305:
--

Thanks Chris for remove calls to FsShell. I've been very busy lately so I did 
not manage to compile the patch.

 Running distcp with -delete incurs avoidable penalties
 --

 Key: MAPREDUCE-1305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
 Attachments: M1305-1.patch, M1305-2.patch, MAPREDUCE-1305.patch


 *First problem*
 In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus 
 objects when the path is all we need.
 The performance problem comes from 
 org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries 
 to retrieve file permissions by issuing a ls -ld path which is painfully 
 slow.
 Changed that to just serialize Path and not FileStatus.
 *Second problem*
 To delete the files we invoke the hadoop command line tool with option 
 -rmr path. Again, for each file.
 Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1251) c++ utils doesn't compile

2010-02-11 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832581#action_12832581
 ] 

Todd Lipcon commented on MAPREDUCE-1251:


This should be committed to branch-0.20 as well, since it causes a fail to 
build from release source on many systems. 

 c++ utils doesn't compile
 -

 Key: MAPREDUCE-1251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: ubuntu karmic 64-bit
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch


 c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for 
 HADOOP-5611 needs to be applied first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (MAPREDUCE-1251) c++ utils doesn't compile

2010-02-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reopened MAPREDUCE-1251:



 c++ utils doesn't compile
 -

 Key: MAPREDUCE-1251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: ubuntu karmic 64-bit
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch


 c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for 
 HADOOP-5611 needs to be applied first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1305) Running distcp with -delete incurs avoidable penalties

2010-02-11 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-1305:
--

Hadoop Flags: [Reviewed]

+1 patch looks good.

 Running distcp with -delete incurs avoidable penalties
 --

 Key: MAPREDUCE-1305
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1305
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
 Attachments: M1305-1.patch, M1305-2.patch, MAPREDUCE-1305.patch


 *First problem*
 In org.apache.hadoop.tools.DistCp#deleteNonexisting we serialize FileStatus 
 objects when the path is all we need.
 The performance problem comes from 
 org.apache.hadoop.fs.RawLocalFileSystem.RawLocalFileStatus#write which tries 
 to retrieve file permissions by issuing a ls -ld path which is painfully 
 slow.
 Changed that to just serialize Path and not FileStatus.
 *Second problem*
 To delete the files we invoke the hadoop command line tool with option 
 -rmr path. Again, for each file.
 Changed that to dstfs.delete(path, true)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1484) Framework should not sort the input splits

2010-02-11 Thread Owen O'Malley (JIRA)
Framework should not sort the input splits
--

 Key: MAPREDUCE-1484
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1484
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Owen O'Malley


Currently the framework sorts the input splits by size before the job is 
submitted. This makes it very difficult to run map only jobs that transform the 
input because the assignment of input names to output names isn't obvious. We 
fixed this once in HADOOP-1440, but the fix was broken so it was rolled back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1434) Dynamic add input for one job

2010-02-11 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832608#action_12832608
 ] 

Aaron Kimball commented on MAPREDUCE-1434:
--

Owen,

The {{getNewInputSplits}} method proposed above requires the InputFormat to 
maintain state containing the previously-enumerated InputSplits. The proposed 
command-line tools suggest independent user-side processes performing the 
addition of files to the job, making this challenging. Given that splits are 
calculated on the client, but the true list of input splits is held by the 
JobTracker (or is/could the splits file be written to HDFS?), calculating just 
the delta might be challenging.

I think it might be more reasonable if one of the following things were true:
* The client code just calls {{getInputSplits()}} again. The same algorithm is 
run as in initial job submission, but the output list may be longer than the 
previous list returned by this method. The InputFormat is responsible for 
ensuring that it doesn't return any fewer splits than it did before (i.e., 
don't drop inputs)
* For that matter, if the input queue for a job is dynamic, I don't see why 
this same mechanism couldn't be used to drop splits that are, for whatever 
reason, irrelevant.
* {{getNewInputSplits()}} should have the signature: {{InputSplit [] 
getNewInputSplits(JobContext job, ListInputSplit existingSplits) throws 
IOException, InterruptedException}}.

The latter case would present to the user a list of the existing inputs read 
from the existing 'splits' file for the job. That way state-tracking is 
unnecessary; you can just use (e.g.) a PathFilter to disregard things already 
in {{existingSplits}}.

A final proposition is that users must manually specify new paths (or other 
arbitrary arguments like database table names, URLs, etc) to include, in 
addition to the InputFormat. In which case, it might look more sane to have:
* {{getNewInputSplits()}} should have the signature: {{InputSplit [] 
getNewInputSplits(JobContext job, String... newSplitHints) throws IOException, 
InterruptedException}}.

The {{newSplitHints}} is effectively a user-specified argv; it can be decoded 
as a list of Paths, database tables, etc., and used appropriately by the 
InputFormat to generate new splits.

Other question: What are the semantics of a doubly-specified split? (Especially 
curious about the inexact match case, where the same file in HDFS is enumerated 
twice but the splits are at different offsets) Can/should the same file be 
processed twice in a job?

Finally: Why does a user-disconnect timeout kill the job? That's different than 
the usual case in MapReduce, where a user disconnect is not noticed by the 
server-side processes at all. I would think that after a user-disconnect 
timeout, that declares that all the input is added, and that the reduce phase 
can begin -- not that it should kill things. 

 Dynamic add input for one job
 -

 Key: MAPREDUCE-1434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1434
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
 Environment: 0.19.0
Reporter: Xing Shi

 Always we should firstly upload the data to hdfs, then we can analize the 
 data using hadoop mapreduce.
 Sometimes, the upload process takes long time. So if we can add input during 
 one job, the time can be saved.
 WHAT?
 Client:
 a) hadoop job -add-input jobId inputFormat ...
 Add the input to jobid
 b) hadoop job -add-input done
 Tell the JobTracker, the input has been prepared over.
 c) hadoop job -add-input status jobid
 Show how many input the jobid has.
 HOWTO?
 Mainly, I think we should do three things:
 1. JobClinet: here JobClient should support add input to a job, indeed, 
 JobClient generate the split, and submit to JobTracker.
 2. JobTracker: JobTracker support addInput, and add the new tasks to the 
 original mapTasks. Because the uploaded data will be 
 processed quickly, so it also should update the scheduler to support pending 
 a map task till Client tells the Job input done.
 3. Reducer: the reducer should also update the mapNums, so it will shuffle 
 right.
 This is the rough idea, and I will update it .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1436) Deadlock in preemption code in fair scheduler

2010-02-11 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832623#action_12832623
 ] 

Matei Zaharia commented on MAPREDUCE-1436:
--

Are you suggesting that I add a JobTracker lock in update() or in the 
JobListener methods? I think it's best to add it in update() because it also 
gets called from a separate thread. This actually happens quite rarely now (it 
used to be every few seconds, but it's every 15 seconds after MAPREDUCE-706, 
and can be set higher pretty safely).

BTW, I found another deadlock that seems to be much rarer (it happened when I 
was submitting about 50 jobs simultaneously) but is not related to preemption:

code

Found one Java-level deadlock:
=
IPC Server handler 24 on 9001:
  waiting to lock monitor 0x40c91750 (object 0x7fc0243e2c20, a 
org.apache.hadoop.mapred.JobTracker),
  which is held by IPC Server handler 0 on 9001
IPC Server handler 0 on 9001:
  waiting to lock monitor 0x40bc0770 (object 0x7fc0243e3080, a 
org.apache.hadoop.mapred.FairScheduler),
  which is held by FairScheduler update thread
FairScheduler update thread:
  waiting to lock monitor 0x4095dd98 (object 0x7fc0258bc0d0, a 
org.apache.hadoop.mapred.JobInProgress),
  which is held by IPC Server handler 0 on 9001

Java stack information for the threads listed above:
===
IPC Server handler 24 on 9001:
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2487)
- waiting to lock 0x7fc0243e2c20 (a 
org.apache.hadoop.mapred.JobTracker)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
IPC Server handler 0 on 9001:
at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2115)
- waiting to lock 0x7fc0243e3080 (a 
org.apache.hadoop.mapred.FairScheduler)
- locked 0x7fc0243e3420 (a java.util.TreeMap)
- locked 0x7fc0243e2c20 (a org.apache.hadoop.mapred.JobTracker)
at 
org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2510)
- locked 0x7fc0258bc0d0 (a org.apache.hadoop.mapred.JobInProgress)
at 
org.apache.hadoop.mapred.JobInProgress.jobComplete(JobInProgress.java:2146)
at 
org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:2084)
- locked 0x7fc0258bc0d0 (a org.apache.hadoop.mapred.JobInProgress)
at 
org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:883)
- locked 0x7fc0258bc0d0 (a org.apache.hadoop.mapred.JobInProgress)
at 
org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:3564)
at 
org.apache.hadoop.mapred.JobTracker.processHeartbeat(JobTracker.java:2758)
- locked 0x7fc0243e2c20 (a org.apache.hadoop.mapred.JobTracker)
at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:2553)
- locked 0x7fc0243e2c20 (a org.apache.hadoop.mapred.JobTracker)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
FairScheduler update thread:
at 
org.apache.hadoop.mapred.JobInProgress.scheduleReduces(JobInProgress.java:1203)
- waiting to lock 0x7fc0258bc0d0 (a 
org.apache.hadoop.mapred.JobInProgress)
at 
org.apache.hadoop.mapred.JobSchedulable.updateDemand(JobSchedulable.java:53)
at 
org.apache.hadoop.mapred.PoolSchedulable.updateDemand(PoolSchedulable.java:81)
at org.apache.hadoop.mapred.FairScheduler.update(FairScheduler.java:577)
- locked 0x7fc0243e3080 (a org.apache.hadoop.mapred.FairScheduler)
at 
org.apache.hadoop.mapred.FairScheduler$UpdateThread.run(FairScheduler.java:277)
/code

The problem in this 

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats

2010-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832642#action_12832642
 ] 

Hadoop QA commented on MAPREDUCE-1309:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12435485/mapreduce-1309--2010-02-10.patch
  against trunk revision 908321.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 17 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

-1 javac.  The applied patch generated 2219 javac compiler warnings (more 
than the trunk's current 2215 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/442/console

This message is automatically generated.

 I want to change the rumen job trace generator to use a more modular internal 
 structure, to allow for more input log formats 
 -

 Key: MAPREDUCE-1309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Dick King
Assignee: Dick King
 Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
 demuxer-plus-concatenated-files--2010-01-06.patch, 
 demuxer-plus-concatenated-files--2010-01-08-b.patch, 
 demuxer-plus-concatenated-files--2010-01-08-c.patch, 
 demuxer-plus-concatenated-files--2010-01-08-d.patch, 
 demuxer-plus-concatenated-files--2010-01-08.patch, 
 demuxer-plus-concatenated-files--2010-01-11.patch, 
 mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
 mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
 mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch


 There are two orthogonal questions to answer when processing a job tracker 
 log: how will the logs and the xml configuration files be packaged, and in 
 which release of hadoop map/reduce were the logs generated?  The existing 
 rumen only has a couple of answers to this question.  The new engine will 
 handle three answers to the version question: 0.18, 0.20 and current, and two 
 answers to the packaging question: separate files with names derived from the 
 job ID, and concatenated files with a header between sections [used for 
 easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1470) Move Delegation token into Common so that we can use it for MapReduce also

2010-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832645#action_12832645
 ] 

Hudson commented on MAPREDUCE-1470:
---

Integrated in Hadoop-Mapreduce-trunk #232 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/])


 Move Delegation token into Common so that we can use it for MapReduce also
 --

 Key: MAPREDUCE-1470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1470
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.22.0

 Attachments: mr-1470.patch


 We need to update one reference for map/reduce when we move the hdfs 
 delegation tokens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1433) Create a Delegation token for MapReduce

2010-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832646#action_12832646
 ] 

Hudson commented on MAPREDUCE-1433:
---

Integrated in Hadoop-Mapreduce-trunk #232 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/])


 Create a Delegation token for MapReduce
 ---

 Key: MAPREDUCE-1433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1433
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.22.0

 Attachments: 1433.bp20.patch, 1433.bp20.patch, mr-1433.patch, 
 mr-1433.patch, mr-1433.patch, mr-1433.patch, mr-1433.patch


 Occasionally, MapReduce jobs need to launch other MapReduce jobs. With 
 security enabled, the task needs to authenticate to the JobTracker as the 
 user with a token.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1399) The archive command shows a null error message

2010-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832648#action_12832648
 ] 

Hudson commented on MAPREDUCE-1399:
---

Integrated in Hadoop-Mapreduce-trunk #232 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/])


 The archive command shows a null error message
 --

 Key: MAPREDUCE-1399
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1399
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.22.0

 Attachments: m1399_20100204.patch, m1399_20100205.patch, 
 m1399_20100205trunk.patch, m1399_20100205trunk2.patch, 
 m1399_20100205trunk2_y0.20.patch, MAPREDUCE-1399.patch


 {noformat}
 bash-3.1$ hadoop archive -archiveName foo.har -p . foo .
 Exception in archives
 null
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1448) [Mumak] mumak.sh does not honor --config option.

2010-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832647#action_12832647
 ] 

Hudson commented on MAPREDUCE-1448:
---

Integrated in Hadoop-Mapreduce-trunk #232 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/])


 [Mumak] mumak.sh does not honor --config option.
 

 Key: MAPREDUCE-1448
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1448
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0, 0.22.0
Reporter: Hong Tang
Assignee: Hong Tang
 Fix For: 0.21.0

 Attachments: mapred-1448-2.patch, mapred-1448.patch


 When --config is specified, mumak.sh should put the customized conf directory 
 in the classpath.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1425) archive throws OutOfMemoryError

2010-02-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832644#action_12832644
 ] 

Hudson commented on MAPREDUCE-1425:
---

Integrated in Hadoop-Mapreduce-trunk #232 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/232/])


 archive throws OutOfMemoryError
 ---

 Key: MAPREDUCE-1425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1425
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0

 Attachments: har.sh, m1425_20100129TextFileGenerator.patch, 
 MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, MAPREDUCE-1425.patch, 
 MAPREDUCE-1425_y_0.20.patch


 {noformat}
 -bash-3.1$ hadoop  archive -archiveName t4.har -p . t4 .
 Exception in thread main java.lang.OutOfMemoryError: Java heap space
 at java.util.regex.Pattern.compile(Pattern.java:1432)
 at java.util.regex.Pattern.init(Pattern.java:1133)
 at java.util.regex.Pattern.compile(Pattern.java:847)
 at java.lang.String.replace(String.java:2208)
 at org.apache.hadoop.fs.Path.normalizePath(Path.java:146)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 at org.apache.hadoop.fs.Path.init(Path.java:126)
 at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:244)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:256)
 at 
 org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:393)
 at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:736)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at 
 org.apache.hadoop.tools.HadoopArchives.main(HadoopArchives.java:751)
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.




[jira] Commented: (MAPREDUCE-1320) StringBuffer - StringBuilder occurence

2010-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832654#action_12832654
 ] 

Hadoop QA commented on MAPREDUCE-1320:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428677/MAPREDUCE-1320.patch
  against trunk revision 908321.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/443/console

This message is automatically generated.

 StringBuffer - StringBuilder occurence 
 

 Key: MAPREDUCE-1320
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1320
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Kay Kay
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1320.patch


 A good number of toString() implementations use StringBuffer when the 
 reference clearly does not go out of scope of the method and no concurrency 
 is needed. Patch contains replacing those occurences from StringBuffer to 
 StringBuilder. 
 Created against map/reduce project trunk . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1485) CapacityScheduler should have prevent a single job taking over large parts of a cluster

2010-02-11 Thread Arun C Murthy (JIRA)
CapacityScheduler should have prevent a single job taking over large parts of a 
cluster
---

 Key: MAPREDUCE-1485
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1485
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/capacity-sched
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.22.0


The proposal is to have a per-queue limit on the number of concurrent tasks a 
job can run on a cluster. 

We've seen cases where a single, large, job took over a majority of the cluster 
- worse, it meant that any bug in it caused issues for both the NameNode _and_ 
the JobTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed

2010-02-11 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832677#action_12832677
 ] 

Edward Capriolo commented on MAPREDUCE-323:
---

Being able to control the structure better is definitely a nice feature. 
Practically, for dividing the job folders by mm/dd/yy would solve the immediate 
problem on having to clean and restart your JobTracker when you hit ext3 limit. 
Introducing a variable into the jobtracker mapred.jobhistory.maxjobhistory and 
a FIFO queue might be helpful as well. As things stand now a downtime and 
cleanup is needed to keep the JobTracker running well, this is less then 
optimal.

 Improve the way job history files are managed
 -

 Key: MAPREDUCE-323
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-323
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0, 0.22.0
Reporter: Amar Kamat
Assignee: Amareshwari Sriramadasu
Priority: Critical

 Today all the jobhistory files are dumped in one _job-history_ folder. This 
 can cause problems when there is a need to search the history folder 
 (job-recovery etc). It would be nice if we group all the jobs under a _user_ 
 folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. 
 Jobs can be categorized using various features like _jobid, date, jobname_ 
 etc but using _username_ will make the search much more efficient and also 
 will not result into namespace explosion. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1431) archive does not work with distcp -update

2010-02-11 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832697#action_12832697
 ] 

Tsz Wo (Nicholas), SZE commented on MAPREDUCE-1431:
---

Took a closer look: HarFileSystem extends FilterFileSystem and it uses the 
underlying file system to get file checksum.  That's why we got Wrong FS since 
HarFileSystem passes a har:// path to the underlying fs.getFileChecksum(..).  
In our case, the underlying fs is hdfs.


 archive does not work with distcp -update
 -

 Key: MAPREDUCE-1431
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1431
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Mahadev konar
 Fix For: 0.22.0


 The following distcp command  works.
 {noformat}
 hadoop distcp -Dmapred.job.queue.name=q 
 har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101 t101_distcp
 {noformat}
 However, it does not work for -update.
 {noformat}
 -bash-3.1$ hadoop distcp -Dmapred.job.queue.name=q -update 
 har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101 t101_distcp
 10/01/29 20:06:53 INFO tools.DistCp: 
 srcPaths=[har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101]
 10/01/29 20:06:53 INFO tools.DistCp: destPath=t101
 java.lang.IllegalArgumentException: Wrong FS: 
 har://hdfs-nn_hostname:8020/user/tsz/t101.har/t101/text-, expected: 
 hdfs://nn_hostname
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:310)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:463)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileChecksum(DistributedFileSystem.java:46)
 at 
 org.apache.hadoop.fs.FilterFileSystem.getFileChecksum(FilterFileSystem.java:250)
 at org.apache.hadoop.tools.DistCp.sameFile(DistCp.java:1204)
 at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1084)
 ...
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832705#action_12832705
 ] 

Hadoop QA commented on MAPREDUCE-1341:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12435484/MAPREDUCE-1341.6.patch
  against trunk revision 908321.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 27 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/314/console

This message is automatically generated.

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, 
 MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1334) contrib/index - test - TestIndexUpdater fails due to an additional presence of file _SUCCESS in hdfs

2010-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832709#action_12832709
 ] 

Hadoop QA commented on MAPREDUCE-1334:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429081/MAPREDUCE-1334.patch
  against trunk revision 908321.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/444/console

This message is automatically generated.

 contrib/index - test - TestIndexUpdater fails due to an additional presence 
 of file _SUCCESS in hdfs 
 -

 Key: MAPREDUCE-1334
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1334
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/index
Reporter: Kay Kay
Priority: Critical
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1334.patch


 $ cd src/contrib/index
 $ ant clean test 
 This fails the test TestIndexUpdater due to a mismatch in the - doneFileNames 
 - data structure, when it is being run with different parameters. 
 (ArrayIndexOutOfBoundsException raised when inserting elements in 
 doneFileNames, array ). 
 Debugging further - there seems to be an additional file called as - 
 hdfs://localhost:36021/myoutput/_SUCCESS , taken into consideration in 
 addition to those that begins with done* .  The presence of the extra file 
 causes the error. 
 Attaching a patch that would circumvent this by increasing the array length 
 of shards by 1 . 
 But longer term the test fixtures need to be probably revisited to see if the 
 presence of _SUCCESS as a file is a good thing to begin with before we even 
 get to this test case. 
 Any comments / suggestions on the same welcome. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1375) TestFileArgs fails intermittently

2010-02-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned MAPREDUCE-1375:
--

Assignee: Todd Lipcon

 TestFileArgs fails intermittently
 -

 Key: MAPREDUCE-1375
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Amar Kamat
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: TEST-org.apache.hadoop.streaming.TestFileArgs.txt


 TestFileArgs failed once for me with the following error
 {code}
 expected:[job.jar
 sidefile
 tmp
 ] but was:[]
 sidefile
 tmp
 ] but was:[]
 at 
 org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107)
 at 
 org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1375) TestFileArgs fails intermittently

2010-02-11 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832751#action_12832751
 ] 

Todd Lipcon commented on MAPREDUCE-1375:


I think I got this figured out. The issue is that the test actually tries to 
write some roses are red text to ls's stdin. Very infrequently, the ls will 
actually complete before the data can be flushed, so the task gets a Broken 
pipe exception - see MAPREDUCE-1481. I'm actually unsure whether 
MAPREDUCE-1481 is a bug, but the easy fix for this test is to make the input  
so no data gets written into ls's stdin.

I'm running the test in a loop with this fix now. If it keeps going for a 
couple hours without failure I'll post a patch. (before, this loop would fail 
after about 10 minutes usually)

 TestFileArgs fails intermittently
 -

 Key: MAPREDUCE-1375
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Amar Kamat
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: TEST-org.apache.hadoop.streaming.TestFileArgs.txt


 TestFileArgs failed once for me with the following error
 {code}
 expected:[job.jar
 sidefile
 tmp
 ] but was:[]
 sidefile
 tmp
 ] but was:[]
 at 
 org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107)
 at 
 org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1481) Streaming should swallow IOExceptions when closing clientOut

2010-02-11 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832753#action_12832753
 ] 

Todd Lipcon commented on MAPREDUCE-1481:


Actually, I think this is a bug but not quite how I described it. If the flush 
fails, it means we were trying to write data into a streaming executable that 
didn't consume all of its input.

I don't know what the expected behavior is here. Right now, the behavior is 
that we stop consuming its output, but the task still succeeds so long as the 
exit code is 0. I think this is incorrect. We should either entirely fail the 
task regardless of exit code, or we should consume the rest of its output.

 Streaming should swallow IOExceptions when closing clientOut
 

 Key: MAPREDUCE-1481
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1481
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.20.1, 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 in PipeMapRed.mapRedFinished, streaming flushes and closes clientOut_, the 
 handle to the subprocess's stdin. If the subprocess has already exited or 
 closed its stdin, this will generate a Broken Pipe IOException. This causes 
 us to skip waitOutputThreads, which is incorrect, since the subprocess may 
 have data still written from stdout that needs to be read.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1375) TestFileArgs fails intermittently

2010-02-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1375:
---

Attachment: mapreduce-1375.txt

I think this patch fixes the problem.

 TestFileArgs fails intermittently
 -

 Key: MAPREDUCE-1375
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Amar Kamat
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: mapreduce-1375.txt, 
 TEST-org.apache.hadoop.streaming.TestFileArgs.txt


 TestFileArgs failed once for me with the following error
 {code}
 expected:[job.jar
 sidefile
 tmp
 ] but was:[]
 sidefile
 tmp
 ] but was:[]
 at 
 org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107)
 at 
 org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1375) TestFileArgs fails intermittently

2010-02-11 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-1375:
---

Status: Patch Available  (was: Open)

 TestFileArgs fails intermittently
 -

 Key: MAPREDUCE-1375
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1375
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Amar Kamat
Assignee: Todd Lipcon
 Fix For: 0.22.0

 Attachments: mapreduce-1375.txt, 
 TEST-org.apache.hadoop.streaming.TestFileArgs.txt


 TestFileArgs failed once for me with the following error
 {code}
 expected:[job.jar
 sidefile
 tmp
 ] but was:[]
 sidefile
 tmp
 ] but was:[]
 at 
 org.apache.hadoop.streaming.TestStreaming.checkOutput(TestStreaming.java:107)
 at 
 org.apache.hadoop.streaming.TestStreaming.testCommandLine(TestStreaming.java:123)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1462) Enable context-specific and stateful serializers in MapReduce

2010-02-11 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1462:
-

Attachment: MAPREDUCE-1462-mr.patch
MAPREDUCE-1462-common.patch

In order to help understand the problem better I've created a demonstration 
patch that uses the SerializationContext-based user API, while retaining the 
Serialization code that exists in common. (In fact, I had to make some changes 
to the Serialization code so that it can retain its metadata in an instance 
variable.)

Here's what the configuration looks like for the user:

{code}
Schema keySchema = Schema.create(Schema.Type.STRING);
Schema valSchema = Schema.create(Schema.Type.LONG);
job.setSerialization(Job.SerializationContext.MAP_OUTPUT_KEY,
   new AvroGenericSerialization(keySchema));
job.setSerialization(Job.SerializationContext.MAP_OUTPUT_VALUE,
   new AvroGenericSerialization(valSchema));
{code}

 Enable context-specific and stateful serializers in MapReduce
 -

 Key: MAPREDUCE-1462
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1462
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: h-1462.patch, MAPREDUCE-1462-common.patch, 
 MAPREDUCE-1462-mr.patch


 Although the current serializer framework is powerful, within the context of 
 a job it is limited to picking a single serializer for a given class. 
 Additionally, Avro generic serialization can make use of additional 
 configuration/state such as the schema. (Most other serialization frameworks 
 including Writable, Jute/Record IO, Thrift, Avro Specific, and Protocol 
 Buffers only need the object's class name to deserialize the object.)
 With the goal of keeping the easy things easy and maintaining backwards 
 compatibility, we should be able to allow applications to use context 
 specific (eg. map output key) serializers in addition to the current type 
 based ones that handle the majority of the cases. Furthermore, we should be 
 able to support serializer specific configuration/metadata in a type safe 
 manor without cluttering up the base API with a lot of new methods that will 
 confuse new users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-434) local map-reduce job limited to single reducer

2010-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832781#action_12832781
 ] 

Hadoop QA commented on MAPREDUCE-434:
-

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435513/MAPREDUCE-434.5.patch
  against trunk revision 908321.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/315/console

This message is automatically generated.

 local map-reduce job limited to single reducer
 --

 Key: MAPREDUCE-434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-434
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: local job tracker
Reporter: Yoram Arnon
Assignee: Aaron Kimball
Priority: Minor
 Attachments: MAPREDUCE-434.2.patch, MAPREDUCE-434.3.patch, 
 MAPREDUCE-434.4.patch, MAPREDUCE-434.5.patch, MAPREDUCE-434.patch


 when mapred.job.tracker is set to 'local', my setNumReduceTasks call is 
 ignored, and the number of reduce tasks is set at 1.
 This prevents me from locally debugging my partition function, which tries to 
 partition based on the number of reduce tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

2010-02-11 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-326:


Attachment: MAPREDUCE-326.pdf

Here's a proposal for a binary API for review.

 The lowest level map-reduce APIs should be byte oriented
 

 Key: MAPREDUCE-326
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: eric baldeschwieler
 Attachments: MAPREDUCE-326.pdf


 As discussed here:
 https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
 The templates, serializers and other complexities that allow map-reduce to 
 use arbitrary types complicate the design and lead to lots of object creates 
 and other overhead that a byte oriented design would not suffer.  I believe 
 the lowest level implementation of hadoop map-reduce should have byte string 
 oriented APIs (for keys and values).  This API would be more performant, 
 simpler and more easily cross language.
 The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1480) CombineFileRecordReader does not properly initialize child RecordReader

2010-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832784#action_12832784
 ] 

Hadoop QA commented on MAPREDUCE-1480:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12435529/MAPREDUCE-1480.2.patch
  against trunk revision 908321.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/445/console

This message is automatically generated.

 CombineFileRecordReader does not properly initialize child RecordReader
 ---

 Key: MAPREDUCE-1480
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1480
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1480.2.patch, MAPREDUCE-1480.patch


 CombineFileRecordReader instantiates child RecordReader instances but never 
 calls their initialize() method to give them the proper TaskAttemptContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1486) Configuration data should be preserved within the same MapTask

2010-02-11 Thread Aaron Kimball (JIRA)
Configuration data should be preserved within the same MapTask
--

 Key: MAPREDUCE-1486
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1486
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Aaron Kimball
Assignee: Aaron Kimball


Map tasks involve a number of Contexts -- at least a TaskAttemptContext and a 
MapContext. These context objects contain a Configuration each; when one 
context is initialized, it initializes its own Configuration by deep-copying a 
previous Configuration.

If one Context instance is used entirely prior to a second, more specific 
Context then the second Context should contain the configuration data 
initialized in the previous Context. This specifically affects the interaction 
between an InputFormat and its RecordReader instance(s).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1486) Configuration data should be preserved within the same MapTask

2010-02-11 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1486:
-

Attachment: MAPREDUCE-1486.patch

Attaching patch which fixes this problem; now the same configuration data will 
flow forward through the map task. This patch also contains a test case that 
highlights the problem.

 Configuration data should be preserved within the same MapTask
 --

 Key: MAPREDUCE-1486
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1486
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1486.patch


 Map tasks involve a number of Contexts -- at least a TaskAttemptContext and a 
 MapContext. These context objects contain a Configuration each; when one 
 context is initialized, it initializes its own Configuration by deep-copying 
 a previous Configuration.
 If one Context instance is used entirely prior to a second, more specific 
 Context then the second Context should contain the configuration data 
 initialized in the previous Context. This specifically affects the 
 interaction between an InputFormat and its RecordReader instance(s).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1486) Configuration data should be preserved within the same MapTask

2010-02-11 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1486:
-

Status: Patch Available  (was: Open)

 Configuration data should be preserved within the same MapTask
 --

 Key: MAPREDUCE-1486
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1486
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1486.patch


 Map tasks involve a number of Contexts -- at least a TaskAttemptContext and a 
 MapContext. These context objects contain a Configuration each; when one 
 context is initialized, it initializes its own Configuration by deep-copying 
 a previous Configuration.
 If one Context instance is used entirely prior to a second, more specific 
 Context then the second Context should contain the configuration data 
 initialized in the previous Context. This specifically affects the 
 interaction between an InputFormat and its RecordReader instance(s).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-11 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832797#action_12832797
 ] 

Aaron Kimball commented on MAPREDUCE-1341:
--

+1; patch #6 looks good to me. If someone could commit this, that'd be superb.


 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, 
 MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-11 Thread Leonid Furman (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832803#action_12832803
 ] 

Leonid Furman commented on MAPREDUCE-1341:
--

Thanks, Aaron!

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, 
 MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-326) The lowest level map-reduce APIs should be byte oriented

2010-02-11 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-326:


Attachment: MAPREDUCE-326-api.patch

And an accompanying draft patch for the raw API classes.

 The lowest level map-reduce APIs should be byte oriented
 

 Key: MAPREDUCE-326
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-326
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: eric baldeschwieler
 Attachments: MAPREDUCE-326-api.patch, MAPREDUCE-326.pdf


 As discussed here:
 https://issues.apache.org/jira/browse/HADOOP-1986#action_12551237
 The templates, serializers and other complexities that allow map-reduce to 
 use arbitrary types complicate the design and lead to lots of object creates 
 and other overhead that a byte oriented design would not suffer.  I believe 
 the lowest level implementation of hadoop map-reduce should have byte string 
 oriented APIs (for keys and values).  This API would be more performant, 
 simpler and more easily cross language.
 The existing API could be maintained as a thin layer on top of the leaner API.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1220) Implement an in-cluster LocalJobRunner

2010-02-11 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832838#action_12832838
 ] 

Tom White commented on MAPREDUCE-1220:
--

bq. Most of the effort involved teasing out the framework in the MapTask and 
ReduceTask to allow several components such as MapOutputBuffer, 
ReduceValuesIterator etc. to be used as 'pluggable' components.

Interesting. MAPREDUCE-326 has a proposal for making these components 
pluggable, which might make the work of this JIRA simpler.

 Implement an in-cluster LocalJobRunner
 --

 Key: MAPREDUCE-1220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1220
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: client, jobtracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1220_yhadoop20.patch


 Currently very small map-reduce jobs suffer from latency issues due to 
 overheads in Hadoop Map-Reduce such as scheduling, jvm startup etc. We've 
 periodically tried to optimize all parts of framework to achieve lower 
 latencies.
 I'd like to turn the problem around a little bit. I propose we allow very 
 small jobs to run as a single task job with multiple maps and reduces i.e. 
 similar to our current implementation of the LocalJobRunner. Thus, under 
 certain conditions (maybe user-set configuration, or if input data is small 
 i.e. less a DFS blocksize) we could launch a special task which will run all 
 maps in a serial manner, followed by the reduces. This would really help 
 small jobs achieve significantly smaller latencies, thanks to lesser 
 scheduling overhead, jvm startup, lack of shuffle over the network etc. 
 This would be a huge benefit, especially on large clusters, to small Hive/Pig 
 queries.
 Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1341) Sqoop should have an option to create hive tables and skip the table import step

2010-02-11 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1341:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Leonid!

 Sqoop should have an option to create hive tables and skip the table import 
 step
 

 Key: MAPREDUCE-1341
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1341
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/sqoop
Affects Versions: 0.22.0
Reporter: Leonid Furman
Assignee: Leonid Furman
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1341.2.patch, MAPREDUCE-1341.3.patch, 
 MAPREDUCE-1341.4.patch, MAPREDUCE-1341.5.patch, MAPREDUCE-1341.6.patch, 
 MAPREDUCE-1341.patch


 In case the client only needs to create tables in hive, it would be helpful 
 if Sqoop had an optional parameter:
 --hive-create-only
 which would omit the time consuming table import step, generate hive create 
 table statements and run them.
 If this feature seems useful, I can generate the patch. I have modified the 
 Sqoop code and built it on my development machine, and it seems to be working 
 well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1469) Sqoop should disable speculative execution in export

2010-02-11 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1469:
-

   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

+1

I've just committed this. Thanks Aaron!

 Sqoop should disable speculative execution in export
 

 Key: MAPREDUCE-1469
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1469
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1469.patch


 Concurrent writers of the same output shard may cause the database to try to 
 insert duplicate primary keys concurrently. Not a good situation. Speculative 
 execution should be forced off for this operation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1476) committer.needsTaskCommit should not be called for a task cleanup attempt

2010-02-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832845#action_12832845
 ] 

Hadoop QA commented on MAPREDUCE-1476:
--

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12435549/patch-1476.txt
  against trunk revision 908321.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/316/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/316/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/316/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/316/console

This message is automatically generated.

 committer.needsTaskCommit should not be called for a task cleanup attempt
 -

 Key: MAPREDUCE-1476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1476
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.22.0

 Attachments: patch-1476.txt


 Currently, Task.done() calls committer.needsTaskCommit() to know whether it 
 needs a commit or not. This need not be called for task cleanup attempt as no 
 commit is required for a cleanup attempt. 
 Due to MAPREDUCE-1409, we saw a case where cleanup attempt went into 
 COMMIT_PENDING state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.