[jira] Commented: (MAPREDUCE-809) Job summary logs show status of completed jobs as RUNNING

2009-07-27 Thread Suman Sehgal (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735956#action_12735956
 ] 

Suman Sehgal commented on MAPREDUCE-809:


Though I saw the issue with successful jobs but couldn't reproduce it. The 
issue is quite consistent for "failed" and "killed" jobs. Jobtracker log shows 
the status "RUNNING" for these jobs.

Log Message:
==
2009-07-27 05:46:14,276 INFO org.apache.hadoop.mapred.JobInProgress$JobSummary: 
jobId=job_200907270543_0003,submitTime=1248673540705,launchTime=1248673544024,finishTime=0,numMaps=2,numSlotsPerMap=1,numReduces=1,numSlotsPerReduce=1,user=hadoopqa,queue=default,status=RUNNING,mapSlotSeconds=38,reduceSlotsSeconds=0,clusterMapCapacity=102,clusterReduceCapacity=34
2009-07-27 05:46:14,277 INFO org.apache.hadoop.mapred.JobHistory: Moving 
completed job from file:/mapred/history/_1248673437715_job_200907270543_0003_hadoopqa_streamjob5894288556860737357.jar
 to file:/mapred/history/done/_1248673437715_job_200907270543_0003_hadoopqa_streamjob5894288556860737357.jar
2009-07-27 05:46:14,278 INFO org.apache.hadoop.mapred.JobHistory: Moving 
configuration of completed job from file:/_1248673437715_job_200907270543_0003_conf.xml to file:/mapred/history/done/_1248673437715_job_200907270543_0003_conf.xml


> Job summary logs show status of completed jobs as RUNNING 
> --
>
> Key: MAPREDUCE-809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-809
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.21.0
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.21.0
>
>
> MAPREDUCE-740 added job summary logs. During testing our QA folks noticed 
> that completed jobs show up as RUNNING in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-803) Provide a command line option to clean up jobtracker system directory

2009-07-27 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735953#action_12735953
 ] 

Amar Kamat commented on MAPREDUCE-803:
--

Hemanth, we already have something like this today. When the jobtracker is 
restarted with mapred.jobtracker.restart.recover=false, the jobtracker will 
format the mapred.system.dir.

> Provide a command line option to clean up jobtracker system directory
> -
>
> Key: MAPREDUCE-803
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-803
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Reporter: Hemanth Yamijala
>
> When the JT is restarted, the mapreduce system directory's contents are used 
> for job recovery. For sites that use this feature, there might be instances 
> when we don't want to restart to read the mapred system directory. A sample 
> use case is if there is a full cluster restart with a (typically minor) 
> version upgrade of the Map/Reduce code base. To easily support such cases, it 
> would be nice to provide a way for clean up the jobtracker system directory 
> so that no files will be available for cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-768) Configuration information should generate dump in a standard format.

2009-07-27 Thread rahul k singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735949#action_12735949
 ] 

rahul k singh commented on MAPREDUCE-768:
-

Proposal:

- Generate a configuration dump  in JSON format.
- Dump consists of key,value and final flag.

Good to have :

- Information regarding the resource or a filename that  a given value came from
  or mark it unknown.

> Configuration information should generate dump in a standard format.
> 
>
> Key: MAPREDUCE-768
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-768
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: rahul k singh
>
>  We need to generate the configuration dump in a standard format .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-805) Deadlock in Jobtracker

2009-07-27 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-805:
-

Attachment: MAPREDUCE-805-v1.2.patch

Attaching a patch with updated javadoc and some minor fixes. Result of 
test-patch
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 12 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Running ant-test now

> Deadlock in Jobtracker
> --
>
> Key: MAPREDUCE-805
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-805
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Michael Tamm
> Attachments: MAPREDUCE-805-v1.1.patch, MAPREDUCE-805-v1.2.patch
>
>
> We are running a hadoop cluster (version 0.20.0) and have detected the 
> following deadlock on our jobtracker:
> {code}
> "IPC Server handler 51 on 9001":
>   at 
> org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
>   - waiting to lock <0x7f2b6fb46130> (a 
> org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
>   - locked <0x7f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
>   at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>  "pool-1-thread-2":
>   at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
>   - waiting to lock <0x7f2b5f026000> (a 
> org.apache.hadoop.mapred.JobTracker)
>   at 
> org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-491) RAgzip: multiple map tasks for a large gzipped file

2009-07-27 Thread Daehyun Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daehyun Kim updated MAPREDUCE-491:
--

Attachment: (was: MAPREDUCE-491.patch)

> RAgzip: multiple map tasks for a large gzipped file
> ---
>
> Key: MAPREDUCE-491
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-491
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Daehyun Kim
>Assignee: Daehyun Kim
>Priority: Minor
> Attachments: HADOOP-4652-v2.patch, HADOOP-4652-v3.patch, 
> HADOOP-4652.path, MAPREDUCE-491.patch
>
>
> Currently, the hadoop processes gzipped files with only one map.
> We have made a patch that enables multiple map tasks for one large gzipped 
> file. We call the patch RAgzip.
> To process multiple map tasks for gzipped file, you may use RAgzip by just 
> changing InputFormat to RAGZIPInputFormat.
> The option used in RAGZIPInputFormat can be found at the javadoc of 
> RAGZIPInputFormat part.
> RAgzip uses zlib's inflatePrime function which supports random access on a 
> gzipped file. 
> Since the inflatePrime is supported from the version of 1.2.2.4, it requires 
> zlib 1.2.2.4 or higher. (We tested on zlib 1.2.3)
> RAgzip requires the preprocessing step that creates an access point (.ap) 
> file, which is like the index of the gzipped file chunks. 
> The access point(.ap) file is located in same path of the gzipped file.
> If there is a "/user/hadoop/test.gz", the .ap file is created with 
> "/user/hadoop/test.gz.ap".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-491) RAgzip: multiple map tasks for a large gzipped file

2009-07-27 Thread Daehyun Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daehyun Kim updated MAPREDUCE-491:
--

Attachment: MAPREDUCE-491.patch

> RAgzip: multiple map tasks for a large gzipped file
> ---
>
> Key: MAPREDUCE-491
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-491
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Daehyun Kim
>Assignee: Daehyun Kim
>Priority: Minor
> Attachments: HADOOP-4652-v2.patch, HADOOP-4652-v3.patch, 
> HADOOP-4652.path, MAPREDUCE-491.patch
>
>
> Currently, the hadoop processes gzipped files with only one map.
> We have made a patch that enables multiple map tasks for one large gzipped 
> file. We call the patch RAgzip.
> To process multiple map tasks for gzipped file, you may use RAgzip by just 
> changing InputFormat to RAGZIPInputFormat.
> The option used in RAGZIPInputFormat can be found at the javadoc of 
> RAGZIPInputFormat part.
> RAgzip uses zlib's inflatePrime function which supports random access on a 
> gzipped file. 
> Since the inflatePrime is supported from the version of 1.2.2.4, it requires 
> zlib 1.2.2.4 or higher. (We tested on zlib 1.2.3)
> RAgzip requires the preprocessing step that creates an access point (.ap) 
> file, which is like the index of the gzipped file chunks. 
> The access point(.ap) file is located in same path of the gzipped file.
> If there is a "/user/hadoop/test.gz", the .ap file is created with 
> "/user/hadoop/test.gz.ap".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-548) Global scheduling in the Fair Scheduler

2009-07-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735925#action_12735925
 ] 

Hadoop QA commented on MAPREDUCE-548:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12414243/mapreduce-548-v4.patch
  against trunk revision 798239.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/426/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/426/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/426/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/426/console

This message is automatically generated.

> Global scheduling in the Fair Scheduler
> ---
>
> Key: MAPREDUCE-548
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-548
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Matei Zaharia
>Assignee: Matei Zaharia
> Fix For: 0.21.0
>
> Attachments: fs-global-v0.patch, hadoop-4667-v1.patch, 
> hadoop-4667-v1b.patch, hadoop-4667-v2.patch, HADOOP-4667_api.patch, 
> mapreduce-548-v1.patch, mapreduce-548-v2.patch, mapreduce-548-v3.patch, 
> mapreduce-548-v4.patch, mapreduce-548.patch
>
>
> The current schedulers in Hadoop all examine a single job on every heartbeat 
> when choosing which tasks to assign, choosing the job based on FIFO or fair 
> sharing. There are inherent limitations to this approach. For example, if the 
> job at the front of the queue is small (e.g. 10 maps, in a cluster of 100 
> nodes), then on average it will launch only one local map on the first 10 
> heartbeats while it is at the head of the queue. This leads to very poor 
> locality for small jobs. Instead, we need a more "global" view of scheduling 
> that can look at multiple jobs. To resolve the locality problem, we will use 
> the following algorithm:
> - If the job at the head of the queue has no node-local task to launch, skip 
> it and look through other jobs.
> - If a job has waited at least T1 seconds while being skipped, also allow it 
> to launch rack-local tasks.
> - If a job has waited at least T2 > T1 seconds, also allow it to launch 
> off-rack tasks.
> This algorithm improves locality while bounding the delay that any job 
> experiences in launching a task.
> It turns out that whether waiting is useful depends on how many tasks are 
> left in the job - the probability of getting a heartbeat from a node with a 
> local task - and on whether the job is CPU or IO bound. Thus there may be 
> logic for removing the wait on the last few tasks in the job.
> As a related issue, once we allow global scheduling, we can launch multiple 
> tasks per heartbeat, as in HADOOP-3136. The initial implementation of 
> HADOOP-3136 adversely affected performance because it only launched multiple 
> tasks from the same job, but with the wait rule above, we will only do this 
> for jobs that are allowed to launch non-local tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-807) Stray user files in mapred.system.dir with permissions other than 777 can prevent the jobtracker from starting up.

2009-07-27 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735920#action_12735920
 ] 

Amar Kamat commented on MAPREDUCE-807:
--

ant tests passed on my box.

> Stray user files in mapred.system.dir with permissions other than 777 can 
> prevent the jobtracker from starting up.
> --
>
> Key: MAPREDUCE-807
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-807
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: MAPRED-807-v1.1.patch
>
>
> With restart disabled, the jobtracker does a _rm -rf_ of the 
> mapred.system.dir. If the mapred.system.dir contains user files with 
> permissions other than 777 then the jobtracker gets stuck in a loop trying to 
> delete the mapred.system.dir (and each time failing with 
> AccessControlException).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-157) Job History log file format is not friendly for external tools.

2009-07-27 Thread Jothi Padmanabhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan reassigned MAPREDUCE-157:
---

Assignee: Jothi Padmanabhan  (was: Amar Kamat)

> Job History log file format is not friendly for external tools.
> ---
>
> Key: MAPREDUCE-157
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Jothi Padmanabhan
>
> Currently, parsing the job history logs with external tools is very difficult 
> because of the format. The most critical problem is that newlines aren't 
> escaped in the strings. That makes using tools like grep, sed, and awk very 
> tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-181) mapred.system.dir should be accessible only to hadoop daemons

2009-07-27 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-181:
-

Attachment: hadoop-3578-branch-20-example-2.patch

Attaching a new patch [hadoop-3578-branch-20-example-2.patch] with no changes 
to the testcase. This patch is manually tested. This patch assumes 
[hadoop-3578-branch-20-example.patch] and should not be committed to branch 
0.20.

> mapred.system.dir should be accessible only to hadoop daemons 
> --
>
> Key: MAPREDUCE-181
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Attachments: hadoop-3578-branch-20-example-2.patch, 
> hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
> HADOOP-3578-v2.7.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-181) mapred.system.dir should be accessible only to hadoop daemons

2009-07-27 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-181:
-

Attachment: (was: hadoop-3578-branch-20-example-2.patch)

> mapred.system.dir should be accessible only to hadoop daemons 
> --
>
> Key: MAPREDUCE-181
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Attachments: hadoop-3578-branch-20-example.patch, 
> HADOOP-3578-v2.6.patch, HADOOP-3578-v2.7.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-805) Deadlock in Jobtracker

2009-07-27 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735913#action_12735913
 ] 

Vinod K V commented on MAPREDUCE-805:
-

Ditto w.r.t the javadoc for the kill methods.

> Deadlock in Jobtracker
> --
>
> Key: MAPREDUCE-805
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-805
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Michael Tamm
> Attachments: MAPREDUCE-805-v1.1.patch
>
>
> We are running a hadoop cluster (version 0.20.0) and have detected the 
> following deadlock on our jobtracker:
> {code}
> "IPC Server handler 51 on 9001":
>   at 
> org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
>   - waiting to lock <0x7f2b6fb46130> (a 
> org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
>   - locked <0x7f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
>   at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>  "pool-1-thread-2":
>   at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
>   - waiting to lock <0x7f2b5f026000> (a 
> org.apache.hadoop.mapred.JobTracker)
>   at 
> org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-810) Make TaskInProgress independent of JobTracker reference

2009-07-27 Thread Amar Kamat (JIRA)
Make TaskInProgress independent of JobTracker reference
---

 Key: MAPREDUCE-810
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-810
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Amar Kamat


As of today the TaskInProgress holds a reference of jobtracker and makes a 
back-call. These circular calls along with synchronization can lead to 
deadlocks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-805) Deadlock in Jobtracker

2009-07-27 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735907#action_12735907
 ] 

Vinod K V commented on MAPREDUCE-805:
-

Had a cursory look at the patch. It will be good to add javadoc for 
JobInProgress.initTasks() and JobInProgress.fail() mentioning that these 
methods ARE NOT supposed to be called directly by the schedulers and suggesting 
that the JobTracker methods be preferred to over JobInProgress methods for 
general use.

Given this issue, it will also be helpful to document the locking order 
(JobTracker, JobInProgress) so that, for e.g, schedulers don't lock 
JobInProgress asynchronously before calling these methods.

Though not directly related to the patch, it will be good to document that 
JobTracker is locked while calling JobInProgressListener update methods.

> Deadlock in Jobtracker
> --
>
> Key: MAPREDUCE-805
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-805
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Michael Tamm
> Attachments: MAPREDUCE-805-v1.1.patch
>
>
> We are running a hadoop cluster (version 0.20.0) and have detected the 
> following deadlock on our jobtracker:
> {code}
> "IPC Server handler 51 on 9001":
>   at 
> org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
>   - waiting to lock <0x7f2b6fb46130> (a 
> org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
>   - locked <0x7f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
>   at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>  "pool-1-thread-2":
>   at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
>   - waiting to lock <0x7f2b5f026000> (a 
> org.apache.hadoop.mapred.JobTracker)
>   at 
> org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-408) TestKillSubProcesses fails with assertion failure sometimes

2009-07-27 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735902#action_12735902
 ] 

Ravi Gummadi commented on MAPREDUCE-408:


Test failures are not related to the patch. All unit tests passed on my local 
machine.

> TestKillSubProcesses fails with assertion failure sometimes
> ---
>
> Key: MAPREDUCE-408
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-408
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: MR-408.patch, MR-408.v1.1.patch, MR-408.v1.patch
>
>
> org.apache.hadoop.mapred.TestKillSubProcesses.testJobKillFailAndSucceed fails 
> sometimes with following error Message:
> {noformat}
> Unexpected: The subprocess at level 3 in the subtree is not alive before Job 
> completion
> {noformat}
> Stacktrace
> {noformat}
> junit.framework.AssertionFailedError: Unexpected: The subprocess at level 3 
> in the subtree is not alive before Job completion
>   at 
> org.apache.hadoop.mapred.TestKillSubProcesses.runJobAndSetProcessHandle(TestKillSubProcesses.java:221)
>   at 
> org.apache.hadoop.mapred.TestKillSubProcesses.runFailingJobAndValidate(TestKillSubProcesses.java:112)
>   at 
> org.apache.hadoop.mapred.TestKillSubProcesses.runTests(TestKillSubProcesses.java:327)
>   at 
> org.apache.hadoop.mapred.TestKillSubProcesses.testJobKillFailAndSucceed(TestKillSubProcesses.java:310)
> {noformat}
> one such failure at 
> http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/495/testReport/org.apache.hadoop.mapred/TestKillSubProcesses/testJobKillFailAndSucceed/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-789) Oracle support for Sqoop

2009-07-27 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735885#action_12735885
 ] 

Aaron Kimball commented on MAPREDUCE-789:
-

No test failures are related to this patch

> Oracle support for Sqoop
> 
>
> Key: MAPREDUCE-789
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-789
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/sqoop
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-789.patch
>
>
> A separate ConnManager is needed for Oracle to support its slightly different 
> syntax and configuration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-789) Oracle support for Sqoop

2009-07-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735883#action_12735883
 ] 

Hadoop QA commented on MAPREDUCE-789:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12414235/MAPREDUCE-789.patch
  against trunk revision 798239.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/425/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/425/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/425/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/425/console

This message is automatically generated.

> Oracle support for Sqoop
> 
>
> Key: MAPREDUCE-789
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-789
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/sqoop
>Reporter: Aaron Kimball
>Assignee: Aaron Kimball
> Attachments: MAPREDUCE-789.patch
>
>
> A separate ConnManager is needed for Oracle to support its slightly different 
> syntax and configuration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-809) Job summary logs show status of completed jobs as RUNNING

2009-07-27 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735830#action_12735830
 ] 

Arun C Murthy commented on MAPREDUCE-809:
-

Suman tells me that she saw SUCCEEDED jobs show up in the logs with status as 
RUNNING, which given our code structure of JobInProgress.jobCompleted being the 
only entry point to mark jobs as SUCCEDED is probably indicative of a 
race-condition.

> Job summary logs show status of completed jobs as RUNNING 
> --
>
> Key: MAPREDUCE-809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-809
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.21.0
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Fix For: 0.21.0
>
>
> MAPREDUCE-740 added job summary logs. During testing our QA folks noticed 
> that completed jobs show up as RUNNING in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-809) Job summary logs show status of completed jobs as RUNNING

2009-07-27 Thread Arun C Murthy (JIRA)
Job summary logs show status of completed jobs as RUNNING 
--

 Key: MAPREDUCE-809
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-809
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.21.0


MAPREDUCE-740 added job summary logs. During testing our QA folks noticed that 
completed jobs show up as RUNNING in the logs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-408) TestKillSubProcesses fails with assertion failure sometimes

2009-07-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735771#action_12735771
 ] 

Hadoop QA commented on MAPREDUCE-408:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12413637/MR-408.v1.1.patch
  against trunk revision 798239.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/422/console

This message is automatically generated.

> TestKillSubProcesses fails with assertion failure sometimes
> ---
>
> Key: MAPREDUCE-408
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-408
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.21.0
>
> Attachments: MR-408.patch, MR-408.v1.1.patch, MR-408.v1.patch
>
>
> org.apache.hadoop.mapred.TestKillSubProcesses.testJobKillFailAndSucceed fails 
> sometimes with following error Message:
> {noformat}
> Unexpected: The subprocess at level 3 in the subtree is not alive before Job 
> completion
> {noformat}
> Stacktrace
> {noformat}
> junit.framework.AssertionFailedError: Unexpected: The subprocess at level 3 
> in the subtree is not alive before Job completion
>   at 
> org.apache.hadoop.mapred.TestKillSubProcesses.runJobAndSetProcessHandle(TestKillSubProcesses.java:221)
>   at 
> org.apache.hadoop.mapred.TestKillSubProcesses.runFailingJobAndValidate(TestKillSubProcesses.java:112)
>   at 
> org.apache.hadoop.mapred.TestKillSubProcesses.runTests(TestKillSubProcesses.java:327)
>   at 
> org.apache.hadoop.mapred.TestKillSubProcesses.testJobKillFailAndSucceed(TestKillSubProcesses.java:310)
> {noformat}
> one such failure at 
> http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/495/testReport/org.apache.hadoop.mapred/TestKillSubProcesses/testJobKillFailAndSucceed/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-801) MAPREDUCE framework should issue warning with too many locations for a split

2009-07-27 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735737#action_12735737
 ] 

Doug Cutting commented on MAPREDUCE-801:


I am not yet convinced this is a common enough error that the framework need 
guard against it.  It might be more reasonable to have a limit on the total 
number of locations per job rather than locations per split.

> MAPREDUCE framework should issue warning with too many locations for a split
> 
>
> Key: MAPREDUCE-801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-801
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Hong Tang
>
> Customized input-format may be buggy and report misleading locations through 
> input-split, an example of which is PIG-878. When an input split returns too 
> many locations, it would not only artificially inflate the percentage of data 
> local or rack local maps, but also force scheduler to use more memory and 
> work harder to conduct task assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-801) MAPREDUCE framework should issue warning with too many locations for a split

2009-07-27 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735718#action_12735718
 ] 

Hong Tang commented on MAPREDUCE-801:
-

@doug, this is a fair concern. We will probably need to expose this as a conf 
parameter and allow advanced users to override. Also, we need to issue a 
warning in the log so that a user can still see what might go wrong.

> MAPREDUCE framework should issue warning with too many locations for a split
> 
>
> Key: MAPREDUCE-801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-801
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Hong Tang
>
> Customized input-format may be buggy and report misleading locations through 
> input-split, an example of which is PIG-878. When an input split returns too 
> many locations, it would not only artificially inflate the percentage of data 
> local or rack local maps, but also force scheduler to use more memory and 
> work harder to conduct task assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-801) MAPREDUCE framework should issue warning with too many locations for a split

2009-07-27 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735690#action_12735690
 ] 

Doug Cutting commented on MAPREDUCE-801:


> discard location information completely when the number of locations reported 
> by an input split is greater than a threshold (e.g. 20).

This seems rather arbitrary to me, since one might reasonably increase the 
replication for an input file to 20 or more, to, e.g., ensure local 
availability on every rack or node.


> MAPREDUCE framework should issue warning with too many locations for a split
> 
>
> Key: MAPREDUCE-801
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-801
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Hong Tang
>
> Customized input-format may be buggy and report misleading locations through 
> input-split, an example of which is PIG-878. When an input split returns too 
> many locations, it would not only artificially inflate the percentage of data 
> local or rack local maps, but also force scheduler to use more memory and 
> work harder to conduct task assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-782) Use PureJavaCrc32 in mapreduce spills

2009-07-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-782:
-

   Resolution: Fixed
Fix Version/s: 0.21.0
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Todd!

> Use PureJavaCrc32 in mapreduce spills
> -
>
> Key: MAPREDUCE-782
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-782
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Minor
> Fix For: 0.21.0
>
> Attachments: mapreduce-782.txt
>
>
> HADOOP-6148 implemented a Pure Java implementation of CRC32 which performs 
> better than the built-in one. This issue is to make use of it in the mapred 
> package

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-383) pipes combiner does not reset properly after a spill

2009-07-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735629#action_12735629
 ] 

Hudson commented on MAPREDUCE-383:
--

Integrated in Hadoop-Mapreduce-trunk #31 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/31/])
. Fix a bug in Pipes combiner due to bytes count not  getting reset after 
the spill. Contributed by Christian Kunz.


> pipes combiner does not reset properly after a spill
> 
>
> Key: MAPREDUCE-383
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-383
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Christian Kunz
>Assignee: Christian Kunz
> Fix For: 0.20.1
>
> Attachments: patch.HADOOP-6070
>
>
> When using a pipes combiner, the variable numBytes is not reset to 0 in 
> spillAll, effectively reducing the effect of running a combiner to the first 
> spill.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-808) Buffer objects incorrectly serialized to typed bytes

2009-07-27 Thread Klaas Bosteels (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaas Bosteels updated MAPREDUCE-808:
-

Assignee: Klaas Bosteels
  Status: Patch Available  (was: Open)

> Buffer objects incorrectly serialized to typed bytes
> 
>
> Key: MAPREDUCE-808
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-808
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: Klaas Bosteels
>Assignee: Klaas Bosteels
> Attachments: MAPREDUCE-808.patch
>
>
> {{TypedBytesOutput.write()}} should do something like
> {code}
> Buffer buf = (Buffer) obj;
> writeBytes(buf.get(), 0, bug.getCount());
> {code}
> instead of
> {code}
> writeBytes(((Buffer) obj).get());
> {code}
> since the bytes returned by {{Buffer.get()}} are "only valid between 0 and 
> getCount() - 1".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-808) Buffer objects incorrectly serialized to typed bytes

2009-07-27 Thread Klaas Bosteels (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaas Bosteels updated MAPREDUCE-808:
-

Attachment: MAPREDUCE-808.patch

> Buffer objects incorrectly serialized to typed bytes
> 
>
> Key: MAPREDUCE-808
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-808
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.21.0
>Reporter: Klaas Bosteels
> Attachments: MAPREDUCE-808.patch
>
>
> {{TypedBytesOutput.write()}} should do something like
> {code}
> Buffer buf = (Buffer) obj;
> writeBytes(buf.get(), 0, bug.getCount());
> {code}
> instead of
> {code}
> writeBytes(((Buffer) obj).get());
> {code}
> since the bytes returned by {{Buffer.get()}} are "only valid between 0 and 
> getCount() - 1".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-760) TestNodeRefresh might not work as expected

2009-07-27 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-760:
-

Attachment: MAPREDUCE-760-v1.0.patch

Attaching a straight forward fix. Resulf of test-patch
[exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

This is a testcase change and hence no ant-tests required

> TestNodeRefresh might not work as expected
> --
>
> Key: MAPREDUCE-760
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-760
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Amar Kamat
> Attachments: MAPREDUCE-760-v1.0.patch
>
>
> MAPREDUCE-677 fixed one part of the problem. It is possible that the 
> tasktracker might not have joined the jobtracker and hence the asserts might 
> fail.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-808) Buffer objects incorrectly serialized to typed bytes

2009-07-27 Thread Klaas Bosteels (JIRA)
Buffer objects incorrectly serialized to typed bytes


 Key: MAPREDUCE-808
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-808
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: Klaas Bosteels


{{TypedBytesOutput.write()}} should do something like

{code}
Buffer buf = (Buffer) obj;
writeBytes(buf.get(), 0, bug.getCount());
{code}

instead of

{code}
writeBytes(((Buffer) obj).get());
{code}

since the bytes returned by {{Buffer.get()}} are "only valid between 0 and 
getCount() - 1".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-805) Deadlock in Jobtracker

2009-07-27 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-805:
-

Attachment: MAPREDUCE-805-v1.1.patch

Attaching a patch that should fix this. Result of test-patch 
[exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 12 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Running ant-tests.

> Deadlock in Jobtracker
> --
>
> Key: MAPREDUCE-805
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-805
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Michael Tamm
> Attachments: MAPREDUCE-805-v1.1.patch
>
>
> We are running a hadoop cluster (version 0.20.0) and have detected the 
> following deadlock on our jobtracker:
> {code}
> "IPC Server handler 51 on 9001":
>   at 
> org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
>   - waiting to lock <0x7f2b6fb46130> (a 
> org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
>   - locked <0x7f2b5f026000> (a org.apache.hadoop.mapred.JobTracker)
>   at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>  "pool-1-thread-2":
>   at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
>   - waiting to lock <0x7f2b5f026000> (a 
> org.apache.hadoop.mapred.JobTracker)
>   at 
> org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
>   - locked <0x7f2b6fb46130> (a org.apache.hadoop.mapred.JobInProgress)
>   at 
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-807) Stray user files in mapred.system.dir with permissions other than 777 can prevent the jobtracker from starting up.

2009-07-27 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-807:
-

Attachment: MAPRED-807-v1.1.patch

Attaching a patch that solves the issue. Result of test-patch
[exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


> Stray user files in mapred.system.dir with permissions other than 777 can 
> prevent the jobtracker from starting up.
> --
>
> Key: MAPREDUCE-807
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-807
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Amar Kamat
>Assignee: Amar Kamat
>Priority: Blocker
> Attachments: MAPRED-807-v1.1.patch
>
>
> With restart disabled, the jobtracker does a _rm -rf_ of the 
> mapred.system.dir. If the mapred.system.dir contains user files with 
> permissions other than 777 then the jobtracker gets stuck in a loop trying to 
> delete the mapred.system.dir (and each time failing with 
> AccessControlException).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-181) mapred.system.dir should be accessible only to hadoop daemons

2009-07-27 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735547#action_12735547
 ] 

Amar Kamat commented on MAPREDUCE-181:
--

[patch2|https://issues.apache.org/jira/secure/attachment/12414412/hadoop-3578-branch-20-example-2.patch]
 assumes that 
[patch1|https://issues.apache.org/jira/secure/attachment/12410472/hadoop-3578-branch-20-example.patch]
 is applied.

> mapred.system.dir should be accessible only to hadoop daemons 
> --
>
> Key: MAPREDUCE-181
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Attachments: hadoop-3578-branch-20-example-2.patch, 
> hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
> HADOOP-3578-v2.7.patch
>
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job 
> details. Hence the {{mapred.system.dir}} has the permissions of 
> {{rwx-wx-wx}}. This could be a security loophole where the job files might 
> get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-491) RAgzip: multiple map tasks for a large gzipped file

2009-07-27 Thread Daehyun Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daehyun Kim updated MAPREDUCE-491:
--

Attachment: MAPREDUCE-491.patch

To remove the RAGZIPInputFormat and the RAGZIPLineRecordReader class.
I modify the FileInputFormat/TextInputFormat class and the LineRecordReader 
class.

Q: This patch depends on HADOOP-6153. 
But I think the hudson tests this patch without merging HADOOP-6153.
Could I submit this patch?

> RAgzip: multiple map tasks for a large gzipped file
> ---
>
> Key: MAPREDUCE-491
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-491
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Daehyun Kim
>Assignee: Daehyun Kim
>Priority: Minor
> Attachments: HADOOP-4652-v2.patch, HADOOP-4652-v3.patch, 
> HADOOP-4652.path, MAPREDUCE-491.patch
>
>
> Currently, the hadoop processes gzipped files with only one map.
> We have made a patch that enables multiple map tasks for one large gzipped 
> file. We call the patch RAgzip.
> To process multiple map tasks for gzipped file, you may use RAgzip by just 
> changing InputFormat to RAGZIPInputFormat.
> The option used in RAGZIPInputFormat can be found at the javadoc of 
> RAGZIPInputFormat part.
> RAgzip uses zlib's inflatePrime function which supports random access on a 
> gzipped file. 
> Since the inflatePrime is supported from the version of 1.2.2.4, it requires 
> zlib 1.2.2.4 or higher. (We tested on zlib 1.2.3)
> RAgzip requires the preprocessing step that creates an access point (.ap) 
> file, which is like the index of the gzipped file chunks. 
> The access point(.ap) file is located in same path of the gzipped file.
> If there is a "/user/hadoop/test.gz", the .ap file is created with 
> "/user/hadoop/test.gz.ap".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-383) pipes combiner does not reset properly after a spill

2009-07-27 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-383:
-

   Resolution: Fixed
Fix Version/s: 0.20.1
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I committed this to trunk and branch 0.20. Thanks Christian!

> pipes combiner does not reset properly after a spill
> 
>
> Key: MAPREDUCE-383
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-383
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Christian Kunz
>Assignee: Christian Kunz
> Fix For: 0.20.1
>
> Attachments: patch.HADOOP-6070
>
>
> When using a pipes combiner, the variable numBytes is not reset to 0 in 
> spillAll, effectively reducing the effect of running a combiner to the first 
> spill.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.