date:20100304


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-1435:


Status: Open  (was: Patch Available)

Canceling patch to incorporate review comments.

 symlinks in cwd of the task are not handled properly after MAPREDUCE-896
 

 Key: MAPREDUCE-1435
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1435
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1435.patch, 1435.v1.patch, 1435.v2.patch, 1435.v3.patch, 
 MR-1435-y20s.patch


 With JVM reuse, TaskRunner.setupWorkDir() lists the contents of workDir and 
 does a fs.delete on each path listed. If the listed file is a symlink to 
 directory, it will delete the contents of those linked directories. This 
 would delete files from distributed cache and jars directory,if 
 mapred.create.symlink is true.
 Changing ownership/permissions of symlinks through ENABLE_TASK_FOR_CLEANUP 
 would change ownership/permissions of underlying files.
 This is observed by Karam while running streaming jobs with DistributedCache 
 and jvm reuse.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1435) symlinks in cwd of the task are not handled properly after MAPREDUCE-896


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-1435:


Status: Patch Available  (was: Open)

Running through Hudson.

 symlinks in cwd of the task are not handled properly after MAPREDUCE-896
 

 Key: MAPREDUCE-1435
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1435
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1435.patch, 1435.v1.patch, 1435.v2.patch, 1435.v3.patch, 
 1435.v4.patch, MR-1435-y20s.patch


 With JVM reuse, TaskRunner.setupWorkDir() lists the contents of workDir and 
 does a fs.delete on each path listed. If the listed file is a symlink to 
 directory, it will delete the contents of those linked directories. This 
 would delete files from distributed cache and jars directory,if 
 mapred.create.symlink is true.
 Changing ownership/permissions of symlinks through ENABLE_TASK_FOR_CLEANUP 
 would change ownership/permissions of underlying files.
 This is observed by Karam while running streaming jobs with DistributedCache 
 and jvm reuse.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1435) symlinks in cwd of the task are not handled properly after MAPREDUCE-896


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841167#action_12841167
 ] 

Hemanth Yamijala commented on MAPREDUCE-1435:
-

Output of test-patch:

{noformat}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 18 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
{noformat}

 symlinks in cwd of the task are not handled properly after MAPREDUCE-896
 

 Key: MAPREDUCE-1435
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1435
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1435.patch, 1435.v1.patch, 1435.v2.patch, 1435.v3.patch, 
 1435.v4.patch, MR-1435-y20s.patch


 With JVM reuse, TaskRunner.setupWorkDir() lists the contents of workDir and 
 does a fs.delete on each path listed. If the listed file is a symlink to 
 directory, it will delete the contents of those linked directories. This 
 would delete files from distributed cache and jars directory,if 
 mapred.create.symlink is true.
 Changing ownership/permissions of symlinks through ENABLE_TASK_FOR_CLEANUP 
 would change ownership/permissions of underlying files.
 This is observed by Karam while running streaming jobs with DistributedCache 
 and jvm reuse.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1421) LinuxTaskController tests failing on trunk after the commit of MAPREDUCE-1385

2010-03-04 Thread Amareshwari Sriramadasu (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1421:
---

Attachment: patch-1421-ydist.txt

Patch for Yahoo! distribution

 LinuxTaskController tests failing on trunk after the commit of MAPREDUCE-1385
 -

 Key: MAPREDUCE-1421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1421
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task-controller, tasktracker, test
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.22.0

 Attachments: patch-1421-1.txt, patch-1421-2.txt, 
 patch-1421-ydist.txt, patch-1421.txt, TestJobExecutionAsDifferentUser.patch


 The following tests fail, in particular:
  - TestDebugScriptWithLinuxTaskController
  - TestJobExecutionAsDifferentUser
  - TestPipesAsDifferentUser
  - TestKillSubProcessesWithLinuxTaskController

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1542) Deprecate mapred.permissions.supergroup in favor of hadoop.cluster.administrators

2010-03-04 Thread Ravi Gummadi (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841177#action_12841177
 ] 

Ravi Gummadi commented on MAPREDUCE-1542:
-

OK. Planning to make  the config property mapred.permissions.supergroup work. 
This is done by doing some code changes spicifically for this config property's 
compatinility. This should be fine as we need this config property only when 
daemons are starting.

 Deprecate mapred.permissions.supergroup in favor of 
 hadoop.cluster.administrators
 -

 Key: MAPREDUCE-1542
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1542
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Reporter: Vinod K V
Assignee: Ravi Gummadi
 Fix For: 0.22.0


 HADOOP-6568 added the configuration {{hadoop.cluster.administrators}} through 
 which admins can configure who the superusers/supergroups for the cluster 
 are. MAPREDUCE itself already has {{mapred.permissions.supergroup}} (which is 
 just a single group). As agreed upon at HADOOP-6568, this should be 
 deprecated in favor of {{hadoop.cluster.administrators}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-927) Cleanup of task-logs should happen in TaskTracker instead of the Child

[
https://issues.apache.org/jira/browse/MAPREDUCE-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841210#action_12841210
]

Hadoop QA commented on MAPREDUCE-927:
-

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12437694/patch-927-2.txt
against trunk revision 918864.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 17 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/17/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/17/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/17/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/17/console

This message is automatically generated.

Cleanup of task-logs should happen in TaskTracker instead of the Child
--

Key: MAPREDUCE-927
URL: https://issues.apache.org/jira/browse/MAPREDUCE-927
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Vinod K V
Assignee: Amareshwari Sriramadasu
Priority: Blocker
Fix For: 0.22.0

Attachments: patch-927-1.txt, patch-927-2.txt, patch-927.txt

Task logs' cleanup is being done in Child now. This is undesirable atleast
for two reasons: 1) failures while cleaning up will affect the user's tasks,
and 2) the task's wall time will get affected due to operations that TT
actually should own.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1523) Sometimes rumen trace generator fails to extract the job finish time.

[
https://issues.apache.org/jira/browse/MAPREDUCE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841214#action_12841214
]

Hadoop QA commented on MAPREDUCE-1523:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12437218/mapreduce-1523--2010-02-25.patch
against trunk revision 918864.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 13 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/498/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/498/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/498/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/498/console

This message is automatically generated.

Sometimes rumen trace generator fails to extract the job finish time.
-

Key: MAPREDUCE-1523
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1523
Project: Hadoop Map/Reduce
Issue Type: Bug
Reporter: Hong Tang
Assignee: Dick King
Attachments: mapreduce-1523--2010-02-24.patch,
mapreduce-1523--2010-02-25.patch

We saw sometimes (not very often) that rumen may fail to extract the job
finish time from Hadoop 0.20 history log.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1553) mapred.userlog.retain.hours is improperly renamed in MAPREDUCE-849

[
https://issues.apache.org/jira/browse/MAPREDUCE-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841222#action_12841222
]

Hadoop QA commented on MAPREDUCE-1553:
--

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12437697/patch-1553.txt
against trunk revision 918864.

+1 @author. The patch does not contain any @author tags.

+0 tests included. The patch appears to be a documentation patch that
doesn't require tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h9.grid.sp2.yahoo.net/3/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h9.grid.sp2.yahoo.net/3/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h9.grid.sp2.yahoo.net/3/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h9.grid.sp2.yahoo.net/3/console

This message is automatically generated.

mapred.userlog.retain.hours is improperly renamed in MAPREDUCE-849
--

Key: MAPREDUCE-1553
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1553
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: documentation
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Priority: Blocker
Fix For: 0.21.0

Attachments: patch-1553.txt

mapred.userlog.retain.hours is renamed as mapred.task.userlog.retain.hours in
JobContext. But, in mapred-default, it is mapreduce.task.userlog.retain.hours.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-890) After HADOOP-4491, the user who started mapred system is not able to run job.

[
https://issues.apache.org/jira/browse/MAPREDUCE-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841226#action_12841226
]

Hadoop QA commented on MAPREDUCE-890:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12437589/MR890.v1.1.patch
against trunk revision 918864.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 24 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/342/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/342/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/342/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/342/console

This message is automatically generated.

After HADOOP-4491, the user who started mapred system is not able to run job.
-

Key: MAPREDUCE-890
URL: https://issues.apache.org/jira/browse/MAPREDUCE-890
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: tasktracker
Reporter: Karam Singh
Assignee: Ravi Gummadi
Priority: Blocker
Fix For: 0.21.0

Attachments: MAPREDUCE-890-20090904.txt, MAPREDUCE-890-20090909.txt,
MR890.patch, MR890.v1.1.patch, MR890.v1.patch

Even setup and cleanup task of job fails due exception -: It fails to create
job and related directories under mapred.local.dir/taskTracker/jobcache
Directories are created as -:
[dr-xrws--- mapred hadoop ] job_200908190916_0002
mapred is not wrtie under this. Even manually I failed to touch file.
mapred is use of started mr cluster

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1435) symlinks in cwd of the task are not handled properly after MAPREDUCE-896

2010-03-04 Thread Ravi Gummadi (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841341#action_12841341
 ] 

Ravi Gummadi commented on MAPREDUCE-1435:
-

Patch looks good.
+1

 symlinks in cwd of the task are not handled properly after MAPREDUCE-896
 

 Key: MAPREDUCE-1435
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1435
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1435.patch, 1435.v1.patch, 1435.v2.patch, 1435.v3.patch, 
 1435.v4.patch, MR-1435-y20s.patch


 With JVM reuse, TaskRunner.setupWorkDir() lists the contents of workDir and 
 does a fs.delete on each path listed. If the listed file is a symlink to 
 directory, it will delete the contents of those linked directories. This 
 would delete files from distributed cache and jars directory,if 
 mapred.create.symlink is true.
 Changing ownership/permissions of symlinks through ENABLE_TASK_FOR_CLEANUP 
 would change ownership/permissions of underlying files.
 This is observed by Karam while running streaming jobs with DistributedCache 
 and jvm reuse.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1408) Allow customization of job submission policies

[
https://issues.apache.org/jira/browse/MAPREDUCE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841344#action_12841344
]

Hadoop QA commented on MAPREDUCE-1408:
--

+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12437859/1408-4.patch
against trunk revision 918864.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 9 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/18/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/18/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/18/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/18/console

This message is automatically generated.

Allow customization of job submission policies
--

Key: MAPREDUCE-1408
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1408
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: contrib/gridmix
Reporter: rahul k singh
Attachments: 1408-1.patch, 1408-2.patch, 1408-2.patch,
1408-20-2.patch, 1408-20-3.patch, 1408-20.patch, 1408-3.patch, 1408-4.patch

Currently, gridmix3 replay job submission faithfully. For evaluation
purposes, it would be great if we can support other job submission policies
such as sequential job submission, or stress job submission.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1512) RAID could use HarFileSystem directly instead of FileSystem.get

[
https://issues.apache.org/jira/browse/MAPREDUCE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841358#action_12841358
]

Hadoop QA commented on MAPREDUCE-1512:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12437733/MAPREDUCE-1512.1.patch
against trunk revision 918864.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/499/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/499/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/499/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/499/console

This message is automatically generated.

RAID could use HarFileSystem directly instead of FileSystem.get
---

Key: MAPREDUCE-1512
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1512
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/raid
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
Priority: Minor
Attachments: MAPREDUCE-1512.1.patch, MAPREDUCE-1512.patch

Makes the code run slightly faster and avoids possible problems in matching
the right filesystem like the stale cache reported in HADOOP-6097.
This is a minor improvement for trunk, but it is really helpful for people
running RAID on earlier releases susceptible to HADOOP-6097, since RAID would
crash on them.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1548) Hadoop archives should be able to preserve times and other properties from original files

2010-03-04 Thread Mahadev konar (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841372#action_12841372
 ] 

Mahadev konar commented on MAPREDUCE-1548:
--

HADOOP-6591 has been created for this. We can fix that with avro or we can fix 
that with url encoding in the filenames (both leading to upping the version).

 Hadoop archives should be able to preserve times and other properties from 
 original files
 -

 Key: MAPREDUCE-1548
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1548
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: harchive
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt

 Files inside hadoop archives don't keep their original:
 - modification time
 - access time
 - permission
 - owner
 - group
 all such properties are currently taken from the file storing the archive 
 index, and not the stored files. This doesn't look very correct.
 There should be possible to preserve the original properties of the stored 
 files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1512) RAID could use HarFileSystem directly instead of FileSystem.get


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841377#action_12841377
 ] 

Rodrigo Schmidt commented on MAPREDUCE-1512:


This patch is mostly refactoring the code. It simplifies some things, and 
optimizes others. I don't think we need new tests since there is no bug or new 
feature associated with it.

 RAID could use HarFileSystem directly instead of FileSystem.get
 ---

 Key: MAPREDUCE-1512
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1512
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
Priority: Minor
 Attachments: MAPREDUCE-1512.1.patch, MAPREDUCE-1512.patch


 Makes the code run slightly faster and avoids possible problems in matching 
 the right filesystem like the stale cache reported in HADOOP-6097.
 This is a minor improvement for trunk, but it is really helpful for people 
 running RAID on earlier releases susceptible to HADOOP-6097, since RAID would 
 crash on them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841428#action_12841428
 ] 

Hudson commented on MAPREDUCE-1501:
---

Integrated in Hadoop-Mapreduce-trunk #248 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/248/])
. FileInputFormat supports multi-level, recursive 
directory listing.  (Zheng Shao via dhruba)


 FileInputFormat to support multi-level/recursive directory listing
 --

 Key: MAPREDUCE-1501
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1501.1.branch-0.20.patch, 
 MAPREDUCE-1501.1.trunk.patch


 As we have seen multiple times in the mailing list, users want to have the 
 capability of getting all files out of a multi-level directory structure.
 4/1/2008: 
 http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e
 2/3/2009: 
 http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e
 6/2/2009: 
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e
 One solution that our users had is to write a new FileInputFormat, but that 
 means all existing FileInputFormat subclasses need to be changed in order to 
 support this feature.
 We can easily provide a JobConf option (which defaults to false) to 
 {{FileInputFormat.listStatus(...)}} to recursively go into directory 
 structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1454) The servlets should quote server generated strings sent in the response


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841426#action_12841426
 ] 

Hudson commented on MAPREDUCE-1454:
---

Integrated in Hadoop-Mapreduce-trunk #248 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/248/])
. Quote user supplied strings in Tracker servlets.


 The servlets should quote server generated strings sent in the response
 ---

 Key: MAPREDUCE-1454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1454
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 0.22.0
Reporter: Devaraj Das
Assignee: Chris Douglas
 Fix For: 0.22.0

 Attachments: M1454-0y20.patch, M1454-1.patch, M1454-1y20.patch, 
 M1454-2.patch, mr-1454-trunk-v1.patch


 This is related to HADOOP-6151 but for output. We need to go through all the 
 servlets/jsps and pass all the response strings that could be based on the 
 incoming request or user's data through a filter (implemented in HADOOP-6151).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1455) Authorization for servlets


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841427#action_12841427
 ] 

Hudson commented on MAPREDUCE-1455:
---

Integrated in Hadoop-Mapreduce-trunk #248 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/248/])


 Authorization for servlets
 --

 Key: MAPREDUCE-1455
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1455
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobtracker, security, tasktracker
Reporter: Devaraj Das
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1455.20S.2.fix.patch, 1455.20S.2.patch, 1455.patch, 
 1455.v1.patch, 1455.v2.patch, 1455.v3.patch, 1455.v4.1.patch, 
 1455.v4.2.patch, 1455.v4.patch


 This jira is about building the authorization for servlets (on top of 
 MAPREDUCE-1307). That is, the JobTracker/TaskTracker runs authorization 
 checks on web requests based on the configured job permissions. For e.g., if 
 the job permission is 600, then no one except the authenticated user can look 
 at the job details via the browser. The authenticated user in the servlet can 
 be obtained using the HttpServletRequest method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1510) RAID should regenerate parity files if they get deleted


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841429#action_12841429
 ] 

Hudson commented on MAPREDUCE-1510:
---

Integrated in Hadoop-Mapreduce-trunk #248 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/248/])


 RAID should regenerate parity files if they get deleted
 ---

 Key: MAPREDUCE-1510
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1510
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
 Attachments: MAPREDUCE-1510.1.patch, MAPREDUCE-1510.2.patch, 
 MAPREDUCE-1510.patch


 Currently, if a source file has a replication factor lower or equal to that 
 expected by RAID, the file is skipped and no parity file is generated. I 
 don't think this is a good behavior since parity files can get wrongly 
 deleted, leaving the source file with a low replication factor. In that case, 
 raid should be able to recreate the parity file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1385) Make changes to MapReduce for the new UserGroupInformation APIs (HADOOP-6299)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841431#action_12841431
 ] 

Hudson commented on MAPREDUCE-1385:
---

Integrated in Hadoop-Mapreduce-trunk #248 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/248/])


 Make changes to MapReduce for the new UserGroupInformation APIs (HADOOP-6299)
 -

 Key: MAPREDUCE-1385
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1385
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: mr-6299.3.patch, mr-6299.7.patch, mr-6299.8.patch, 
 mr-6299.patch


 This is about moving the MapReduce code to use the new UserGroupInformation 
 API as described in HADOOP-6299.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1309) I want to change the rumen job trace generator to use a more modular internal structure, to allow for more input log formats


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841435#action_12841435
 ] 

Hudson commented on MAPREDUCE-1309:
---

Integrated in Hadoop-Mapreduce-trunk #248 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/248/])


 I want to change the rumen job trace generator to use a more modular internal 
 structure, to allow for more input log formats 
 -

 Key: MAPREDUCE-1309
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1309
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Dick King
Assignee: Dick King
 Fix For: 0.22.0

 Attachments: demuxer-plus-concatenated-files--2009-12-21.patch, 
 demuxer-plus-concatenated-files--2010-01-06.patch, 
 demuxer-plus-concatenated-files--2010-01-08-b.patch, 
 demuxer-plus-concatenated-files--2010-01-08-c.patch, 
 demuxer-plus-concatenated-files--2010-01-08-d.patch, 
 demuxer-plus-concatenated-files--2010-01-08.patch, 
 demuxer-plus-concatenated-files--2010-01-11.patch, 
 mapreduce-1309--2009-01-14-a.patch, mapreduce-1309--2009-01-14.patch, 
 mapreduce-1309--2010-01-20.patch, mapreduce-1309--2010-02-03.patch, 
 mapreduce-1309--2010-02-04.patch, mapreduce-1309--2010-02-10.patch, 
 mapreduce-1309--2010-02-12.patch, mapreduce-1309--2010-02-16-a.patch, 
 mapreduce-1309--2010-02-16.patch, mapreduce-1309--2010-02-17.patch


 There are two orthogonal questions to answer when processing a job tracker 
 log: how will the logs and the xml configuration files be packaged, and in 
 which release of hadoop map/reduce were the logs generated?  The existing 
 rumen only has a couple of answers to this question.  The new engine will 
 handle three answers to the version question: 0.18, 0.20 and current, and two 
 answers to the packaging question: separate files with names derived from the 
 job ID, and concatenated files with a header between sections [used for 
 easier file interchange].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1523) Sometimes rumen trace generator fails to extract the job finish time.

2010-03-04 Thread Dick King (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841438#action_12841438
 ] 

Dick King commented on MAPREDUCE-1523:
--

I seem to have gotten zero test failures [ 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/498/testReport/
 ] but got busted anyway.

Huh?


 Sometimes rumen trace generator fails to extract the job finish time.
 -

 Key: MAPREDUCE-1523
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1523
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Hong Tang
Assignee: Dick King
 Attachments: mapreduce-1523--2010-02-24.patch, 
 mapreduce-1523--2010-02-25.patch


 We saw sometimes (not very often) that rumen may fail to extract the job 
 finish time from Hadoop 0.20 history log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1270) Hadoop C++ Extention

2010-03-04 Thread Owen O'Malley (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841468#action_12841468
]

Owen O'Malley commented on MAPREDUCE-1270:
--

By the way, here is an archive of the message that I sent back in Nov 07
comparing the performance of Java, pipes, and streaming.

http://www.mail-archive.com/hadoop-u...@lucene.apache.org/msg02961.html

Especially by reimplementing the sort and shuffle, you should be able to get
much faster than Java. *smile*

Hadoop C++ Extention

Key: MAPREDUCE-1270
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1270
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: task
Affects Versions: 0.20.1
Environment: hadoop linux
Reporter: Wang Shouyan

Hadoop C++ extension is an internal project in baidu, We start it for these
reasons:
1 To provide C++ API. We mostly use Streaming before, and we also try to
use PIPES, but we do not find PIPES is more efficient than Streaming. So we
think a new C++ extention is needed for us.
2 Even using PIPES or Streaming, it is hard to control memory of hadoop
map/reduce Child JVM.
3 It costs so much to read/write/sort TB/PB data by Java. When using
PIPES or Streaming, pipe or socket is not efficient to carry so huge data.
What we want to do:
1 We do not use map/reduce Child JVM to do any data processing, which just
prepares environment, starts C++ mapper, tells mapper which split it should
deal with, and reads report from mapper until that finished. The mapper will
read record, ivoke user defined map, to do partition, write spill, combine
and merge into file.out. We think these operations can be done by C++ code.
2 Reducer is similar to mapper, it was started after sort finished, it
read from sorted files, ivoke user difined reduce, and write to user defined
record writer.
3 We also intend to rewrite shuffle and sort with C++, for efficience and
memory control.
at first, 1 and 2, then 3.
What's the difference with PIPES:
1 Yes, We will reuse most PIPES code.
2 And, We should do it more completely, nothing changed in scheduling and
management, but everything in execution.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1538) TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit

2010-03-04 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841506#action_12841506
 ] 

dhruba borthakur commented on MAPREDUCE-1538:
-

Code look good. I will commit this patch

 TrackerDistributedCacheManager can fail because the number of subdirectories 
 reaches system limit
 -

 Key: MAPREDUCE-1538
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1538
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1538.patch


 TrackerDistributedCacheManager deletes the cached files when the size goes up 
 to a configured number.
 But there is no such limit for the number of subdirectories. Therefore the 
 number of subdirectories may grow large and exceed system limit.
 This will make TT cannot create directory when getLocalCache and fails the 
 tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1512) RAID could use HarFileSystem directly instead of FileSystem.get

2010-03-04 Thread dhruba borthakur (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated MAPREDUCE-1512:


   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks Rodrigo!

 RAID could use HarFileSystem directly instead of FileSystem.get
 ---

 Key: MAPREDUCE-1512
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1512
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1512.1.patch, MAPREDUCE-1512.patch


 Makes the code run slightly faster and avoids possible problems in matching 
 the right filesystem like the stale cache reported in HADOOP-6097.
 This is a minor improvement for trunk, but it is really helpful for people 
 running RAID on earlier releases susceptible to HADOOP-6097, since RAID would 
 crash on them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1512) RAID could use HarFileSystem directly instead of FileSystem.get


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841515#action_12841515
 ] 

Rodrigo Schmidt commented on MAPREDUCE-1512:



Thanks, Dhruba!

Now I'll submit the patch for MAPREDUCE-1518.

Cheers,
Rodrigo







 RAID could use HarFileSystem directly instead of FileSystem.get
 ---

 Key: MAPREDUCE-1512
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1512
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1512.1.patch, MAPREDUCE-1512.patch


 Makes the code run slightly faster and avoids possible problems in matching 
 the right filesystem like the stale cache reported in HADOOP-6097.
 This is a minor improvement for trunk, but it is really helpful for people 
 running RAID on earlier releases susceptible to HADOOP-6097, since RAID would 
 crash on them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1512) RAID could use HarFileSystem directly instead of FileSystem.get


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841534#action_12841534
 ] 

Hudson commented on MAPREDUCE-1512:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #260 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/260/])
. RAID uses HarFileSystem directly instead of
FileSystem.get (Rodrigo Schmidt via dhruba)


 RAID could use HarFileSystem directly instead of FileSystem.get
 ---

 Key: MAPREDUCE-1512
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1512
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1512.1.patch, MAPREDUCE-1512.patch


 Makes the code run slightly faster and avoids possible problems in matching 
 the right filesystem like the stale cache reported in HADOOP-6097.
 This is a minor improvement for trunk, but it is really helpful for people 
 running RAID on earlier releases susceptible to HADOOP-6097, since RAID would 
 crash on them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1120) JobClient poll intervals should be job configurations, not cluster configurations

2010-03-04 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841614#action_12841614
 ] 

Todd Lipcon commented on MAPREDUCE-1120:


bq. If polling intervals are job-level configuration parameters, Job. 
getCompletionPollInterval(conf) and Job.getProgressPollInterval(conf) should be 
not static methods and should not take configuration as the parameter. The 
methods should read the values from Job's conf directly. 

OK. Do we need to maintain compatibility on these static functions, since 
they're a public API? (eg mark the static ones deprecated, then make non-static 
ones that forward to the static ones for now)

 JobClient poll intervals should be job configurations, not cluster 
 configurations
 -

 Key: MAPREDUCE-1120
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1120
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Minor
 Attachments: mapreduce-1120.txt


 Job.waitForCompletion gets the poll interval from the Cluster object's 
 configuration rather than its own Job configuration. This is 
 counter-intuitive - Chris and I both made this same mistake working on 
 MAPREDUCE-64, and Aaron agrees as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1518) On contrib/raid, the RaidNode currently runs the deletion check for parity files on directories too. It would be better if it didn't.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Schmidt updated MAPREDUCE-1518:
---

Attachment: MAPREDUCE-1518.0.patch

 On contrib/raid, the RaidNode currently runs the deletion check for parity 
 files on directories too. It would be better if it didn't.
 -

 Key: MAPREDUCE-1518
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1518
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
 Environment: On contrib/raid, the RaidNode currently runs the 
 deletion check for parity files on directories too. It runs okay because the 
 directory is not empty and trying to delete it non-recursively fails, but 
 such failure messages only polute the log file.
 My proposal is the following:
 If recursePurge is checking a directory, it should call itself recursively.
 If it's checking a file, it should do the deletion check.
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
 Attachments: MAPREDUCE-1518.0.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1518) On contrib/raid, the RaidNode currently runs the deletion check for parity files on directories too. It would be better if it didn't.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Schmidt updated MAPREDUCE-1518:
---

Status: Patch Available  (was: Open)

 On contrib/raid, the RaidNode currently runs the deletion check for parity 
 files on directories too. It would be better if it didn't.
 -

 Key: MAPREDUCE-1518
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1518
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
 Environment: On contrib/raid, the RaidNode currently runs the 
 deletion check for parity files on directories too. It runs okay because the 
 directory is not empty and trying to delete it non-recursively fails, but 
 such failure messages only polute the log file.
 My proposal is the following:
 If recursePurge is checking a directory, it should call itself recursively.
 If it's checking a file, it should do the deletion check.
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
 Attachments: MAPREDUCE-1518.0.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1065) Modify the mapred tutorial documentation to use new mapreduce api.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1065:
-

Attachment: MAPREDUCE-1065.2.patch

Attaching patch that addresses the issues from the above review.

 Modify the mapred tutorial documentation to use new mapreduce api.
 --

 Key: MAPREDUCE-1065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Assignee: Aaron Kimball
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1065.2.patch, MAPREDUCE-1065.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1065) Modify the mapred tutorial documentation to use new mapreduce api.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1065:
-

Status: Patch Available  (was: Open)

 Modify the mapred tutorial documentation to use new mapreduce api.
 --

 Key: MAPREDUCE-1065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Assignee: Aaron Kimball
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1065.2.patch, MAPREDUCE-1065.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1065) Modify the mapred tutorial documentation to use new mapreduce api.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-1065:
-

Status: Open  (was: Patch Available)

 Modify the mapred tutorial documentation to use new mapreduce api.
 --

 Key: MAPREDUCE-1065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Assignee: Aaron Kimball
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1065.2.patch, MAPREDUCE-1065.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1480) CombineFileRecordReader does not properly initialize child RecordReader


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841641#action_12841641
 ] 

Aaron Kimball commented on MAPREDUCE-1480:
--

Amareshwari,

Thanks for looking over this patch.

The previous progress calculator was strictly based on the number of sub-splits 
processed. The underlying RecordReader's getProgress() function was never 
called, which means that the granularity of progress was only based around the 
number of subsplits and did not take intra-split progress into account. A 
review from Dhruba is definitely welcome.

I'll add another testcase as you suggest and post this in the next couple of 
days.


 CombineFileRecordReader does not properly initialize child RecordReader
 ---

 Key: MAPREDUCE-1480
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1480
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-1480.2.patch, MAPREDUCE-1480.patch


 CombineFileRecordReader instantiates child RecordReader instances but never 
 calls their initialize() method to give them the proper TaskAttemptContext.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

2010-03-04 Thread Scott Chen (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841653#action_12841653
]

Scott Chen commented on MAPREDUCE-1221:
---

@Arun: Sorry for the very late reply. Dhruba and I have been trying to call you
but it seems you are busy as well.

I think I got your point. The problem is that the bad job will never fail and
its task gets killed and rescheduled again and again which keeps hurting the
cluster. So we should add per task RSS limit in this patch so that we can fail
the bad job. This is just like what we currently do in the trunk for virtual
memory. But we here offer the RSS memory limiting as an option (a trade-off
between memory utilization and stability).

I will make the change and resubmit the patch soon. Thanks again for the help.

Kill tasks on a node if the free physical memory on that machine falls below
a configured threshold
---

Key: MAPREDUCE-1221
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: tasktracker
Affects Versions: 0.22.0
Reporter: dhruba borthakur
Assignee: Scott Chen
Fix For: 0.22.0

Attachments: MAPREDUCE-1221-v1.patch, MAPREDUCE-1221-v2.patch,
MAPREDUCE-1221-v3.patch

The TaskTracker currently supports killing tasks if the virtual memory of a
task exceeds a set of configured thresholds. I would like to extend this
feature to enable killing tasks if the physical memory used by that task
exceeds a certain threshold.
On a certain operating system (guess?), if user space processes start using
lots of memory, the machine hangs and dies quickly. This means that we would
like to prevent map-reduce jobs from triggering this condition. From my
understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were
designed to address this problem. This works well when most map-reduce jobs
are Java jobs and have well-defined -Xmx parameters that specify the max
virtual memory for each task. On the other hand, if each task forks off
mappers/reducers written in other languages (python/php, etc), the total
virtual memory usage of the process-subtree varies greatly. In these cases,
it is better to use kill-tasks-using-physical-memory-limits.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1435) symlinks in cwd of the task are not handled properly after MAPREDUCE-896


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala updated MAPREDUCE-1435:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I committed this patch to trunk. Thanks, Ravi !

 symlinks in cwd of the task are not handled properly after MAPREDUCE-896
 

 Key: MAPREDUCE-1435
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1435
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.22.0
Reporter: Amareshwari Sriramadasu
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: 1435.patch, 1435.v1.patch, 1435.v2.patch, 1435.v3.patch, 
 1435.v4.patch, MR-1435-y20s.patch


 With JVM reuse, TaskRunner.setupWorkDir() lists the contents of workDir and 
 does a fs.delete on each path listed. If the listed file is a symlink to 
 directory, it will delete the contents of those linked directories. This 
 would delete files from distributed cache and jars directory,if 
 mapred.create.symlink is true.
 Changing ownership/permissions of symlinks through ENABLE_TASK_FOR_CLEANUP 
 would change ownership/permissions of underlying files.
 This is observed by Karam while running streaming jobs with DistributedCache 
 and jvm reuse.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1501) FileInputFormat to support multi-level/recursive directory listing

2010-03-04 Thread Chris Douglas (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841658#action_12841658
 ] 

Chris Douglas commented on MAPREDUCE-1501:
--

{noformat}
+import com.sun.org.apache.commons.logging.Log;
+import com.sun.org.apache.commons.logging.LogFactory;
{noformat}
Should these imports be {{org.apache.hadoop.commons.logging}}, not 
{{com.sun...}} ?

Is there a reason this feature was only added to a deprecated class, instead of 
the {{FileInputFormat}} in the {{mapreduce}} package?

 FileInputFormat to support multi-level/recursive directory listing
 --

 Key: MAPREDUCE-1501
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1501
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1501.1.branch-0.20.patch, 
 MAPREDUCE-1501.1.trunk.patch


 As we have seen multiple times in the mailing list, users want to have the 
 capability of getting all files out of a multi-level directory structure.
 4/1/2008: 
 http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200804.mbox/%3ce75c02ef0804011433x144813e6x2450da7883de3...@mail.gmail.com%3e
 2/3/2009: 
 http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200902.mbox/%3c7f80089c-3e7f-4330-90ba-6f1c5b0b0...@nist.gov%3e
 6/2/2009: 
 http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200906.mbox/%3c4a258a16.8050...@darose.net%3e
 One solution that our users had is to write a new FileInputFormat, but that 
 means all existing FileInputFormat subclasses need to be changed in order to 
 support this feature.
 We can easily provide a JobConf option (which defaults to false) to 
 {{FileInputFormat.listStatus(...)}} to recursively go into directory 
 structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1538) TrackerDistributedCacheManager can fail because the number of subdirectories reaches system limit

2010-03-04 Thread Scott Chen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841660#action_12841660
 ] 

Scott Chen commented on MAPREDUCE-1538:
---

Thanks for the help, Dhruba :)

 TrackerDistributedCacheManager can fail because the number of subdirectories 
 reaches system limit
 -

 Key: MAPREDUCE-1538
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1538
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.22.0
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1538.patch


 TrackerDistributedCacheManager deletes the cached files when the size goes up 
 to a configured number.
 But there is no such limit for the number of subdirectories. Therefore the 
 number of subdirectories may grow large and exceed system limit.
 This will make TT cannot create directory when getLocalCache and fails the 
 tasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1408) Allow customization of job submission policies


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1408:
-

Status: Open  (was: Patch Available)

 Allow customization of job submission policies
 --

 Key: MAPREDUCE-1408
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1408
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix
Reporter: rahul k singh
 Attachments: 1408-1.patch, 1408-2.patch, 1408-2.patch, 
 1408-20-2.patch, 1408-20-3.patch, 1408-20.patch, 1408-3.patch, 1408-4.patch


 Currently, gridmix3 replay job submission faithfully. For evaluation 
 purposes, it would be great if we can support other job submission policies 
 such as sequential job submission, or stress job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1408) Allow customization of job submission policies


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1408:
-

Attachment: 1408-5.patch

 Allow customization of job submission policies
 --

 Key: MAPREDUCE-1408
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1408
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix
Reporter: rahul k singh
 Attachments: 1408-1.patch, 1408-2.patch, 1408-2.patch, 
 1408-20-2.patch, 1408-20-3.patch, 1408-20-4.patch, 1408-20.patch, 
 1408-3.patch, 1408-4.patch, 1408-5.patch


 Currently, gridmix3 replay job submission faithfully. For evaluation 
 purposes, it would be great if we can support other job submission policies 
 such as sequential job submission, or stress job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1408) Allow customization of job submission policies


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1408:
-

Attachment: 1408-20-4.patch

A very minute change in DebugJobProducer

 Allow customization of job submission policies
 --

 Key: MAPREDUCE-1408
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1408
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix
Reporter: rahul k singh
 Attachments: 1408-1.patch, 1408-2.patch, 1408-2.patch, 
 1408-20-2.patch, 1408-20-3.patch, 1408-20-4.patch, 1408-20.patch, 
 1408-3.patch, 1408-4.patch, 1408-5.patch


 Currently, gridmix3 replay job submission faithfully. For evaluation 
 purposes, it would be great if we can support other job submission policies 
 such as sequential job submission, or stress job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1408) Allow customization of job submission policies


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1408:
-

Status: Patch Available  (was: Open)

 Allow customization of job submission policies
 --

 Key: MAPREDUCE-1408
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1408
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix
Reporter: rahul k singh
 Attachments: 1408-1.patch, 1408-2.patch, 1408-2.patch, 
 1408-20-2.patch, 1408-20-3.patch, 1408-20-4.patch, 1408-20.patch, 
 1408-3.patch, 1408-4.patch, 1408-5.patch


 Currently, gridmix3 replay job submission faithfully. For evaluation 
 purposes, it would be great if we can support other job submission policies 
 such as sequential job submission, or stress job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1408) Allow customization of job submission policies

2010-03-04 Thread Chris Douglas (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1408:
-

   Resolution: Fixed
Fix Version/s: 0.22.0
 Assignee: rahul k singh
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

This doesn't need to go through Hudson again; the only changes were to 
constants and the relevant test case passes.

+1

I committed this. Thanks, Rahul!

 Allow customization of job submission policies
 --

 Key: MAPREDUCE-1408
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1408
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix
Reporter: rahul k singh
Assignee: rahul k singh
 Fix For: 0.22.0

 Attachments: 1408-1.patch, 1408-2.patch, 1408-2.patch, 
 1408-20-2.patch, 1408-20-3.patch, 1408-20-4.patch, 1408-20.patch, 
 1408-3.patch, 1408-4.patch, 1408-5.patch


 Currently, gridmix3 replay job submission faithfully. For evaluation 
 purposes, it would be great if we can support other job submission policies 
 such as sequential job submission, or stress job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1518) On contrib/raid, the RaidNode currently runs the deletion check for parity files on directories too. It would be better if it didn't.

[
https://issues.apache.org/jira/browse/MAPREDUCE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841677#action_12841677
]

Hadoop QA commented on MAPREDUCE-1518:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12437948/MAPREDUCE-1518.0.patch
against trunk revision 919173.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/20/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/20/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/20/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/20/console

This message is automatically generated.

On contrib/raid, the RaidNode currently runs the deletion check for parity
files on directories too. It would be better if it didn't.
-

Key: MAPREDUCE-1518
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1518
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/raid
Environment: On contrib/raid, the RaidNode currently runs the
deletion check for parity files on directories too. It runs okay because the
directory is not empty and trying to delete it non-recursively fails, but
such failure messages only polute the log file.
My proposal is the following:
If recursePurge is checking a directory, it should call itself recursively.
If it's checking a file, it should do the deletion check.
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
Attachments: MAPREDUCE-1518.0.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2010-03-04 Thread BitsOfInfo (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Patch Available  (was: Open)

regtriggering hudson on latest patch, could not see output from last run


 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.2, 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1560) Better diagnostic message for tasks killed for going over vmem limit

2010-03-04 Thread Arun C Murthy (JIRA)

Better diagnostic message for tasks killed for going over vmem limit


 Key: MAPREDUCE-1560
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1560
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy
 Fix For: 0.22.0


Currently the user has no indication of his tasks getting killed due to vmem 
limit, the only way to know is by looking at TT logs. We should get the TT to 
insert a diagnostic string for the task to indicate this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2010-03-04 Thread BitsOfInfo (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Open  (was: Patch Available)

regtriggering hudson on latest patch, could not see output from last run


 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.2, 0.20.1
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch, MAPREDUCE-1176-v4.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1408) Allow customization of job submission policies


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841688#action_12841688
 ] 

Hudson commented on MAPREDUCE-1408:
---

Integrated in Hadoop-Mapreduce-trunk-Commit #262 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/262/])
. Add customizable job submission policies to Gridmix. Contributed by Rahul 
Singh


 Allow customization of job submission policies
 --

 Key: MAPREDUCE-1408
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1408
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/gridmix
Reporter: rahul k singh
Assignee: rahul k singh
 Fix For: 0.22.0

 Attachments: 1408-1.patch, 1408-2.patch, 1408-2.patch, 
 1408-20-2.patch, 1408-20-3.patch, 1408-20-4.patch, 1408-20.patch, 
 1408-3.patch, 1408-4.patch, 1408-5.patch


 Currently, gridmix3 replay job submission faithfully. For evaluation 
 purposes, it would be great if we can support other job submission policies 
 such as sequential job submission, or stress job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1493) Authorization for job-history pages

2010-03-04 Thread Vinod K V (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-1493:
-

Status: Patch Available  (was: Open)

 Authorization for job-history pages
 ---

 Key: MAPREDUCE-1493
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1493
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobtracker, security
Reporter: Vinod K V
Assignee: Vinod K V
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1493-20100222.1.txt, 
 MAPREDUCE-1493-20100225.2.txt, MAPREDUCE-1493-20100226.1.txt, 
 MAPREDUCE-1493-20100227.2-ydist.txt, MAPREDUCE-1493-20100227.3-ydist.txt, 
 MAPREDUCE-1493-20100301.1.txt, MAPREDUCE-1493-20100304.txt


 MAPREDUCE-1455 introduces authorization for most of the Map/Reduce jsp pages 
 and servlets, but left history pages. This JIRA will make sure that 
 authorization checks are made while accessing job-history pages also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1065) Modify the mapred tutorial documentation to use new mapreduce api.

[
https://issues.apache.org/jira/browse/MAPREDUCE-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841716#action_12841716
]

Hadoop QA commented on MAPREDUCE-1065:
--

+1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12437951/MAPREDUCE-1065.2.patch
against trunk revision 919268.

+1 @author. The patch does not contain any @author tags.

+0 tests included. The patch appears to be a documentation patch that
doesn't require tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/501/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/501/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/501/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/501/console

This message is automatically generated.

Modify the mapred tutorial documentation to use new mapreduce api.
--

Key: MAPREDUCE-1065
URL: https://issues.apache.org/jira/browse/MAPREDUCE-1065
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: documentation
Affects Versions: 0.21.0
Reporter: Amareshwari Sriramadasu
Assignee: Aaron Kimball
Priority: Blocker
Fix For: 0.21.0

Attachments: MAPREDUCE-1065.2.patch, MAPREDUCE-1065.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-1561) mapreduce patch tests hung with java.lang.OutOfMemoryError: Java heap space

2010-03-04 Thread Giridharan Kesavan (JIRA)

mapreduce patch tests hung with java.lang.OutOfMemoryError: Java heap space
-

 Key: MAPREDUCE-1561
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1561
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Giridharan Kesavan


http://hudson.zones.apache.org/hudson/view/Mapreduce/job/Mapreduce-Patch-h9.grid.sp2.yahoo.net/4/console

Error form the console:

 [exec] [junit] 10/03/05 04:08:29 INFO datanode.DataNode: PacketResponder 2 
for block blk_-3280111748864197295_19758 terminating
 [exec] [junit] 10/03/05 04:08:29 INFO hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:46067 is added to 
blk_-3280111748864197295_19758{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[127.0.0.1:46067|RBW], 
ReplicaUnderConstruction[127.0.0.1:37626|RBW], 
ReplicaUnderConstruction[127.0.0.1:48886|RBW]]} size 0
 [exec] [junit] 10/03/05 04:08:29 INFO hdfs.StateChange: DIR* 
NameSystem.completeFile: file 
/tmp/hadoop-hudson/mapred/system/job_20100304162726530_3751/job-info is closed 
by DFSClient_79157028
 [exec] [junit] 10/03/05 04:08:29 INFO mapred.JobTracker: Job 
job_20100304162726530_3751 added successfully for user 'hudson' to queue 
'default'
 [exec] [junit] 10/03/05 04:08:29 INFO mapred.JobTracker: Initializing 
job_20100304162726530_3751
 [exec] [junit] 10/03/05 04:08:29 INFO mapred.JobInProgress: 
Initializing job_20100304162726530_3751
 [exec] [junit] 10/03/05 04:08:29 INFO mapreduce.Job: Running job: 
job_20100304162726530_3751
 [exec] [junit] 10/03/05 04:08:29 INFO jobhistory.JobHistory: 
SetupWriter, creating file 
file:/grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h9.grid.sp2.yahoo.net/trunk/build/contrib/raid/test/logs/history/job_20100304162726530_3751_hudson
 [exec] [junit] 10/03/05 04:08:29 ERROR mapred.JobTracker: Job 
initialization failed:
 [exec] [junit] org.apache.avro.AvroRuntimeException: 
java.lang.NoSuchFieldException: _SCHEMA
 [exec] [junit] at 
org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:50)
 [exec] [junit] at 
org.apache.avro.reflect.ReflectData.getSchema(ReflectData.java:210)
 [exec] [junit] at 
org.apache.avro.specific.SpecificDatumWriter.init(SpecificDatumWriter.java:28)
 [exec] [junit] at 
org.apache.hadoop.mapreduce.jobhistory.EventWriter.init(EventWriter.java:47)
 [exec] [junit] at 
org.apache.hadoop.mapreduce.jobhistory.JobHistory.setupEventWriter(JobHistory.java:252)
 [exec] [junit] at 
org.apache.hadoop.mapred.JobInProgress.logSubmissionToJobHistory(JobInProgress.java:710)
 [exec] [junit] at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:619)
 [exec] [junit] at 
org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3256)
 [exec] [junit] at 
org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
 [exec] [junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 [exec] [junit] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 [exec] [junit] at java.lang.Thread.run(Thread.java:619)
 [exec] [junit] Caused by: java.lang.NoSuchFieldException: _SCHEMA
 [exec] [junit] at 
java.lang.Class.getDeclaredField(Class.java:1882)
 [exec] [junit] at 
org.apache.avro.specific.SpecificData.createSchema(SpecificData.java:48)
 [exec] [junit] ... 11 more
 [exec] [junit] 
 [exec] [junit] Exception in thread pool-1-thread-3 
java.lang.OutOfMemoryError: Java heap space
 [exec] [junit] at java.util.Arrays.copyOf(Arrays.java:2786)
 [exec] [junit] at 
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
 [exec] [junit] at 
java.io.PrintStream.write(PrintStream.java:430)
 [exec] [junit] at 
org.apache.tools.ant.util.TeeOutputStream.write(TeeOutputStream.java:81)
 [exec] [junit] at 
java.io.PrintStream.write(PrintStream.java:430)
 [exec] [junit] at 
sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
 [exec] [junit] at 
sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
 [exec] [junit] at 
sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:276)
 [exec] [junit] at 
sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
 [exec] [junit] at 
java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
 [exec] [junit] at

[jira] Commented: (MAPREDUCE-1556) upgrade to Avro 1.3.0

2010-03-04 Thread Giridharan Kesavan (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841723#action_12841723
 ] 

Giridharan Kesavan commented on MAPREDUCE-1556:
---

patch test on hudson is stuck for 17 hrs, I 've to kill this patch test job.
https://issues.apache.org/jira/browse/MAPREDUCE-1561

 upgrade to Avro 1.3.0
 -

 Key: MAPREDUCE-1556
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1556
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Doug Cutting
Assignee: Doug Cutting
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1556.patch


 Avro 1.3.0 has now been released.  HADOOP-6486 and HDFS-892 require it, and 
 the version of Avro used by MapReduce should be synchronized with these 
 projects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1518) On contrib/raid, the RaidNode currently runs the deletion check for parity files on directories too. It would be better if it didn't.

2010-03-04 Thread dhruba borthakur (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841729#action_12841729
 ] 

dhruba borthakur commented on MAPREDUCE-1518:
-

Code looks good. +1

 On contrib/raid, the RaidNode currently runs the deletion check for parity 
 files on directories too. It would be better if it didn't.
 -

 Key: MAPREDUCE-1518
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1518
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/raid
 Environment: On contrib/raid, the RaidNode currently runs the 
 deletion check for parity files on directories too. It runs okay because the 
 directory is not empty and trying to delete it non-recursively fails, but 
 such failure messages only polute the log file.
 My proposal is the following:
 If recursePurge is checking a directory, it should call itself recursively.
 If it's checking a file, it should do the deletion check.
Reporter: Rodrigo Schmidt
Assignee: Rodrigo Schmidt
 Attachments: MAPREDUCE-1518.0.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader