[jira] Commented: (MAPREDUCE-1084) Implementing aspects development and fault injeciton framework for MapReduce

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788590#action_12788590
 ] 

Hadoop QA commented on MAPREDUCE-1084:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12427580/mapreduce-1084-6.patch
  against trunk revision 889085.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/183/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/183/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/183/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/183/console

This message is automatically generated.

 Implementing aspects development and fault injeciton framework for MapReduce
 

 Key: MAPREDUCE-1084
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1084
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, test
Reporter: Konstantin Boudnik
Assignee: Sreekanth Ramakrishnan
 Attachments: mapreduce-1084-1-withoutsvnexternals.patch, 
 mapreduce-1084-1.patch, mapreduce-1084-2.patch, mapreduce-1084-3.patch, 
 mapreduce-1084-5.patch, mapreduce-1084-6.patch, mapreduce-1084-final.patch


 Similar to HDFS-435 and HADOOP-6204 this JIRA will track the introduction of 
 injection framework for MapReduce.
 After HADOOP-6204 is in place this particular modification should be very 
 trivial and would take importing (via svn:external) of src/test/build and 
 some tweaking of the build.xml file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1284) TestLocalizationWithLinuxTaskController fails

2009-12-10 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788594#action_12788594
 ] 

Ravi Gummadi commented on MAPREDUCE-1284:
-

The issue of not having null-terminated array that is being passed to 
fts_open() is caught now because of different code generated by gcc with the 
option -O2 added in task-controller's MakeFile.in in MAPREDUCE-1119.

 TestLocalizationWithLinuxTaskController fails
 -

 Key: MAPREDUCE-1284
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1284
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: MR-1284.patch


 With current trunk, the testcase TestLocalizationWithLinuxTaskController 
 fails with an exit code of 139 from task-controller when doing INITIALIZE_USER

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788596#action_12788596
 ] 

Hadoop QA commented on MAPREDUCE-1171:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427572/patch-1171-2.txt
  against trunk revision 889085.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/314/console

This message is automatically generated.

 Lots of fetch failures
 --

 Key: MAPREDUCE-1171
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.21.0
Reporter: Christian Kunz
Assignee: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0

 Attachments: patch-1171-1-ydist.txt, patch-1171-1.txt, 
 patch-1171-2.txt, patch-1171-ydist.txt, patch-1171.txt


 Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map 
 task failures because of 'Too many fetch-failures'.
 One of our jobs makes hardly any progress, because of 3000 reduces not able 
 to get map output of 2 trailing maps (with about 80GB output each), which 
 repeatedly are marked as failures because of reduces not being able to get 
 their map output.
 One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed 
 mapoutput fetch even after a single try when it was a read error 
 (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is 
 a good idea, as trailing map tasks will be attacked by all reduces 
 simultaneously.
 Here is a log output of a reduce task:
 {noformat}
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 attempt_200910281903_0028_r_00_0 copy failed: 
 attempt_200910281903_0028_m_002781_1 from some host
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 java.net.SocketTimeoutException: Read timed outat 
 java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
 at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
 at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task 
 attempt_200910281903_0028_r_00_0: Failed fetch #1 from 
 attempt_200910281903_0028_m_002781_1
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
 fetch map-output from attempt_200910281903_0028_m_002781_1 even after 
 MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to 
 the JobTracker.
 {noformat}
 Also I saw a few log messages which look suspicious as if successfully 
 fetched map output is 

[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures

2009-12-10 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788603#action_12788603
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1171:


The test failure, TestTrackerBlacklistAcrossJobs, is unrelated to the patch. 
The log has following ZipException :
{noformat}
2009-12-10 07:51:17,307 WARN  mapred.TaskTracker 
(TaskTracker.java:startNewTask(1887)) - Error initializing 
attempt_20091210075025802_0001_m_01_0:
java.lang.RuntimeException: java.util.zip.ZipException: ZIP_Read: error reading 
zip file
at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1600)
at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1408)
at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1352)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:574)
at 
org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1874)
at org.apache.hadoop.mapred.JobConf.init(JobConf.java:392)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:925)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:869)
at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1883)
at 
org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:109)
at 
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1848)
{noformat}

The same test passes on my machine.

 Lots of fetch failures
 --

 Key: MAPREDUCE-1171
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.21.0
Reporter: Christian Kunz
Assignee: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0

 Attachments: patch-1171-1-ydist.txt, patch-1171-1.txt, 
 patch-1171-2.txt, patch-1171-ydist.txt, patch-1171.txt


 Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map 
 task failures because of 'Too many fetch-failures'.
 One of our jobs makes hardly any progress, because of 3000 reduces not able 
 to get map output of 2 trailing maps (with about 80GB output each), which 
 repeatedly are marked as failures because of reduces not being able to get 
 their map output.
 One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed 
 mapoutput fetch even after a single try when it was a read error 
 (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is 
 a good idea, as trailing map tasks will be attacked by all reduces 
 simultaneously.
 Here is a log output of a reduce task:
 {noformat}
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 attempt_200910281903_0028_r_00_0 copy failed: 
 attempt_200910281903_0028_m_002781_1 from some host
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 java.net.SocketTimeoutException: Read timed outat 
 java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
 at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
 at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task 
 attempt_200910281903_0028_r_00_0: Failed fetch #1 from 
 attempt_200910281903_0028_m_002781_1
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
 fetch map-output from attempt_200910281903_0028_m_002781_1 even after 
 MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to 
 the JobTracker.
 {noformat}
 Also I saw a few log messages which look suspicious as if successfully 
 fetched map output is discarded because of the map being marked as failed 
 (because of too many fetch failures). This would make the situation even 
 worse.
 {noformat}
 2009-10-29 22:07:28,729 INFO 

[jira] Commented: (MAPREDUCE-1218) Collecting cpu and memory usage for TaskTrackers

2009-12-10 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788606#action_12788606
 ] 

Vinod K V commented on MAPREDUCE-1218:
--

Quickly looked at the patch. Some comments:
 - Your renaming script will not work while committing too, because other 
classes still refer to them using the old names. You should generate a patch 
after renaming the classes yourselves. Classes that need to be renamed to 
reflect the word 'Resource' are: MemoryCalculator and all its sub-classes and 
TTMemoryReporting
 - Further, MemoryCalculatorPlugin.java and the its subclasses are all public. 
So, you still should retain the old classes, deprecate them and redirect all 
the old functionality to the new Resource* classes.
 - TaskTracker.java: Creation of the plugin should be taken out of 
initializeMemoryManagement() (+3471 through +3491) into initialize() method.
 - TTConfig.java: TT_MEMORY_CALCULATOR_PLUGIN needs to be renamed too.
 - TaskTrackerStatus.java: Why isn't cpu-usage serialized and de-serialized 
directly? Calculating it at the time of de-serialization from cumulativeCpuTime 
will yield wrong results, I think.
 - LinuxMemoryCalculatorPlugin:
-- JIFFY_LENGTH calculation code is duplicated from MAPREDUCE-1201. Should 
that be a blocker for this one?
-- In the main method, I didn't understand why we should sleep for 500ms. 
What is the reason for doing this?
-- This class has a lot of parsing and calculation code. Though I could 
verify them, it will be more helpful if we can write tests validating this. For 
this, we can write dummy /proc files ourselves and call the calculator class's 
methods similar to some of the {{TestProcfsBasedProcessTree}} tests, for e.g., 
testMemoryForProcessTree(). Can you add some similar tests here too, verifying 
each of the APIs?

 Collecting cpu and memory usage for TaskTrackers
 

 Key: MAPREDUCE-1218
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1218
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 0.22.0
 Environment: linux
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1218-rename.sh, MAPREDUCE-1218-v2.patch, 
 MAPREDUCE-1218.patch


 The information can be used for resource aware scheduling.
 Note that this is related to MAPREDUCE-220. There the per task resource 
 information is collected.
 This one collects the per machine information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1009) Forrest documentation needs to be updated to describes features provided for supporting hierarchical queues

2009-12-10 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788637#action_12788637
 ] 

Hemanth Yamijala commented on MAPREDUCE-1009:
-

A few comments on this patch:

- Why is the docs target copying xml template files to build ? In case it is 
needed, shouldn't the target be overridden in capacity-scheduler's build file, 
rather than putting contrib specific operations in build.xml.
- The documentation of maximum-capacity in capacity-scheduler's 
mapred-queues.xml.template seems to have problems. E.g. it says: 
maximum-capacity-stretch instead of maximum-capacity. Mentions default value as 
100. It also talks about sub-queues, but that becomes ambiguous if this is a 
leaf queue. Same applies for Forrest documentation.
- Maybe we should specify in the conf template file and the forrest 
documentation what properties apply for container queues and what don't.
- In the example (using queues q1 and q2) can we scrub the values of properties 
to be more clear - like we have capacity set to 0, which is wrong.
- The link to conf/mapred-queues.xml.template in cluster_setup.xml seems wrong. 
It is pointing t mapred-queues.xml and not the template.
- Typo: But the usage of multiple as well as hierarchical queues in actually 
dependent... - should be ...hierarchical queues is actually dependent...
- I think it makes sense to explicitly define what hierarchical queues are 
somewhere early on when talking about queues. What they are, how they can be 
used etc. Because we refer to it at multiple places but I don't think the 
intent is explicitly coming out.
- Queue refresh is also a scheduler specific supported feature. Should we call 
that out in the cluster-setup documentation ?
- I think we need not mention that queue web UI uses YUI. It seems like an 
implementation detail. Any specific reason for mentioning this ?
- Shouldn't the Map/Reduce commands section move to the commands manual ? And 
we can possibly link them from cluster-setup. There seems to be a specific 
format we are using for describing the commands and it would be consistent to 
mention it in the same way, which we can easily do by moving to the commands 
manual.

 Forrest documentation needs to be updated to describes features provided for 
 supporting hierarchical queues
 ---

 Key: MAPREDUCE-1009
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1009
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.21.0
Reporter: Hemanth Yamijala
Assignee: Vinod K V
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-1009-20091008.txt, 
 MAPREDUCE-1009-20091116.txt, MAPREDUCE-1009-20091124.txt


 Forrest documentation must be updated for describing how to set up and use 
 hierarchical queues in the framework and the capacity scheduler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures

2009-12-10 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788651#action_12788651
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1171:


All unit test except TestHdfsProxy passed on machine, with ydist patch.

 Lots of fetch failures
 --

 Key: MAPREDUCE-1171
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.21.0
Reporter: Christian Kunz
Assignee: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0

 Attachments: patch-1171-1-ydist.txt, patch-1171-1.txt, 
 patch-1171-2.txt, patch-1171-ydist.txt, patch-1171.txt


 Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map 
 task failures because of 'Too many fetch-failures'.
 One of our jobs makes hardly any progress, because of 3000 reduces not able 
 to get map output of 2 trailing maps (with about 80GB output each), which 
 repeatedly are marked as failures because of reduces not being able to get 
 their map output.
 One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed 
 mapoutput fetch even after a single try when it was a read error 
 (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is 
 a good idea, as trailing map tasks will be attacked by all reduces 
 simultaneously.
 Here is a log output of a reduce task:
 {noformat}
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 attempt_200910281903_0028_r_00_0 copy failed: 
 attempt_200910281903_0028_m_002781_1 from some host
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 java.net.SocketTimeoutException: Read timed outat 
 java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
 at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
 at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task 
 attempt_200910281903_0028_r_00_0: Failed fetch #1 from 
 attempt_200910281903_0028_m_002781_1
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
 fetch map-output from attempt_200910281903_0028_m_002781_1 even after 
 MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to 
 the JobTracker.
 {noformat}
 Also I saw a few log messages which look suspicious as if successfully 
 fetched map output is discarded because of the map being marked as failed 
 (because of too many fetch failures). This would make the situation even 
 worse.
 {noformat}
 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: 
 attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed 
 len: 23967845
 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 
 23967845 bytes (21882555 raw bytes) into RAM from 
 attempt_200910281903_0028_m_001076_0
 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read 
 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0
 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from 
 attempt_200910281903_0028_m_001076_0 - (20, 39772) from some host
 ...
 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring 
 obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0'
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1254) job.xml should add crc check in tasktracker and sub jvm.

2009-12-10 Thread ZhuGuanyin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788657#action_12788657
 ] 

ZhuGuanyin commented on MAPREDUCE-1254:
---

I just show the example that the inexpensive disk are not reliable, the kernel 
doesn't notice the hardware failture while it has being truncated.

1)job.xml in configuration are loaded asynchronous, and if it could  corrupted 
or missing before parse it, if it does happen, the corrupted data or default 
data would load without notice(that means some task run the right 
configuration, but some would run with wrong configurations);

2)the job.xml has so many important parameters, it need check before used;

3) if it doesn't crc check, why we generate the crc checksum file?  :)

 job.xml should add crc check in tasktracker and sub jvm.
 

 Key: MAPREDUCE-1254
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1254
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Affects Versions: 0.22.0
Reporter: ZhuGuanyin

 Currently job.xml in tasktracker and subjvm are write to local disk through 
 ChecksumFilesystem, and already had crc checksum information, but load the 
 job.xml file without crc check. It would cause the mapred job finished 
 successful but with wrong data because of disk error.  Example: The 
 tasktracker and sub task jvm would load the default configuration if it 
 doesn't successfully load the job.xml which maybe replace the mapper with 
 IdentityMapper. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-744) Support in DistributedCache to share cache files with other users after HADOOP-4493

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788686#action_12788686
 ] 

Hadoop QA commented on MAPREDUCE-744:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427588/744-1.patch
  against trunk revision 889085.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/184/console

This message is automatically generated.

 Support in DistributedCache to share cache files with other users after 
 HADOOP-4493
 ---

 Key: MAPREDUCE-744
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-744
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Reporter: Vinod K V
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 744-1.patch, 744-early.patch


 HADOOP-4493 aims to completely privatize the files distributed to TT via 
 DistributedCache. This jira issues focuses on sharing some/all of these files 
 with all other users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-896) Users can set non-writable permissions on temporary files for TT and can abuse disk usage.

2009-12-10 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-896:
---

Attachment: MR-896.v2.patch

Attaching patch for trunk with review comments incorporated.

 Users can set non-writable permissions on temporary files for TT and can 
 abuse disk usage.
 --

 Key: MAPREDUCE-896
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-896
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Vinod K V
Assignee: Ravi Gummadi
 Fix For: 0.21.0

 Attachments: MR-896.patch, MR-896.v1.patch, MR-896.v2.patch, 
 y896.v1.patch, y896.v2.1.patch, y896.v2.patch


 As of now, irrespective of the TaskController in use, TT itself does a full 
 delete on local files created by itself or job tasks. This step, depending 
 upon TT's umask and the permissions set by files by the user, for e.g in 
 job-work/task-work or child.tmp directories, may or may not go through 
 successful completion fully. Thus is left an opportunity for abusing disk 
 space usage either accidentally or intentionally by TT/users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-879) TestTaskTrackerLocalization fails on MAC OS

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788725#action_12788725
 ] 

Hadoop QA commented on MAPREDUCE-879:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427324/mapreduce-879-1.patch
  against trunk revision 889085.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/315/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/315/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/315/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/315/console

This message is automatically generated.

 TestTaskTrackerLocalization fails on MAC OS
 ---

 Key: MAPREDUCE-879
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-879
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.0
 Environment: Mac OS X 10.5.7
Reporter: Devaraj Das
Assignee: Sreekanth Ramakrishnan
Priority: Blocker
 Fix For: 0.21.0

 Attachments: mapreduce-879-1.patch, 
 TEST-org.apache.hadoop.mapred.TestTaskTrackerLocalization.txt


 TestTaskTrackerLocalization failed on an 'ant test' run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1287) HashPartitioner calls hashCode() when there is only 1 reducer

2009-12-10 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788792#action_12788792
 ] 

Tom White commented on MAPREDUCE-1287:
--

What size of performance gain does this change give?

This might be better done in the framework, by using a special partitioner in 
the single reduce case. A class called, say, SinglePartitionPartitioner whose 
getPartition() method always returns 0.


 HashPartitioner calls hashCode() when there is only 1 reducer
 -

 Key: MAPREDUCE-1287
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1287
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Ed Mazur
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1287.2.patch, MAPREDUCE-1287.3.patch, 
 MAPREDUCE-1287.patch


 HashPartitioner could be optimized to not call the key's hashCode() if there 
 is only 1 reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1287) HashPartitioner calls hashCode() when there is only 1 reducer

2009-12-10 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White reassigned MAPREDUCE-1287:


Assignee: Ed Mazur

 HashPartitioner calls hashCode() when there is only 1 reducer
 -

 Key: MAPREDUCE-1287
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1287
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Ed Mazur
Assignee: Ed Mazur
Priority: Minor
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1287.2.patch, MAPREDUCE-1287.3.patch, 
 MAPREDUCE-1287.patch


 HashPartitioner could be optimized to not call the key's hashCode() if there 
 is only 1 reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1218) Collecting cpu and memory usage for TaskTrackers

2009-12-10 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788853#action_12788853
 ] 

Scott Chen commented on MAPREDUCE-1218:
---

Thanks for the review.

bq. Your renaming script will not work while committing too, because other 
classes still refer to them using the old names. You should generate a patch 
after renaming the classes yourselves. Classes that need to be renamed to 
reflect the word 'Resource' are: MemoryCalculator and all its sub-classes and 
TTMemoryReporting
I see. I will upload a patch that also renames the classes and methods. I will 
globally check 'Memory' and replace them with 'Resource' if it is necessary.
bq. Further, MemoryCalculatorPlugin.java and the its subclasses are all public. 
So, you still should retain the old classes, deprecate them and redirect all 
the old functionality to the new Resource* classes.
I see. This way I don't have to rename so many things.
{quote}
TaskTracker.java: Creation of the plugin should be taken out of 
initializeMemoryManagement() (+3471 through +3491) into initialize() method.
TTConfig.java: TT_MEMORY_CALCULATOR_PLUGIN needs to be renamed too.
{quote}
Will do the necessary renaming.
bq. TaskTrackerStatus.java: Why isn't cpu-usage serialized and de-serialized 
directly? Calculating it at the time of de-serialization from cumulativeCpuTime 
will yield wrong results, I think.
From /proc/stat, we can only get the cumulative time. So the CPU-usage has to 
be calculated by differencing. And we will always need to sample twice to be 
able to do the differencing. I do this in the receiving side because this can 
save a little bandwidth (one double). But you are right. This way the 
CPU-usage may be affected by the nonuniform delay caused by network 
transmission. I will move the calculation in the transmitter side and 
serialize it. 
{quote}
# LinuxMemoryCalculatorPlugin:

* JIFFY_LENGTH calculation code is duplicated from MAPREDUCE-1201. Should 
that be a blocker for this one?
{quote}
We should use ProcfsBasedProcessTree.JIFFY_LENGTH_IN_MILLIS here. For now I 
will keep the calculation code here just to make the code readable. After 
MAPREDUCE-1201 is committed. I will fix this part. 
{quote}
* In the main method, I didn't understand why we should sleep for 500ms. 
What is the reason for doing this?
{quote}
This is because we can only get the cumulative CPU time from /proc/stat. To 
compute the CPU usage, we need to compute the difference. So we delay a while 
and take another sample.
{quote}
* This class has a lot of parsing and calculation code. Though I could 
verify them, it will be more helpful if we can write tests validating this. For 
this, we can write dummy /proc files ourselves and call the calculator class's 
methods similar to some of the TestProcfsBasedProcessTree tests, for e.g., 
testMemoryForProcessTree(). Can you add some similar tests here too, verifying 
each of the APIs?
{quote}
I see. This kind of tests will definitely be helpful. I will work on it. Thanks.

 Collecting cpu and memory usage for TaskTrackers
 

 Key: MAPREDUCE-1218
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1218
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 0.22.0
 Environment: linux
Reporter: Scott Chen
Assignee: Scott Chen
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1218-rename.sh, MAPREDUCE-1218-v2.patch, 
 MAPREDUCE-1218.patch


 The information can be used for resource aware scheduling.
 Note that this is related to MAPREDUCE-220. There the per task resource 
 information is collected.
 This one collects the per machine information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1221) Kill tasks on a node if the free physical memory on that machine falls below a configured threshold

2009-12-10 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur reassigned MAPREDUCE-1221:
---

Assignee: Scott Chen  (was: dhruba borthakur)

 Kill tasks on a node if the free physical memory on that machine falls below 
 a configured threshold
 ---

 Key: MAPREDUCE-1221
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1221
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Reporter: dhruba borthakur
Assignee: Scott Chen

 The TaskTracker currently supports killing tasks if the virtual memory of a 
 task exceeds a set of configured thresholds. I would like to extend this 
 feature to enable killing tasks if the physical memory used by that task 
 exceeds a certain threshold.
 On a certain operating system (guess?), if user space processes start using 
 lots of memory, the machine hangs and dies quickly. This means that we would 
 like to prevent map-reduce jobs from triggering this condition. From my 
 understanding, the killing-based-on-virtual-memory-limits (HADOOP-5883) were 
 designed to address this problem. This works well when most map-reduce jobs 
 are Java jobs and have well-defined -Xmx parameters that specify the max 
 virtual memory for each task. On the other hand, if each task forks off 
 mappers/reducers written in other languages (python/php, etc), the total 
 virtual memory usage of the process-subtree varies greatly. In these cases, 
 it is better to use kill-tasks-using-physical-memory-limits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-10 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao reassigned MAPREDUCE-1277:
-

Assignee: ZhuGuanyin

 Streaming job should support other characterset in user's stderr log, not 
 only utf8
 ---

 Key: MAPREDUCE-1277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: ZhuGuanyin
Assignee: ZhuGuanyin
 Fix For: 0.21.0

 Attachments: streaming-1277.patch


 Current implementation in streaming  only support utf8 encoded user stderr 
 log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-10 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1270#action_1270
 ] 

Zheng Shao commented on MAPREDUCE-1277:
---

I think a better way to do this is to add an encoding property to JobConf so 
that we can encode and decode the data correctly.
That also allows us to do codec changes if needed.

Does that make sense?

 Streaming job should support other characterset in user's stderr log, not 
 only utf8
 ---

 Key: MAPREDUCE-1277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: ZhuGuanyin
Assignee: ZhuGuanyin
 Fix For: 0.21.0

 Attachments: streaming-1277.patch


 Current implementation in streaming  only support utf8 encoded user stderr 
 log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-12-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Status: Open  (was: Patch Available)

 TaskTracker does not need to fully unjar job jars
 -

 Key: MAPREDUCE-967
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
 mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt


 In practice we have seen some users submitting job jars that consist of 
 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
 up after them has a significant cost (both in wall clock and in unnecessary 
 heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-12-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Attachment: mapreduce-967.txt

Fixed patch - the previous one worked with a slightly old version of the common 
patch it depended on. Apologies for not re-running the tests locally before 
submitting to Hudson yesterday.

Ran the tests on this fixed patch overnight and looked good here - will 
resubmit to Hudson to confirm.

 TaskTracker does not need to fully unjar job jars
 -

 Key: MAPREDUCE-967
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
 mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, 
 mapreduce-967.txt


 In practice we have seen some users submitting job jars that consist of 
 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
 up after them has a significant cost (both in wall clock and in unnecessary 
 heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-12-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Status: Patch Available  (was: Open)

 TaskTracker does not need to fully unjar job jars
 -

 Key: MAPREDUCE-967
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
 mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, 
 mapreduce-967.txt


 In practice we have seen some users submitting job jars that consist of 
 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
 up after them has a significant cost (both in wall clock and in unnecessary 
 heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-12-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Attachment: mapreduce-967.txt

Ahh, forgot --no-prefix arg on the diff. Sorry for the spam.

 TaskTracker does not need to fully unjar job jars
 -

 Key: MAPREDUCE-967
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
 mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, 
 mapreduce-967.txt


 In practice we have seen some users submitting job jars that consist of 
 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
 up after them has a significant cost (both in wall clock and in unnecessary 
 heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1254) job.xml should add crc check in tasktracker and sub jvm.

2009-12-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788912#action_12788912
 ] 

Todd Lipcon commented on MAPREDUCE-1254:


bq. if it does happen, the corrupted data or default data would load without 
notice

This seems like a bug on its own (or a bug waiting to happen)

I'm not against the CRC (I think it's a good idea) but we should also fail a 
job if job.xml fails to parse as valid XML, I think.

 job.xml should add crc check in tasktracker and sub jvm.
 

 Key: MAPREDUCE-1254
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1254
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: task, tasktracker
Affects Versions: 0.22.0
Reporter: ZhuGuanyin

 Currently job.xml in tasktracker and subjvm are write to local disk through 
 ChecksumFilesystem, and already had crc checksum information, but load the 
 job.xml file without crc check. It would cause the mapred job finished 
 successful but with wrong data because of disk error.  Example: The 
 tasktracker and sub task jvm would load the default configuration if it 
 doesn't successfully load the job.xml which maybe replace the mapper with 
 IdentityMapper. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI

2009-12-10 Thread Devaraj Das (JIRA)
DistributedCache localizes only once per cache URI
--

 Key: MAPREDUCE-1288
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.21.0


As part of the file localization the distributed cache localizer creates a copy 
of the file in the corresponding user's private directory. The localization in 
DistributedCache assumes the key as the URI of the cachefile and if it already 
exists in the map, the localization is not done again. This means that another 
user cannot access the same distributed cache file. We should change the key to 
include the username so that localization is done for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously

2009-12-10 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1213:
--

Attachment: MAPREDUCE-1213.1.patch

This patch fixes the problem by moving the file first and removing it later 
asynchronously using a thread pool per volume.


 TaskTrackers restart is very slow because it deletes distributed cache 
 directory synchronously
 --

 Key: MAPREDUCE-1213
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: dhruba borthakur
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1213.1.patch


 We are seeing that when we restart a tasktracker, it tries to recursively 
 delete all the file in the distributed cache. It invoked 
 FileUtil.fullyDelete() which is very very slow. This means that the 
 TaskTracker cannot join the cluster for an extended period of time (upto 2 
 hours for us). The problem is acute if the number of files in a distributed 
 cache is a few-thousands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously

2009-12-10 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1213:
--

Status: Patch Available  (was: Open)

 TaskTrackers restart is very slow because it deletes distributed cache 
 directory synchronously
 --

 Key: MAPREDUCE-1213
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: dhruba borthakur
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1213.1.patch


 We are seeing that when we restart a tasktracker, it tries to recursively 
 delete all the file in the distributed cache. It invoked 
 FileUtil.fullyDelete() which is very very slow. This means that the 
 TaskTracker cannot join the cluster for an extended period of time (upto 2 
 hours for us). The problem is acute if the number of files in a distributed 
 cache is a few-thousands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously

2009-12-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12788995#action_12788995
 ] 

Todd Lipcon commented on MAPREDUCE-1213:


Given that this almost duplicates the async disk service from the DN, could we 
move it into o.a.h.io in Common? The duplication seems a bit ugly since this is 
a nontrivial class.

 TaskTrackers restart is very slow because it deletes distributed cache 
 directory synchronously
 --

 Key: MAPREDUCE-1213
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: dhruba borthakur
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1213.1.patch


 We are seeing that when we restart a tasktracker, it tries to recursively 
 delete all the file in the distributed cache. It invoked 
 FileUtil.fullyDelete() which is very very slow. This means that the 
 TaskTracker cannot join the cluster for an extended period of time (upto 2 
 hours for us). The problem is acute if the number of files in a distributed 
 cache is a few-thousands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-744) Support in DistributedCache to share cache files with other users after HADOOP-4493

2009-12-10 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-744:
--

Status: Open  (was: Patch Available)

 Support in DistributedCache to share cache files with other users after 
 HADOOP-4493
 ---

 Key: MAPREDUCE-744
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-744
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Reporter: Vinod K V
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 744-1.patch, 744-early.patch


 HADOOP-4493 aims to completely privatize the files distributed to TT via 
 DistributedCache. This jira issues focuses on sharing some/all of these files 
 with all other users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-744) Support in DistributedCache to share cache files with other users after HADOOP-4493

2009-12-10 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-744:
--

Status: Patch Available  (was: Open)

Retrying hudson

 Support in DistributedCache to share cache files with other users after 
 HADOOP-4493
 ---

 Key: MAPREDUCE-744
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-744
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Reporter: Vinod K V
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 744-1.patch, 744-2.patch, 744-early.patch


 HADOOP-4493 aims to completely privatize the files distributed to TT via 
 DistributedCache. This jira issues focuses on sharing some/all of these files 
 with all other users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789031#action_12789031
 ] 

Hadoop QA commented on MAPREDUCE-967:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427634/mapreduce-967.txt
  against trunk revision 889085.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/316/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/316/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/316/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/316/console

This message is automatically generated.

 TaskTracker does not need to fully unjar job jars
 -

 Key: MAPREDUCE-967
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
 mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, 
 mapreduce-967.txt


 In practice we have seen some users submitting job jars that consist of 
 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
 up after them has a significant cost (both in wall clock and in unnecessary 
 heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1222) [Mumak] We should not include nodes with numeric ips in cluster topology.

2009-12-10 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789032#action_12789032
 ] 

Hong Tang commented on MAPREDUCE-1222:
--

Ok, I did a bit research, and (with some help from Hairong) found out that the 
numeric ip string is obtained by NN when a DN registers itself with NN through 
o.a.h.ipc.Server.getRemoteIAddress(), which in turn calls 
InetAddress.getHostAddress() to get the string representation of the ip 
address. For an Inet6Address, the format would always be 8 hexadecimal numbers 
(in the range from 0 to ) separated by : (each number may be represented 
by 1 to 4 hexadecimal characters).

So for this jira, I'd like to just have a simple regex to recognize this format 
instead of arbitrary ipv6 representations.





 [Mumak] We should not include nodes with numeric ips in cluster topology.
 -

 Key: MAPREDUCE-1222
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1222
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/mumak
Affects Versions: 0.21.0, 0.22.0
Reporter: Hong Tang
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: IPv6-predicate.patch, mapreduce-1222-20091119.patch, 
 mapreduce-1222-20091121.patch


 Rumen infers cluster topology by parsing input split locations from job 
 history logs. Due to HDFS-778, a cluster node may appear both as a numeric ip 
 or as a host name in job history logs. We should exclude nodes appeared as 
 numeric ips in cluster toplogy when we run mumak until a solution is found so 
 that numeric ips would never appear in input split locations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-12-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Status: Open  (was: Patch Available)

The failing streaming tests don't fail on my local box. I think it might be 
MAPREDUCE-1275 at fault here... since this patch does affect classpaths and 
streaming, I'm toggling patch status to give hudson a second go.

 TaskTracker does not need to fully unjar job jars
 -

 Key: MAPREDUCE-967
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
 mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, 
 mapreduce-967.txt


 In practice we have seen some users submitting job jars that consist of 
 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
 up after them has a significant cost (both in wall clock and in unnecessary 
 heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-12-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated MAPREDUCE-967:
--

Status: Patch Available  (was: Open)

 TaskTracker does not need to fully unjar job jars
 -

 Key: MAPREDUCE-967
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
 mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, 
 mapreduce-967.txt


 In practice we have seen some users submitting job jars that consist of 
 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
 up after them has a significant cost (both in wall clock and in unnecessary 
 heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously

2009-12-10 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789042#action_12789042
 ] 

Zheng Shao commented on MAPREDUCE-1213:
---

That's the plan but I suppose I cannot add that to common and at the same time 
use it in mapreduce?
I will open one jira in common and get that committed, while this is also 
committed.

Then we can clean things up by letting both hdfs and mapreduce use the class in 
common.

Is that good?


 TaskTrackers restart is very slow because it deletes distributed cache 
 directory synchronously
 --

 Key: MAPREDUCE-1213
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: dhruba borthakur
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1213.1.patch


 We are seeing that when we restart a tasktracker, it tries to recursively 
 delete all the file in the distributed cache. It invoked 
 FileUtil.fullyDelete() which is very very slow. This means that the 
 TaskTracker cannot join the cluster for an extended period of time (upto 2 
 hours for us). The problem is acute if the number of files in a distributed 
 cache is a few-thousands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-12-10 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789044#action_12789044
 ] 

Aaron Kimball commented on MAPREDUCE-1026:
--

I am finding a NullPointerException in Shuffle when I run things with the 
LocalJobRunner:

{code}
09/12/10 16:08:58 WARN mapred.LocalJobRunner: job_local_0001
java.lang.NullPointerException
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:108)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:358)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:299)
{code}

{{reduceTask.getJobTokens()}} is returning null; I can't see anyplace in 
LocalJobRunner where the JobTokens object is being initialized. I think this 
patch is to blame?

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, 
 MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, 
 MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, 
 MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures

2009-12-10 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789049#action_12789049
 ] 

dhruba borthakur commented on MAPREDUCE-1171:
-

hi amareshwari 7 jothi: can you pl advice is you plan to check this patch into 
the 0.20 release? or to the yahoodist 0.20 release?

 Lots of fetch failures
 --

 Key: MAPREDUCE-1171
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.21.0
Reporter: Christian Kunz
Assignee: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0

 Attachments: patch-1171-1-ydist.txt, patch-1171-1.txt, 
 patch-1171-2.txt, patch-1171-ydist.txt, patch-1171.txt


 Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map 
 task failures because of 'Too many fetch-failures'.
 One of our jobs makes hardly any progress, because of 3000 reduces not able 
 to get map output of 2 trailing maps (with about 80GB output each), which 
 repeatedly are marked as failures because of reduces not being able to get 
 their map output.
 One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed 
 mapoutput fetch even after a single try when it was a read error 
 (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is 
 a good idea, as trailing map tasks will be attacked by all reduces 
 simultaneously.
 Here is a log output of a reduce task:
 {noformat}
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 attempt_200910281903_0028_r_00_0 copy failed: 
 attempt_200910281903_0028_m_002781_1 from some host
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 java.net.SocketTimeoutException: Read timed outat 
 java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
 at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
 at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task 
 attempt_200910281903_0028_r_00_0: Failed fetch #1 from 
 attempt_200910281903_0028_m_002781_1
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
 fetch map-output from attempt_200910281903_0028_m_002781_1 even after 
 MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to 
 the JobTracker.
 {noformat}
 Also I saw a few log messages which look suspicious as if successfully 
 fetched map output is discarded because of the map being marked as failed 
 (because of too many fetch failures). This would make the situation even 
 worse.
 {noformat}
 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: 
 attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed 
 len: 23967845
 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 
 23967845 bytes (21882555 raw bytes) into RAM from 
 attempt_200910281903_0028_m_001076_0
 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read 
 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0
 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from 
 attempt_200910281903_0028_m_001076_0 - (20, 39772) from some host
 ...
 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring 
 obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0'
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously

2009-12-10 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-1213:
--

Attachment: MAPREDUCE-1213.2.patch

Fixed the comments and reorganized the class.

 TaskTrackers restart is very slow because it deletes distributed cache 
 directory synchronously
 --

 Key: MAPREDUCE-1213
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: dhruba borthakur
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch


 We are seeing that when we restart a tasktracker, it tries to recursively 
 delete all the file in the distributed cache. It invoked 
 FileUtil.fullyDelete() which is very very slow. This means that the 
 TaskTracker cannot join the cluster for an extended period of time (upto 2 
 hours for us). The problem is acute if the number of files in a distributed 
 cache is a few-thousands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1285) DistCp cannot handle -delete if destination is local filesystem

2009-12-10 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-1285:
--

   Resolution: Fixed
Fix Version/s: 0.22.0
   Status: Resolved  (was: Patch Available)

I have committed this.  Thanks, Peter!

 DistCp cannot handle -delete if destination is local filesystem
 ---

 Key: MAPREDUCE-1285
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1285
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: distcp
Affects Versions: 0.20.1
Reporter: Peter Romianowski
Assignee: Peter Romianowski
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1285-trunk.patch, MAPREDUCE-1285.patch


 The following exception is thrown:
 {code}
 Copy failed: java.io.IOException: wrong value class: 
 org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus is not class 
 org.apache.hadoop.fs.FileStatus
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:988)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
   at org.apache.hadoop.tools.DistCp.deleteNonexisting(DistCp.java:1226)
   at org.apache.hadoop.tools.DistCp.setup(DistCp.java:1134)
   at org.apache.hadoop.tools.DistCp.copy(DistCp.java:650)
   at org.apache.hadoop.tools.DistCp.run(DistCp.java:857)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789072#action_12789072
 ] 

Hadoop QA commented on MAPREDUCE-1213:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12427654/MAPREDUCE-1213.1.patch
  against trunk revision 889085.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 160 release audit warnings 
(more than the trunk's current 159 warnings).

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/185/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/185/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/185/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/185/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/185/console

This message is automatically generated.

 TaskTrackers restart is very slow because it deletes distributed cache 
 directory synchronously
 --

 Key: MAPREDUCE-1213
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: dhruba borthakur
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch


 We are seeing that when we restart a tasktracker, it tries to recursively 
 delete all the file in the distributed cache. It invoked 
 FileUtil.fullyDelete() which is very very slow. This means that the 
 TaskTracker cannot join the cluster for an extended period of time (upto 2 
 hours for us). The problem is acute if the number of files in a distributed 
 cache is a few-thousands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1213) TaskTrackers restart is very slow because it deletes distributed cache directory synchronously

2009-12-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789073#action_12789073
 ] 

Todd Lipcon commented on MAPREDUCE-1213:


bq. That's the plan but I suppose I cannot add that to common and at the same 
time use it in mapreduce?

Right. I think you should open a new JIRA in common, and link this jira to be 
Blocked by it. Once that jira is committed, you can use this jira to use it in 
mapreduce, and file another JIRA to switch over HDFS to use it. Sound good?

 TaskTrackers restart is very slow because it deletes distributed cache 
 directory synchronously
 --

 Key: MAPREDUCE-1213
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1213
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: dhruba borthakur
Assignee: Zheng Shao
 Attachments: MAPREDUCE-1213.1.patch, MAPREDUCE-1213.2.patch


 We are seeing that when we restart a tasktracker, it tries to recursively 
 delete all the file in the distributed cache. It invoked 
 FileUtil.fullyDelete() which is very very slow. This means that the 
 TaskTracker cannot join the cluster for an extended period of time (upto 2 
 hours for us). The problem is acute if the number of files in a distributed 
 cache is a few-thousands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1277) Streaming job should support other characterset in user's stderr log, not only utf8

2009-12-10 Thread ZhuGuanyin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789081#action_12789081
 ] 

ZhuGuanyin commented on MAPREDUCE-1277:
---

I think the framework should not care what the characterset of the input and 
user log, may be the input or output has more than one characterset.

what hadoop need to do is read raw data for user mapper or reducer, collect raw 
stdout and stderr data and save them on hdfs or tasktracker local disk.

raw in, raw out, no matter what characterset it is.

 Streaming job should support other characterset in user's stderr log, not 
 only utf8
 ---

 Key: MAPREDUCE-1277
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1277
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: ZhuGuanyin
Assignee: ZhuGuanyin
 Fix For: 0.21.0

 Attachments: streaming-1277.patch


 Current implementation in streaming  only support utf8 encoded user stderr 
 log, it should encode free to support other characterset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1026) Shuffle should be secure

2009-12-10 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789083#action_12789083
 ] 

Devaraj Das commented on MAPREDUCE-1026:


I don't think so. In the local mode, shuffle shouldn't be invoked at all...

 Shuffle should be secure
 

 Key: MAPREDUCE-1026
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1026
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: security
Reporter: Owen O'Malley
Assignee: Boris Shkolnik
 Fix For: 0.22.0

 Attachments: MAPREDUCE-1026-1.patch, MAPREDUCE-1026-12.patch, 
 MAPREDUCE-1026-13.patch, MAPREDUCE-1026-14.patch, MAPREDUCE-1026-15.patch, 
 MAPREDUCE-1026-2.patch, MAPREDUCE-1026-3.patch, MAPREDUCE-1026-7.patch, 
 MAPREDUCE-1026-9.patch, MAPREDUCE-1026.patch, MAPREDUCE-1026.patch


 Since the user's data is available via http from the TaskTrackers, we should 
 require a job-specific secret to access it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-744) Support in DistributedCache to share cache files with other users after HADOOP-4493

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789085#action_12789085
 ] 

Hadoop QA commented on MAPREDUCE-744:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427656/744-2.patch
  against trunk revision 889085.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/317/console

This message is automatically generated.

 Support in DistributedCache to share cache files with other users after 
 HADOOP-4493
 ---

 Key: MAPREDUCE-744
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-744
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Reporter: Vinod K V
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 744-1.patch, 744-2.patch, 744-early.patch


 HADOOP-4493 aims to completely privatize the files distributed to TT via 
 DistributedCache. This jira issues focuses on sharing some/all of these files 
 with all other users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI

2009-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789095#action_12789095
 ] 

Arun C Murthy commented on MAPREDUCE-1288:
--

I'm assuming this is done only for 'private' cache files? i.e. public cache 
files should probably use the 'username' of the TT itself?

 DistributedCache localizes only once per cache URI
 --

 Key: MAPREDUCE-1288
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.21.0


 As part of the file localization the distributed cache localizer creates a 
 copy of the file in the corresponding user's private directory. The 
 localization in DistributedCache assumes the key as the URI of the cachefile 
 and if it already exists in the map, the localization is not done again. This 
 means that another user cannot access the same distributed cache file. We 
 should change the key to include the username so that localization is done 
 for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1171) Lots of fetch failures

2009-12-10 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789099#action_12789099
 ] 

Arun C Murthy commented on MAPREDUCE-1171:
--

Dhruba - currently the plan is to put this into 21 and y20. 

 Lots of fetch failures
 --

 Key: MAPREDUCE-1171
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.21.0
Reporter: Christian Kunz
Assignee: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0

 Attachments: patch-1171-1-ydist.txt, patch-1171-1.txt, 
 patch-1171-2.txt, patch-1171-ydist.txt, patch-1171.txt


 Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map 
 task failures because of 'Too many fetch-failures'.
 One of our jobs makes hardly any progress, because of 3000 reduces not able 
 to get map output of 2 trailing maps (with about 80GB output each), which 
 repeatedly are marked as failures because of reduces not being able to get 
 their map output.
 One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed 
 mapoutput fetch even after a single try when it was a read error 
 (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is 
 a good idea, as trailing map tasks will be attacked by all reduces 
 simultaneously.
 Here is a log output of a reduce task:
 {noformat}
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 attempt_200910281903_0028_r_00_0 copy failed: 
 attempt_200910281903_0028_m_002781_1 from some host
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 java.net.SocketTimeoutException: Read timed outat 
 java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
 at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
 at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task 
 attempt_200910281903_0028_r_00_0: Failed fetch #1 from 
 attempt_200910281903_0028_m_002781_1
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
 fetch map-output from attempt_200910281903_0028_m_002781_1 even after 
 MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to 
 the JobTracker.
 {noformat}
 Also I saw a few log messages which look suspicious as if successfully 
 fetched map output is discarded because of the map being marked as failed 
 (because of too many fetch failures). This would make the situation even 
 worse.
 {noformat}
 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: 
 attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed 
 len: 23967845
 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 
 23967845 bytes (21882555 raw bytes) into RAM from 
 attempt_200910281903_0028_m_001076_0
 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read 
 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0
 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from 
 attempt_200910281903_0028_m_001076_0 - (20, 39772) from some host
 ...
 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring 
 obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0'
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1084) Implementing aspects development and fault injeciton framework for MapReduce

2009-12-10 Thread Sreekanth Ramakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789102#action_12789102
 ] 

Sreekanth Ramakrishnan commented on MAPREDUCE-1084:
---

The test case failures for this test seems to be caused by MAPREDUCE-1275

Locally there were no test case failures of any sort.

 Implementing aspects development and fault injeciton framework for MapReduce
 

 Key: MAPREDUCE-1084
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1084
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, test
Reporter: Konstantin Boudnik
Assignee: Sreekanth Ramakrishnan
 Attachments: mapreduce-1084-1-withoutsvnexternals.patch, 
 mapreduce-1084-1.patch, mapreduce-1084-2.patch, mapreduce-1084-3.patch, 
 mapreduce-1084-5.patch, mapreduce-1084-6.patch, mapreduce-1084-final.patch


 Similar to HDFS-435 and HADOOP-6204 this JIRA will track the introduction of 
 injection framework for MapReduce.
 After HADOOP-6204 is in place this particular modification should be very 
 trivial and would take importing (via svn:external) of src/test/build and 
 some tweaking of the build.xml file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1251) c++ utils doesn't compile

2009-12-10 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-1251:
---

Status: Open  (was: Patch Available)

 c++ utils doesn't compile
 -

 Key: MAPREDUCE-1251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: ubuntu karmic 64-bit
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch


 c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for 
 HADOOP-5611 needs to be applied first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1251) c++ utils doesn't compile

2009-12-10 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-1251:
---

Status: Patch Available  (was: Open)

 c++ utils doesn't compile
 -

 Key: MAPREDUCE-1251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: ubuntu karmic 64-bit
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch


 c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for 
 HADOOP-5611 needs to be applied first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1171) Lots of fetch failures

2009-12-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-1171:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I've just committed this. Thanks, Amareshwari!

 Lots of fetch failures
 --

 Key: MAPREDUCE-1171
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1171
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.21.0
Reporter: Christian Kunz
Assignee: Amareshwari Sriramadasu
Priority: Blocker
 Fix For: 0.21.0

 Attachments: patch-1171-1-ydist.txt, patch-1171-1.txt, 
 patch-1171-2.txt, patch-1171-ydist.txt, patch-1171.txt


 Since we upgraded to hadoop-0.20.1  from hadoop0.18.3, we see lot of more map 
 task failures because of 'Too many fetch-failures'.
 One of our jobs makes hardly any progress, because of 3000 reduces not able 
 to get map output of 2 trailing maps (with about 80GB output each), which 
 repeatedly are marked as failures because of reduces not being able to get 
 their map output.
 One difference to hadoop-0.18.3 seems to be that reduce tasks report a failed 
 mapoutput fetch even after a single try when it was a read error 
 (cr.getError().equals(CopyOutputErrorType.READ_ERROR). I do not think this is 
 a good idea, as trailing map tasks will be attacked by all reduces 
 simultaneously.
 Here is a log output of a reduce task:
 {noformat}
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 attempt_200910281903_0028_r_00_0 copy failed: 
 attempt_200910281903_0028_m_002781_1 from some host
 2009-10-29 21:38:36,148 WARN org.apache.hadoop.mapred.ReduceTask: 
 java.net.SocketTimeoutException: Read timed outat 
 java.net.SocketInputStream.socketRead0(Native Method)
 at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
 at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
 at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
 at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1496)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1377)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1289)
 at 
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1220)
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Task 
 attempt_200910281903_0028_r_00_0: Failed fetch #1 from 
 attempt_200910281903_0028_m_002781_1
 2009-10-29 21:38:36,149 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
 fetch map-output from attempt_200910281903_0028_m_002781_1 even after 
 MAX_FETCH_RETRIES_PER_MAP retries...  or it is a read error,  reporting to 
 the JobTracker.
 {noformat}
 Also I saw a few log messages which look suspicious as if successfully 
 fetched map output is discarded because of the map being marked as failed 
 (because of too many fetch failures). This would make the situation even 
 worse.
 {noformat}
 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: header: 
 attempt_200910281903_0028_m_001076_0, compressed len: 21882555, decompressed 
 len: 23967845
 2009-10-29 22:07:28,729 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 
 23967845 bytes (21882555 raw bytes) into RAM from 
 attempt_200910281903_0028_m_001076_0
 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Read 
 23967845 bytes from map-output for attempt_200910281903_0028_m_001076_0
 2009-10-29 22:07:43,602 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from 
 attempt_200910281903_0028_m_001076_0 - (20, 39772) from some host
 ...
 2009-10-29 22:10:07,220 INFO org.apache.hadoop.mapred.ReduceTask: Ignoring 
 obsolete output of FAILED map-task: 'attempt_200910281903_0028_m_001076_0'
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-12-10 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Patch Available  (was: Open)

trying to trigger hudson to check v3

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-12-10 Thread BitsOfInfo (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BitsOfInfo updated MAPREDUCE-1176:
--

Status: Open  (was: Patch Available)

trying to trigger hudson to check v3

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-967) TaskTracker does not need to fully unjar job jars

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789107#action_12789107
 ] 

Hadoop QA commented on MAPREDUCE-967:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427634/mapreduce-967.txt
  against trunk revision 889486.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/186/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/186/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/186/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/186/console

This message is automatically generated.

 TaskTracker does not need to fully unjar job jars
 -

 Key: MAPREDUCE-967
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-967
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.21.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-967-branch-0.20.txt, mapreduce-967.txt, 
 mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, mapreduce-967.txt, 
 mapreduce-967.txt


 In practice we have seen some users submitting job jars that consist of 
 10,000+ classes. Unpacking these jars into mapred.local.dir and then cleaning 
 up after them has a significant cost (both in wall clock and in unnecessary 
 heavy disk utilization). This cost can be easily avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI

2009-12-10 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789122#action_12789122
 ] 

Vinod K V commented on MAPREDUCE-1288:
--

+1 for putting the username also as part of the key.

 DistributedCache localizes only once per cache URI
 --

 Key: MAPREDUCE-1288
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.21.0


 As part of the file localization the distributed cache localizer creates a 
 copy of the file in the corresponding user's private directory. The 
 localization in DistributedCache assumes the key as the URI of the cachefile 
 and if it already exists in the map, the localization is not done again. This 
 means that another user cannot access the same distributed cache file. We 
 should change the key to include the username so that localization is done 
 for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI

2009-12-10 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789128#action_12789128
 ] 

Hemanth Yamijala commented on MAPREDUCE-1288:
-

I suppose one could argue that if two different users can access the same set 
of files on the DFS for localization, they are 'public'. But then, you could 
theoretically construct a use case where there's a 'group' access for some 
files on DFS and these are localized per user on the task tracker. Is that what 
we're trying to address ?

 DistributedCache localizes only once per cache URI
 --

 Key: MAPREDUCE-1288
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.21.0


 As part of the file localization the distributed cache localizer creates a 
 copy of the file in the corresponding user's private directory. The 
 localization in DistributedCache assumes the key as the URI of the cachefile 
 and if it already exists in the map, the localization is not done again. This 
 means that another user cannot access the same distributed cache file. We 
 should change the key to include the username so that localization is done 
 for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1084) Implementing aspects development and fault injeciton framework for MapReduce

2009-12-10 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789130#action_12789130
 ] 

Konstantin Boudnik commented on MAPREDUCE-1084:
---

Great. Looks like it is good to go. I'll commit it tomorrow morning if there no 
one else around to do this earlier :-)

 Implementing aspects development and fault injeciton framework for MapReduce
 

 Key: MAPREDUCE-1084
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1084
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: build, test
Reporter: Konstantin Boudnik
Assignee: Sreekanth Ramakrishnan
 Attachments: mapreduce-1084-1-withoutsvnexternals.patch, 
 mapreduce-1084-1.patch, mapreduce-1084-2.patch, mapreduce-1084-3.patch, 
 mapreduce-1084-5.patch, mapreduce-1084-6.patch, mapreduce-1084-final.patch


 Similar to HDFS-435 and HADOOP-6204 this JIRA will track the introduction of 
 injection framework for MapReduce.
 After HADOOP-6204 is in place this particular modification should be very 
 trivial and would take importing (via svn:external) of src/test/build and 
 some tweaking of the build.xml file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI

2009-12-10 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789134#action_12789134
 ] 

Devaraj Das commented on MAPREDUCE-1288:


Look at this scenario - the URI is hdfs://host:port/foo/bar/file.txt. Even 
if the entire path were accessible to everyone, the TaskTracker would localize 
it exactly once, and in a user's private directory. A second job wishing to 
access the same file wouldn't be able to do so since the TT wouldn't localize 
it again.. Does that make sense ?

 DistributedCache localizes only once per cache URI
 --

 Key: MAPREDUCE-1288
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.21.0


 As part of the file localization the distributed cache localizer creates a 
 copy of the file in the corresponding user's private directory. The 
 localization in DistributedCache assumes the key as the URI of the cachefile 
 and if it already exists in the map, the localization is not done again. This 
 means that another user cannot access the same distributed cache file. We 
 should change the key to include the username so that localization is done 
 for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1222) [Mumak] We should not include nodes with numeric ips in cluster topology.

2009-12-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1222:
-

Attachment: mapreduce-1222-20091210.patch

New patch that excludes *well-formatted* ipv4 and ipv6 addresses. For ipv4 
addresses, optimally allow each period preceded with double backslashes.

 [Mumak] We should not include nodes with numeric ips in cluster topology.
 -

 Key: MAPREDUCE-1222
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1222
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/mumak
Affects Versions: 0.21.0, 0.22.0
Reporter: Hong Tang
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: IPv6-predicate.patch, mapreduce-1222-20091119.patch, 
 mapreduce-1222-20091121.patch, mapreduce-1222-20091210.patch


 Rumen infers cluster topology by parsing input split locations from job 
 history logs. Due to HDFS-778, a cluster node may appear both as a numeric ip 
 or as a host name in job history logs. We should exclude nodes appeared as 
 numeric ips in cluster toplogy when we run mumak until a solution is found so 
 that numeric ips would never appear in input split locations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1222) [Mumak] We should not include nodes with numeric ips in cluster topology.

2009-12-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1222:
-

Status: Open  (was: Patch Available)

 [Mumak] We should not include nodes with numeric ips in cluster topology.
 -

 Key: MAPREDUCE-1222
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1222
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/mumak
Affects Versions: 0.21.0, 0.22.0
Reporter: Hong Tang
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: IPv6-predicate.patch, mapreduce-1222-20091119.patch, 
 mapreduce-1222-20091121.patch, mapreduce-1222-20091210.patch


 Rumen infers cluster topology by parsing input split locations from job 
 history logs. Due to HDFS-778, a cluster node may appear both as a numeric ip 
 or as a host name in job history logs. We should exclude nodes appeared as 
 numeric ips in cluster toplogy when we run mumak until a solution is found so 
 that numeric ips would never appear in input split locations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1222) [Mumak] We should not include nodes with numeric ips in cluster topology.

2009-12-10 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1222:
-

Status: Patch Available  (was: Open)

 [Mumak] We should not include nodes with numeric ips in cluster topology.
 -

 Key: MAPREDUCE-1222
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1222
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/mumak
Affects Versions: 0.21.0, 0.22.0
Reporter: Hong Tang
Assignee: Hong Tang
 Fix For: 0.21.0, 0.22.0

 Attachments: IPv6-predicate.patch, mapreduce-1222-20091119.patch, 
 mapreduce-1222-20091121.patch, mapreduce-1222-20091210.patch


 Rumen infers cluster topology by parsing input split locations from job 
 history logs. Due to HDFS-778, a cluster node may appear both as a numeric ip 
 or as a host name in job history logs. We should exclude nodes appeared as 
 numeric ips in cluster toplogy when we run mumak until a solution is found so 
 that numeric ips would never appear in input split locations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI

2009-12-10 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789145#action_12789145
 ] 

Hemanth Yamijala commented on MAPREDUCE-1288:
-

bq. Even if the entire path were accessible to everyone,

If the entire path were accessible to everyone on DFS, there's really no great 
security for that file. I was just trying to point out that such a case may not 
even be valid in the context of how MAPREDUCE-856 was approached (i.e we wanted 
to secure localized files for users). But I am concurring that one could 
theoretically construct a case where the URI was accessible to a group of users 
on DFS and since there's no way to securely localize that per group on the TT, 
this bug is still valid.

 DistributedCache localizes only once per cache URI
 --

 Key: MAPREDUCE-1288
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.21.0


 As part of the file localization the distributed cache localizer creates a 
 copy of the file in the corresponding user's private directory. The 
 localization in DistributedCache assumes the key as the URI of the cachefile 
 and if it already exists in the map, the localization is not done again. This 
 means that another user cannot access the same distributed cache file. We 
 should change the key to include the username so that localization is done 
 for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1251) c++ utils doesn't compile

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789151#action_12789151
 ] 

Hadoop QA commented on MAPREDUCE-1251:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12426461/MR-1251.patch
  against trunk revision 889496.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/318/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/318/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/318/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/318/console

This message is automatically generated.

 c++ utils doesn't compile
 -

 Key: MAPREDUCE-1251
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1251
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
 Environment: ubuntu karmic 64-bit
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: HDFS-790-1.patch, HDFS-790.patch, MR-1251.patch


 c++ utils doesn't compile on ubuntu karmic 64-bit. The latest patch for 
 HADOOP-5611 needs to be applied first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI

2009-12-10 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789153#action_12789153
 ] 

Devaraj Das commented on MAPREDUCE-1288:


All I am saying is that irrespective of the file being public or not, in the 
current codebase, we localize the file exactly once per TaskTracker. On a given 
tasktracker, users cannot share the same hdfs file as a distributed cache 
file.. 
What I thought earlier was that the same file would be localized twice in such 
a case (in their respective private directories).

 DistributedCache localizes only once per cache URI
 --

 Key: MAPREDUCE-1288
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.21.0


 As part of the file localization the distributed cache localizer creates a 
 copy of the file in the corresponding user's private directory. The 
 localization in DistributedCache assumes the key as the URI of the cachefile 
 and if it already exists in the map, the localization is not done again. This 
 means that another user cannot access the same distributed cache file. We 
 should change the key to include the username so that localization is done 
 for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1289) TrackerDistributedCacheManagerWithLinuxTaskController fails

2009-12-10 Thread Ravi Gummadi (JIRA)
TrackerDistributedCacheManagerWithLinuxTaskController fails
---

 Key: MAPREDUCE-1289
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1289
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ravi Gummadi


TrackerDistributedCacheManagerWithLinuxTaskController fails with 
INITIALIZE_DISTRIBUTED_CACHE failing in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1289) TrackerDistributedCacheManagerWithLinuxTaskController fails

2009-12-10 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789155#action_12789155
 ] 

Ravi Gummadi commented on MAPREDUCE-1289:
-

log messages have:


WARN  mapred.LinuxTaskController (LinuxTaskController.java:runCommand(192)) - 
Exit code from INITIALIZE_DISTRIBUTEDCACHE is : 139
2009-12-11 12:21:52,346 WARN  mapred.LinuxTaskController 
(LinuxTaskController.java:runCommand(194)) - Exception thrown by 
INITIALIZE_DISTRIBUTEDCACHE : org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:243)
at org.apache.hadoop.util.Shell.run(Shell.java:170)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:363)
at 
org.apache.hadoop.mapred.LinuxTaskController.runCommand(LinuxTaskController.java:190)
at 
org.apache.hadoop.mapred.LinuxTaskController.initializeDistributedCache(LinuxTaskController.java:416)
at 
org.apache.hadoop.mapred.ClusterWithLinuxTaskController$MyLinuxTaskController.initializeDistributedCache(ClusterWithLinuxTaskController.java:68)
at 
org.apache.hadoop.mapreduce.filecache.TestTrackerDistributedCacheManager.testManagerFlow(TestTrackerDistributedCacheManager.java:156)

 TrackerDistributedCacheManagerWithLinuxTaskController fails
 ---

 Key: MAPREDUCE-1289
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1289
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ravi Gummadi

 TrackerDistributedCacheManagerWithLinuxTaskController fails with 
 INITIALIZE_DISTRIBUTED_CACHE failing in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1284) TestLocalizationWithLinuxTaskController fails

2009-12-10 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789157#action_12789157
 ] 

Ravi Gummadi commented on MAPREDUCE-1284:
-

Unit tests passed on my local machine. Only failures are 
TestGridmixSubmission(MAPREDUCE-1124) and 
TestTrackerDistributedCacheManagerWithLinuxTaskController(MAPREDUCE-1289).

 TestLocalizationWithLinuxTaskController fails
 -

 Key: MAPREDUCE-1284
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1284
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Ravi Gummadi
Assignee: Ravi Gummadi
 Fix For: 0.22.0

 Attachments: MR-1284.patch


 With current trunk, the testcase TestLocalizationWithLinuxTaskController 
 fails with an exit code of 139 from task-controller when doing INITIALIZE_USER

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1176) Contribution: FixedLengthInputFormat and FixedLengthRecordReader

2009-12-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789159#action_12789159
 ] 

Hadoop QA commented on MAPREDUCE-1176:
--

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12426619/MAPREDUCE-1176-v3.patch
  against trunk revision 889496.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/187/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/187/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/187/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/187/console

This message is automatically generated.

 Contribution: FixedLengthInputFormat and FixedLengthRecordReader
 

 Key: MAPREDUCE-1176
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1176
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 0.20.1, 0.20.2
 Environment: Any
Reporter: BitsOfInfo
Priority: Minor
 Attachments: MAPREDUCE-1176-v1.patch, MAPREDUCE-1176-v2.patch, 
 MAPREDUCE-1176-v3.patch


 Hello,
 I would like to contribute the following two classes for incorporation into 
 the mapreduce.lib.input package. These two classes can be used when you need 
 to read data from files containing fixed length (fixed width) records. Such 
 files have no CR/LF (or any combination thereof), no delimiters etc, but each 
 record is a fixed length, and extra data is padded with spaces. The data is 
 one gigantic line within a file.
 Provided are two classes first is the FixedLengthInputFormat and its 
 corresponding FixedLengthRecordReader. When creating a job that specifies 
 this input format, the job must have the 
 mapreduce.input.fixedlengthinputformat.record.length property set as follows
 myJobConf.setInt(mapreduce.input.fixedlengthinputformat.record.length,[myFixedRecordLength]);
 OR
 myJobConf.setInt(FixedLengthInputFormat.FIXED_RECORD_LENGTH, 
 [myFixedRecordLength]);
 This input format overrides computeSplitSize() in order to ensure that 
 InputSplits do not contain any partial records since with fixed records there 
 is no way to determine where a record begins if that were to occur. Each 
 InputSplit passed to the FixedLengthRecordReader will start at the beginning 
 of a record, and the last byte in the InputSplit will be the last byte of a 
 record. The override of computeSplitSize() delegates to FileInputFormat's 
 compute method, and then adjusts the returned split size by doing the 
 following: (Math.floor(fileInputFormatsComputedSplitSize / fixedRecordLength) 
 * fixedRecordLength)
 This suite of fixed length input format classes, does not support compressed 
 files. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1289) TestTrackerDistributedCacheManagerWithLinuxTaskController fails

2009-12-10 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1289:


Description: TestTrackerDistributedCacheManagerWithLinuxTaskController 
fails with INITIALIZE_DISTRIBUTED_CACHE failing in trunk.  (was: 
TrackerDistributedCacheManagerWithLinuxTaskController fails with 
INITIALIZE_DISTRIBUTED_CACHE failing in trunk.)
Summary: TestTrackerDistributedCacheManagerWithLinuxTaskController 
fails  (was: TrackerDistributedCacheManagerWithLinuxTaskController fails)

 TestTrackerDistributedCacheManagerWithLinuxTaskController fails
 ---

 Key: MAPREDUCE-1289
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1289
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ravi Gummadi

 TestTrackerDistributedCacheManagerWithLinuxTaskController fails with 
 INITIALIZE_DISTRIBUTED_CACHE failing in trunk.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI

2009-12-10 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789160#action_12789160
 ] 

Hemanth Yamijala commented on MAPREDUCE-1288:
-

Just to be clear, I am *not* disagreeing at all that there's a bug. Or with the 
assessment that this is a blocker. +1 on both. *smile*

 DistributedCache localizes only once per cache URI
 --

 Key: MAPREDUCE-1288
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security, tasktracker
Affects Versions: 0.21.0
Reporter: Devaraj Das
Priority: Blocker
 Fix For: 0.21.0


 As part of the file localization the distributed cache localizer creates a 
 copy of the file in the corresponding user's private directory. The 
 localization in DistributedCache assumes the key as the URI of the cachefile 
 and if it already exists in the map, the localization is not done again. This 
 means that another user cannot access the same distributed cache file. We 
 should change the key to include the username so that localization is done 
 for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.