date:20090907


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-943:
--

Issue Type: Sub-task  (was: Bug)
Parent: MAPREDUCE-873

 TestNodeRefresh timesout occasionally
 -

 Key: MAPREDUCE-943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-943
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobtracker
Reporter: Amareshwari Sriramadasu
Assignee: Amar Kamat
 Fix For: 0.21.0

 Attachments: MAPRED-943-v1.0.patch


 TestNodeRefresh timesout occasionally.
 One of the hudson patch build with timeout 
 @http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/26/testReport/org.apache.hadoop.mapred/TestNodeRefresh/testMRExcludeHostsAcrossRestarts/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAPREDUCE-943) TestNodeRefresh timesout occasionally


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das resolved MAPREDUCE-943.
---

Resolution: Fixed

I just committed this. Thanks, Amar!

 TestNodeRefresh timesout occasionally
 -

 Key: MAPREDUCE-943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-943
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobtracker
Reporter: Amareshwari Sriramadasu
Assignee: Amar Kamat
 Fix For: 0.21.0

 Attachments: MAPRED-943-v1.0.patch


 TestNodeRefresh timesout occasionally.
 One of the hudson patch build with timeout 
 @http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/26/testReport/org.apache.hadoop.mapred/TestNodeRefresh/testMRExcludeHostsAcrossRestarts/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-957) Set mapred.job.name for a pipes job

2009-09-07 Thread Ramya R (JIRA)

Set mapred.job.name for a pipes job
---

 Key: MAPREDUCE-957
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-957
 Project: Hadoop Map/Reduce
  Issue Type: Wish
  Components: pipes
Affects Versions: 0.20.1
Reporter: Ramya R
Priority: Minor


Currently mapred.job.name is not set for a pipes job. It will be useful if this 
value is set when a pipes job is submitted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-943) TestNodeRefresh timesout occasionally


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752019#action_12752019
 ] 

Devaraj Das commented on MAPREDUCE-943:
---

Should have added that I also agree that the testcase which times out is no 
longer needed.

 TestNodeRefresh timesout occasionally
 -

 Key: MAPREDUCE-943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-943
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobtracker
Reporter: Amareshwari Sriramadasu
Assignee: Amar Kamat
 Fix For: 0.21.0

 Attachments: MAPRED-943-v1.0.patch


 TestNodeRefresh timesout occasionally.
 One of the hudson patch build with timeout 
 @http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/26/testReport/org.apache.hadoop.mapred/TestNodeRefresh/testMRExcludeHostsAcrossRestarts/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.

[
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

rahul k singh updated MAPREDUCE-861:

Attachment: MAPREDUCE-861-4.patch

Incorporated all the comments except

1.In DeprecatedHierarchyBuilder we are still not checking if ACLs are disabled
before parsing them. Note though that this is being done for the
QueueHierarchyBuilder.
Lots of testcases esp. in TestQueueManager are written with an assumption that
MapString, AccessControlList list is created for the Queue object all the
time.
esp in case of setting mapred.acls.enabled = true using conf.set . There are
lots of NullPointerException if we dont generate this empty object. Hence not
accommodating this comment , as it is a significant change in testcase and
moreover for deprecated stuff and having this does empty
MapString,AccessControlList doesn't effect the overall behaviour at all.

Modify queue configuration format and parsing to support a hierarchy of
queues.
---

Key: MAPREDUCE-861
URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Reporter: Hemanth Yamijala
Assignee: rahul k singh
Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch,
MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch

MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce
framework. This JIRA is for defining changes to the configuration related to
queues.
The current format for defining a queue and its properties is as follows:
mapred.queue.queue-name.property-name. For e.g.
mapred.queue.queue-name.acl-submit-job. The reason for using this verbose
format was to be able to reuse the Configuration parser in Hadoop. However,
administrators currently using the queue configuration have already indicated
a very strong desire for a more manageable format. Since, this becomes more
unwieldy with hierarchical queues, the time may be good to introduce a new
format for representing queue configuration.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-943) TestNodeRefresh timesout occasionally


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752020#action_12752020
 ] 

Hudson commented on MAPREDUCE-943:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #18 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/18/])
. Removes a testcase in TestNodeRefresh that doesn't make sense in the new 
Job recovery model. Contributed by Amar Kamat.


 TestNodeRefresh timesout occasionally
 -

 Key: MAPREDUCE-943
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-943
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobtracker
Reporter: Amareshwari Sriramadasu
Assignee: Amar Kamat
 Fix For: 0.21.0

 Attachments: MAPRED-943-v1.0.patch


 TestNodeRefresh timesout occasionally.
 One of the hudson patch build with timeout 
 @http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/26/testReport/org.apache.hadoop.mapred/TestNodeRefresh/testMRExcludeHostsAcrossRestarts/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752021#action_12752021
 ] 

rahul k singh commented on MAPREDUCE-861:
-

error in the above patch , attaching new one.

 Modify queue configuration format and parsing to support a hierarchy of 
 queues.
 ---

 Key: MAPREDUCE-861
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Hemanth Yamijala
Assignee: rahul k singh
 Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, 
 MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch


 MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
 framework. This JIRA is for defining changes to the configuration related to 
 queues. 
 The current format for defining a queue and its properties is as follows: 
 mapred.queue.queue-name.property-name. For e.g. 
 mapred.queue.queue-name.acl-submit-job. The reason for using this verbose 
 format was to be able to reuse the Configuration parser in Hadoop. However, 
 administrators currently using the queue configuration have already indicated 
 a very strong desire for a more manageable format. Since, this becomes more 
 unwieldy with hierarchical queues, the time may be good to introduce a new 
 format for representing queue configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-861:


Attachment: MAPREDUCE-861-5.patch

 Modify queue configuration format and parsing to support a hierarchy of 
 queues.
 ---

 Key: MAPREDUCE-861
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Hemanth Yamijala
Assignee: rahul k singh
 Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, 
 MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch


 MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
 framework. This JIRA is for defining changes to the configuration related to 
 queues. 
 The current format for defining a queue and its properties is as follows: 
 mapred.queue.queue-name.property-name. For e.g. 
 mapred.queue.queue-name.acl-submit-job. The reason for using this verbose 
 format was to be able to reuse the Configuration parser in Hadoop. However, 
 administrators currently using the queue configuration have already indicated 
 a very strong desire for a more manageable format. Since, this becomes more 
 unwieldy with hierarchical queues, the time may be good to introduce a new 
 format for representing queue configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-861:


Status: Patch Available  (was: Open)

 Modify queue configuration format and parsing to support a hierarchy of 
 queues.
 ---

 Key: MAPREDUCE-861
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Hemanth Yamijala
Assignee: rahul k singh
 Attachments: MAPREDUCE-861-1.patch, MAPREDUCE-861-2.patch, 
 MAPREDUCE-861-3.patch, MAPREDUCE-861-4.patch, MAPREDUCE-861-5.patch


 MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
 framework. This JIRA is for defining changes to the configuration related to 
 queues. 
 The current format for defining a queue and its properties is as follows: 
 mapred.queue.queue-name.property-name. For e.g. 
 mapred.queue.queue-name.acl-submit-job. The reason for using this verbose 
 format was to be able to reuse the Configuration parser in Hadoop. However, 
 administrators currently using the queue configuration have already indicated 
 a very strong desire for a more manageable format. Since, this becomes more 
 unwieldy with hierarchical queues, the time may be good to introduce a new 
 format for representing queue configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-856) Localized files from DistributedCache should have right access-control

2009-09-07 Thread Hemanth Yamijala (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752023#action_12752023
 ] 

Hemanth Yamijala commented on MAPREDUCE-856:


I verified the changes. Only comment is that in the changes related to 
synchronization of user localization, we are repeating work related to a user 
everytime job localization happens. A suggestion is to keep the synchronization 
on user name, but have the value to be a state variable that can indicate the 
status of localization and check that before beginning to localize.

 Localized files from DistributedCache should have right access-control
 --

 Key: MAPREDUCE-856
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-856
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Reporter: Arun C Murthy
Assignee: Vinod K V
 Fix For: 0.21.0

 Attachments: MAPREDUCE-856-20090820.txt, MAPREDUCE-856-20090821.txt, 
 MAPREDUCE-856-20090825.3.txt, MAPREDUCE-856-20090827.txt, 
 MAPREDUCE-856-20090903.txt, MAPREDUCE-856-20090904.1.txt, 
 MAPREDUCE-856-20090904.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAPREDUCE-860) Modify Queue APIs to support a hierarchy of queues


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh resolved MAPREDUCE-860.
-

Resolution: Duplicate

 Modify Queue APIs to support a hierarchy of queues
 --

 Key: MAPREDUCE-860
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-860
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobtracker
Reporter: Hemanth Yamijala
Assignee: rahul k singh

 MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
 framework. This JIRA is for defining changes to the APIs related to queues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-860) Modify Queue APIs to support a hierarchy of queues


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752025#action_12752025
 ] 

rahul k singh commented on MAPREDUCE-860:
-

This issue is being resolved as part of MAPREDUCE-861. Hence closing this as 
duplicate.

 Modify Queue APIs to support a hierarchy of queues
 --

 Key: MAPREDUCE-860
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-860
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: jobtracker
Reporter: Hemanth Yamijala
Assignee: rahul k singh

 MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
 framework. This JIRA is for defining changes to the APIs related to queues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-157) Job History log file format is not friendly for external tools.

2009-09-07 Thread Jothi Padmanabhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated MAPREDUCE-157:


Status: Patch Available  (was: Open)

 Job History log file format is not friendly for external tools.
 ---

 Key: MAPREDUCE-157
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 0.20.1
Reporter: Owen O'Malley
Assignee: Jothi Padmanabhan
 Fix For: 0.21.0

 Attachments: mapred-157-4Sep.patch, mapred-157-7Sep.patch, 
 mapred-157-prelim.patch, MAPREDUCE-157-avro.patch


 Currently, parsing the job history logs with external tools is very difficult 
 because of the format. The most critical problem is that newlines aren't 
 escaped in the strings. That makes using tools like grep, sed, and awk very 
 tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-924) TestPipes must not directly invoke 'main' of pipes as an exit from main could cause the testcase to crash.

2009-09-07 Thread Hemanth Yamijala (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hemanth Yamijala updated MAPREDUCE-924:
---

Description:
TestPipes invokes the main method of the program running pipes. In
MAPREDUCE-421, a change was made to the Pipes command runner to invoke
System.exit after completion. This itself is a valid change because the pipes
command runner is in itself a user facing program. But when combined with a
testcase, it causes the testcase to crash rather than providing feedback on
whether the test ran correctly or not.

The testcase should be modified to use Tool instead of running main directly.

was:
TestPipes crashes on trunk due to MAPREDUCE-421.
Testcase should be modified to use Tool insteadof running main directly.

Summary: TestPipes must not directly invoke 'main' of pipes as an exit
from main could cause the testcase to crash. (was: TestPipes crashes on trunk)

TestPipes must not directly invoke 'main' of pipes as an exit from main could
cause the testcase to crash.
--

Key: MAPREDUCE-924
URL: https://issues.apache.org/jira/browse/MAPREDUCE-924
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: pipes
Affects Versions: 0.20.1
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
Fix For: 0.20.1

Attachments: patch-924-0.20.txt, patch-924.txt

TestPipes invokes the main method of the program running pipes. In
MAPREDUCE-421, a change was made to the Pipes command runner to invoke
System.exit after completion. This itself is a valid change because the pipes
command runner is in itself a user facing program. But when combined with a
testcase, it causes the testcase to crash rather than providing feedback on
whether the test ran correctly or not.
The testcase should be modified to use Tool instead of running main directly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-956) Shuffle should be broken down to only two phases (copy/reduce) instead of three (copy/sort/reduce)

2009-09-07 Thread Ravi Gummadi (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752048#action_12752048
 ] 

Ravi Gummadi commented on MAPREDUCE-956:


We could say the phases as Shuffle phase and Reduce phase. But we need to 
investigate how we want to update progress in shuffle phase --- because 
updating progress of shuffle phase just based on 'copy of map outputs' would 
not be a correct way as there could be some merges that would take some time 
after all map outputs are copied to this reduce node(even though some merges 
happen while some map outputs are being copied).

 Shuffle should be broken down to only two phases (copy/reduce) instead of 
 three (copy/sort/reduce)
 --

 Key: MAPREDUCE-956
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-956
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.21.0
Reporter: Jothi Padmanabhan

 For the progress calculations and displaying on the UI, shuffle, in its 
 current form,  is decomposed into three phases (copy/sort/reduce). Actually, 
 the sort phase is no longer applicable. I think we should just reduce the 
 number of phases to two and assign 50% weight-age to each of copy and reduce 
 phases. Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-856) Localized files from DistributedCache should have right access-control

2009-09-07 Thread Vinod K V (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-856:


Status: Patch Available  (was: Open)

 Localized files from DistributedCache should have right access-control
 --

 Key: MAPREDUCE-856
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-856
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Reporter: Arun C Murthy
Assignee: Vinod K V
 Fix For: 0.21.0

 Attachments: MAPREDUCE-856-20090820.txt, MAPREDUCE-856-20090821.txt, 
 MAPREDUCE-856-20090825.3.txt, MAPREDUCE-856-20090827.txt, 
 MAPREDUCE-856-20090903.txt, MAPREDUCE-856-20090904.1.txt, 
 MAPREDUCE-856-20090904.txt, MAPREDUCE-856-20090907.1.txt, 
 MAPREDUCE-856-20090907.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-856) Localized files from DistributedCache should have right access-control

2009-09-07 Thread Vinod K V (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod K V updated MAPREDUCE-856:


Attachment: MAPREDUCE-856-20090907.1.txt

Updated patch fixing the test failures reported by Hudson.

 Localized files from DistributedCache should have right access-control
 --

 Key: MAPREDUCE-856
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-856
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Reporter: Arun C Murthy
Assignee: Vinod K V
 Fix For: 0.21.0

 Attachments: MAPREDUCE-856-20090820.txt, MAPREDUCE-856-20090821.txt, 
 MAPREDUCE-856-20090825.3.txt, MAPREDUCE-856-20090827.txt, 
 MAPREDUCE-856-20090903.txt, MAPREDUCE-856-20090904.1.txt, 
 MAPREDUCE-856-20090904.txt, MAPREDUCE-856-20090907.1.txt, 
 MAPREDUCE-856-20090907.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-841) Protect Job Tracker against memory exhaustion due to very large InputSplit or JobConf objects


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752074#action_12752074
 ] 

Devaraj Das commented on MAPREDUCE-841:
---

BTW for the splits part, MAPREDUCE-181 (http://tinyurl.com/legzp9) is 
introducing some changes.

 Protect Job Tracker against memory exhaustion due to very large InputSplit or 
 JobConf objects
 -

 Key: MAPREDUCE-841
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-841
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.20.1
Reporter: Hong Tang
 Fix For: 0.21.0


 JobTracker only needs to examine a subset of information contained by 
 InputSplit or JobConf objects. But currently JobTracker loads the complete 
 user-defined InputSplit and JobConf objects in memory. This design would 
 leave JobTracker susceptible to memory exhaustion particularly in cases when 
 some bugs in user code which could result in very large input splits or job 
 conf objects (e.g. PIG-901).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-876) Sqoop import of large tables can time out

2009-09-07 Thread Tom White (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-876:


   Resolution: Fixed
Fix Version/s: 0.21.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

+1

I've just committed this. Thanks Aaron!

 Sqoop import of large tables can time out
 -

 Key: MAPREDUCE-876
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-876
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.21.0

 Attachments: MAPREDUCE-876.2.patch, MAPREDUCE-876.patch


 Related to MAPREDUCE-875, Sqoop should use a background thread to ensure that 
 progress is being reported while a database does external work for the 
 MapReduce task.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-918) Test hsqldb server should be memory-only.

2009-09-07 Thread Tom White (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-918:


   Resolution: Fixed
Fix Version/s: 0.21.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

+1

I've just committed this. Thanks Aaron!

 Test hsqldb server should be memory-only.
 -

 Key: MAPREDUCE-918
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-918
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.21.0

 Attachments: MAPREDUCE-918.patch


 Sqoop launches a standalone hsqldb server for unit tests, but it currently 
 writes its database to disk and uses a connect string of {{//localhost}}. If 
 multiple test instances are running concurrently, one test server may serve 
 to the other instance of the unit tests, causing race conditions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-944) Extend FairShare scheduler to fair-share memory usage in the cluster

2009-09-07 Thread Vinod K V (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752105#action_12752105
]

Vinod K V commented on MAPREDUCE-944:
-

I see in the patch attached that only one concrete implementation
CapBasedLoadManager is done for the LoadManager which in turn doesn't take into
account any resource usage. I guess you are planning a proper implementation
for this feature regarding fair-share of memory usage in another JIRA.

Some points still not dealt with in this JIRA. I bring about these points so as
to know if you are thinking or have already thought anything about this.
- Job configuration about how users specify the resource usage. Some memory
related configuration properties are added to the framework while working for
memory monitoring on TTs as well as memory usage based scheduling in
CapacityTaskScheduler. You may want to reuse some/all of it.
- Capturing the scheduling decisions involved when we are not able to find a
task from a Schedulable because of lack of resources on a given TaskTasker.

Regarding the latter, the current patch just returns null, which is similar to
the decision CapacityTaskScheduler used to take in previous versions - i.e.
block the TT till it can be given a task from the job at the head of the
queue/pool. Sometime back, we investigated how this approach works with
FairScheduler and realized some important implications. For e.g, because the
order of jobs might change significantly in consecutive iterations of
FairScheduler, just returning null may not work at all. Eventually we may end
up waiting for a long time if significant number of jobs ask for high amount of
resources.

Thoughts?

Extend FairShare scheduler to fair-share memory usage in the cluster

Key: MAPREDUCE-944
URL: https://issues.apache.org/jira/browse/MAPREDUCE-944
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/fair-share
Reporter: dhruba borthakur
Attachments: LoadManager.txt

The FairShare Scheduler has an extensible LoadManager API to regulate
allocating new tasks on a particular TaskTracker. In similar lines, it would
be nice if the FairShare Scheduler can have a pluggable policy to regulate
new tasks from a particular job. This will allow one to skip scheduling tasks
of a job that is eating a large percentage of memory in the cluster, i.e.
fair-share of memory resources among jobs.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-181) Secure job submission


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752117#action_12752117
 ] 

Devaraj Das commented on MAPREDUCE-181:
---

For now, let's keep it simple - don't implement the points to do with 
maintaining/cleaning-up jobID-userName mappings. This should be looked at, in 
a bigger picture, once we have the authentication implemented. Also, rather 
than time-based expiry I think it would be better to have limits on number of 
queued jobs per user and the max queued jobs overall.

 Secure job submission 
 --

 Key: MAPREDUCE-181
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-181
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: hadoop-3578-branch-20-example-2.patch, 
 hadoop-3578-branch-20-example.patch, HADOOP-3578-v2.6.patch, 
 HADOOP-3578-v2.7.patch, MAPRED-181-v3.8.patch


 Currently the jobclient accesses the {{mapred.system.dir}} to add job 
 details. Hence the {{mapred.system.dir}} has the permissions of 
 {{rwx-wx-wx}}. This could be a security loophole where the job files might 
 get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-898) Change DistributedCache to use new api.


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752122#action_12752122
 ] 

Hudson commented on MAPREDUCE-898:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #19 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/19/])
. Changes DistributedCache to use the new API. Contributed by Amareshwari 
Sriramadasu.


 Change DistributedCache to use new api.
 ---

 Key: MAPREDUCE-898
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-898
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-898-1.txt, patch-898-2.txt, patch-898-3.txt, 
 patch-898-4.txt, patch-898.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-918) Test hsqldb server should be memory-only.


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752123#action_12752123
 ] 

Hudson commented on MAPREDUCE-918:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #19 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/19/])
. Test hsqldb server should be memory-only. Contributed by Aaron Kimball.


 Test hsqldb server should be memory-only.
 -

 Key: MAPREDUCE-918
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-918
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.21.0

 Attachments: MAPREDUCE-918.patch


 Sqoop launches a standalone hsqldb server for unit tests, but it currently 
 writes its database to disk and uses a connect string of {{//localhost}}. If 
 multiple test instances are running concurrently, one test server may serve 
 to the other instance of the unit tests, causing race conditions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-764) TypedBytesInput's readRaw() does not preserve custom type codes


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752182#action_12752182
 ] 

Hudson commented on MAPREDUCE-764:
--

Integrated in Hadoop-Mapreduce-trunk-Commit #20 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/20/])
. TypedBytesInput's readRaw() does not preserve custom type codes. 
Contributed by Klaas Bosteels.


 TypedBytesInput's readRaw() does not preserve custom type codes
 ---

 Key: MAPREDUCE-764
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-764
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/streaming
Affects Versions: 0.21.0
Reporter: Klaas Bosteels
Assignee: Klaas Bosteels
Priority: Blocker
 Fix For: 0.21.0

 Attachments: MAPREDUCE-764.patch, MAPREDUCE-764.patch


 The typed bytes format supports byte sequences of the form {{custom type 
 code length bytes}}. When reading such a sequence via 
 {{TypedBytesInput}}'s {{readRaw()}} method, however, the returned sequence 
 currently is {{0 length bytes}} (0 is the type code for a bytes array), 
 which leads to bugs such as the one described 
 [here|http://dumbo.assembla.com/spaces/dumbo/tickets/54].

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-157) Job History log file format is not friendly for external tools.

2009-09-07 Thread Jothi Padmanabhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated MAPREDUCE-157:


Status: Open  (was: Patch Available)

 Job History log file format is not friendly for external tools.
 ---

 Key: MAPREDUCE-157
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 0.20.1
Reporter: Owen O'Malley
Assignee: Jothi Padmanabhan
 Fix For: 0.21.0

 Attachments: mapred-157-4Sep.patch, mapred-157-7Sep.patch, 
 mapred-157-prelim.patch, MAPREDUCE-157-avro.patch


 Currently, parsing the job history logs with external tools is very difficult 
 because of the format. The most critical problem is that newlines aren't 
 escaped in the strings. That makes using tools like grep, sed, and awk very 
 tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-157) Job History log file format is not friendly for external tools.

2009-09-07 Thread Jothi Padmanabhan (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated MAPREDUCE-157:


Attachment: mapred-157-7Sep-v1.patch

Now, sqoop's ivy.xml needs to be updated too!

 Job History log file format is not friendly for external tools.
 ---

 Key: MAPREDUCE-157
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Affects Versions: 0.20.1
Reporter: Owen O'Malley
Assignee: Jothi Padmanabhan
 Fix For: 0.21.0

 Attachments: mapred-157-4Sep.patch, mapred-157-7Sep-v1.patch, 
 mapred-157-7Sep.patch, mapred-157-prelim.patch, MAPREDUCE-157-avro.patch


 Currently, parsing the job history logs with external tools is very difficult 
 because of the format. The most critical problem is that newlines aren't 
 escaped in the strings. That makes using tools like grep, sed, and awk very 
 tricky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-936) Allow a load difference in fairshare scheduler

[
https://issues.apache.org/jira/browse/MAPREDUCE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752230#action_12752230
]

Hudson commented on MAPREDUCE-936:
--

Integrated in Hadoop-Mapreduce-trunk #75 (See
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/])

Allow a load difference in fairshare scheduler
--

Key: MAPREDUCE-936
URL: https://issues.apache.org/jira/browse/MAPREDUCE-936
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: contrib/fair-share
Affects Versions: 0.20.1, 0.21.0, 0.22.0
Reporter: Zheng Shao
Assignee: Zheng Shao
Fix For: 0.21.0

Attachments: MAPREDUCE-936.1.patch, MAPREDUCE-936.2.patch

The problem we are facing: It takes a long time for all tasks of a job to get
scheduled on the cluster, even if the cluster is almost empty.
There are two reasons that together lead to this situation:
1. The load factor makes sure each TT runs the same number of tasks. (This is
the part that this patch tries to change).
2. The scheduler tries to schedule map tasks locally (first node-local, then
rack-local). There is a wait time (mapred.fairscheduler.localitywait.node and
mapred.fairscheduler.localitywait.rack, both are around 10 sec in our conf),
and accumulated wait time (JobInfo.localityWait). The accumulated wait time
is reset to 0 whenever a non-local map task is scheduled. That means it takes
N * wait_time to schedule N non-local map tasks.
Because of 1, a lot of TT will not be able to take more tasks, even if they
have free slots. As a result, a lot of the map tasks cannot be scheduled
locally.
Because of 2, it's really hard to schedule a non-local task.
As a result, sometimes we are seeing that it takes more than 2 minutes to
schedule all the mappers of a job.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752231#action_12752231
 ] 

Hudson commented on MAPREDUCE-370:
--

Integrated in Hadoop-Mapreduce-trunk #75 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/])


 Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
 ---

 Key: MAPREDUCE-370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-370-1.txt, patch-370-2.txt, patch-370-3.txt, 
 patch-370-4.txt, patch-370-5.txt, patch-370.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-372) Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752232#action_12752232
 ] 

Hudson commented on MAPREDUCE-372:
--

Integrated in Hadoop-Mapreduce-trunk #75 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/])


 Change org.apache.hadoop.mapred.lib.ChainMapper/Reducer to use new api.
 ---

 Key: MAPREDUCE-372
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-372
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-372-1.txt, patch-372.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-903) Adding AVRO jar to eclipse classpath


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752233#action_12752233
 ] 

Hudson commented on MAPREDUCE-903:
--

Integrated in Hadoop-Mapreduce-trunk #75 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/])


 Adding AVRO jar to eclipse classpath
 

 Key: MAPREDUCE-903
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-903
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Philip Zeyliger
Assignee: Philip Zeyliger
 Fix For: 0.21.0

 Attachments: MAPREDUCE-903.patch


 Avro is missing from the eclipse classpath, which caused Eclipse to whine.  
 Easy fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-318) Refactor reduce shuffle code


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752234#action_12752234
 ] 

Hudson commented on MAPREDUCE-318:
--

Integrated in Hadoop-Mapreduce-trunk #75 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/75/])


 Refactor reduce shuffle code
 

 Key: MAPREDUCE-318
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-318
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Fix For: 0.21.0

 Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, 
 mapred-318-14Aug.patch, mapred-318-20Aug.patch, mapred-318-24Aug.patch, 
 mapred-318-3Sep-v1.patch, mapred-318-3Sep.patch, mapred-318-common.patch


 The reduce shuffle code has become very complex and entangled. I think we 
 should move it out of ReduceTask and into a separate package 
 (org.apache.hadoop.mapred.task.reduce). Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-945) Test programs support only default queue.

[
https://issues.apache.org/jira/browse/MAPREDUCE-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752252#action_12752252
]

Hadoop QA commented on MAPREDUCE-945:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12418797/mapreduce-945-2.patch
against trunk revision 812209.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/12/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/12/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/12/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/12/console

This message is automatically generated.

Test programs support only default queue.
-

Key: MAPREDUCE-945
URL: https://issues.apache.org/jira/browse/MAPREDUCE-945
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: test
Reporter: Suman Sehgal
Attachments: mapreduce-945-1.patch, mapreduce-945-2.patch

None of the test program seems to be supporting queue's concept. These
programs looks for the default queue only even if some other queue is
specified to run these programs.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.

[
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752275#action_12752275
]

Hadoop QA commented on MAPREDUCE-861:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12418777/MAPREDUCE-861-5.patch
against trunk revision 812002.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 40 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated 1 warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 4 new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/42/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/42/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/42/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/42/console

This message is automatically generated.

Modify queue configuration format and parsing to support a hierarchy of
queues.
---

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-157) Job History log file format is not friendly for external tools.

[
https://issues.apache.org/jira/browse/MAPREDUCE-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752278#action_12752278
]

Hadoop QA commented on MAPREDUCE-157:
-

+1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12418824/mapred-157-7Sep-v1.patch
against trunk revision 812209.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 30 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/13/console

This message is automatically generated.

Job History log file format is not friendly for external tools.
---

Key: MAPREDUCE-157
URL: https://issues.apache.org/jira/browse/MAPREDUCE-157
Project: Hadoop Map/Reduce
Issue Type: Sub-task
Affects Versions: 0.20.1
Reporter: Owen O'Malley
Assignee: Jothi Padmanabhan
Fix For: 0.21.0

Attachments: mapred-157-4Sep.patch, mapred-157-7Sep-v1.patch,
mapred-157-7Sep.patch, mapred-157-prelim.patch, MAPREDUCE-157-avro.patch

Currently, parsing the job history logs with external tools is very difficult
because of the format. The most critical problem is that newlines aren't
escaped in the strings. That makes using tools like grep, sed, and awk very
tricky.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-959) JobConf::setWorkingDirectory requires that the default FileSystem is reachable

JobConf::setWorkingDirectory requires that the default FileSystem is reachable
--

 Key: MAPREDUCE-959
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-959
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, test
Reporter: Chris Douglas
Priority: Minor


If mapred.working.dir is not set, JobConf::setWorkingDirectory will attempt to 
obtain the default working directory for the default FileSystem. In trunk at 
least, if the default fs is HDFS and not reachable, set will fail:
{noformat}
java.net.UnknownHostException: unknown host: notahost
java.lang.RuntimeException: java.net.UnknownHostException: unknown host: 
notahost
  at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:541)
  at org.apache.hadoop.mapred.JobConf.setWorkingDirectory(JobConf.java:522)
  at org.apache.hadoop.conf.TestJobConf.testSetWorkingDir(TestJobConf.java:67)
Caused by: java.net.UnknownHostException: unknown host: notahost
  at org.apache.hadoop.ipc.Client$Connection.init(Client.java:216)
  at org.apache.hadoop.ipc.Client.getConnection(Client.java:876)
  at org.apache.hadoop.ipc.Client.call(Client.java:746)
  at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:223)
  at $Proxy4.getProtocolVersion(Unknown Source)
  at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:366)
  at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:169)
  at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:276)
  at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:235)
  at 
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:83)
  at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1430)
  at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:69)
  at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1458)
  at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1446)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:190)
  at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:98)
  at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:537)
{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAPREDUCE-960) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader

Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader


 Key: MAPREDUCE-960
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-960
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Chris Douglas
Assignee: Chris Douglas


KeyValueLineRecordReader effects the copy from the line to the key/value by 
creating separate arrays:
{noformat}
  int keyLen = pos;
  byte[] keyBytes = new byte[keyLen];
  System.arraycopy(line, 0, keyBytes, 0, keyLen);
  int valLen = lineLen - keyLen - 1;
  byte[] valBytes = new byte[valLen];
  System.arraycopy(line, pos + 1, valBytes, 0, valLen);
  key.set(keyBytes);
  value.set(valBytes);
{noformat}
Since set triggers another copy and Text has a set taking {{byte[], off, len}}, 
the intermediate copy can be avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-960) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-960:


Attachment: M960-0.patch

Removed intermediate buffer and {{KeyValueLineRecordReader::getKeyClass}} 
accidentally copied from mapred in MAPREDUCE-655

 Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader
 

 Key: MAPREDUCE-960
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-960
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Chris Douglas
Assignee: Chris Douglas
 Attachments: M960-0.patch


 KeyValueLineRecordReader effects the copy from the line to the key/value by 
 creating separate arrays:
 {noformat}
   int keyLen = pos;
   byte[] keyBytes = new byte[keyLen];
   System.arraycopy(line, 0, keyBytes, 0, keyLen);
   int valLen = lineLen - keyLen - 1;
   byte[] valBytes = new byte[valLen];
   System.arraycopy(line, pos + 1, valBytes, 0, valLen);
   key.set(keyBytes);
   value.set(valBytes);
 {noformat}
 Since set triggers another copy and Text has a set taking {{byte[], off, 
 len}}, the intermediate copy can be avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-960) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-960:


Status: Patch Available  (was: Open)

 Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader
 

 Key: MAPREDUCE-960
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-960
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Chris Douglas
Assignee: Chris Douglas
 Attachments: M960-0.patch


 KeyValueLineRecordReader effects the copy from the line to the key/value by 
 creating separate arrays:
 {noformat}
   int keyLen = pos;
   byte[] keyBytes = new byte[keyLen];
   System.arraycopy(line, 0, keyBytes, 0, keyLen);
   int valLen = lineLen - keyLen - 1;
   byte[] valBytes = new byte[valLen];
   System.arraycopy(line, pos + 1, valBytes, 0, valLen);
   key.set(keyBytes);
   value.set(valBytes);
 {noformat}
 Since set triggers another copy and Text has a set taking {{byte[], off, 
 len}}, the intermediate copy can be avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-830) Providing BZip2 splitting support for Text data

[
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Douglas updated MAPREDUCE-830:

Attachment: M830-3.patch

* Fixed mapreduce.lib.input.LineRecordReader (I missed the filePosition updates
in the last patch)
* Added a unit test for the mapreduce code
* Patched KeyValueLineRecordReader::isSplittable in mapred and mapreduce

Providing BZip2 splitting support for Text data
---

Key: MAPREDUCE-830
URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
Project: Hadoop Map/Reduce
Issue Type: Improvement
Affects Versions: 0.21.0
Reporter: Abdul Qadeer
Assignee: Abdul Qadeer
Fix For: 0.21.0

Attachments: M830-2.patch, M830-3.patch, MapReduce-830-version1.patch

HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing
support to handle BZip2 compressed data such that the input compressed file
is split at arbitrary points. This JIRA uses that functionality in
LineRecordReader. The benefit of this work is that, if user provides
compressed BZip2 Text data, it will be split by Hadoop and hence will be
processed by multiple mappers. So BZip2 compressed data will be able to
fully utilize the cluster power. Currently BZip2 compressed Text file goes
to one mapper and is not split. So the enhancement in this JIRA provides
splitting support and a considerable performance gains.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752304#action_12752304
 ] 

Chris Douglas commented on MAPREDUCE-830:
-

(also includes a workaround for MAPREDUCE-959, which was getting irritating, 
and updates the unit tests to JUnit4 semantics)

 Providing BZip2 splitting support for Text data
 ---

 Key: MAPREDUCE-830
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.21.0
Reporter: Abdul Qadeer
Assignee: Abdul Qadeer
 Fix For: 0.21.0

 Attachments: M830-2.patch, M830-3.patch, MapReduce-830-version1.patch


 HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing 
 support to handle BZip2 compressed data such that the input compressed file 
 is split at arbitrary points.  This JIRA uses that functionality in 
 LineRecordReader.  The benefit of this work is that, if user provides 
 compressed BZip2 Text data, it will be split by Hadoop and hence will be 
 processed by multiple mappers.  So BZip2 compressed data will be able to 
 fully utilize the cluster power.  Currently BZip2 compressed Text file goes 
 to one mapper and is not split.  So the enhancement in this JIRA provides 
 splitting support  and a considerable performance gains.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-960) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-960:


Affects Version/s: 0.21.0

 Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader
 

 Key: MAPREDUCE-960
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-960
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.21.0
Reporter: Chris Douglas
Assignee: Chris Douglas
 Attachments: M960-0.patch


 KeyValueLineRecordReader effects the copy from the line to the key/value by 
 creating separate arrays:
 {noformat}
   int keyLen = pos;
   byte[] keyBytes = new byte[keyLen];
   System.arraycopy(line, 0, keyBytes, 0, keyLen);
   int valLen = lineLen - keyLen - 1;
   byte[] valBytes = new byte[valLen];
   System.arraycopy(line, pos + 1, valBytes, 0, valLen);
   key.set(keyBytes);
   value.set(valBytes);
 {noformat}
 Since set triggers another copy and Text has a set taking {{byte[], off, 
 len}}, the intermediate copy can be avoided

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-960) Unnecessary copy in mapreduce.lib.input.KeyValueLineRecordReader