[jira] Created: (MAPREDUCE-1690) Using BuddySystem to reduce the ReduceTask's mem usage in the step of shuffle

2010-04-10 Thread luoli (JIRA)
Using BuddySystem to reduce the ReduceTask's mem usage in the step of shuffle
-

 Key: MAPREDUCE-1690
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1690
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: luoli




-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-1689) Write MR wire protocols in Avro IDL

2010-04-10 Thread Jeff Hammerbacher (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855674#action_12855674
 ] 

Jeff Hammerbacher commented on MAPREDUCE-1689:
--

Hey Arun,

I think there's a typo in your description. Do you mean "write all MapReduce 
protocols in Avro IDL"?

Thanks,
Jeff

> Write MR wire protocols in Avro IDL
> ---
>
> Key: MAPREDUCE-1689
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1689
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, jobtracker, task, tasktracker
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>
> As part of the the move to AVRO and wire compatibility, write all HDFS 
> protocols in AVRO IDL. This is analogous to HDFS-1069.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (MAPREDUCE-1689) Write MR wire protocols in Avro IDL

2010-04-10 Thread Arun C Murthy (JIRA)
Write MR wire protocols in Avro IDL
---

 Key: MAPREDUCE-1689
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1689
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client, jobtracker, task, tasktracker
Reporter: Arun C Murthy
Assignee: Arun C Murthy


As part of the the move to AVRO and wire compatibility, write all HDFS 
protocols in AVRO IDL. This is analogous to HDFS-1069.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-1526) Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.

2010-04-10 Thread rahul k singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855541#action_12855541
 ] 

rahul k singh commented on MAPREDUCE-1526:
--

Added the new patch . Following comments are done in this.:

- Restore the behavior of default seq == -1 for GridmixJob and 
GenerateData
- why do we need this?  wouldn't noOfRunningJobs always be the same as 
jobMaps.size()?
{noformat}
+  private static AtomicInteger noOfRunningJobs= new AtomicInteger(0);
{noformat}
- The proper logic for addJobStats should be: first check if seq < 0, 
if yes, ignore the job; then if jobdesc is null, we should throw 
exception instead of adjust the #maps == 1.
- The implementation of Statistics.add(Job job) is still wrong: You 
should hold the return value of jobMaps.remove(), and call 
StatListener.update() with the return value iff the return 
value is not null.
- We should eliminate the variable runningJobs in StressJobFactory?

Minor things:
- The following comments from my previous review were not addressed:
> - I think the following statement should be Log.debug() instead of 
> Log.info() (and be protected by a check of LOG.isDebugEnabled()):
> {noformat}
> -if (LOG.isDebugEnabled()) {
> -  LOG.info(
> +LOG.info(
> System.currentTimeMillis() + " Overloaded is " + 
> Boolean.toString(
>   overloaded) + " incompleteMapTasks " + relOp + " " +
>   OVERLAOD_MAPTASK_MAPSLOT_RATIO + "*mapSlotCapacity" + "(" +
>   incompleteMapTasks + " " + relOp + " " +
>   OVERLAOD_MAPTASK_MAPSLOT_RATIO + "*" +
>   clusterStatus.getMaxMapTasks() + ")");
> -}
> +
> {noformat}
- static List pullDescription(JobContext jobCtxt) can be 
implemented on top of GridmixJob.getJobSeqId

- removed the redundant GridmixJob.getJobSeqId() calls.
- fixed a minor bug in Statistics.addJobStats(Job, JobStats)


> Cache the job related information while submitting the job , this would avoid 
> many RPC calls to JobTracker.
> ---
>
> Key: MAPREDUCE-1526
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1526
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/gridmix
>Reporter: rahul k singh
> Attachments: 1526-yahadoop-20-101-2.patch, 
> 1526-yahadoop-20-101-3.patch, 1526-yahadoop-20-101.patch, 
> 1526-yhadoop-20-101-4.patch, 1526-yhadoop-20-101-4.patch
>
>


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-1526) Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.

2010-04-10 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1526:
-

Attachment: 1526-yhadoop-20-101-4.patch

> Cache the job related information while submitting the job , this would avoid 
> many RPC calls to JobTracker.
> ---
>
> Key: MAPREDUCE-1526
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1526
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/gridmix
>Reporter: rahul k singh
> Attachments: 1526-yahadoop-20-101-2.patch, 
> 1526-yahadoop-20-101-3.patch, 1526-yahadoop-20-101.patch, 
> 1526-yhadoop-20-101-4.patch, 1526-yhadoop-20-101-4.patch
>
>


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (MAPREDUCE-1526) Cache the job related information while submitting the job , this would avoid many RPC calls to JobTracker.

2010-04-10 Thread rahul k singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rahul k singh updated MAPREDUCE-1526:
-

Attachment: 1526-yhadoop-20-101-4.patch

> Cache the job related information while submitting the job , this would avoid 
> many RPC calls to JobTracker.
> ---
>
> Key: MAPREDUCE-1526
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1526
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/gridmix
>Reporter: rahul k singh
> Attachments: 1526-yahadoop-20-101-2.patch, 
> 1526-yahadoop-20-101-3.patch, 1526-yahadoop-20-101.patch, 
> 1526-yhadoop-20-101-4.patch, 1526-yhadoop-20-101-4.patch
>
>


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-1683) Remove JNI calls from ClusterStatus cstr

2010-04-10 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855540#action_12855540
 ] 

Vinod K V commented on MAPREDUCE-1683:
--

+1 for leaving the memory vals in the detailed info.

Patch for 20 looks good too.

When working on the trunk patch, we should cleanup/remove all the useless 
constructors - constructors are all package private so no harm in doing a 
cleanup here, I think.

> Remove JNI calls from ClusterStatus cstr
> 
>
> Key: MAPREDUCE-1683
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1683
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.2
>Reporter: Chris Douglas
> Attachments: MAPREDUCE-1683_yhadoop_20_9.patch
>
>
> The {{ClusterStatus}} constructor makes two JNI calls to the {{Runtime}} to 
> fetch memory information. {{ClusterStatus}} instances are often created 
> inside the {{JobTracker}} to obtain other, unrelated metrics (sometimes from 
> schedulers' inner loops). Given that this information is related to the 
> {{JobTracker}} process and not the cluster, the metrics are also available 
> via {{JvmMetrics}}, and the jsps can gather this information for themselves: 
> these fields can be removed from {{ClusterStatus}}

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (MAPREDUCE-1505) Cluster class should create the rpc client only when needed

2010-04-10 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855539#action_12855539
 ] 

Vinod K V commented on MAPREDUCE-1505:
--

I looked at the latest 20 patch. Looks good except that the changes in 
{{ensureState()}} are never reachable.

> Cluster class should create the rpc client only when needed
> ---
>
> Key: MAPREDUCE-1505
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1505
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.20.2
>Reporter: Devaraj Das
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1505_yhadoop20.patch, 
> MAPREDUCE-1505_yhadoop20_9.patch
>
>
> It will be good to have the org.apache.hadoop.mapreduce.Cluster create the 
> rpc client object only when needed (when a call to the jobtracker is actually 
> required). org.apache.hadoop.mapreduce.Job constructs the Cluster object 
> internally and in many cases the application that created the Job object 
> really wants to look at the configuration only. It'd help to not have these 
> connections to the jobtracker especially when Job is used in the tasks (for 
> e.g., Pig calls mapreduce.FileInputFormat.setInputPath in the tasks and that 
> requires a Job object to be passed).
> In Hadoop 20, the Job object internally creates the JobClient object, and the 
> same argument applies there too.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira