[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832373#action_12832373
 ] 

Chris Douglas commented on MAPREDUCE-1408:
------------------------------------------

The patch looks good. A few notes after a first pass (I have not examined the 
statistics code, yet):
* {{TestGridmixSubmission}} is an expensive test to run; the cluster 
startup/shutdown can be amortized across runs (as can the data generation). If 
this needs to clean up between jobs, consider using {...@before}} and 
{...@after}} instead of making separate tests (i.e., no subclassing the unit 
test). The current patch also makes many fields protected without referencing 
them in subclasses and also makes static, inner classes into member classes for 
reasons that aren't obvious (e.g. {{FilterJobStory}})
* Very minor nits: the coding standards recommend whitespace after commas and 
before and after braces. Control statements also must enclose their blocks in 
braces e.g. "{{if (expr) { stmt }}}", never "{{if (expr) stmt}}", "{{foo(a, b, 
c) {}}" never "{{foo(a, b,c){}}". Some lines in the patch exceed 80 characters
* {{DebugGridmix::createJobMonitor}} should allow IOExceptions to escape and 
fail instead of logging the exception
* {{statistics}} is a more descriptive name than {{collector}} for the 
{{StatsCollector}} references
* It would probably be cleaner to pull the {{REPLAY}} mode into a separate 
class, allowing {{JobFactory}} to be an abstract base.
* The {{JobMonitor::process}} method can probably be removed, since it doesn't 
do any meaningful work (the instance is instead passed to the downstream 
{{StatsCollector}}, which does). The monitor is then only responsible for 
determining when the job is done, not its outcome The unit test can subclass 
the downstream component instead of subclassing {{JobMonitor}}
* Instead of making {{JobFactory::rThread}} protected, adding an abstract 
{{createReaderThead}} would also allow it to remain final
* The {{SerialJobFactory}} includes logic that preserves submission times, but 
it should be submitting the job so that it executes immediately. Since the 
queue between the {{JobFactory}} and {{JobSubmitter}} will never have more than 
1 element in it, it should be sufficient to submit new jobs at the system time.
* The {{SerialJobFactory}} should either be waiting on a condition or add new 
jobs to the submitter while holding the lock. Otherwise it's possible (though 
extremely unlikely) for signals to be missed.
* If a job fails submission in serial mode, wil the {{JobFactory}} be woken up?
* Instead of adding {{GridmixJobSubmissionPolicy::getPolicy}}, callers can use 
{{Configuration::getEnum}} (so callers may specify the default)
* The poll interval in the submission policy enum should be specified as a 
final param on the instance (as the name is), not by an abstract method. The 
{{getName}} method is the same as its {{toString}}, isn't it? Is the "name" 
important?

> Allow customization of job submission policies
> ----------------------------------------------
>
>                 Key: MAPREDUCE-1408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1408
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/gridmix
>            Reporter: rahul k singh
>         Attachments: 1408-1.patch
>
>
> Currently, gridmix3 replay job submission faithfully. For evaluation 
> purposes, it would be great if we can support other job submission policies 
> such as sequential job submission, or stress job submission.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to