[
https://issues.apache.org/jira/browse/MAPREDUCE-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17783878#comment-17783878
]
ASF GitHub Bot commented on MAPREDUCE-7076:
-------------------------------------------
wsu13 commented on PR #1089:
URL: https://github.com/apache/hadoop/pull/1089#issuecomment-1801030384
This modification makes `barrier()` always returns true, and makes
`NNbench_reports.log`'s "maps that missed the barrier" metric always be zero.
In a large test configuration with thousands of mappers, Yarn takes more
than 2 minutes(the default barrier time) to start up all the mappers. The
`barrier()` function is important to make the results meaningful.
@aajisaka
I think we should revisit this commit.
> TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build
> ------------------------------------------------------------------------
>
> Key: MAPREDUCE-7076
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7076
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: test
> Affects Versions: 2.8.0
> Reporter: Rushabh Shah
> Assignee: Kevin Su
> Priority: Minor
> Labels: newbie
> Fix For: 2.10.0, 3.3.0, 3.2.1, 3.1.3
>
>
> TestNNBench#testNNBenchCreateReadAndDelete failed couple of times in our
> internal jenkins build.
> {noformat}
> java.lang.AssertionError: create_write should create the file
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at
> org.apache.hadoop.hdfs.TestNNBench.testNNBenchCreateReadAndDelete(TestNNBench.java:55)
> {noformat}
> Below is my analysis for why it didn't create the file.
> {code:java|title=NNBench.java|borderStyle=solid}
> // Some comments here
> public void map(Text key,
> LongWritable value,
> OutputCollector<Text, Text> output,
> Reporter reporter) throws IOException {
> if (barrier()) {
> String fileName = "file_" + value;
> if (op.equals(OP_CREATE_WRITE)) {
> startTimeTPmS = System.currentTimeMillis();
> doCreateWriteOp(fileName, reporter);
> } ...
> } else {
> output.collect(new Text("l:latemaps"), new Text("1"));
> }
> // Below are the relevant parts of barrier() method
> private boolean barrier() {
> ..
> // If the sleep time is greater than 0, then sleep and return
> ...
> LOG.info("Waiting in barrier for: " + sleepTime + " ms");
> return retVal;
> }
> // Below are the relevant parts of the doCreateWriteOp
> private void doCreateWriteOp(String name,
> Reporter reporter) {
> FSDataOutputStream out;
> byte[] buffer = new byte[bytesToWrite];
> for (long l = 0l; l < numberOfFiles; l++) {
> Path filePath = new Path(new Path(baseDir, dataDirName),
> name + "_" + l);
> }
> ....
> }
> {code}
> This file {{BASE_DIR/data/file_0_0}} is getting created only if the map task
> starts before the time mentioned by {{startTime}}.
> Refer the chunk which I pasted above.
> {{map(..)}} --> {{barrier()}} and *only if* {{barrier()}} evaluates to true
> it will call {{doCreateWriteOp}} which will eventually create the file.
> In test case, the delay value is 3 seconds as per {{"-startTime", "" +
> (Time.now() / 1000 + 3)}}
> In this failing test case, I can see the task starting minimum 6 seconds
> after the test case started.
> {noformat}
> 2017-01-27 03:11:15,387 INFO [Thread-4] mapreduce.JobSubmitter
> (JobSubmitter.java:printTokens(289)) - Submitting tokens for job:
> job_local1711545156_0001
> 2017-01-27 03:11:23,405 INFO [Thread-4] mapreduce.Job
> (Job.java:submit(1345)) - The url to track the job: http://localhost:8080/
> {noformat}
> Also when I run this test on my laptop, I see the following line being
> printed.
> {noformat}
> 2017-01-27 17:09:27,982 INFO [LocalJobRunner Map Task Executor #0]
> hdfs.NNBench (NNBench.java:barrier(676)) - Waiting in barrier for: 1018 ms
> {noformat}
> This line will be printed only in {{barrier()}} method and I don't see this
> line in the logs of failed test.
>
> In our environment, the jenkins server was very slow and it took more than 6
> seconds to launch a map task.
> The correct fix in my opinion would be to return true in case there is no
> sleep in {{barrier() method}}. Only in exception, it should return false.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]