[jira] [Created] (MAPREDUCE-7076) TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build
Rushabh S Shah created MAPREDUCE-7076: - Summary: TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build Key: MAPREDUCE-7076 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7076 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Affects Versions: 2.8.0 Reporter: Rushabh S Shah TestNNBench#testNNBenchCreateReadAndDelete failed couple of times in our internal jenkins build. {noformat} java.lang.AssertionError: create_write should create the file at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.assertTrue(Assert.java:41) at org.apache.hadoop.hdfs.TestNNBench.testNNBenchCreateReadAndDelete(TestNNBench.java:55) {noformat} Below is my analysis for why it didn't create the file. {code:title=NNBench.java|borderStyle=solid} // Some comments here public void map(Text key, LongWritable value, OutputCollectoroutput, Reporter reporter) throws IOException { if (barrier()) { String fileName = "file_" + value; if (op.equals(OP_CREATE_WRITE)) { startTimeTPmS = System.currentTimeMillis(); doCreateWriteOp(fileName, reporter); } ... } else { output.collect(new Text("l:latemaps"), new Text("1")); } // Below are the relevant parts of barrier() method private boolean barrier() { .. // If the sleep time is greater than 0, then sleep and return ... LOG.info("Waiting in barrier for: " + sleepTime + " ms"); return retVal; } // Below are the relevant parts of the doCreateWriteOp private void doCreateWriteOp(String name, Reporter reporter) { FSDataOutputStream out; byte[] buffer = new byte[bytesToWrite]; for (long l = 0l; l < numberOfFiles; l++) { Path filePath = new Path(new Path(baseDir, dataDirName), name + "_" + l); } } {code} This file {{BASE_DIR/data/file_0_0}} is getting created only if the map task starts before the time mentioned by {{startTime}}. Refer the chunk which I pasted above. {{map(..)}} --> {{barrier()}} and *only if* {{barrier()}} evaluates to true it will call {{doCreateWriteOp}} which will eventually create the file. In test case, the delay value is 3 seconds as per {{"-startTime", "" + (Time.now() / 1000 + 3)}} In this failing test case, I can see the task starting minimum 6 seconds after the test case started. {noformat} 2017-01-27 03:11:15,387 INFO [Thread-4] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(289)) - Submitting tokens for job: job_local1711545156_0001 2017-01-27 03:11:23,405 INFO [Thread-4] mapreduce.Job (Job.java:submit(1345)) - The url to track the job: http://localhost:8080/ {noformat} Also when I run this test on my laptop, I see the following line being printed. {noformat} 2017-01-27 17:09:27,982 INFO [LocalJobRunner Map Task Executor #0] hdfs.NNBench (NNBench.java:barrier(676)) - Waiting in barrier for: 1018 ms {noformat} This line will be printed only in {{barrier()}} method and I don't see this line in the logs of failed test. In our environment, the jenkins server was very slow and it took more than 6 seconds to launch a map task. The correct fix in my opinion would be to return true in case there is no sleep in {{barrier() method}}. Only in exception, it will return false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6996) FileInputFormat#getBlockIndex should include file name in the exception.
Rushabh S Shah created MAPREDUCE-6996: - Summary: FileInputFormat#getBlockIndex should include file name in the exception. Key: MAPREDUCE-6996 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6996 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rushabh S Shah Priority: Minor {code:title=FileInputFormat..java|borderStyle=solid} // Some comments here protected int getBlockIndex(BlockLocation[] blkLocations, long offset) { { ... ... BlockLocation last = blkLocations[blkLocations.length -1]; long fileLength = last.getOffset() + last.getLength() -1; throw new IllegalArgumentException("Offset " + offset + " is outside of file (0.." + fileLength + ")"); } {code} When the file is open for writing, the {{last.getLength()}} and {{last.getOffset()}} will be zero and we see the following exception stack trace. {noformat} org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:288) Caused by: java.lang.IllegalArgumentException: Offset 0 is outside of file (0..-1) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getBlockIndex(FileInputFormat.java:453) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:413) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265) ... 18 more {noformat} Its difficult to debug which file was open. So creating this ticket to include the filename in the exception. Since {{FileInputFormat#getBlockIndex}} is protected, we can't change the signature of that method and add file name to arguments. The only way I can think to fix this is: {code:title=FileInputFormat..java|borderStyle=solid} public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException { { ... ... for (FileStatus file: files) { Path path = file.getPath(); long length = file.getLen(); if (length != 0) { FileSystem fs = path.getFileSystem(job); BlockLocation[] blkLocations; if (file instanceof LocatedFileStatus) { blkLocations = ((LocatedFileStatus) file).getBlockLocations(); } else { blkLocations = fs.getFileBlockLocations(file, 0, length); } if (isSplitable(fs, path)) { long blockSize = file.getBlockSize(); long splitSize = computeSplitSize(goalSize, minSize, blockSize); long bytesRemaining = length; while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) { String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, length-bytesRemaining, splitSize, clusterMap); splits.add(makeSplit(path, length-bytesRemaining, splitSize, splitHosts[0], splitHosts[1])); bytesRemaining -= splitSize; } if (bytesRemaining != 0) { String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, length - bytesRemaining, bytesRemaining, clusterMap); splits.add(makeSplit(path, length - bytesRemaining, bytesRemaining, splitHosts[0], splitHosts[1])); } } else { String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,0,length,clusterMap); splits.add(makeSplit(path, 0, length, splitHosts[0], splitHosts[1])); } } else { //Create empty hosts array for zero length files splits.add(makeSplit(path, 0, length, new String[0])); } } {code} Have a try-catch block around the above code chunk and catch {{IllegalArgumentException}} and check for message {{Offset 0 is outside of file (0..-1)}}. If yes, add the file name and rethrow {{IllegalArgumentException}}. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Resolved] (MAPREDUCE-6938) Question
[ https://issues.apache.org/jira/browse/MAPREDUCE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah resolved MAPREDUCE-6938. --- Resolution: Invalid [~remil] This jira board is for bug/improvement/feature tracking system not for asking some random questions/programs. Please send an email to {{gene...@hadoop.apache.org}} or {{u...@hadoop.apache.org}} and hope someone would reply. Thanks, Rushabh Shah. > Question > > > Key: MAPREDUCE-6938 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6938 > Project: Hadoop Map/Reduce > Issue Type: Task >Reporter: Remil >Priority: Minor > > I need 2 helps. > 1) need a Java map reducer sample program where multiple parameters > are passed from mapper to reducer. > 2) need a Java map reducer program where there is a write to a file inside > hdfs filesystem as well as a read from a file inside hdfs other than > the normal input file and output file mentioned in the mapper and reducer. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.
Rushabh S Shah created MAPREDUCE-6633: - Summary: AM should retry map attempts if the reduce task encounters commpression related errors. Key: MAPREDUCE-6633 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.7.2 Reporter: Rushabh S Shah Assignee: Rushabh S Shah When reduce task encounters compression related errors, AM doesn't retry the corresponding map task. In one of the case we encountered, here is the stack trace. {noformat} 2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#29 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: java.lang.ArrayIndexOutOfBoundsException at com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196) at org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104) at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) {noformat} In this case, the node on which the map task ran had a bad drive. If the AM had retried running that map task somewhere else, the jib definitely would have succeeded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-5797) The elapsed time for tasks in a failed job that were never started can be way off.
Rushabh S Shah created MAPREDUCE-5797: - Summary: The elapsed time for tasks in a failed job that were never started can be way off. Key: MAPREDUCE-5797 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 0.23.9 Reporter: Rushabh S Shah Assignee: Rushabh S Shah The elapsed time for tasks in a failed job that were never started can be way off. It looks like we're marking the start time as the beginning of the epoch (i.e.: start time = -1) but the finish time is when the task was marked as failed when the whole job failed. That causes the calculated elapsed time of the task to be a ridiculous number of hours. Tasks that fail without any attempts shouldn't have start/finish/elapsed times. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page
Rushabh S Shah created MAPREDUCE-5789: - Summary: Average Reduce time is incorrect on Job Overview page Key: MAPREDUCE-5789 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789 Project: Hadoop Map/Reduce Issue Type: Bug Components: jobhistoryserver, webapps Affects Versions: 2.3.0, 0.23.10 Reporter: Rushabh S Shah Assignee: Rushabh S Shah The Average Reduce time displayed on the job overview page is incorrect. Previously Reduce time was calculated as difference between finishTime and shuffleFinishTime. It should be difference of finishTime and sortFinishTime -- This message was sent by Atlassian JIRA (v6.2#6252)