[jira] [Created] (MAPREDUCE-7076) TestNNBench#testNNBenchCreateReadAndDelete failing in our internal build

2018-04-10 Thread Rushabh S Shah (JIRA)
Rushabh S Shah created MAPREDUCE-7076:
-

 Summary: TestNNBench#testNNBenchCreateReadAndDelete failing in our 
internal build
 Key: MAPREDUCE-7076
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7076
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 2.8.0
Reporter: Rushabh S Shah


TestNNBench#testNNBenchCreateReadAndDelete failed couple of times in our 
internal jenkins build.
{noformat}
java.lang.AssertionError: create_write should create the file
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.hadoop.hdfs.TestNNBench.testNNBenchCreateReadAndDelete(TestNNBench.java:55)
{noformat}

Below is my analysis for why it didn't create the file.
{code:title=NNBench.java|borderStyle=solid}
// Some comments here
  public void map(Text key, 
LongWritable value,
OutputCollector output,
Reporter reporter) throws IOException {
  if (barrier()) {
String fileName = "file_" + value;
if (op.equals(OP_CREATE_WRITE)) {
  startTimeTPmS = System.currentTimeMillis();
  doCreateWriteOp(fileName, reporter);
} ...
  } else {
output.collect(new Text("l:latemaps"), new Text("1"));
  }
  // Below are the relevant parts of barrier() method
  private boolean barrier() {
..
// If the sleep time is greater than 0, then sleep and return
...
LOG.info("Waiting in barrier for: " + sleepTime + " ms");
return retVal;
  }
  // Below are the relevant parts of the doCreateWriteOp
  private void doCreateWriteOp(String name,
 Reporter reporter) {
FSDataOutputStream out;
byte[] buffer = new byte[bytesToWrite];  
for (long l = 0l; l < numberOfFiles; l++) {
  Path filePath = new Path(new Path(baseDir, dataDirName), 
  name + "_" + l);
}
  
  }
{code}   

This file {{BASE_DIR/data/file_0_0}} is getting created only if the map task 
starts before the time mentioned by {{startTime}}.
Refer the chunk which I pasted above.
{{map(..)}} --> {{barrier()}} and *only if* {{barrier()}} evaluates to true it 
will call {{doCreateWriteOp}} which will eventually create the file.
In test case, the delay value is 3 seconds as per {{"-startTime", "" + 
(Time.now() / 1000 + 3)}}
In this failing test case, I can see the task starting minimum 6 seconds after 
the test case started.
{noformat}
2017-01-27 03:11:15,387 INFO  [Thread-4] mapreduce.JobSubmitter 
(JobSubmitter.java:printTokens(289)) - Submitting tokens for job: 
job_local1711545156_0001
2017-01-27 03:11:23,405 INFO  [Thread-4] mapreduce.Job (Job.java:submit(1345)) 
- The url to track the job: http://localhost:8080/
{noformat}

Also when I run this test on my laptop, I see the following line being printed.
{noformat}
2017-01-27 17:09:27,982 INFO  [LocalJobRunner Map Task Executor #0] 
hdfs.NNBench (NNBench.java:barrier(676)) - Waiting in barrier for: 1018 ms
{noformat}
This line will be printed only in {{barrier()}} method and I don't see this 
line in the logs of failed test.
In our environment, the jenkins server was very slow and it took more than 6 
seconds to launch a map task.
The correct fix in my opinion would be to return true in case there is no sleep 
in {{barrier() method}}. Only in exception, it will return false.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6996) FileInputFormat#getBlockIndex should include file name in the exception.

2017-11-01 Thread Rushabh S Shah (JIRA)
Rushabh S Shah created MAPREDUCE-6996:
-

 Summary: FileInputFormat#getBlockIndex should include file name in 
the exception.
 Key: MAPREDUCE-6996
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6996
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Rushabh S Shah
Priority: Minor


{code:title=FileInputFormat..java|borderStyle=solid}
// Some comments here
 protected int getBlockIndex(BlockLocation[] blkLocations, 
  long offset) {
{
...
...
BlockLocation last = blkLocations[blkLocations.length -1];
long fileLength = last.getOffset() + last.getLength() -1;
throw new IllegalArgumentException("Offset " + offset + 
   " is outside of file (0.." +
   fileLength + ")");
}
{code}
When the file is open for writing, the {{last.getLength()}} and 
{{last.getOffset()}} will be zero and we see the following exception stack 
trace.
{noformat}
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:288)
Caused by: java.lang.IllegalArgumentException: Offset 0 is outside of file 
(0..-1)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getBlockIndex(FileInputFormat.java:453)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:413)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:265)
... 18 more
{noformat}
Its difficult to debug which file was open.
So creating this ticket to include the filename in the exception.
Since {{FileInputFormat#getBlockIndex}} is protected, we can't change the 
signature of that method and add file name to arguments.
The only way I can think to fix this is: 
{code:title=FileInputFormat..java|borderStyle=solid}
 public InputSplit[] getSplits(JobConf job, int numSplits)
throws IOException {
{
...
...
   for (FileStatus file: files) {
  Path path = file.getPath();
  long length = file.getLen();
  if (length != 0) {
FileSystem fs = path.getFileSystem(job);
BlockLocation[] blkLocations;
if (file instanceof LocatedFileStatus) {
  blkLocations = ((LocatedFileStatus) file).getBlockLocations();
} else {
  blkLocations = fs.getFileBlockLocations(file, 0, length);
}
if (isSplitable(fs, path)) {
  long blockSize = file.getBlockSize();
  long splitSize = computeSplitSize(goalSize, minSize, blockSize);

  long bytesRemaining = length;
  while (((double) bytesRemaining)/splitSize > SPLIT_SLOP) {
String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations,
length-bytesRemaining, splitSize, clusterMap);
splits.add(makeSplit(path, length-bytesRemaining, splitSize,
splitHosts[0], splitHosts[1]));
bytesRemaining -= splitSize;
  }

  if (bytesRemaining != 0) {
String[][] splitHosts = getSplitHostsAndCachedHosts(blkLocations, 
length
- bytesRemaining, bytesRemaining, clusterMap);
splits.add(makeSplit(path, length - bytesRemaining, bytesRemaining,
splitHosts[0], splitHosts[1]));
  }
} else {
  String[][] splitHosts = 
getSplitHostsAndCachedHosts(blkLocations,0,length,clusterMap);
  splits.add(makeSplit(path, 0, length, splitHosts[0], splitHosts[1]));
}
  } else { 
//Create empty hosts array for zero length files
splits.add(makeSplit(path, 0, length, new String[0]));
  }
}
{code}
Have a try-catch block around the above code chunk and catch 
{{IllegalArgumentException}} and check for message {{Offset 0 is outside of 
file (0..-1)}}.
If yes, add the file name and rethrow {{IllegalArgumentException}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Resolved] (MAPREDUCE-6938) Question

2017-08-15 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah resolved MAPREDUCE-6938.
---
Resolution: Invalid

[~remil] This jira board is for bug/improvement/feature tracking system not for 
asking some random questions/programs.
Please send an email to {{gene...@hadoop.apache.org}} or 
{{u...@hadoop.apache.org}} and hope someone would reply.

Thanks,
Rushabh Shah.


> Question
> 
>
> Key: MAPREDUCE-6938
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6938
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>Reporter: Remil
>Priority: Minor
>
> I need 2 helps.
> 1) need a Java map reducer sample program where multiple parameters 
> are passed from mapper to reducer.
> 2) need a Java map reducer program where there is a write to a file inside 
> hdfs filesystem as well as a read from a file inside hdfs other than 
> the normal input file and output file mentioned in the mapper and reducer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6633) AM should retry map attempts if the reduce task encounters commpression related errors.

2016-02-10 Thread Rushabh S Shah (JIRA)
Rushabh S Shah created MAPREDUCE-6633:
-

 Summary: AM should retry map attempts if the reduce task 
encounters commpression related errors.
 Key: MAPREDUCE-6633
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6633
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.2
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah


When reduce task encounters compression related errors, AM  doesn't retry the 
corresponding map task.
In one of the case we encountered, here is the stack trace.
{noformat}
2016-01-27 13:44:28,915 WARN [main] org.apache.hadoop.mapred.YarnChild: 
Exception running child : 
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle 
in fetcher#29
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ArrayIndexOutOfBoundsException
at 
com.hadoop.compression.lzo.LzoDecompressor.setInput(LzoDecompressor.java:196)
at 
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:104)
at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
at 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:537)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:336)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
{noformat}
In this case, the node on which the map task ran had a bad drive.
If the AM had retried running that map task somewhere else, the jib definitely 
would have succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-5797) The elapsed time for tasks in a failed job that were never started can be way off.

2014-03-14 Thread Rushabh S Shah (JIRA)
Rushabh S Shah created MAPREDUCE-5797:
-

 Summary: The elapsed time for tasks in a failed job that were 
never started can be way off. 
 Key: MAPREDUCE-5797
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5797
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, webapps
Affects Versions: 0.23.9
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah


The elapsed time for tasks in a failed job that were never
started can be way off.  It looks like we're marking the start time as the
beginning of the epoch (i.e.: start time = -1) but the finish time is when the
task was marked as failed when the whole job failed.  That causes the
calculated elapsed time of the task to be a ridiculous number of hours.

Tasks that fail without any attempts shouldn't have start/finish/elapsed times.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page

2014-03-10 Thread Rushabh S Shah (JIRA)
Rushabh S Shah created MAPREDUCE-5789:
-

 Summary: Average Reduce time is incorrect on Job Overview page
 Key: MAPREDUCE-5789
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, webapps
Affects Versions: 2.3.0, 0.23.10
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah


The Average Reduce time displayed on the job overview page is incorrect.
Previously Reduce time was calculated as difference between finishTime and 
shuffleFinishTime.
It should be difference of finishTime and sortFinishTime



--
This message was sent by Atlassian JIRA
(v6.2#6252)