[ https://issues.apache.org/jira/browse/MAPREDUCE-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Iyappan Srinivasan updated MAPREDUCE-1676: ------------------------------------------ Attachment: TEST-org.apache.hadoop.mapred.TestDistributedCacheModifiedFile.txt TestDistributedCacheModifiedFile.patch I have added the ArrayList instead of String array. The reason for having seperate testacase and not combining is : The code has as much differences as commonalities. Between TestDistributedCacheModifiedFile.java and TestDistributedCacheUnModifiedFile.java, there are various differences in ways that it collects the tasktrackers where tasks ran. For the unModified, it doesnt have to collect even the clean task's tasktracker whereas for modifiedFile, even that tasktracker needs to be collected. One more difference is for modified it has to modify the dfs file in betwen two jobs, which is not necessary for Unmodified. Also, the the ways Assert checking is done is different. If these both needs to be combined, it will require a lot more if loops, which will reduce the understanding and maintainability of both these functionalities. Very generic functionalities used in both have been put in TestUtils.java. So, I would suggest we keep them as seperate. > Create test scenario for "distributed cache file behaviour, when dfs file is > modified" > -------------------------------------------------------------------------------------- > > Key: MAPREDUCE-1676 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1676 > Project: Hadoop Map/Reduce > Issue Type: Test > Affects Versions: 0.22.0 > Reporter: Iyappan Srinivasan > Assignee: Iyappan Srinivasan > Attachments: > TEST-org.apache.hadoop.mapred.TestDistributedCacheModifiedFile.txt, > TEST-org.apache.hadoop.mapred.TestDistributedCacheModifiedFile.txt, > TestDistributedCacheModifiedFile.patch, > TestDistributedCacheModifiedFile.patch, TestDistributedCacheModifiedFile.patch > > > Verify the Distributed Cache functionality. This test scenario is for a > distributed cache file behaviour when it is modified before and after being > accessed by maximum two jobs. Once a job uses a distributed cache file that > file is stored in the mapred.local.dir. If the next job > uses the same file, but with differnt timestamp, then that file is stored > again. So, if two jobs choose the same tasktracker for their job execution > then, the distributed cache file should be found twice. > This testcase runs a job with a distributed cache file. All the tasks' > corresponding tasktracker's handle is got and checked for the presence of > distributed cache with proper permissions in the proper directory. Next when > job runs again and if any of its tasks hits the same tasktracker, which ran > one of the task of the previous job, then that > file should be uploaded again and task should not use the old file. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira