[jira] Created: (MAPREDUCE-1975) gridmix shows unnecessary InterruptedException
gridmix shows unnecessary InterruptedException -- Key: MAPREDUCE-1975 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1975 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/gridmix Reporter: Ravi Gummadi Assignee: Ravi Gummadi Fix For: 0.22.0 The following InterruptedException is seen when gridmix is run and it ran successfully: 10/06/24 20:43:03 INFO gridmix.ReplayJobFactory: START REPLAY @ 11331037109 10/06/24 20:43:03 ERROR gridmix.Statistics: Statistics interrupt while waiting for polling null java.lang.InterruptedException at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObjec\ t.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObjec\ t.await(AbstractQueuedSynchronizer.java:2066) at org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.run(Statis\ tics.java:190) 10/06/24 20:43:03 INFO gridmix.Gridmix: Exiting... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars
[ https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amareshwari Sriramadasu updated MAPREDUCE-1686: --- Status: Open (was: Patch Available) Thanks Paul for the patch. Some comments on the patch: * Please remove printStackTrace() calls in catch blocks in StreamUtil. Since StreamUtil.goodClassOrNull is used to find whether the passed mapper/reducer value is class or command, we don't want to print the stacktrace. Also, see MAPREDUCE-571. * The testcase does not pass even after the fix because the path given for the jar is never built. For example, see the testjar directory in src/test/mapred/testjar and how it is built. * In the testcase, loadLibJar() and assert associated with it, seems unnecessary. > ClassNotFoundException for custom format classes provided in libjars > > > Key: MAPREDUCE-1686 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 >Reporter: Paul Burkhardt >Priority: Minor > Attachments: HADOOP-1686.patch > > > The StreamUtil::goodClassOrNull method assumes user-provided classes have > package names and if not, they are part of the Hadoop Streaming package. For > example, using custom InputFormat or OutputFormat classes without package > names will fail with a ClassNotFound exception which is not indicative given > the classes are provided in the libjars option. Admittedly, most Java > packages should have a package name so this should rarely come up. > Possible resolution options: > 1) modify the error message to include the actual classname that was > attempted in the goodClassOrNull method > 2) call the Configuration::getClassByName method first and if class not found > check for default package name and try the call again > {code} > public static Class goodClassOrNull(Configuration conf, String className, > String defaultPackage) { > Class clazz = null; > try { > clazz = conf.getClassByName(className); > } catch (ClassNotFoundException cnf) { > } > if (clazz == null) { > if (className.indexOf('.') == -1 && defaultPackage != null) { > className = defaultPackage + "." + className; > try { > clazz = conf.getClassByName(className); > } catch (ClassNotFoundException cnf) { > } > } > } > return clazz; > } > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1968) Deprecate GridMix v1
[ https://issues.apache.org/jira/browse/MAPREDUCE-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893054#action_12893054 ] Ranjit Mathew commented on MAPREDUCE-1968: -- Yes, that's the ultimate goal. However, GridMix v3 doesn't yet cover CPU/memory-load simulation and does not come out-of-the-box with benchmarking work-loads that can mimic those from GridMix v2. It doesn't cover compression and pipes either. > Deprecate GridMix v1 > > > Key: MAPREDUCE-1968 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1968 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: contrib/gridmix >Reporter: Ranjit Mathew > > GridMix v2 in "src/benchmarks/gridmix2" obsoletes GridMix v1 in > "src/benchmarks/gridmix". > The latter should be deprecated and then removed to reduce the clutter in the > source-tree. > One way of doing this is shown by the "hadoop" script from 0.20.xx that has > been deprecated > in favour of "mapred", for example. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1882) Use Jsch instead of Shell.java
[ https://issues.apache.org/jira/browse/MAPREDUCE-1882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893048#action_12893048 ] Konstantin Boudnik commented on MAPREDUCE-1882: --- And for the trunk? > Use Jsch instead of Shell.java > --- > > Key: MAPREDUCE-1882 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1882 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: test > Environment: herriot framework >Reporter: Balaji Rajagopalan >Assignee: Iyappan Srinivasan > Attachments: 1882-ydist-security-patch.txt, RemoteExecution.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > In herriot ( hadoop system test case dev) we often find that we are resorted > to habit of ssh to remote node execute a shell command, and come out. It is > wise to use Jsch instead of doing this through Shell.java ( hadoop code), > since Jsch provides nice Java abstraction, the JIRA will only close after we > import Jsch input hadoop build system and also fix all the existing test > cases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1890) Create automated test scenarios for decommissioning of task trackers
[ https://issues.apache.org/jira/browse/MAPREDUCE-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893049#action_12893049 ] Konstantin Boudnik commented on MAPREDUCE-1890: --- trunk patch, please > Create automated test scenarios for decommissioning of task trackers > > > Key: MAPREDUCE-1890 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1890 > Project: Hadoop Map/Reduce > Issue Type: Test > Components: test >Reporter: Iyappan Srinivasan > Attachments: 1890-ydist-security-patch.txt, TestDecomissioning.patch > > > Test scenarios : > 1) Put a healthy slave task tracker in the dfs.exclude file. > 2) As a valid user, decommission a node in the cluster by issuing the > command "hadoop mradmin -refreshNodes" > 3) Make sure that the node is decommissioned. > 4) Now take the task tracker out of the file. > 5) As a valid user, again issue the command "hadoop mradmin -refreshNodes" > 6) Make sure that the node is not in the decommiossion list. > 7) Bring back that node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1963) [Herriot] TaskMemoryManager should log process-tree's status while killing tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12893047#action_12893047 ] Konstantin Boudnik commented on MAPREDUCE-1963: --- Please have patch for the trunk as well. > [Herriot] TaskMemoryManager should log process-tree's status while killing > tasks > > > Key: MAPREDUCE-1963 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1963 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: test >Reporter: Vinay Kumar Thota >Assignee: Vinay Kumar Thota > Attachments: 1963-ydist-security.patch > > > 1. Execute a streaming job which will increase memory usage beyond configured > memory limits during mapping phase. TaskMemoryManager should logs a map > task's process-tree's status just before killing the task. > 2. Execute a streaming job which will increase memory usage beyond configured > memory limits during reduce phase. TaskMemoryManager should logs a > reduce task's process-tree's status just before killing the task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1958) using delegation token over hftp for long running clients (part of hdfs 1296)
[ https://issues.apache.org/jira/browse/MAPREDUCE-1958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik updated MAPREDUCE-1958: -- Attachment: MAPREDUCE-1958-1.patch ran tests. all passed. > using delegation token over hftp for long running clients (part of hdfs 1296) > - > > Key: MAPREDUCE-1958 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1958 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Boris Shkolnik >Assignee: Boris Shkolnik > Attachments: MAPREDUCE-1958-1.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1974) FairScheduler can preempt the same task many times
FairScheduler can preempt the same task many times -- Key: MAPREDUCE-1974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1974 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Scott Chen Assignee: Scott Chen In FairScheduler.preemptTasks(), tasks are collected from JobInProgress.runningMapCache. But tasks repeat multiple times in JobInProgress.runningMapCache (on rack, node and cluster). This makes FairScheduler preempt the same task many times. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1973) Optimize input split creation
[ https://issues.apache.org/jira/browse/MAPREDUCE-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Burkhardt updated MAPREDUCE-1973: -- Attachment: HADOOP-1973.patch Please review. > Optimize input split creation > - > > Key: MAPREDUCE-1973 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1973 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 0.20.1, 0.20.2 > Environment: Intel Nehalem cluster running Red Hat. >Reporter: Paul Burkhardt >Priority: Minor > Attachments: HADOOP-1973.patch > > > The input split returns the locations that host the file blocks in the split. > The locations are determined by the getBlockLocations method of the > filesystem client which requires a remote connection to the filesystem (i.e. > HDFS). The remote connection is made for each file in the entire input split. > For jobs with many input files the network connections dominate the cost of > writing the input split file. > A job requests a listing of the input files from the remote filesystem and > creates a FileStatus object as a handle for each file in the listing. The > FileStatus object can be imbued with the necessary host information on the > remote end and passed to the client-side in the bulk return of the listing > request. A getHosts method of the FileStatus would then return the locations > for the blocks comprising that file and eliminate the need for another trip > to the remote filesystem. > The INodeFile maintains the blocks for a file and is an obvious choice to be > the originator for the locations of that file. It is also available to the > FSDirectory which first creates the listing of FileStatus objects. We propose > that the block locations be generated by the INodeFile to instantiate the > FileStatus object during the getListing request. > Our tests demonstrated a factor of 2000 speedup for approximately 60,000 > input files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1973) Optimize input split creation
Optimize input split creation - Key: MAPREDUCE-1973 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1973 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.20.2, 0.20.1 Environment: Intel Nehalem cluster running Red Hat. Reporter: Paul Burkhardt Priority: Minor The input split returns the locations that host the file blocks in the split. The locations are determined by the getBlockLocations method of the filesystem client which requires a remote connection to the filesystem (i.e. HDFS). The remote connection is made for each file in the entire input split. For jobs with many input files the network connections dominate the cost of writing the input split file. A job requests a listing of the input files from the remote filesystem and creates a FileStatus object as a handle for each file in the listing. The FileStatus object can be imbued with the necessary host information on the remote end and passed to the client-side in the bulk return of the listing request. A getHosts method of the FileStatus would then return the locations for the blocks comprising that file and eliminate the need for another trip to the remote filesystem. The INodeFile maintains the blocks for a file and is an obvious choice to be the originator for the locations of that file. It is also available to the FSDirectory which first creates the listing of FileStatus objects. We propose that the block locations be generated by the INodeFile to instantiate the FileStatus object during the getListing request. Our tests demonstrated a factor of 2000 speedup for approximately 60,000 input files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1968) Deprecate GridMix v1
[ https://issues.apache.org/jira/browse/MAPREDUCE-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892973#action_12892973 ] Owen O'Malley commented on MAPREDUCE-1968: -- I think we should remove both gridmix v1 and v2. It is pretty clear that v3 is the best so far. > Deprecate GridMix v1 > > > Key: MAPREDUCE-1968 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1968 > Project: Hadoop Map/Reduce > Issue Type: Task > Components: contrib/gridmix >Reporter: Ranjit Mathew > > GridMix v2 in "src/benchmarks/gridmix2" obsoletes GridMix v1 in > "src/benchmarks/gridmix". > The latter should be deprecated and then removed to reduce the clutter in the > source-tree. > One way of doing this is shown by the "hadoop" script from 0.20.xx that has > been deprecated > in favour of "mapred", for example. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892971#action_12892971 ] Owen O'Malley commented on MAPREDUCE-1288: -- {quote} (2) introduce the concept of group sharing of distributed cache files so as to avoid repetitive downloads for group shared files also. This may be a complex solution after all. {quote} This would be quite complex to get right. In particular, it is difficult to determine which group should have access. If we want to improve it, I'd suggest that we use hardlinks to give each user access to a single copy of the file.. Of course you need to ensure that they do in fact have read access to the original file. *smile* > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Critical > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch, > MR-1288-bp20-3.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-1154) Large-scale, automated test framwork for Map-Reduce
[ https://issues.apache.org/jira/browse/MAPREDUCE-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik resolved MAPREDUCE-1154. --- Resolution: Duplicate This has been addressed as HADOOP-6332 and derived work. > Large-scale, automated test framwork for Map-Reduce > --- > > Key: MAPREDUCE-1154 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1154 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: test >Reporter: Arun C Murthy > Attachments: testing.patch > > > HADOOP-6332 proposes a large-scale, automated, junit-based test-framework for > Hadoop. > This jira is meant to track relevant work to Map-Reduce. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars
[ https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892952#action_12892952 ] Paul Burkhardt commented on MAPREDUCE-1686: --- Patch has been created and submitted. > ClassNotFoundException for custom format classes provided in libjars > > > Key: MAPREDUCE-1686 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming >Affects Versions: 0.20.1 >Reporter: Paul Burkhardt >Priority: Minor > Attachments: HADOOP-1686.patch > > > The StreamUtil::goodClassOrNull method assumes user-provided classes have > package names and if not, they are part of the Hadoop Streaming package. For > example, using custom InputFormat or OutputFormat classes without package > names will fail with a ClassNotFound exception which is not indicative given > the classes are provided in the libjars option. Admittedly, most Java > packages should have a package name so this should rarely come up. > Possible resolution options: > 1) modify the error message to include the actual classname that was > attempted in the goodClassOrNull method > 2) call the Configuration::getClassByName method first and if class not found > check for default package name and try the call again > {code} > public static Class goodClassOrNull(Configuration conf, String className, > String defaultPackage) { > Class clazz = null; > try { > clazz = conf.getClassByName(className); > } catch (ClassNotFoundException cnf) { > } > if (clazz == null) { > if (className.indexOf('.') == -1 && defaultPackage != null) { > className = defaultPackage + "." + className; > try { > clazz = conf.getClassByName(className); > } catch (ClassNotFoundException cnf) { > } > } > } > return clazz; > } > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892948#action_12892948 ] Devaraj Das commented on MAPREDUCE-1288: bq. Devaraj, this corner case is exactly what Hemanth was trying to explain earlier on this ticket, starting with comment #4 above Yeah.. i realized that.. That's the reason i stuck to this jira rather than opening a new one :-) bq. As for the approach, we have two options: (1) (this seems to be what the patch is doing) for group shared files, localize them separately for each user. This is a simple solution, but sacrifices the optimization ( may not be too bad?) Yes, I am going with this for now. If needed (after we deploy this patch on our clusters and observe), we can look at proposal (2) in your comment.. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Critical > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch, > MR-1288-bp20-3.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj Das updated MAPREDUCE-1288: --- Attachment: MR-1288-bp20-3.patch Patch addressing Owen's comments. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Critical > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch, > MR-1288-bp20-3.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892916#action_12892916 ] Owen O'Malley commented on MAPREDUCE-1288: -- It looks good. I'd suggest: 1. change DistributedChache.releaseCache to pass in the current user to TrackerDistributedCacheManager.releaseCache rather than creating a new method. 2. it looks like the constructor for CacheFile can easily throw IOException instead of putting it in a RuntimeException. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Critical > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1969) Allow raid to use Reed-Solomon erasure codes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scott Chen updated MAPREDUCE-1969: -- Description: Currently raid uses one parity block per stripe which corrects one missing block on one stripe. Using Reed-Solomon code, we can add any number of parity blocks to tolerate more missing blocks. This way we can get a good file corrupt probability even if we set the replication to 1. Here are some simple comparisons: 1. No raid, replication = 3: File corruption probability = O(p^3), Storage space = 3x 2. Single parity raid with stripe size = 10, replication = 2: File corruption probability = O(p^4), Storage space = 2.2x 3. Reed-Solomon raid with parity size = 4 and stripe size = 10, replication = 1: File corruption probability = O(p^5), Storage space = 1.4x where p is the missing block probability. Reed-Solomon code can save lots of space without compromising the corruption probability. To achieve this, we need some changes to raid: 1. Add a block placement policy that knows about raid logic and do not put blocks on the same stripe on the same node. 2. Add an automatic block fixing mechanism. The block fixing will replace the replication of under replicated blocks. 3. Allow raid to use general erasure code. It is now hard coded using Xor. 4. Add a Reed-Solomon code implementation We are planing to use it on the older data only. Because setting replication = 1 hurts the data locality. was: Currently raid uses one parity block per stripe which corrects one missing block on one stripe. Using Reed-Solomon code, we can add any number of parity blocks to tolerate more missing blocks. This way we can get a good file corrupt probability even if we set the replication to 1. Here are some simple comparisons: 1. No raid, replication = 3: File corruption probability = O(p^3), Storage space = 3x 2. Signal parity raid with stripe size = 10, replication = 2: File corruption probability = O(p^4), Storage space = 2.2x 3. Reed-Solomon raid with parity size = 4 and stripe size = 10, replication = 1: File corruption probability = O(p^5), Storage space = 1.4x where p is the missing block probability. Reed-Solomon code can save lots of space without compromising the corruption probability. To achieve this, we need some changes to raid: 1. Add a block placement policy that knows about raid logic and do not put blocks on the same stripe on the same node. 2. Add an automatic block fixing mechanism. The block fixing will replace the replication of under replicated blocks. 3. Allow raid to use general erasure code. It is now hard coded using Xor. 4. Add a Reed-Solomon code implementation We are planing to use it on the older data only. Because setting replication = 1 hurts the data locality. > Allow raid to use Reed-Solomon erasure codes > > > Key: MAPREDUCE-1969 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1969 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/raid >Reporter: Scott Chen > Fix For: 0.22.0 > > > Currently raid uses one parity block per stripe which corrects one missing > block on one stripe. > Using Reed-Solomon code, we can add any number of parity blocks to tolerate > more missing blocks. > This way we can get a good file corrupt probability even if we set the > replication to 1. > Here are some simple comparisons: > 1. No raid, replication = 3: > File corruption probability = O(p^3), Storage space = 3x > 2. Single parity raid with stripe size = 10, replication = 2: > File corruption probability = O(p^4), Storage space = 2.2x > 3. Reed-Solomon raid with parity size = 4 and stripe size = 10, replication = > 1: > File corruption probability = O(p^5), Storage space = 1.4x > where p is the missing block probability. > Reed-Solomon code can save lots of space without compromising the corruption > probability. > To achieve this, we need some changes to raid: > 1. Add a block placement policy that knows about raid logic and do not put > blocks on the same stripe on the same node. > 2. Add an automatic block fixing mechanism. The block fixing will replace the > replication of under replicated blocks. > 3. Allow raid to use general erasure code. It is now hard coded using Xor. > 4. Add a Reed-Solomon code implementation > We are planing to use it on the older data only. > Because setting replication = 1 hurts the data locality. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1972) TestUserLogCleanup test cant clean up the toBeDeleted
TestUserLogCleanup test cant clean up the toBeDeleted - Key: MAPREDUCE-1972 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1972 Project: Hadoop Map/Reduce Issue Type: Bug Environment: http://hudson.zones.apache.org/hudson/view/Mapreduce/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/300 Reporter: Giridharan Kesavan All the hudson patch test builds are failing as the Mapreduce-Patch-h4.grid.sp2.yahoo.net/trunk/build/test/logs/userlogs has a folder created by the following test which doesnt seem to have read permission. Running org.apache.hadoop.mapred.TestUserLogCleanup [exec] [junit] 2010-07-14 22:24:54,027 INFO mapred.UserLogCleaner (UserLogCleaner.java:markJobLogsForDeletion(174)) - Adding job_test_0001 for user-log deletion with retainTimeStamp:720 ... [exec] [junit] 2010-07-14 22:24:54,373 WARN util.MRAsyncDiskService (MRAsyncDiskService.java:run(214)) - Failure in deletion of toBeDeleted/2010-07-14_22-24-54.372_6 on /grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h4.grid.sp2.yahoo.net/trunk/build/test/logs/userlogs with original name job_20100714203911410_0002 [exec] [junit] 2010-07-14 22:24:54,374 WARN util.MRAsyncDiskService (MRAsyncDiskService.java:run(214)) - Failure in deletion of toBeDeleted/2010-07-14_22-24-54.373_7 on /grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h4.grid.sp2.yahoo.net/trunk/build/test/logs/userlogs with original name job_test_0003 [exec] [junit] 2010-07-14 22:24:54,391 WARN util.MRAsyncDiskService (MRAsyncDiskService.java:run(214)) - Failure in deletion of toBeDeleted/2010-07-14_22-24-54.372_6 on /grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h4.grid.sp2.yahoo.net/trunk/build/test/logs/userlogs with original name /grid/0/hudson/hudson-slave/workspace/Mapreduce-Patch-h4.grid.sp2.yahoo.net/trunk/build/test/logs/userlogs/toBeDeleted/2010-07-14_22-24-54.372_6 [exec] [junit] 2010-07-14 22:24:54,405 INFO mapred.UserLogCleaner (UserLogCleaner.java:markJobLogsForDeletion(174)) - Adding job_test_0001 for user-log deletion with retainTimeStamp:720 .. [exec] [junit] 2010-07-14 22:24:54,441 WARN util.MRAsyncDiskService (MRAsyncDiskService.java:run(214)) - Failure in deletion of toBeDeleted/2010-07-14_22-2 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1964) Running hi Ram jobs when TTs are blacklisted
[ https://issues.apache.org/jira/browse/MAPREDUCE-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Rajagopalan updated MAPREDUCE-1964: -- Attachment: hiRam_bList_y20_1.patch implemented vinay's comments > Running hi Ram jobs when TTs are blacklisted > > > Key: MAPREDUCE-1964 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1964 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Balaji Rajagopalan > Attachments: hiRam_bList_y20.patch, hiRam_bList_y20_1.patch > > > More slots are getting reserved for HiRAM job tasks then required > Blacklist more than 25% TTs across the job. Run high ram job. No > java.lang.RuntimeException should be displayed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1971) herriot automation system test case for verification of bug fix to jobhistory
[ https://issues.apache.org/jira/browse/MAPREDUCE-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Rajagopalan updated MAPREDUCE-1971: -- Attachment: concurrent_exp_y20_1.patch Implemented iyappans comment. > herriot automation system test case for verification of bug fix to jobhistory > - > > Key: MAPREDUCE-1971 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1971 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Environment: herriot >Reporter: Balaji Rajagopalan > Attachments: concurrent_exp_y20.patch, concurrent_exp_y20_1.patch > > > Run a few jobs and check the job history page . Job history information > should be displayed properly . > Analyze a running job . The values shown in the page should be correct . > Concurrently access jobs in job history page . No exception should be thrown. > In the developed herriot test case accesses the job tracker tracker directly, > the jsp page access does the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1971) herriot automation system test case for verification of bug fix to jobhistory
[ https://issues.apache.org/jira/browse/MAPREDUCE-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892726#action_12892726 ] Iyappan Srinivasan commented on MAPREDUCE-1971: --- Remove the extra comment "//{" in runSleepJob. Code looks good otherwise. > herriot automation system test case for verification of bug fix to jobhistory > - > > Key: MAPREDUCE-1971 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1971 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Environment: herriot >Reporter: Balaji Rajagopalan > Attachments: concurrent_exp_y20.patch > > > Run a few jobs and check the job history page . Job history information > should be displayed properly . > Analyze a running job . The values shown in the page should be correct . > Concurrently access jobs in job history page . No exception should be thrown. > In the developed herriot test case accesses the job tracker tracker directly, > the jsp page access does the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI
[ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892724#action_12892724 ] Vinod K V commented on MAPREDUCE-1288: -- Devaraj, this corner case is exactly what Hemanth was trying to explain earlier on this ticket, starting with comment #4 above :) As for the approach, we have two options: (1) (this seems to be what the patch is doing) for group shared files, localize them separately for each user. This is a simple solution, but sacrifices the optimization ( may not be too bad?) (2) introduce the concept of group sharing of distributed cache files so as to avoid repetitive downloads for group shared files also. This may be a complex solution after all. > DistributedCache localizes only once per cache URI > -- > > Key: MAPREDUCE-1288 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: distributed-cache, security, tasktracker >Affects Versions: 0.21.0 >Reporter: Devaraj Das >Priority: Critical > Attachments: MR-1288-bp20-1.patch, MR-1288-bp20-2.patch > > > As part of the file localization the distributed cache localizer creates a > copy of the file in the corresponding user's private directory. The > localization in DistributedCache assumes the key as the URI of the cachefile > and if it already exists in the map, the localization is not done again. This > means that another user cannot access the same distributed cache file. We > should change the key to include the username so that localization is done > for every user. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1971) herriot automation system test case for verification of bug fix to jobhistory
[ https://issues.apache.org/jira/browse/MAPREDUCE-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Rajagopalan updated MAPREDUCE-1971: -- Attachment: concurrent_exp_y20.patch First patch for y20. > herriot automation system test case for verification of bug fix to jobhistory > - > > Key: MAPREDUCE-1971 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1971 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Environment: herriot >Reporter: Balaji Rajagopalan > Attachments: concurrent_exp_y20.patch > > > Run a few jobs and check the job history page . Job history information > should be displayed properly . > Analyze a running job . The values shown in the page should be correct . > Concurrently access jobs in job history page . No exception should be thrown. > In the developed herriot test case accesses the job tracker tracker directly, > the jsp page access does the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (MAPREDUCE-1971) herriot automation system test case for verification of bug fix to jobhistory
herriot automation system test case for verification of bug fix to jobhistory - Key: MAPREDUCE-1971 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1971 Project: Hadoop Map/Reduce Issue Type: New Feature Environment: herriot Reporter: Balaji Rajagopalan Run a few jobs and check the job history page . Job history information should be displayed properly . Analyze a running job . The values shown in the page should be correct . Concurrently access jobs in job history page . No exception should be thrown. In the developed herriot test case accesses the job tracker tracker directly, the jsp page access does the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1834) TestSimulatorDeterministicReplay timesout on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated MAPREDUCE-1834: -- Status: Patch Available (was: Open) > TestSimulatorDeterministicReplay timesout on trunk > -- > > Key: MAPREDUCE-1834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/mumak >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Hong Tang > Attachments: MAPREDUCE-1834.patch, mr-1834-20100727.patch, > TestSimulatorDeterministicReplay.log > > > TestSimulatorDeterministicReplay timesout on trunk. > See hudson patch build > http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/testReport/org.apache.hadoop.mapred/TestSimulatorDeterministicReplay/testMain/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1834) TestSimulatorDeterministicReplay timesout on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giridharan Kesavan updated MAPREDUCE-1834: -- Status: Open (was: Patch Available) > TestSimulatorDeterministicReplay timesout on trunk > -- > > Key: MAPREDUCE-1834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/mumak >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Hong Tang > Attachments: MAPREDUCE-1834.patch, mr-1834-20100727.patch, > TestSimulatorDeterministicReplay.log > > > TestSimulatorDeterministicReplay timesout on trunk. > See hudson patch build > http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/testReport/org.apache.hadoop.mapred/TestSimulatorDeterministicReplay/testMain/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1253) Making Mumak work with Capacity-Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1253: - Priority: Major (was: Minor) > Making Mumak work with Capacity-Scheduler > - > > Key: MAPREDUCE-1253 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1253 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/mumak >Affects Versions: 0.21.0, 0.22.0 >Reporter: Anirban Dasgupta >Assignee: Anirban Dasgupta > Attachments: MAPREDUCE-1253-20100406.patch, > MAPREDUCE-1253-20100726-2.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > In order to make the capacity-scheduler work in the mumak simulation > environment, we have to replace the job-initialization threads of the > capacity scheduler with classes that perform event-based initialization. We > propose to use aspectj to disable the threads of the JobInitializationPoller > class used by the Capacity Scheduler, and then perform the corresponding > initialization tasks through a simulation job-initialization class that > receives periodic wake-up calls from the simulator engine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1253) Making Mumak work with Capacity-Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892690#action_12892690 ] Hong Tang commented on MAPREDUCE-1253: -- All mumak unit tests passed on my local machine after applying patch mr-1834-20100727.patch from MAPREDUCE-1834. > Making Mumak work with Capacity-Scheduler > - > > Key: MAPREDUCE-1253 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1253 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/mumak >Affects Versions: 0.21.0, 0.22.0 >Reporter: Anirban Dasgupta >Assignee: Anirban Dasgupta >Priority: Minor > Attachments: MAPREDUCE-1253-20100406.patch, > MAPREDUCE-1253-20100726-2.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > In order to make the capacity-scheduler work in the mumak simulation > environment, we have to replace the job-initialization threads of the > capacity scheduler with classes that perform event-based initialization. We > propose to use aspectj to disable the threads of the JobInitializationPoller > class used by the Capacity Scheduler, and then perform the corresponding > initialization tasks through a simulation job-initialization class that > receives periodic wake-up calls from the simulator engine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1253) Making Mumak work with Capacity-Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1253: - Attachment: MAPREDUCE-1253-20100726-2.patch > Making Mumak work with Capacity-Scheduler > - > > Key: MAPREDUCE-1253 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1253 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: contrib/mumak >Affects Versions: 0.21.0, 0.22.0 >Reporter: Anirban Dasgupta >Assignee: Anirban Dasgupta >Priority: Minor > Attachments: MAPREDUCE-1253-20100406.patch, > MAPREDUCE-1253-20100726-2.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > In order to make the capacity-scheduler work in the mumak simulation > environment, we have to replace the job-initialization threads of the > capacity scheduler with classes that perform event-based initialization. We > propose to use aspectj to disable the threads of the JobInitializationPoller > class used by the Capacity Scheduler, and then perform the corresponding > initialization tasks through a simulation job-initialization class that > receives periodic wake-up calls from the simulator engine. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1834) TestSimulatorDeterministicReplay timesout on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892684#action_12892684 ] Hong Tang commented on MAPREDUCE-1834: -- It is a bit tricky to intercept the constructor of ConcurrentHashMap because there is no ConcurrentLinkedHashMap. I implemented a faked concurrent hash map which does not support concurrency (ok because mumak runs as a single thread) and uses a LinkedHashMap for internal storage. > TestSimulatorDeterministicReplay timesout on trunk > -- > > Key: MAPREDUCE-1834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/mumak >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Hong Tang > Attachments: MAPREDUCE-1834.patch, mr-1834-20100727.patch, > TestSimulatorDeterministicReplay.log > > > TestSimulatorDeterministicReplay timesout on trunk. > See hudson patch build > http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/testReport/org.apache.hadoop.mapred/TestSimulatorDeterministicReplay/testMain/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1834) TestSimulatorDeterministicReplay timesout on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1834: - Status: Patch Available (was: Open) > TestSimulatorDeterministicReplay timesout on trunk > -- > > Key: MAPREDUCE-1834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/mumak >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Hong Tang > Attachments: MAPREDUCE-1834.patch, mr-1834-20100727.patch, > TestSimulatorDeterministicReplay.log > > > TestSimulatorDeterministicReplay timesout on trunk. > See hudson patch build > http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/testReport/org.apache.hadoop.mapred/TestSimulatorDeterministicReplay/testMain/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-1834) TestSimulatorDeterministicReplay timesout on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang updated MAPREDUCE-1834: - Attachment: mr-1834-20100727.patch > TestSimulatorDeterministicReplay timesout on trunk > -- > > Key: MAPREDUCE-1834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/mumak >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Hong Tang > Attachments: MAPREDUCE-1834.patch, mr-1834-20100727.patch, > TestSimulatorDeterministicReplay.log > > > TestSimulatorDeterministicReplay timesout on trunk. > See hudson patch build > http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/testReport/org.apache.hadoop.mapred/TestSimulatorDeterministicReplay/testMain/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (MAPREDUCE-1834) TestSimulatorDeterministicReplay timesout on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Tang reassigned MAPREDUCE-1834: Assignee: Hong Tang > TestSimulatorDeterministicReplay timesout on trunk > -- > > Key: MAPREDUCE-1834 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1834 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/mumak >Affects Versions: 0.21.0 >Reporter: Amareshwari Sriramadasu >Assignee: Hong Tang > Attachments: MAPREDUCE-1834.patch, mr-1834-20100727.patch, > TestSimulatorDeterministicReplay.log > > > TestSimulatorDeterministicReplay timesout on trunk. > See hudson patch build > http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/216/testReport/org.apache.hadoop.mapred/TestSimulatorDeterministicReplay/testMain/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.