[jira] [Created] (MAPREDUCE-4063) make TaggedInputSplit public class for development of MultipleInput of other DB Products extension
make TaggedInputSplit public class for development of MultipleInput of other DB Products extension -- Key: MAPREDUCE-4063 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4063 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.1 Reporter: Muddy Dixon Priority: Minor In Trunk, org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit is not public class. This prevents to develop other MultipleInput of DB products extension. I make workaround file https://github.com/muddydixon/mongo-hadoop/blob/develop/multipleinputs/core/src/main/java/org/apache/hadoop/mapreduce/lib/input/TaggedInputSplitGenerator.java So unless a reason, TaggedInputSplit should be public -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4063) make TaggedInputSplit public class for development of MultipleInput of other DB Products extension
[ https://issues.apache.org/jira/browse/MAPREDUCE-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muddy Dixon updated MAPREDUCE-4063: --- Attachment: MAPREDUCE-4063.txt make TaggedInputSplit public class for development of MultipleInput of other DB Products extension -- Key: MAPREDUCE-4063 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4063 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.23.1 Reporter: Muddy Dixon Priority: Minor Labels: newbie Attachments: MAPREDUCE-4063.txt Original Estimate: 0.5h Remaining Estimate: 0.5h In Trunk, org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit is not public class. This prevents to develop other MultipleInput of DB products extension. I make workaround file https://github.com/muddydixon/mongo-hadoop/blob/develop/multipleinputs/core/src/main/java/org/apache/hadoop/mapreduce/lib/input/TaggedInputSplitGenerator.java So unless a reason, TaggedInputSplit should be public -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3060) Generic shuffle service
[ https://issues.apache.org/jira/browse/MAPREDUCE-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237899#comment-13237899 ] Avner BenHanoch commented on MAPREDUCE-3060: Please see MAPREDUCE-4049. Generic shuffle service --- Key: MAPREDUCE-3060 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3060 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Affects Versions: 0.23.0 Reporter: Luke Lu Labels: shuffle Fix For: 0.24.0 When I was talking to Owen about MAPREDUCE-2600, we came across (again, talked about it with Chris before) the shuffle dependency issue. NodeManager currently has an implicit (hidden by the service plugin mechanism) dependency of a specific version of mapreduce shuffle. While this works in many cases, as long as we don't change shuffle headers and the usage of mapred security tokens, it's a hack to make things work none the less. It's generally agreed upon that nodemanager should only load generic services that are mapreduce framework neutral. In this particular case, the right solution seems to be a generic shuffle handler that can serve data for a particular partition securely. The ShuffleHandler currently only depends on mapreduce for task tokens and shuffle header, which is only used for writing data, i.e., the shuffle handler has no semantic dependency on mapreduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4064) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: MAPREDUCE-4064 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4064 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.23.1 Reporter: Devaraj K Assignee: Devaraj K {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4064) Job History Link in RM UI is redirecting to the URL which contains Job Id twice
[ https://issues.apache.org/jira/browse/MAPREDUCE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Devaraj K updated MAPREDUCE-4064: - Component/s: mrv2 Job History Link in RM UI is redirecting to the URL which contains Job Id twice --- Key: MAPREDUCE-4064 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4064 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 0.23.1 Reporter: Devaraj K Assignee: Devaraj K {code:xml} http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001 {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4065) Add .proto files to built tarball
Add .proto files to built tarball - Key: MAPREDUCE-4065 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4065 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.23.2 Reporter: Ralph H Castain Fix For: 0.23.3 Please add the .proto files to the built tarball so that users can build 3rd party tools that use protocol buffers without having to do an svn checkout of the source code. Sorry I don't know more about Maven, or I would provide a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3540) saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common)
[ https://issues.apache.org/jira/browse/MAPREDUCE-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237975#comment-13237975 ] Bikas Saha commented on MAPREDUCE-3540: --- Is the unix version of whoami not available on Cygwin? Looks like you are trying to convert the Windows whoami CR/LF to Unix. saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common) -- Key: MAPREDUCE-3540 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3540 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.24.0 Reporter: Alejandro Abdelnur Fix For: 0.24.0 Attachments: MAPREDUCE-3540.patch {code} [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (generate-version) on project hadoop-yarn-common: Comman d execution failed. Cannot run program scripts\saveVersion.sh (in directory C:\cygwin\home\tucu\src\hadoop\hadoop-mapreduce-proje ct\hadoop-yarn\hadoop-yarn-common): CreateProcess error=2, The system cannot find the file specified - [Help 1] [ERROR] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3540) saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common)
[ https://issues.apache.org/jira/browse/MAPREDUCE-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237988#comment-13237988 ] Mostafa Elhemali commented on MAPREDUCE-3540: - There is a version there but it outputs the CR/LF. Note that this is a copy-paste of the same workaround in hadoop-common-project/hadoop-common/dev-support/saveVersion.sh saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common) -- Key: MAPREDUCE-3540 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3540 Project: Hadoop Map/Reduce Issue Type: Bug Components: build Affects Versions: 0.24.0 Reporter: Alejandro Abdelnur Fix For: 0.24.0 Attachments: MAPREDUCE-3540.patch {code} [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec (generate-version) on project hadoop-yarn-common: Comman d execution failed. Cannot run program scripts\saveVersion.sh (in directory C:\cygwin\home\tucu\src\hadoop\hadoop-mapreduce-proje ct\hadoop-yarn\hadoop-yarn-common): CreateProcess error=2, The system cannot find the file specified - [Help 1] [ERROR] {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4039) Sort Avoidance
[ https://issues.apache.org/jira/browse/MAPREDUCE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238028#comment-13238028 ] Schubert Zhang commented on MAPREDUCE-4039: --- Patch is available by Anty, someone to have a review? Sort Avoidance -- Key: MAPREDUCE-4039 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4039 Project: Hadoop Map/Reduce Issue Type: New Feature Components: mrv2 Affects Versions: 0.23.2 Reporter: anty.rao Priority: Minor Fix For: 0.23.2 Attachments: MAPREDUCE-4039-branch-0.23.2.patch Inspired by [Tenzing|http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37200.pdf], in 5.1 MapReduce Enhanceemtns: {quote}*Sort Avoidance*. Certain operators such as hash join and hash aggregation require shuffling, but not sorting. The MapReduce API was enhanced to automatically turn off sorting for these operations. When sorting is turned off, the mapper feeds data to the reducer which directly passes the data to the Reduce() function bypassing the intermediate sorting step. This makes many SQL operators significantly more ecient.{quote} There are a lot of applications which need aggregation only, not sorting.Using sorting to achieve aggregation is costly and inefficient. Without sorting, up application can make use of hash table or hash map to do aggregation efficiently.But application should bear in mind that reduce memory is limited, itself is committed to manage memory of reduce, guard against out of memory. Map-side combiner is not supported, you can also do hash aggregation in map side as a workaround. the following is the main points of sort avoidance implementation # add a configuration parameter ??mapreduce.sort.avoidance??, boolean type, to turn on/off sort avoidance workflow.Two type of workflow are coexist together. # key/value pairs emitted by map function is sorted by partition only, using a more efficient sorting algorithm: counting sort. # map-side merge, use a kind of byte merge, which just concatenate bytes from generated spills, read in bytes, write out bytes, without overhead of key/value serialization/deserailization, comparison, which current version incurs. # reduce can start up as soon as there is any map output available, in contrast to sort workflow which must wait until all map outputs are fetched and merged. # map output in memory can be directly consumed by reduce.When reduce can't catch up with the speed of incoming map outputs, in-memory merge thread will kick in, merging in-memory map outputs onto disk. # sequentially read in on-disk files to feed reduce, in contrast to currently implementation which read multiple files concurrently, result in many disk seek. Map output in memory take precedence over on disk files in feeding reduce function. I have already implement this feature based on hadoop CDH3U3 and done some performance evaluation, you can reference to [https://github.com/hanborq/hadoop] for details. Now,I'm willing to port it into yarn. Welcome for commenting. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
[ https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-3353: - Resolution: Fixed Fix Version/s: (was: 0.23.2) 0.23.3 Target Version/s: 0.23.3 (was: 0.23.2) Status: Resolved (was: Patch Available) I just committed this. Thanks Bikas! Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes - Key: MAPREDUCE-3353 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster, mrv2, resourcemanager Affects Versions: 0.23.0 Reporter: Vinod Kumar Vavilapalli Assignee: Bikas Saha Fix For: 0.23.3 Attachments: MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch When a node gets lost or turns faulty, AM needs to know about that event so that it can take some action like for e.g. re-executing map tasks whose intermediate output live on that faulty node. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira