[jira] [Created] (MAPREDUCE-4063) make TaggedInputSplit public class for development of MultipleInput of other DB Products extension

2012-03-25 Thread Muddy Dixon (Created) (JIRA)
make TaggedInputSplit public class for development of MultipleInput of other DB 
Products extension
--

 Key: MAPREDUCE-4063
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4063
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.1
Reporter: Muddy Dixon
Priority: Minor


In Trunk, org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit is not public 
class.

This prevents to develop other MultipleInput of DB products extension.

I make workaround file 
https://github.com/muddydixon/mongo-hadoop/blob/develop/multipleinputs/core/src/main/java/org/apache/hadoop/mapreduce/lib/input/TaggedInputSplitGenerator.java

So unless a reason, TaggedInputSplit should be public 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4063) make TaggedInputSplit public class for development of MultipleInput of other DB Products extension

2012-03-25 Thread Muddy Dixon (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Muddy Dixon updated MAPREDUCE-4063:
---

Attachment: MAPREDUCE-4063.txt

 make TaggedInputSplit public class for development of MultipleInput of other 
 DB Products extension
 --

 Key: MAPREDUCE-4063
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4063
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.23.1
Reporter: Muddy Dixon
Priority: Minor
  Labels: newbie
 Attachments: MAPREDUCE-4063.txt

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 In Trunk, org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit is not 
 public class.
 This prevents to develop other MultipleInput of DB products extension.
 I make workaround file 
 https://github.com/muddydixon/mongo-hadoop/blob/develop/multipleinputs/core/src/main/java/org/apache/hadoop/mapreduce/lib/input/TaggedInputSplitGenerator.java
 So unless a reason, TaggedInputSplit should be public 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3060) Generic shuffle service

2012-03-25 Thread Avner BenHanoch (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237899#comment-13237899
 ] 

Avner BenHanoch commented on MAPREDUCE-3060:


Please see MAPREDUCE-4049.

 Generic shuffle service
 ---

 Key: MAPREDUCE-3060
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3060
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 0.23.0
Reporter: Luke Lu
  Labels: shuffle
 Fix For: 0.24.0


 When I was talking to Owen about MAPREDUCE-2600, we came across (again, 
 talked about it with Chris before) the shuffle dependency issue. NodeManager 
 currently has an implicit (hidden by the service plugin mechanism) dependency 
 of a specific version of mapreduce shuffle. While this works in many cases, 
 as long as we don't change shuffle headers and the usage of mapred security 
 tokens, it's a hack to make things work none the less. It's generally agreed 
 upon that nodemanager should only load generic services that are mapreduce 
 framework neutral.
 In this particular case, the right solution seems to be a generic shuffle 
 handler that can serve data for a particular partition securely. The 
 ShuffleHandler currently only depends on mapreduce for task tokens and 
 shuffle header, which is only used for writing data, i.e., the shuffle 
 handler has no semantic dependency on mapreduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4064) Job History Link in RM UI is redirecting to the URL which contains Job Id twice

2012-03-25 Thread Devaraj K (Created) (JIRA)
Job History Link in RM UI is redirecting to the URL which contains Job Id twice
---

 Key: MAPREDUCE-4064
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4064
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.23.1
Reporter: Devaraj K
Assignee: Devaraj K


{code:xml}
http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4064) Job History Link in RM UI is redirecting to the URL which contains Job Id twice

2012-03-25 Thread Devaraj K (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj K updated MAPREDUCE-4064:
-

Component/s: mrv2

 Job History Link in RM UI is redirecting to the URL which contains Job Id 
 twice
 ---

 Key: MAPREDUCE-4064
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4064
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 0.23.1
Reporter: Devaraj K
Assignee: Devaraj K

 {code:xml}
 http://xx.x.x.x:19888/jobhistory/job/job_1332435449546_0001/jobhistory/job/job_1332435449546_0001
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4065) Add .proto files to built tarball

2012-03-25 Thread Ralph H Castain (Created) (JIRA)
Add .proto files to built tarball
-

 Key: MAPREDUCE-4065
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4065
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.23.2
Reporter: Ralph H Castain
 Fix For: 0.23.3


Please add the .proto files to the built tarball so that users can build 3rd 
party tools that use protocol buffers without having to do an svn checkout of 
the source code.

Sorry I don't know more about Maven, or I would provide a patch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3540) saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common)

2012-03-25 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237975#comment-13237975
 ] 

Bikas Saha commented on MAPREDUCE-3540:
---

Is the unix version of whoami not available on Cygwin? Looks like you are 
trying to convert the Windows whoami CR/LF to Unix.

 saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common)
 --

 Key: MAPREDUCE-3540
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3540
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.24.0
Reporter: Alejandro Abdelnur
 Fix For: 0.24.0

 Attachments: MAPREDUCE-3540.patch


 {code}
 [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec 
 (generate-version) on project hadoop-yarn-common: Comman
 d execution failed. Cannot run program scripts\saveVersion.sh (in directory 
 C:\cygwin\home\tucu\src\hadoop\hadoop-mapreduce-proje
 ct\hadoop-yarn\hadoop-yarn-common): CreateProcess error=2, The system cannot 
 find the file specified - [Help 1]
 [ERROR]
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3540) saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common)

2012-03-25 Thread Mostafa Elhemali (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237988#comment-13237988
 ] 

Mostafa Elhemali commented on MAPREDUCE-3540:
-

There is a version there but it outputs the CR/LF. Note that this is a 
copy-paste of the same workaround in 
hadoop-common-project/hadoop-common/dev-support/saveVersion.sh

 saveVersion.sh script fails in windows/cygwin (hadoop-yarn-common)
 --

 Key: MAPREDUCE-3540
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3540
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Affects Versions: 0.24.0
Reporter: Alejandro Abdelnur
 Fix For: 0.24.0

 Attachments: MAPREDUCE-3540.patch


 {code}
 [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec 
 (generate-version) on project hadoop-yarn-common: Comman
 d execution failed. Cannot run program scripts\saveVersion.sh (in directory 
 C:\cygwin\home\tucu\src\hadoop\hadoop-mapreduce-proje
 ct\hadoop-yarn\hadoop-yarn-common): CreateProcess error=2, The system cannot 
 find the file specified - [Help 1]
 [ERROR]
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4039) Sort Avoidance

2012-03-25 Thread Schubert Zhang (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238028#comment-13238028
 ] 

Schubert Zhang commented on MAPREDUCE-4039:
---

Patch is available by Anty, someone to have a review? 

 Sort Avoidance
 --

 Key: MAPREDUCE-4039
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4039
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mrv2
Affects Versions: 0.23.2
Reporter: anty.rao
Priority: Minor
 Fix For: 0.23.2

 Attachments: MAPREDUCE-4039-branch-0.23.2.patch


 Inspired by 
 [Tenzing|http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/37200.pdf],
  in 5.1 MapReduce Enhanceemtns:
 {quote}*Sort Avoidance*. Certain operators such as hash join
 and hash aggregation require shuffling, but not sorting. The
 MapReduce API was enhanced to automatically turn off
 sorting for these operations. When sorting is turned off, the
 mapper feeds data to the reducer which directly passes the
 data to the Reduce() function bypassing the intermediate
 sorting step. This makes many SQL operators significantly
 more ecient.{quote}
 There are a lot of applications which need aggregation only, not 
 sorting.Using sorting to achieve aggregation is costly and inefficient. 
 Without sorting, up application can make use of hash table or hash map to do 
 aggregation efficiently.But application should bear in mind that reduce 
 memory is limited, itself is committed to manage memory of reduce, guard 
 against out of memory. Map-side combiner is not supported, you can also do 
 hash aggregation in map side  as a workaround.
 the following is the main points of sort avoidance implementation
 # add a configuration parameter ??mapreduce.sort.avoidance??, boolean type, 
 to turn on/off sort avoidance workflow.Two type of workflow are coexist 
 together.
 # key/value pairs emitted by map function is sorted by partition only, using 
 a more efficient sorting algorithm: counting sort.
 # map-side merge, use a kind of byte merge, which just concatenate bytes from 
 generated spills, read in bytes, write out bytes, without overhead of 
 key/value serialization/deserailization, comparison, which current version 
 incurs.
 # reduce can start up as soon as there is any map output available, in 
 contrast to sort workflow which must wait until all map outputs are fetched 
 and merged.
 # map output in memory can be directly consumed by reduce.When reduce can't 
 catch up with the speed of incoming map outputs, in-memory merge thread will 
 kick in, merging in-memory map outputs onto disk.
 # sequentially read in on-disk files to feed reduce, in contrast to currently 
 implementation which read multiple files concurrently, result in many disk 
 seek. Map output in memory take precedence over on disk files in feeding 
 reduce function.
 I have already implement this feature based on hadoop CDH3U3 and done some 
 performance evaluation, you can reference to 
 [https://github.com/hanborq/hadoop] for details. Now,I'm willing to port it 
 into yarn. Welcome for commenting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3353) Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes

2012-03-25 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-3353:
-

  Resolution: Fixed
   Fix Version/s: (was: 0.23.2)
  0.23.3
Target Version/s: 0.23.3  (was: 0.23.2)
  Status: Resolved  (was: Patch Available)

I just committed this. Thanks Bikas!

 Need a RM-AM channel to inform AMs about faulty/unhealthy/lost nodes
 -

 Key: MAPREDUCE-3353
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3353
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster, mrv2, resourcemanager
Affects Versions: 0.23.0
Reporter: Vinod Kumar Vavilapalli
Assignee: Bikas Saha
 Fix For: 0.23.3

 Attachments: MAPREDUCE-3353-branch-0.23.patch, 
 MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, 
 MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch, 
 MAPREDUCE-3353-branch-0.23.patch, MAPREDUCE-3353-branch-0.23.patch


 When a node gets lost or turns faulty, AM needs to know about that event so 
 that it can take some action like for e.g. re-executing map tasks whose 
 intermediate output live on that faulty node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira