[jira] Commented: (MAPREDUCE-1764) FairScheduler locality delay may put heavy pressure on Jobtracker

2010-05-13 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867424#action_12867424
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1764:
--

it seems better to find out why the index is not helping (assuming it's 
actually being used) rather than adding another cache on top ..

> FairScheduler locality delay may put heavy pressure on Jobtracker
> -
>
> Key: MAPREDUCE-1764
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1764
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Dmytro Molkov
> Fix For: 0.22.0
>
>
> FairScheduler locality delay feature holds the scheduling of jobs until it 
> gets good locality.
> This greatly improves the locality of the tasks. Reduce the cost of traffic.
> We have observed the following problem on FairScheduler locality delay:
> We have some machines have older data and some newly added machines do not 
> have important data.
> When these machines send heartbeat, JT scans tasks to find jobs has the right 
> locality.
> Often time, these machines will scan all of the tasks of all the jobs and do 
> not get any tasks.
> Scanning all the tasks on the JT is very costly. This makes JT very slow.
> And these machines often time do not get scheduled. This hurts the cluster 
> utilization.
> Any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1743) conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

2010-05-13 Thread luo Yi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867397#action_12867397
 ] 

luo Yi commented on MAPREDUCE-1743:
---

the following code may get the true file name from the TaggedInputSplit. 
because TaggedInputSplit is a hadoop inner class ,you should make your class in 
the org.apache.hadoop.mapred.lib classspace:

{code:title=TaggedInputSplitGetName.java|borderStyle=solid}
InputSplit is = reporter.getInputSplit();
String name = is.getClass().getName();
if ( name.compareTo("org.apache.hadoop.mapred.FileSplit") == 0 ) {
FileSplit fs = (FileSplit)is;
String path = fs.getPath().toString();
word.set(path);
output.collect(word, one);
}
if ( name.compareTo("org.apache.hadoop.mapred.lib.TaggedInputSplit") == 0 ) {
TaggedInputSplit tis = (TaggedInputSplit)is;
InputSplit iis = tis.getInputSplit();
String iname = iis.getClass().getName();
word.set(iname);
output.collect(word, one);
if ( iname.compareTo("org.apache.hadoop.mapred.FileSplit") == 0 ) {
FileSplit fs = (FileSplit)iis;
   // the path from the TaggedInputSplit should be prefixed by "convert: "
String path = "convert: " + fs.getPath().toString();
word.set(path);
output.collect(word, one);
}
}

and the output file give me : 

{noformat}
$ grep 'convert' testout/part-0 |head -n 5
convert: 
hdfs://myowndir/pt=2010051300/attempt_201003291206_327196_r_00_01
convert: 
hdfs://myowndir/pt=2010051300/attempt_201003291206_327196_r_01_01
convert: 
hdfs://myowndir/pt=2010051300/attempt_201003291206_327196_r_02_01
convert: 
hdfs://myowndir/pt=2010051300/attempt_201003291206_327196_r_03_01
convert: 
hdfs://myowndir/pt=2010051300/attempt_201003291206_327196_r_04_01
{noformat} 

you may give it a try.

{code} 

> conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 
> 0.20
> 
>
> Key: MAPREDUCE-1743
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1743
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2
>Reporter: Yuanyuan Tian
>
> There is a problem in getting the input file name in the mapper when uisng 
> MultipleInputs in Hadoop 0.20. I need to use MultipleInputs to support 
> different formats for my inputs to the my MapReduce job. And inside each 
> mapper, I also need to know the exact input file that the mapper is 
> processing. However, conf.get("map.input.file") returns null. Can anybody 
> help me solve this problem? Thanks in advance.
> public class Test extends Configured implements Tool{
>   static class InnerMapper extends MapReduceBase implements 
> Mapper
>   {
>   
>   
>   public void configure(JobConf conf)
>   {   
>   String inputName=conf.get("map.input.file"));
>   ...
>   }
>   
>   }
>   
>   public int run(String[] arg0) throws Exception {
>   JonConf job;
>   job = new JobConf(Test.class);
>   ...
>   
>   MultipleInputs.addInputPath(conf, new Path("A"), 
> TextInputFormat.class);
>   MultipleInputs.addInputPath(conf, new Path("B"), 
> SequenceFileFormat.class);
>   ...
>   }
> }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1354) Incremental enhancements to the JobTracker for better scalability

2010-05-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1354:
---

Status: Patch Available  (was: Open)

> Incremental enhancements to the JobTracker for better scalability
> -
>
> Key: MAPREDUCE-1354
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Devaraj Das
>Assignee: Dick King
>Priority: Critical
> Attachments: mapreduce-1354--2010-03-10.patch, 
> mapreduce-1354--2010-05-13.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> mr-1354-y20.patch
>
>
> It'd be nice to have the JobTracker object not be locked while accessing the 
> HDFS for reading the jobconf file and while writing the jobinfo file in the 
> submitJob method. We should see if we can avoid taking the lock altogether.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1354) Incremental enhancements to the JobTracker for better scalability

2010-05-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1354:
---

Status: Open  (was: Patch Available)

Patch looks fine.
Canceling patch to submit for hudson, as trunk compiles now.

> Incremental enhancements to the JobTracker for better scalability
> -
>
> Key: MAPREDUCE-1354
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Devaraj Das
>Assignee: Dick King
>Priority: Critical
> Attachments: mapreduce-1354--2010-03-10.patch, 
> mapreduce-1354--2010-05-13.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> mr-1354-y20.patch
>
>
> It'd be nice to have the JobTracker object not be locked while accessing the 
> HDFS for reading the jobconf file and while writing the jobinfo file in the 
> submitJob method. We should see if we can avoid taking the lock altogether.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-913) TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker

2010-05-13 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867390#action_12867390
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-913:
---

Test failure for TestNodeRefresh is not related to the patch. The test failed 
because JVM exited abnormally. The same test passed on my machine.

> TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks 
> and hung TaskTracker
> 
>
> Key: MAPREDUCE-913
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-913
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.1
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: mapreduce-913-1.patch, MAPREDUCE-913-20091119.1.txt, 
> MAPREDUCE-913-20091119.2.txt, MAPREDUCE-913-20091120.1.txt, patch-913-1.txt, 
> patch-913-2.txt, patch-913.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1018) Document changes to the memory management and scheduling model

2010-05-13 Thread Hemanth Yamijala (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemanth Yamijala reassigned MAPREDUCE-1018:
---

Assignee: Hemanth Yamijala  (was: rahul k singh)

Assigning to myself to take forward. I've started work on it, but please bear 
with slow progress.

> Document changes to the memory management and scheduling model
> --
>
> Key: MAPREDUCE-1018
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1018
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.21.0
>Reporter: Hemanth Yamijala
>Assignee: Hemanth Yamijala
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPRED-1018-1.patch, MAPRED-1018-2.patch, 
> MAPRED-1018-3.patch, MAPRED-1018-4.patch.txt, MAPRED-1018-5.patch.txt, 
> MAPRED-1018-6.patch.txt, MAPRED-1018-commons.patch
>
>
> There were changes done for the configuration, monitoring and scheduling of 
> high ram jobs. This must be documented in the mapred-defaults.xml and also on 
> forrest documentation

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1789) MapReduce trunk fails to compile following HADOOP-6600

2010-05-13 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1789:
-

Status: Resolved  (was: Patch Available)
Resolution: Invalid

This was fixed by MAPREDUCE-1539

> MapReduce trunk fails to compile following HADOOP-6600
> --
>
> Key: MAPREDUCE-1789
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1789
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1789.patch
>
>
> A few classes need updating following the change to KerberosInfo introduced 
> in HADOOP-6600

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1539) authorization checks for inter-server protocol (based on HADOOP-6600)

2010-05-13 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko resolved MAPREDUCE-1539.


Hadoop Flags: [Reviewed]
  Resolution: Fixed

I just committed this. Thank you Boris.

> authorization checks for inter-server protocol (based on HADOOP-6600)
> -
>
> Key: MAPREDUCE-1539
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1539
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1539-1.patch, MAPREDUCE-1539-2.patch, 
> MAPREDUCE-1539-3.patch, MAPREDUCE-1539-5.patch
>
>
> authorization checks for inter-server protocol (based on HADOOP-6600)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1539) authorization checks for inter-server protocol (based on HADOOP-6600)

2010-05-13 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated MAPREDUCE-1539:
--

Attachment: MAPREDUCE-1539-5.patch

> authorization checks for inter-server protocol (based on HADOOP-6600)
> -
>
> Key: MAPREDUCE-1539
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1539
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1539-1.patch, MAPREDUCE-1539-2.patch, 
> MAPREDUCE-1539-3.patch, MAPREDUCE-1539-5.patch
>
>
> authorization checks for inter-server protocol (based on HADOOP-6600)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1442) StackOverflowError when JobHistory parses a really long line

2010-05-13 Thread Dick King (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867324#action_12867324
 ] 

Dick King commented on MAPREDUCE-1442:
--

I reviewed Luke's change, and it looks correct to me.

I agree with Luke that {{trunk}} does not have this problem and does not need 
this patch or any revision of this patch.

-dk


> StackOverflowError when JobHistory parses a really long line
> 
>
> Key: MAPREDUCE-1442
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1442
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: bc Wong
>Assignee: Luke Lu
> Attachments: mr-1442-y20s-v1.patch, overflow.history
>
>
> JobHistory.parseLine() fails with StackOverflowError on a really big COUNTER 
> value, triggered via the web interface. See attached file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1785) Add streaming config option for not emitting the key

2010-05-13 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated MAPREDUCE-1785:
---

Attachment: mapreduce-1785-1.patch

Patch attached.  
* Adds stream.map.input.ignoreKey for toggling key emission. The default 
behavior is unchanged.
* Updated streaming.xml docs and added test coverage in TestStreamingKeyValue

> Add streaming config option for not emitting the key
> 
>
> Key: MAPREDUCE-1785
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1785
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/streaming
>Affects Versions: 0.22.0
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: mapreduce-1785-1.patch
>
>
> PipeMapper currently does not emit the key when using TextInputFormat. If you 
> switch to input formats (eg LzoTextInputFormat) the key will be emitted. We 
> should add an option so users can explicitly make streaming not emit the key 
> so they can change input formats without breaking or having to modify their 
> existing programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1789) MapReduce trunk fails to compile following HADOOP-6600

2010-05-13 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1789:
-

Status: Patch Available  (was: Open)

> MapReduce trunk fails to compile following HADOOP-6600
> --
>
> Key: MAPREDUCE-1789
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1789
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1789.patch
>
>
> A few classes need updating following the change to KerberosInfo introduced 
> in HADOOP-6600

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1789) MapReduce trunk fails to compile following HADOOP-6600

2010-05-13 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-1789:
-

Attachment: MAPREDUCE-1789.patch

Patch fixing compilation errors.

> MapReduce trunk fails to compile following HADOOP-6600
> --
>
> Key: MAPREDUCE-1789
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1789
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Reporter: Tom White
>Assignee: Tom White
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: MAPREDUCE-1789.patch
>
>
> A few classes need updating following the change to KerberosInfo introduced 
> in HADOOP-6600

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1789) MapReduce trunk fails to compile following HADOOP-6600

2010-05-13 Thread Tom White (JIRA)
MapReduce trunk fails to compile following HADOOP-6600
--

 Key: MAPREDUCE-1789
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1789
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build
Reporter: Tom White
Assignee: Tom White
Priority: Blocker
 Fix For: 0.21.0


A few classes need updating following the change to KerberosInfo introduced in 
HADOOP-6600

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1764) FairScheduler locality delay may put heavy pressure on Jobtracker

2010-05-13 Thread Scott Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867300#action_12867300
 ] 

Scott Chen commented on MAPREDUCE-1764:
---

Joydeep:

Matei and I had some discussion and we have also looked the code.
In JobInProgress, there is such a HashMap of node->[tasks] and rack->[tasks] 
exists.
It is not clear to me why this is so slow.

I agree with your point that we should not leave the slots idle especially in 
the case that cluster is full of jobs.

> FairScheduler locality delay may put heavy pressure on Jobtracker
> -
>
> Key: MAPREDUCE-1764
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1764
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Dmytro Molkov
> Fix For: 0.22.0
>
>
> FairScheduler locality delay feature holds the scheduling of jobs until it 
> gets good locality.
> This greatly improves the locality of the tasks. Reduce the cost of traffic.
> We have observed the following problem on FairScheduler locality delay:
> We have some machines have older data and some newly added machines do not 
> have important data.
> When these machines send heartbeat, JT scans tasks to find jobs has the right 
> locality.
> Often time, these machines will scan all of the tasks of all the jobs and do 
> not get any tasks.
> Scanning all the tasks on the JT is very costly. This makes JT very slow.
> And these machines often time do not get scheduled. This hurts the cluster 
> utilization.
> Any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1761) FairScheduler should allow separate configuration of node and rack locality wait time

2010-05-13 Thread Matei Zaharia (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867299#action_12867299
 ] 

Matei Zaharia commented on MAPREDUCE-1761:
--

Looks good, thanks!

> FairScheduler should allow separate configuration of node and rack locality 
> wait time
> -
>
> Key: MAPREDUCE-1761
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1761
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Scott Chen
>Assignee: Scott Chen
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1761-v1.1.txt, MAPREDUCE-1761-v1.2.txt, 
> MAPREDUCE-1761.txt
>
>
> It would be nice that we can separately assign rack locality wait time.
> In our use case, we would set node locality wait to zero and wait only rack 
> locality.
> I propose that we add two parameters
> mapred.fairscheduler.locality.delay.nodetorack
> mapred.fairscheduler.locality.delay.racktoany
> This allows specifying the wait time on each stage.
> And we can use
> mapred.fairscheduler.locality.delay
> as the default value of the above fields so that this is backward compatible.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1354) Incremental enhancements to the JobTracker for better scalability

2010-05-13 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1354:
-

Attachment: mapreduce-1354--2010-05-13.patch

I honored the last two comments by Amareshwari [and ignored the one he invited 
me to ginore] and this is the patch, but as I write this, {{trunk}} does not 
compile, so I'm not resubmitting this patch just yet.

Rather than taking the Big Lock, I chose to turn {{nextJobId}} into an 
{{AtomicInteger}} .

I agree that the {{ugi == null}} test is dead.

When {{trunk}} comes to build I'll test this patch and Submit it.

-dk


> Incremental enhancements to the JobTracker for better scalability
> -
>
> Key: MAPREDUCE-1354
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1354
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Devaraj Das
>Assignee: Dick King
>Priority: Critical
> Attachments: mapreduce-1354--2010-03-10.patch, 
> mapreduce-1354--2010-05-13.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> MAPREDUCE-1354_yhadoop20.patch, MAPREDUCE-1354_yhadoop20.patch, 
> mr-1354-y20.patch
>
>
> It'd be nice to have the JobTracker object not be locked while accessing the 
> HDFS for reading the jobconf file and while writing the jobinfo file in the 
> submitJob method. We should see if we can avoid taking the lock altogether.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1539) authorization checks for inter-server protocol (based on HADOOP-6600)

2010-05-13 Thread Boris Shkolnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867258#action_12867258
 ] 

Boris Shkolnik commented on MAPREDUCE-1539:
---

ran tests manually all passed.

> authorization checks for inter-server protocol (based on HADOOP-6600)
> -
>
> Key: MAPREDUCE-1539
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1539
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1539-1.patch, MAPREDUCE-1539-2.patch, 
> MAPREDUCE-1539-3.patch
>
>
> authorization checks for inter-server protocol (based on HADOOP-6600)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1788) o.a.h.mapreduce.Job shouldn't make a copy of the JobConf

2010-05-13 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867256#action_12867256
 ] 

Aaron Kimball commented on MAPREDUCE-1788:
--

Related: MAPREDUCE-1486

> o.a.h.mapreduce.Job shouldn't make a copy of the JobConf
> 
>
> Key: MAPREDUCE-1788
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1788
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.21.0
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
>Priority: Blocker
>
> Having o.a.h.mapreduce.Job make a copy of the passed in JobConf has several 
> issues: any modifications done by various pieces such as InputSplit etc. are 
> not reflected back and causes issues for frameworks built on top.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1788) o.a.h.mapreduce.Job shouldn't make a copy of the JobConf

2010-05-13 Thread Arun C Murthy (JIRA)
o.a.h.mapreduce.Job shouldn't make a copy of the JobConf


 Key: MAPREDUCE-1788
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1788
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 0.21.0
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Blocker


Having o.a.h.mapreduce.Job make a copy of the passed in JobConf has several 
issues: any modifications done by various pieces such as InputSplit etc. are 
not reflected back and causes issues for frameworks built on top.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1781) option "-D mapred.tasktracker.map.tasks.maximum=1" does not work when no of mappers is bigger than no of nodes - always spawns 2 mapers/node

2010-05-13 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867205#action_12867205
 ] 

Hemanth Yamijala commented on MAPREDUCE-1781:
-

bq. - is it possible to specify that I want 4 mappers/processors or am I 
limited to a static value at the startup of Hadoop?

The configuration per tasktracker can be different for each node, in general. 
However, that makes managing configurations much harder. Does that work for you 
now though ?

bq. which parameters are set at startup and which at job runtime.

OK. Possibly you should file a JIRA asking for this to be explained. But the 
general rule of thumb is that configurations whose names contain the names of 
daemons like 'tasktracker' will be start-up only parameters. Configurations 
whose names contain 'job' or 'task' can be overridden per job.

> option "-D mapred.tasktracker.map.tasks.maximum=1" does not work when no of 
> mappers is bigger than no of nodes - always spawns 2 mapers/node
> 
>
> Key: MAPREDUCE-1781
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1781
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.20.2
> Environment: Debian Lenny x64, and Hadoop 0.20.2, 2GB RAM
>Reporter: Tudor Vlad
>
> Hello
> I am a new user of Hadoop and I have some trouble using Hadoop Streaming and 
> the "-D mapred.tasktracker.map.tasks.maximum" option. 
> I'm experimenting with an unmanaged application (C++) which I want to run 
> over several nodes in 2 scenarios
> 1) the number of maps (input splits) is equal to the number of nodes
> 2) the number of maps is a multiple of the number of nodes (5, 10, 20, ...
> Initially, when running the tests in scenario 1 I would sometimes get 2 
> process/node on half the nodes. However I fixed this by adding the optin "-D 
> mapred.tasktracker.map.tasks.maximum=1", so everything works fine.
> In the case of scenario 2 (more maps than nodes) this directive no longer 
> works, always obtaining 2 processes/node. I tested the even with putting 
> maximum=5 and I still get 2 processes/node.
> The entire command I use is:
> /usr/bin/time --format="-duration:\t%e |\t-MFaults:\t%F 
> |\t-ContxtSwitch:\t%w" \
>  /opt/hadoop/bin/hadoop jar 
> /opt/hadoop/contrib/streaming/hadoop-0.20.2-streaming.jar \
>  -D mapred.tasktracker.map.tasks.maximum=1 \
>  -D mapred.map.tasks=30 \
>  -D mapred.reduce.tasks=0 \
>  -D io.file.buffer.size=5242880 \
>  -libjars "/opt/hadoop/contrib/streaming/hadoop-7debug.jar" \
>  -input input/test \
>  -output out1 \
>  -mapper "/opt/jobdata/script_1k" \
>  -inputformat "me.MyInputFormat"
> Why is this happening and how can I make it work properly (i.e. be able to 
> limit exactly how many mappers I can have at 1 time per node)?
> Thank you in advance

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1442) StackOverflowError when JobHistory parses a really long line

2010-05-13 Thread Luke Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867204#action_12867204
 ] 

Luke Lu commented on MAPREDUCE-1442:


Amarsri, my test case for the old parser exploits details of the old file 
format and weaknesses of Java's backtracking NFA regex implementation.  The new 
implementation in trunk uses the standard json format and a mature json parser 
(jackson) with about 700 tests. It'll be counterproductive for me to add any 
tests to have any material impact to the test coverage of the new parser.

> StackOverflowError when JobHistory parses a really long line
> 
>
> Key: MAPREDUCE-1442
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1442
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: bc Wong
>Assignee: Luke Lu
> Attachments: mr-1442-y20s-v1.patch, overflow.history
>
>
> JobHistory.parseLine() fails with StackOverflowError on a really big COUNTER 
> value, triggered via the web interface. See attached file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1787) Remove verbose logging from the Groups class

2010-05-13 Thread Boris Shkolnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867202#action_12867202
 ] 

Boris Shkolnik commented on MAPREDUCE-1787:
---

moved by mistake. moving back to COMMON.

> Remove verbose logging from the Groups class
> 
>
> Key: MAPREDUCE-1787
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1787
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Boris Shkolnik
> Attachments: HADOOP-6598-BP20-Fix.patch, HADOOP-6598-BP20.patch, 
> HADOOP-6598.patch
>
>
> {quote}
> 2010-02-25 08:30:52,269 INFO  security.Groups (Groups.java:(60)) - 
> Group m
> apping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; 
> cacheTimeout
> =30
> ...
> 2010-02-25 08:30:57,872 INFO  security.Groups (Groups.java:getGroups(76)) - 
> Retu
> rning cached groups for 'oom'
> {quote}
> should both be demoted to debug level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Moved: (MAPREDUCE-1787) Remove verbose logging from the Groups class

2010-05-13 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik moved HADOOP-6598 to MAPREDUCE-1787:
---

Project: Hadoop Map/Reduce  (was: Hadoop Common)
Key: MAPREDUCE-1787  (was: HADOOP-6598)

> Remove verbose logging from the Groups class
> 
>
> Key: MAPREDUCE-1787
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1787
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Boris Shkolnik
> Attachments: HADOOP-6598-BP20-Fix.patch, HADOOP-6598-BP20.patch, 
> HADOOP-6598.patch
>
>
> {quote}
> 2010-02-25 08:30:52,269 INFO  security.Groups (Groups.java:(60)) - 
> Group m
> apping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; 
> cacheTimeout
> =30
> ...
> 2010-02-25 08:30:57,872 INFO  security.Groups (Groups.java:getGroups(76)) - 
> Retu
> rning cached groups for 'oom'
> {quote}
> should both be demoted to debug level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-913) TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker

2010-05-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867133#action_12867133
 ] 

Hadoop QA commented on MAPREDUCE-913:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12444388/patch-913-2.txt
  against trunk revision 943372.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/185/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/185/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/185/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/185/console

This message is automatically generated.

> TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks 
> and hung TaskTracker
> 
>
> Key: MAPREDUCE-913
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-913
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.1
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: mapreduce-913-1.patch, MAPREDUCE-913-20091119.1.txt, 
> MAPREDUCE-913-20091119.2.txt, MAPREDUCE-913-20091120.1.txt, patch-913-1.txt, 
> patch-913-2.txt, patch-913.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1741) Automate the test scenario of job related files are moved from history directory to done directory

2010-05-13 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1741:
--

Attachment: TestJobHistoryLocation.patch

Patch added after deleting the test scenario which adds 100 files in the done 
directory and then checking if the history files are still moved to the done 
directory. Reason being, it does not add value to the functionality. Discussed 
with Sharad about this.

> Automate the test scenario of  job related files are moved from history 
> directory to done directory
> ---
>
> Key: MAPREDUCE-1741
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1741
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
> Fix For: 0.22.0
>
> Attachments: TestJobHistoryLocation.patch, 
> TestJobHistoryLocation.patch, TestJobHistoryLocation.patch
>
>
> Job related files are moved from history directory to done directory, when
> 1) Job succeeds
> 2) Job is killed
> 3) When 100 files are put in the done directory
> 4) When multiple jobs are completed at the same time, some successful, some 
> failed.
> Also, two files, conf.xml and job files should be present in the done 
> directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1677) Test scenario for a distributed cache file behaviour when the file is private

2010-05-13 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1677:
--

Attachment: TestDistributedCachePrivateFile.patch

This addresses a sceanrio, when the user who submits the job is different from 
the user who started the jobtracker/tasktracker daemon. In that case the 
directory and file permissions will differ.

> Test scenario for a distributed cache file behaviour  when the file is private
> --
>
> Key: MAPREDUCE-1677
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1677
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.22.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 
> TEST-org.apache.hadoop.mapred.TestDistributedCachePrivateFile.txt, 
> TestDistributedCachePrivateFile.patch, TestDistributedCachePrivateFile.patch, 
> TestDistributedCachePrivateFile.patch, TestDistributedCachePrivateFile.patch, 
> TestDistributedCachePrivateFile.patch
>
>
>  Verify the Distributed Cache functionality.
>  This test scenario is for a distributed cache file behaviour  when the file 
> is private. Once a job uses a distributed 
> cache file with private permissions that file is stored in the  
> mapred.local.dir, under the directory which has the same name 
>  as job submitter's username. The directory has 700 permission  and the file 
> under it, should have 777 permissions. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-913) TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker

2010-05-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-913:
--

Status: Patch Available  (was: Open)

> TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks 
> and hung TaskTracker
> 
>
> Key: MAPREDUCE-913
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-913
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.1
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: mapreduce-913-1.patch, MAPREDUCE-913-20091119.1.txt, 
> MAPREDUCE-913-20091119.2.txt, MAPREDUCE-913-20091120.1.txt, patch-913-1.txt, 
> patch-913-2.txt, patch-913.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-913) TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks and hung TaskTracker

2010-05-13 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-913:
--

Attachment: patch-913-2.txt

Patch updated to trunk

> TaskRunner crashes with NPE resulting in held up slots, UNINITIALIZED tasks 
> and hung TaskTracker
> 
>
> Key: MAPREDUCE-913
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-913
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.1
>Reporter: Vinod K V
>Assignee: Amareshwari Sriramadasu
>Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: mapreduce-913-1.patch, MAPREDUCE-913-20091119.1.txt, 
> MAPREDUCE-913-20091119.2.txt, MAPREDUCE-913-20091120.1.txt, patch-913-1.txt, 
> patch-913-2.txt, patch-913.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1713) Utilities for system tests specific.

2010-05-13 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1713:
-

Attachment: 1713-ydist-security.patch

Now I understood you point and moved those two methods into JTClient. Uploaded 
the latest patch by addressing all the comments.

> Utilities for system tests specific.
> 
>
> Key: MAPREDUCE-1713
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1713
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, 1713-ydist-security.patch, 
> 1713-ydist-security.patch, systemtestutils_MR1713.patch, 
> utilsforsystemtest_1713.patch
>
>
> 1.  A method for restarting  the daemon with new configuration.
>   public static  void restartCluster(Hashtable props, String 
> confFile) throws Exception;
> 2.  A method for resetting the daemon with default configuration.
>   public void resetCluster() throws Exception;
> 3.  A method for waiting until daemon to stop.
>   public  void waitForClusterToStop() throws Exception;
> 4.  A method for waiting until daemon to start.
>   public  void waitForClusterToStart() throws Exception;
> 5.  A method for checking the job whether it has started or not.
>   public boolean isJobStarted(JobID id) throws IOException;
> 6.  A method for checking the task whether it has started or not.
>   public boolean isTaskStarted(TaskInfo taskInfo) throws IOException;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.