[jira] Updated: (MAPREDUCE-802) Simplify the job updated event notification between Jobtracker and schedulers

2009-08-07 Thread Sreekanth Ramakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreekanth Ramakrishnan updated MAPREDUCE-802:
-

Attachment: eventmodel-1.patch

Attaching the patch which makes changes in the event model as described in the 
[comment|https://issues.apache.org/jira/browse/MAPREDUCE-802?focusedCommentId=12738226page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12738226]

I have introduced {{JobSchedulingInfoIndex}} for removal based on the  old 
{{JobSchedulingInfo}} as I thought the update of the jobs are happening with 
{{JobTracker}} lock.

 Simplify the job updated event notification between Jobtracker and schedulers
 -

 Key: MAPREDUCE-802
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-802
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Hemanth Yamijala
Assignee: Sreekanth Ramakrishnan
 Attachments: eventmodel-1.patch


 HADOOP-4053 and HADOOP-4149 added events to take care of updates to the state 
 / property of a job like the run state / priority of a job notified to the 
 scheduler. We've seen some issues with this framework, such as the following:
 - Events are not raised correctly at all places. If a new code path is added 
 to kill a job, raising events is missed out.
 - Events are raised with incorrect event data. For e.g. typically start time 
 value is missed out.
 The resulting contract break between jobtracker and schedulers has lead to 
 problems in the capacity scheduler where jobs remain stuck in the queue 
 without being ever removed and so on.
 It has proven complicated to get this right in the framework and fixes have 
 typically still left dangling cases. Or new code paths introduce new bugs.
 This JIRA is about trying to simplify the interaction model so that it is 
 more robust and works well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-832) Too man y WARN messages about deprecated memorty config variables in JobTacker log

2009-08-07 Thread Karam Singh (JIRA)
Too man y WARN messages about deprecated memorty config variables in JobTacker 
log
--

 Key: MAPREDUCE-832
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-832
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Karam Singh


When user submit a mapred job using old memory config vairiable 
(mapred.task.maxmem) followinig message too many times in JobTracker logs -:
[
WARN org.apache.hadoop.mapred.JobConf: The variable mapred.task.maxvmem is no 
longer used instead use  mapred.job.map.memory.mb and 
mapred.job.reduce.memory.mb
]


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-833) Jobclient does not print any warning message when old memory config variable used with -D option from command line

2009-08-07 Thread Karam Singh (JIRA)
Jobclient does not print any warning message when old memory config variable 
used with -D option from command line
--

 Key: MAPREDUCE-833
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-833
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Karam Singh




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-834) When TaskTracker config use old memory management values its memory monitoring is diabled.

2009-08-07 Thread Karam Singh (JIRA)
When TaskTracker config use old memory management values its memory monitoring 
is diabled.
--

 Key: MAPREDUCE-834
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-834
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Karam Singh


TaskTracker memory config values -:
mapred.tasktracker.vmem.reserved=8589934592
mapred.task.default.maxvmem=2147483648
mapred.task.limit.maxvmem=4294967296
mapred.tasktracker.pmem.reserved=2147483648
TaskTracker start as -:
   2009-08-05 12:39:03,308 WARN 
org.apache.hadoop.mapred.TaskTracker: The variable 
mapred.tasktracker.vmem.reserved is no longer used
2009-08-05 12:39:03,308 WARN 
org.apache.hadoop.mapred.TaskTracker: The variable 
mapred.tasktracker.pmem.reserved is no longer used
2009-08-05 12:39:03,308 WARN 
org.apache.hadoop.mapred.TaskTracker: The variable mapred.task.default.maxvmem 
is no longer used
2009-08-05 12:39:03,308 WARN 
org.apache.hadoop.mapred.TaskTracker: The variable mapred.task.limit.maxvmem is 
no longer used
2009-08-05 12:39:03,308 INFO 
org.apache.hadoop.mapred.TaskTracker: Starting thread: Map-events fetcher for 
all reduce tasks on tracker_name
2009-08-05 12:39:03,309 INFO 
org.apache.hadoop.mapred.TaskTracker:  Using MemoryCalculatorPlugin : 
org.apache.hadoop.util.linuxmemorycalculatorplu...@19be4777
2009-08-05 12:39:03,311 WARN 
org.apache.hadoop.mapred.TaskTracker: TaskTracker's totalMemoryAllottedForTasks 
is -1. TaskMemoryManager is disabled.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (MAPREDUCE-796) Encountered ClassCastException on tasktracker while running wordcount with MultithreadedMapRunner

2009-08-07 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat reopened MAPREDUCE-796:
--


 Encountered ClassCastException on tasktracker while running wordcount with 
 MultithreadedMapRunner
 ---

 Key: MAPREDUCE-796
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-796
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.1
Reporter: Suman Sehgal
Assignee: Amar Kamat

 ClassCastException for OutOfMemoryError is encountered on tasktracker while 
 running wordcount example with MultithreadedMapRunner. 
 Stack trace :
 =
 java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be cast to 
 java.lang.RuntimeException
   at 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:149)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:581)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-796) Encountered ClassCastException on tasktracker while running wordcount with MultithreadedMapRunner

2009-08-07 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat reassigned MAPREDUCE-796:


Assignee: Amar Kamat

 Encountered ClassCastException on tasktracker while running wordcount with 
 MultithreadedMapRunner
 ---

 Key: MAPREDUCE-796
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-796
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.1
Reporter: Suman Sehgal
Assignee: Amar Kamat

 ClassCastException for OutOfMemoryError is encountered on tasktracker while 
 running wordcount example with MultithreadedMapRunner. 
 Stack trace :
 =
 java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be cast to 
 java.lang.RuntimeException
   at 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:149)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:581)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-796) Encountered ClassCastException on tasktracker while running wordcount with MultithreadedMapRunner

2009-08-07 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-796:
-

Attachment: MAPREDUCE-796-v1.0.patch

Attaching a simple fix.

 Encountered ClassCastException on tasktracker while running wordcount with 
 MultithreadedMapRunner
 ---

 Key: MAPREDUCE-796
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-796
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.1
Reporter: Suman Sehgal
Assignee: Amar Kamat
 Attachments: MAPREDUCE-796-v1.0.patch


 ClassCastException for OutOfMemoryError is encountered on tasktracker while 
 running wordcount example with MultithreadedMapRunner. 
 Stack trace :
 =
 java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be cast to 
 java.lang.RuntimeException
   at 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:149)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:581)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-832) Too many WARN messages about deprecated memorty config variables in JobTacker log

2009-08-07 Thread Karam Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karam Singh updated MAPREDUCE-832:
--

Summary: Too many WARN messages about deprecated memorty config variables 
in JobTacker log  (was: Too man y WARN messages about deprecated memorty config 
variables in JobTacker log)

 Too many WARN messages about deprecated memorty config variables in JobTacker 
 log
 -

 Key: MAPREDUCE-832
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-832
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.1
Reporter: Karam Singh

 When user submit a mapred job using old memory config vairiable 
 (mapred.task.maxmem) followinig message too many times in JobTracker logs -:
 [
 WARN org.apache.hadoop.mapred.JobConf: The variable mapred.task.maxvmem is no 
 longer used instead use  mapred.job.map.memory.mb and 
 mapred.job.reduce.memory.mb
 ]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-796) Encountered ClassCastException on tasktracker while running wordcount with MultithreadedMapRunner

2009-08-07 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740493#action_12740493
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-796:
---

+1

 Encountered ClassCastException on tasktracker while running wordcount with 
 MultithreadedMapRunner
 ---

 Key: MAPREDUCE-796
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-796
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.1
Reporter: Suman Sehgal
Assignee: Amar Kamat
 Attachments: MAPREDUCE-796-v1.0.patch


 ClassCastException for OutOfMemoryError is encountered on tasktracker while 
 running wordcount example with MultithreadedMapRunner. 
 Stack trace :
 =
 java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be cast to 
 java.lang.RuntimeException
   at 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:149)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:581)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-767) to remove mapreduce dependency on commons-cli2

2009-08-07 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740496#action_12740496
 ] 

Amar Kamat commented on MAPREDUCE-767:
--

Tested this patch with examples mentioned in [streaming 
docs|http://hadoop.apache.org/common/docs/r0.20.0/streaming.html]. All cases 
seem to pass. Doing further testing.

 to remove mapreduce dependency on commons-cli2
 --

 Key: MAPREDUCE-767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-767
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/streaming
Reporter: Giridharan Kesavan
Assignee: Amar Kamat
 Attachments: MAPREDUCE-767-v1.1.patch


 mapreduce, streaming and eclipse plugin depends on common-cli2 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-779) Add node health failures into JobTrackerStatistics

2009-08-07 Thread Sreekanth Ramakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreekanth Ramakrishnan updated MAPREDUCE-779:
-

Status: Patch Available  (was: Open)

 Add node health failures into JobTrackerStatistics
 --

 Key: MAPREDUCE-779
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-779
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Reporter: Sreekanth Ramakrishnan
Assignee: Sreekanth Ramakrishnan
 Attachments: mapreduce-779-1.patch, mapreduce-779-2.patch, 
 mapreduce-779-3.patch, mapreduce-779-4.patch


 Add the node health failure counts into {{JobTrackerStatistics}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-814) Move completed Job history files to HDFS

2009-08-07 Thread Sharad Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sharad Agarwal updated MAPREDUCE-814:
-

Attachment: 814_v5.patch

Incorporated Devaraj's offline comments. Minimized the jobtracker init changes. 
Passing filesystem handle in JobHistory#getJobHistoryFileName

 Move completed Job history files to HDFS
 

 Key: MAPREDUCE-814
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-814
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Reporter: Sharad Agarwal
Assignee: Sharad Agarwal
 Attachments: 814_v1.patch, 814_v2.patch, 814_v3.patch, 814_v4.patch, 
 814_v5.patch


 Currently completed job history files remain on the jobtracker node. Having 
 the files available on HDFS will enable clients to access these files more 
 easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

2009-08-07 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-370:
--

Attachment: patch-370.txt

Attaching an early patch.

Patch does the following:
1. Adds an api in org.apache.hadoop.mapreduce.lib.output.FileOutputFormat to 
get RecordWriter by taking the filename. Current api does not support passing a 
filename.

2. Adds org.apache.hadoop.mapreduce.lib.output.MultipleOutputs with following 
api :
{code}
public class MultipleOutputsKEYOUT, VALUEOUT  {

  public MultipleOutputs(TaskInputOutputContext context);

   // Adds a named output for the job.
  public static void addNamedOutput(Job job, String namedOutput,
  Class? extends FileOutputFormat outputFormatClass,
  Class? keyClass, Class? valueClass) ;

  // Enables counters for named outputs
  public static void setCountersEnabled(Job job, boolean enabled);

  // Write to a named output. 
  // write to an output file name that depends on key, value, context and 
namedoutput
  // gets the record writer from output format added for the named output 
  public K,V void write(String namedOutput, K key, V value)
  throws IOException, InterruptedException;

  // Writes to  an output file name that depends on key, value and context
  // gets the record writer from job's outputformat.  
  //Job's output format should be a FileOutputFormat. 
  public  void write(KEYOUT key, VALUEOUT value) 
  throws IOException, InterruptedException;

  protected K,VString generateOutputName(K  key, V value,
  TaskAttemptContext context, String name);

  protected K,V K generateActualKey(K key, V value) ;
  protected K,V V generateActualValue(K key, V value);
{code}

User can add namedOutputs and corresponding OutputFormat, Output key/value 
types using addNamedOutput. 
generateOutputName api can be overridden by the user to give final output name. 
This gives the complete control of the output name to the user. Generating 
unique file-name can done once user gives this name (can be done in framework 
it self) as done in the patch. This facilitates the available counter feature 
to count the number of records written to each output name. The same method can 
be used to plug-in the functionality of multiNamedOutputs.

I illustrated using the api, in the added test-case. 

3. Deprecates org.apache.hadoop.mapred.lib.Multiple*Output*



 Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
 ---

 Key: MAPREDUCE-370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-370.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-796) Encountered ClassCastException on tasktracker while running wordcount with MultithreadedMapRunner

2009-08-07 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das resolved MAPREDUCE-796.
---

   Resolution: Fixed
Fix Version/s: 0.20.1
 Hadoop Flags: [Reviewed]

I just committed this. Thanks, Amar!

 Encountered ClassCastException on tasktracker while running wordcount with 
 MultithreadedMapRunner
 ---

 Key: MAPREDUCE-796
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-796
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.1
Reporter: Suman Sehgal
Assignee: Amar Kamat
 Fix For: 0.20.1

 Attachments: MAPREDUCE-796-v1.0.patch


 ClassCastException for OutOfMemoryError is encountered on tasktracker while 
 running wordcount example with MultithreadedMapRunner. 
 Stack trace :
 =
 java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be cast to 
 java.lang.RuntimeException
   at 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:149)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:581)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-750) Extensible ConnManager factory API

2009-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740511#action_12740511
 ] 

Hadoop QA commented on MAPREDUCE-750:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12415690/MAPREDUCE-750.2.patch
  against trunk revision 801517.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/451/console

This message is automatically generated.

 Extensible ConnManager factory API
 --

 Key: MAPREDUCE-750
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-750
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-750.2.patch, MAPREDUCE-750.patch


 Sqoop uses the ConnFactory class to instantiate a ConnManager implementation 
 based on the connect string and other arguments supplied by the user. This 
 allows per-database logic to be encapsulated in different ConnManager 
 instances, and dynamically chosen based on which database the user is 
 actually importing from. But adding new ConnManager implementations requires 
 modifying the source of a common ConnFactory class. An indirection layer 
 should be used to delegate instantiation to a number of factory 
 implementations which can be specified in the static configuration or at 
 runtime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

2009-08-07 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740509#action_12740509
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-370:
---

bq. To achieve this, I think we could port MultipleOutputs, and change the 
semantics of getCollector() in the multi name case, so that the multi name is 
the full name of the name of the output file. This method is typically invoked 
in the reduce() method, where the key and value are available, and can be used 
to form the name.
Tom, are you saying that we should not have a protected method to 
generateOutputName(), which could be overridden to give the functionality. If 
so, we should have a way to find out whether it is namedOutput (i meant 
multiNamedOutputs) or an arbitrary name, to know which output format should be 
used for writing.
We should have something like :
{code}
  public K,V void write(String namedOutput, String outputPath, K key, V value)
  throws IOException, InterruptedException;
  public K,V void write(String outputPath, K key, V value)
  throws IOException, InterruptedException;
{code}

bq. Applications that want to add a unique suffix can call 
FileOutputFormat#getUniqueFile() themselves.
This should be done by the framework to support counters as  explained earlier.

 Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
 ---

 Key: MAPREDUCE-370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Attachments: patch-370.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-478) separate jvm param for mapper and reducer

2009-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740513#action_12740513
 ] 

Hadoop QA commented on MAPREDUCE-478:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12415805/MAPREDUCE-478_1_20090806_yhadoop20.patch
  against trunk revision 801954.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 19 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/452/console

This message is automatically generated.

 separate jvm param for mapper and reducer
 -

 Key: MAPREDUCE-478
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-478
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Koji Noguchi
Assignee: Arun C Murthy
Priority: Minor
 Fix For: 0.21.0

 Attachments: HADOOP-5684_0_20090420.patch, 
 MAPREDUCE-478_0_20090804.patch, MAPREDUCE-478_0_20090804_yhadoop20.patch, 
 MAPREDUCE-478_1_20090806.patch, MAPREDUCE-478_1_20090806_yhadoop20.patch


 Memory footprint of mapper and reducer can differ. 
 It would be nice if we can pass different jvm param (mapred.child.java.opts) 
 for mappers and reducers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-375) Change org.apache.hadoop.mapred.lib.NLineInputFormat and org.apache.hadoop.mapred.MapFileOutputFormat to use new api.

2009-08-07 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-375:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Amareshwari!

  Change org.apache.hadoop.mapred.lib.NLineInputFormat and 
 org.apache.hadoop.mapred.MapFileOutputFormat to use new api.
 --

 Key: MAPREDUCE-375
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-375
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-375-1.txt, patch-375-2.txt, patch-375.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-757) JobConf will not be deleted from the logs folder if job retires from finalizeJob()

2009-08-07 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-757:
-

Attachment: MAPREDUCE-757-v2.0-branch-0.20.patch

Attaching a patch for branch 0.20.

 JobConf will not be deleted from the logs folder if job retires from 
 finalizeJob()
 --

 Key: MAPREDUCE-757
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-757
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amar Kamat
Assignee: Amar Kamat
 Attachments: MAPREDUCE-757-v1.0.patch, 
 MAPREDUCE-757-v2.0-branch-0.20.patch, MAPREDUCE-757-v2.0.patch


 MAPREDUCE-130 fixed the case where the job is retired from the retire jobs 
 thread. But jobs can also retire when the num-job-per-user limit is exceeded. 
 In such cases the conf file will not be deleted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-835) hadoop-mapred examples,test and tools jar iles are being packaged when ant binary or bin-package

2009-08-07 Thread Karam Singh (JIRA)
hadoop-mapred examples,test and tools jar iles are being packaged when ant 
binary or bin-package


 Key: MAPREDUCE-835
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-835
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Karam Singh


When checking mapreduce trunk.
If run ant binary or ant bin-package commands-:
hadoop-mapred-test-0.21.0-dev.jar, hadoop-mapred-examples-0.21.0-dev.jar, 
hadoop-mapred-tools-0.21.0-dev.jar are being in tar or 
build/hadoop-mapred-0.21.0-dev packe directory. But they present under build 
directory.

For ant tar and ant package they are being packaged correclty. 
buid/hadoop-mapred-0.21.0-dev directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-836) Examples of hadoop pipes a even when -Dcompile.native=yes -Dcompile.c++=yes option are used while running ant package or tar or similar commands.

2009-08-07 Thread Karam Singh (JIRA)
Examples of hadoop pipes a even when -Dcompile.native=yes -Dcompile.c++=yes 
option are used while running ant package or tar or similar commands.
-

 Key: MAPREDUCE-836
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-836
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.1, 0.21.0
Reporter: Karam Singh


Examples of hadoop pies and python are not packed even when 
-Dcompile.native=yes -Dcompile.c++=yes option are used while running ant 
package or tar or similar commands. 
The pipes examples are compiled and copied under build/c++-examples but are not 
being packaged. Similar is case with python examples also.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-805) Deadlock in Jobtracker

2009-08-07 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-805:
-

Attachment: MAPREDUCE-805-v1.7.patch

Attaching a patch incorporating Devaraj's offline comments. Result of 
test-patch 
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 21 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


 Deadlock in Jobtracker
 --

 Key: MAPREDUCE-805
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-805
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Michael Tamm
 Attachments: MAPREDUCE-805-v1.1.patch, MAPREDUCE-805-v1.2.patch, 
 MAPREDUCE-805-v1.3.patch, MAPREDUCE-805-v1.6.patch, MAPREDUCE-805-v1.7.patch


 We are running a hadoop cluster (version 0.20.0) and have detected the 
 following deadlock on our jobtracker:
 {code}
 IPC Server handler 51 on 9001:
   at 
 org.apache.hadoop.mapred.JobInProgress.getCounters(JobInProgress.java:943)
   - waiting to lock 0x7f2b6fb46130 (a 
 org.apache.hadoop.mapred.JobInProgress)
   at 
 org.apache.hadoop.mapred.JobTracker.getJobCounters(JobTracker.java:3102)
   - locked 0x7f2b5f026000 (a org.apache.hadoop.mapred.JobTracker)
   at sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
  pool-1-thread-2:
   at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:2017)
   - waiting to lock 0x7f2b5f026000 (a 
 org.apache.hadoop.mapred.JobTracker)
   at 
 org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2483)
   - locked 0x7f2b6fb46130 (a org.apache.hadoop.mapred.JobInProgress)
   at 
 org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:2152)
   - locked 0x7f2b6fb46130 (a org.apache.hadoop.mapred.JobInProgress)
   at 
 org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:2169)
   - locked 0x7f2b6fb46130 (a org.apache.hadoop.mapred.JobInProgress)
   at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2245)
   - locked 0x7f2b6fb46130 (a org.apache.hadoop.mapred.JobInProgress)
   at 
 org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:86)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-479) Add reduce ID to shuffle clienttrace

2009-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740584#action_12740584
 ] 

Hadoop QA commented on MAPREDUCE-479:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12415747/MAPREDUCE-479-4.patch
  against trunk revision 801959.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/453/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/453/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/453/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/453/console

This message is automatically generated.

 Add reduce ID to shuffle clienttrace
 

 Key: MAPREDUCE-479
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-479
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 0.21.0
Reporter: Jiaqi Tan
Assignee: Jiaqi Tan
Priority: Minor
 Fix For: 0.21.0

 Attachments: HADOOP-6013.patch, MAPREDUCE-479-1.patch, 
 MAPREDUCE-479-2.patch, MAPREDUCE-479-3.patch, MAPREDUCE-479-4.patch, 
 MAPREDUCE-479.patch


 Current clienttrace messages from shuffles note only the destination map ID 
 but not the source reduce ID. Having both source and destination ID of each 
 shuffle enables full tracing of execution. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-798) MRUnit should be able to test a succession of MapReduce passes

2009-08-07 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740609#action_12740609
 ] 

Aaron Kimball commented on MAPREDUCE-798:
-

test failures are in streaming

 MRUnit should be able to test a succession of MapReduce passes
 --

 Key: MAPREDUCE-798
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-798
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-798.2.patch, MAPREDUCE-798.patch


 MRUnit can currently test that the inputs to a given (mapper, reducer) job 
 produce certain outputs at the end of the reducer. It would be good to 
 support more end-to-end tests of a series of MapReduce jobs that form a 
 longer pipeline surrounding some data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-814) Move completed Job history files to HDFS

2009-08-07 Thread Sharad Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740613#action_12740613
 ] 

Sharad Agarwal commented on MAPREDUCE-814:
--

test patch and ant test passed.

 Move completed Job history files to HDFS
 

 Key: MAPREDUCE-814
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-814
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Reporter: Sharad Agarwal
Assignee: Sharad Agarwal
 Attachments: 814_v1.patch, 814_v2.patch, 814_v3.patch, 814_v4.patch, 
 814_v5.patch


 Currently completed job history files remain on the jobtracker node. Having 
 the files available on HDFS will enable clients to access these files more 
 easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-796) Encountered ClassCastException on tasktracker while running wordcount with MultithreadedMapRunner

2009-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740658#action_12740658
 ] 

Hudson commented on MAPREDUCE-796:
--

Integrated in Hadoop-Mapreduce-trunk #41 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/41/])
. Fixes a ClassCastException in an exception log in MultiThreadedMapRunner. 
Contributed by Amar Kamat.


 Encountered ClassCastException on tasktracker while running wordcount with 
 MultithreadedMapRunner
 ---

 Key: MAPREDUCE-796
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-796
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples
Affects Versions: 0.20.1
Reporter: Suman Sehgal
Assignee: Amar Kamat
 Fix For: 0.20.1

 Attachments: MAPREDUCE-796-v1.0.patch


 ClassCastException for OutOfMemoryError is encountered on tasktracker while 
 running wordcount example with MultithreadedMapRunner. 
 Stack trace :
 =
 java.lang.ClassCastException: java.lang.OutOfMemoryError cannot be cast to 
 java.lang.RuntimeException
   at 
 org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:149)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:581)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-375) Change org.apache.hadoop.mapred.lib.NLineInputFormat and org.apache.hadoop.mapred.MapFileOutputFormat to use new api.

2009-08-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740657#action_12740657
 ] 

Hudson commented on MAPREDUCE-375:
--

Integrated in Hadoop-Mapreduce-trunk #41 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/41/])
. Change org.apache.hadoop.mapred.lib.NLineInputFormat and 
org.apache.hadoop.mapred.MapFileOutputFormat to use new api. Contributed by 
Amareshwari Sriramadasu.


  Change org.apache.hadoop.mapred.lib.NLineInputFormat and 
 org.apache.hadoop.mapred.MapFileOutputFormat to use new api.
 --

 Key: MAPREDUCE-375
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-375
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-375-1.txt, patch-375-2.txt, patch-375.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-837) harchive fail when output directory has URI with default port of 8020

2009-08-07 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740692#action_12740692
 ] 

Koji Noguchi commented on MAPREDUCE-837:


hadoop archive -archiveName abc.har /user/knoguchi/abc 
hdfs://mynamenode:8020/user/knoguchi

in 0.18, job fails with
{noformat}
09/08/07 19:41:57 INFO mapred.JobClient: Task Id :
attempt_200908071938_0001_m_00_2, Status : FAILED
Failed to rename output with the exception: java.io.IOException: Can not get the
relative path: base =
hdfs://mynamenode:8020/user/knoguchi/abc.har/_temporary/_attempt_200908071938_0001_m_00_2
child =
hdfs://mynamenode/user/knoguchi/abc.har/_temporary/_attempt_200908071938_0001_m_00_2/part-0
at org.apache.hadoop.mapred.Task.getFinalPath(Task.java:590)
at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:603)
at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:621)
at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:565)
at
org.apache.hadoop.mapred.JobTracker$TaskCommitQueue.run(JobTracker.java:2616)
{noformat}

in 0.20, it logs the above warning but job succeeds with empty output directory.
(which is worse)

I'll create a separate Jira for the 0.20 job succeeding part.



 harchive fail when output directory has URI with default port of 8020
 -

 Key: MAPREDUCE-837
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-837
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Affects Versions: 0.20.1
Reporter: Koji Noguchi
Priority: Minor

 % hadoop archive -archiveName abc.har /user/knoguchi/abc 
 hdfs://mynamenode:8020/user/knoguchi
 doesn't work on 0.18 nor 0.20

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-838) Task succeeds even when committer.commitTask fails with IOException

2009-08-07 Thread Koji Noguchi (JIRA)
Task succeeds even when committer.commitTask fails with IOException
---

 Key: MAPREDUCE-838
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-838
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Affects Versions: 0.20.1
Reporter: Koji Noguchi


In MAPREDUCE-837, job succeeded with empty output even though all the tasks 
were throwing IOException at commiter.commitTask.

{noformat}
2009-08-07 17:51:47,458 INFO org.apache.hadoop.mapred.TaskRunner: Task 
attempt_200907301448_8771_r_00_0 is allowed to commit now
2009-08-07 17:51:47,466 WARN org.apache.hadoop.mapred.TaskRunner: Failure 
committing: java.io.IOException: Can not get the relative path: \
base = 
hdfs://mynamenode:8020/user/knoguchi/test2.har/_temporary/_attempt_200907301448_8771_r_00_0
 \
child = 
hdfs://mynamenode/user/knoguchi/test2.har/_temporary/_attempt_200907301448_8771_r_00_0/_index
  at 
org.apache.hadoop.mapred.FileOutputCommitter.getFinalPath(FileOutputCommitter.java:150)
  at 
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:106)
  at 
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:126)
  at 
org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:86)
  at 
org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:171)
  at org.apache.hadoop.mapred.Task.commit(Task.java:768)
  at org.apache.hadoop.mapred.Task.done(Task.java:692)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
  at org.apache.hadoop.mapred.Child.main(Child.java:170)

2009-08-07 17:51:47,468 WARN org.apache.hadoop.mapred.TaskRunner: Failure 
asking whether task can commit: java.io.IOException: \
Can not get the relative path: base = 
hdfs://mynamenode:8020/user/knoguchi/test2.har/_temporary/_attempt_200907301448_8771_r_00_0
 \
child = 
hdfs://mynamenode/user/knoguchi/test2.har/_temporary/_attempt_200907301448_8771_r_00_0/_index
  at 
org.apache.hadoop.mapred.FileOutputCommitter.getFinalPath(FileOutputCommitter.java:150)
  at 
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:106)
  at 
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:126)
  at 
org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:86)
  at 
org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:171)
  at org.apache.hadoop.mapred.Task.commit(Task.java:768)
  at org.apache.hadoop.mapred.Task.done(Task.java:692)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
  at org.apache.hadoop.mapred.Child.main(Child.java:170)

2009-08-07 17:51:47,469 INFO org.apache.hadoop.mapred.TaskRunner: Task 
attempt_200907301448_8771_r_00_0 is allowed to commit now
2009-08-07 17:51:47,472 INFO org.apache.hadoop.mapred.TaskRunner: Task 
'attempt_200907301448_8771_r_00_0' done.


{noformat}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-837) harchive fail when output directory has URI with default port of 8020

2009-08-07 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740697#action_12740697
 ] 

Koji Noguchi commented on MAPREDUCE-837:


bq. I'll create a separate Jira for the 0.20 job succeeding part.

Created MAPREDUCE-838

 harchive fail when output directory has URI with default port of 8020
 -

 Key: MAPREDUCE-837
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-837
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: harchive
Affects Versions: 0.20.1
Reporter: Koji Noguchi
Priority: Minor

 % hadoop archive -archiveName abc.har /user/knoguchi/abc 
 hdfs://mynamenode:8020/user/knoguchi
 doesn't work on 0.18 nor 0.20

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-839) unit test TestMiniMRChildTask fails on mac os-x

2009-08-07 Thread Hong Tang (JIRA)
unit test TestMiniMRChildTask fails on mac os-x
---

 Key: MAPREDUCE-839
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-839
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Hong Tang
Priority: Minor


The unit test TestMiniMRChildTask fails on Mac OS-X (10.5.8)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-825) JobClient completion poll interval of 5s causes slow tests in local mode

2009-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740726#action_12740726
 ] 

Hadoop QA commented on MAPREDUCE-825:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12415772/MAPREDUCE-825.2.patch
  against trunk revision 801959.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/454/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/454/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/454/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/454/console

This message is automatically generated.

 JobClient completion poll interval of 5s causes slow tests in local mode
 

 Key: MAPREDUCE-825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Aaron Kimball
Assignee: Aaron Kimball
Priority: Minor
 Attachments: completion-poll-interval.patch, MAPREDUCE-825.2.patch


 The JobClient.NetworkedJob.waitForCompletion() method polls for job 
 completion every 5 seconds. When running a set of short tests in 
 pseudo-distributed mode, this is unnecessarily slow and causes lots of wasted 
 time. When bandwidth is not scarce, setting the poll interval to 100 ms 
 results in a 4x speedup in some tests.  This interval should be parametrized 
 to allow users to control the interval for testing purposes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-839) unit test TestMiniMRChildTask fails on mac os-x

2009-08-07 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740727#action_12740727
 ] 

Hong Tang commented on MAPREDUCE-839:
-

The problem is discovered on Mac OS-X. But I tried to list the root causes that 
could also affect non-mac-os-x platforms:

Line 66:  assertEquals(tmp, new 
Path(System.getProperty(java.io.tmpdir)). makeQualified(localFs).toString());
expected = file:/[private/]tmp/hadoop-htang/map..., actual = 
file:/[]tmp/hadoop-htang/map
Root cause: on Mac OS-X, /tmp is symlink to /private/tmp. The test probably 
would fail on normal unix systems if /tmp is also symlinked.

Line 160:   assertTrue(LD doesnt contain pwd,  
System.getenv(LD_LIBRARY_PATH).contains(pwd));
Root cause: the environment variable for dynamic library on Mac OS-X is 
DYLD_LIBRARY_PATH instead of LD_LIBRARY_PATH


 unit test TestMiniMRChildTask fails on mac os-x
 ---

 Key: MAPREDUCE-839
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-839
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Hong Tang
Priority: Minor

 The unit test TestMiniMRChildTask fails on Mac OS-X (10.5.8)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-825) JobClient completion poll interval of 5s causes slow tests in local mode

2009-08-07 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740743#action_12740743
 ] 

Aaron Kimball commented on MAPREDUCE-825:
-

Failures are in streaming only.

 JobClient completion poll interval of 5s causes slow tests in local mode
 

 Key: MAPREDUCE-825
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-825
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Aaron Kimball
Assignee: Aaron Kimball
Priority: Minor
 Attachments: completion-poll-interval.patch, MAPREDUCE-825.2.patch


 The JobClient.NetworkedJob.waitForCompletion() method polls for job 
 completion every 5 seconds. When running a set of short tests in 
 pseudo-distributed mode, this is unnecessarily slow and causes lots of wasted 
 time. When bandwidth is not scarce, setting the poll interval to 100 ms 
 results in a 4x speedup in some tests.  This interval should be parametrized 
 to allow users to control the interval for testing purposes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-840) DBInputFormat leaves open transaction

2009-08-07 Thread Aaron Kimball (JIRA)
DBInputFormat leaves open transaction
-

 Key: MAPREDUCE-840
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-840
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Aaron Kimball
Assignee: Aaron Kimball
Priority: Minor


DBInputFormat.getSplits() does not connection.commit() after the COUNT query. 
This can leave an open transaction against the database which interferes with 
other connections to the same table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-840) DBInputFormat leaves open transaction

2009-08-07 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-840:


Attachment: MAPREDUCE-840.patch

Attaching trivial patch for this issue. No new tests because I've only seen 
this issue manifest in interacting with postgresql. I've verified that with 
this fix in place, it works with postgresql. The TestDBJob unit test also works.

 DBInputFormat leaves open transaction
 -

 Key: MAPREDUCE-840
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-840
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Aaron Kimball
Assignee: Aaron Kimball
Priority: Minor
 Attachments: MAPREDUCE-840.patch


 DBInputFormat.getSplits() does not connection.commit() after the COUNT query. 
 This can leave an open transaction against the database which interferes with 
 other connections to the same table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-840) DBInputFormat leaves open transaction

2009-08-07 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-840:


Status: Patch Available  (was: Open)

 DBInputFormat leaves open transaction
 -

 Key: MAPREDUCE-840
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-840
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Aaron Kimball
Assignee: Aaron Kimball
Priority: Minor
 Attachments: MAPREDUCE-840.patch


 DBInputFormat.getSplits() does not connection.commit() after the COUNT query. 
 This can leave an open transaction against the database which interferes with 
 other connections to the same table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-750) Extensible ConnManager factory API

2009-08-07 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-750:


Status: Patch Available  (was: Open)

 Extensible ConnManager factory API
 --

 Key: MAPREDUCE-750
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-750
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-750.2.patch, MAPREDUCE-750.3.patch, 
 MAPREDUCE-750.patch


 Sqoop uses the ConnFactory class to instantiate a ConnManager implementation 
 based on the connect string and other arguments supplied by the user. This 
 allows per-database logic to be encapsulated in different ConnManager 
 instances, and dynamically chosen based on which database the user is 
 actually importing from. But adding new ConnManager implementations requires 
 modifying the source of a common ConnFactory class. An indirection layer 
 should be used to delegate instantiation to a number of factory 
 implementations which can be specified in the static configuration or at 
 runtime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-750) Extensible ConnManager factory API

2009-08-07 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-750:


Attachment: MAPREDUCE-750.3.patch

New patch resync'd with trunk

 Extensible ConnManager factory API
 --

 Key: MAPREDUCE-750
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-750
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-750.2.patch, MAPREDUCE-750.3.patch, 
 MAPREDUCE-750.patch


 Sqoop uses the ConnFactory class to instantiate a ConnManager implementation 
 based on the connect string and other arguments supplied by the user. This 
 allows per-database logic to be encapsulated in different ConnManager 
 instances, and dynamically chosen based on which database the user is 
 actually importing from. But adding new ConnManager implementations requires 
 modifying the source of a common ConnFactory class. An indirection layer 
 should be used to delegate instantiation to a number of factory 
 implementations which can be specified in the static configuration or at 
 runtime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-750) Extensible ConnManager factory API

2009-08-07 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-750:


Status: Open  (was: Patch Available)

 Extensible ConnManager factory API
 --

 Key: MAPREDUCE-750
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-750
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-750.2.patch, MAPREDUCE-750.3.patch, 
 MAPREDUCE-750.patch


 Sqoop uses the ConnFactory class to instantiate a ConnManager implementation 
 based on the connect string and other arguments supplied by the user. This 
 allows per-database logic to be encapsulated in different ConnManager 
 instances, and dynamically chosen based on which database the user is 
 actually importing from. But adding new ConnManager implementations requires 
 modifying the source of a common ConnFactory class. An indirection layer 
 should be used to delegate instantiation to a number of factory 
 implementations which can be specified in the static configuration or at 
 runtime.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-799) Some of MRUnit's self-tests were not being run

2009-08-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740808#action_12740808
 ] 

Hadoop QA commented on MAPREDUCE-799:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12414378/MAPREDUCE-799.patch
  against trunk revision 801959.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/455/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/455/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/455/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/455/console

This message is automatically generated.

 Some of MRUnit's self-tests were not being run
 --

 Key: MAPREDUCE-799
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-799
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-799.patch


 Due to method naming issues, some test cases were not being executed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-799) Some of MRUnit's self-tests were not being run

2009-08-07 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740816#action_12740816
 ] 

Aaron Kimball commented on MAPREDUCE-799:
-

contrib failures are just streaming.

 Some of MRUnit's self-tests were not being run
 --

 Key: MAPREDUCE-799
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-799
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-799.patch


 Due to method naming issues, some test cases were not being executed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-64) Map-side sort is hampered by io.sort.record.percent

2009-08-07 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12740837#action_12740837
 ] 

Todd Lipcon commented on MAPREDUCE-64:
--

Hi Arun,

Have you guys worked on this at all already? I'm interested in playing around 
with rewriting part of the mapside sort to get rid of this tunable. Like you 
said, for a lot of applications the default values are *way* off. 350K records 
in 95MB = 271 bytes average record size, which is larger than probably the 
majority of jobs we see in practice. If you already have worked on this I don't 
want to duplicate your effort, but if not, I think it would be a good step 
towards better average performance
without expert tuning.

 Map-side sort is hampered by io.sort.record.percent
 ---

 Key: MAPREDUCE-64
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-64
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arun C Murthy

 Currently io.sort.record.percent is a fairly obscure, per-job configurable, 
 expert-level parameter which controls how much accounting space is available 
 for records in the map-side sort buffer (io.sort.mb). Typically values for 
 io.sort.mb (100) and io.sort.record.percent (0.05) imply that we can store 
 ~350,000 records in the buffer before necessitating a sort/combine/spill.
 However for many applications which deal with small records e.g. the 
 world-famous wordcount and it's family this implies we can only use 5-10% of 
 io.sort.mb i.e. (5-10M) before we spill inspite of having _much_ more memory 
 available in the sort-buffer. The word-count for e.g. results in ~12 spills 
 (given hdfs block size of 64M). The presence of a combiner exacerbates the 
 problem by piling serialization/deserialization of records too...
 Sure, jobs can configure io.sort.record.percent, but it's tedious and 
 obscure; we really can do better by getting the framework to automagically 
 pick it by using all available memory (upto io.sort.mb) for either the data 
 or accounting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.