[jira] Commented: (MAPREDUCE-856) Localized files from DistributedCache should have right access-control

2009-08-24 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746749#action_12746749
 ] 

Hemanth Yamijala commented on MAPREDUCE-856:


Looked at the patch. I have a few comments:

- Make Localizer an instance class, as in general, that's a more flexible 
design, and also there's state that the localizer is needing to maintain anyway.
- I would recommend initializeUserDirs to pass the taskcontroller instead of 
tasktracker, as the entire tasktracker interface is not needed by the localizer 
atleast now.
- In HADOOP-4491, if the user directory cannot be created on any disk, we were 
failing localization. I think that's a useful feature to have.
-Synchronization w.r.to user localization needs to be looked at. 
-- It is possible right now that when user localization is in progress for a 
user, another task for the same user could get launched before the localization 
completes. 
-- Also, the object on which we are locking - is it guaranteed that it is a 
unique instance for every user ?
- Race condition exists between creation and deletion of user directories. Say 
a job requires a user dir and has not yet localized files (and consequently 
hasn't acquired the synchronization lock. At that time if deletion starts, it 
could delete the user dir.
- Also, I think it will be good to check for cleaning up user directories on a 
much slower pace as they involve some costly operations.
- I think JobConf.setUserAndGroupNamesForJob need not be static. Also, it would 
be nice to document that this is mainly used in test cases.
- User directory can be 570. So also distributed cache directory (no need even 
for setuid, right ?)
- The changes in MAPREDUCE-871 need to be synced up in this patch as well.
- Some tests like TestTaskControllerSetup are disabled. Can you please enable 
them back.
- Permission checks for user directory and jobcache and archive directory 
permissions needed.
- Test cases should also confirm directory paths in localized distributed cache 
paths are being set to the right permissions. 
- Can we use testManagerFlow to have templates that can be overridden by the 
LinuxTaskController test class.

 Localized files from DistributedCache should have right access-control
 --

 Key: MAPREDUCE-856
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-856
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Reporter: Arun C Murthy
Assignee: Vinod K V
 Attachments: MAPREDUCE-856-20090820.txt, MAPREDUCE-856-20090821.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-909) Shell$ExitCodeException while killing/failing a task.

2009-08-24 Thread Suman Sehgal (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746759#action_12746759
 ] 

Suman Sehgal commented on MAPREDUCE-909:


This exception is suppressed in 0.21 while it should be there in the logs for 
0.21 also while killing or failing a task.

 Shell$ExitCodeException while killing/failing a task.
 -

 Key: MAPREDUCE-909
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-909
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Suman Sehgal
Priority: Minor

 Encountered Shell$ExitCodeException in TT logs while killing/failing a job 
 on 0.20.1
 Stack Trace:
 =
 2009-08-22 16:37:05,867 INFO org.apache.hadoop.mapred.TaskTracker: About to 
 purge task: attempt_200908200732_0541_m_03_1
 2009-08-22 16:37:06,030 WARN org.apache.hadoop.mapred.LinuxTaskController: 
 Exception thrown while launching task JVM : 
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:245)
 at org.apache.hadoop.util.Shell.run(Shell.java:172)
 at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:365)
 at 
 org.apache.hadoop.mapred.LinuxTaskController.launchTaskJVM(LinuxTaskController.java:156)
 at 
 org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:397)
 at 
 org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:386)
 2009-08-22 16:37:06,030 WARN org.apache.hadoop.mapred.LinuxTaskController: 
 Exit code from task is : 143

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-909) Shell$ExitCodeException while killing/failing a task.

2009-08-24 Thread Suman Sehgal (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suman Sehgal updated MAPREDUCE-909:
---

Priority: Trivial  (was: Minor)

 Shell$ExitCodeException while killing/failing a task.
 -

 Key: MAPREDUCE-909
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-909
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.21.0
Reporter: Suman Sehgal
Priority: Trivial

 Encountered Shell$ExitCodeException in TT logs while killing/failing a job 
 on 0.20.1
 Stack Trace:
 =
 2009-08-22 16:37:05,867 INFO org.apache.hadoop.mapred.TaskTracker: About to 
 purge task: attempt_200908200732_0541_m_03_1
 2009-08-22 16:37:06,030 WARN org.apache.hadoop.mapred.LinuxTaskController: 
 Exception thrown while launching task JVM : 
 org.apache.hadoop.util.Shell$ExitCodeException:
 at org.apache.hadoop.util.Shell.runCommand(Shell.java:245)
 at org.apache.hadoop.util.Shell.run(Shell.java:172)
 at 
 org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:365)
 at 
 org.apache.hadoop.mapred.LinuxTaskController.launchTaskJVM(LinuxTaskController.java:156)
 at 
 org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.runChild(JvmManager.java:397)
 at 
 org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.run(JvmManager.java:386)
 2009-08-22 16:37:06,030 WARN org.apache.hadoop.mapred.LinuxTaskController: 
 Exit code from task is : 143

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.

2009-08-24 Thread rahul k singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746763#action_12746763
 ] 

rahul k singh commented on MAPREDUCE-861:
-

small correction above:
getLeafQueues is actually getLeafQueueNames().

Some more clarification in terms of names of queue:

1 . The name of the queues would be parent.child.grandChild. 
for example:
{code:xml}
queue
   nameq/name
queue
namep/name
/queue
 /queue
{code}

In the above example : There are 2 queues. q is a root level queue and p is 
a child of q.
The name of queue q would be q;
The name of queue p would be q.p. 
We would always use this completely qualified name in the implementation.

Users cannot name a queue like queue-name.queue-name as . is used as 
separator.







 Modify queue configuration format and parsing to support a hierarchy of 
 queues.
 ---

 Key: MAPREDUCE-861
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Hemanth Yamijala
Assignee: rahul k singh

 MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
 framework. This JIRA is for defining changes to the configuration related to 
 queues. 
 The current format for defining a queue and its properties is as follows: 
 mapred.queue.queue-name.property-name. For e.g. 
 mapred.queue.queue-name.acl-submit-job. The reason for using this verbose 
 format was to be able to reuse the Configuration parser in Hadoop. However, 
 administrators currently using the queue configuration have already indicated 
 a very strong desire for a more manageable format. Since, this becomes more 
 unwieldy with hierarchical queues, the time may be good to introduce a new 
 format for representing queue configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-861) Modify queue configuration format and parsing to support a hierarchy of queues.

2009-08-24 Thread rahul k singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746771#action_12746771
 ] 

rahul k singh commented on MAPREDUCE-861:
-

After discussing locally regarding separator , there was an agreement over : .


 Modify queue configuration format and parsing to support a hierarchy of 
 queues.
 ---

 Key: MAPREDUCE-861
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-861
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Hemanth Yamijala
Assignee: rahul k singh

 MAPREDUCE-853 proposes to introduce a hierarchy of queues into the Map/Reduce 
 framework. This JIRA is for defining changes to the configuration related to 
 queues. 
 The current format for defining a queue and its properties is as follows: 
 mapred.queue.queue-name.property-name. For e.g. 
 mapred.queue.queue-name.acl-submit-job. The reason for using this verbose 
 format was to be able to reuse the Configuration parser in Hadoop. However, 
 administrators currently using the queue configuration have already indicated 
 a very strong desire for a more manageable format. Since, this becomes more 
 unwieldy with hierarchical queues, the time may be good to introduce a new 
 format for representing queue configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-430) Task stuck in cleanup with OutOfMemoryErrors

2009-08-24 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-430:
-

Attachment: MAPREDUCE-430-v1.11.patch

Attaching a patch that does what was last discussed last.
This is what the patch does :
- tasktracker now provides fatalError() to report fatal errors from child
- Child/ReduceTask/MapTask now catches Throwable and invokes 
umbilical.fatalError(). If this fails, then System.exit() is invoked. 

 Result of test-patch
   [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.

Running ant-tests.

 Task stuck in cleanup with OutOfMemoryErrors
 

 Key: MAPREDUCE-430
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-430
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Amareshwari Sriramadasu
Assignee: Amar Kamat
 Fix For: 0.20.1

 Attachments: MAPREDUCE-430-v1.11.patch, 
 MAPREDUCE-430-v1.6-branch-0.20.patch, MAPREDUCE-430-v1.6.patch, 
 MAPREDUCE-430-v1.7.patch, MAPREDUCE-430-v1.8.patch


 Obesrved a task with OutOfMemory error, stuck in cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-767) to remove mapreduce dependency on commons-cli2

2009-08-24 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das resolved MAPREDUCE-767.
---

Resolution: Fixed

I committed to trunk a fix to handle -debug (that was missed in the earlier 
patch). 
I committed the patch for 0.20 to the 0.20 branch. Thanks, Amar!

 to remove mapreduce dependency on commons-cli2
 --

 Key: MAPREDUCE-767
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-767
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/streaming
Affects Versions: 0.20.1
Reporter: Giridharan Kesavan
Assignee: Amar Kamat
 Fix For: 0.20.1

 Attachments: MAPREDUCE-767-v1.1.patch, MAPREDUCE-767-v1.2.patch, 
 MAPREDUCE-767-v1.3-branch-0.20.patch, MAPREDUCE-767-v1.3.patch


 mapreduce, streaming and eclipse plugin depends on common-cli2 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-768) Configuration information should generate dump in a standard format.

2009-08-24 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746780#action_12746780
 ] 

Hemanth Yamijala commented on MAPREDUCE-768:


bq. Because Config can pull in JVM properties, you do need to do the expansion 
on the host that is using the configuration.

The current scope of this JIRA is to do the dump on the host that is using the 
configuration. Hence, this is covered in HADOOP-6184.

bq. It seems sensible to make this a general purpose Tools option,, print my 
config to stdout, so that anyone using any tool can see the values
bq. It's also handy to be able to ask a remote service endpoint for their 
config -any node, master or slave, should be able to serve up the config to 
someone it trusts. Which introduces one small problem -only users with admin 
rights should be allowed to see the configurations, in case they contain 
passwords or other sensitive topics.

These two are good points and I think we should do them as incremental work. I 
recommend we think about it filing another JIRA for the same after this goes in.

 Configuration information should generate dump in a standard format.
 

 Key: MAPREDUCE-768
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-768
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: rahul k singh
 Attachments: MAPREDUCE-768-1.patch, MAPREDUCE-768-2.patch, 
 MAPREDUCE-768.patch


  We need to generate the configuration dump in a standard format .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-807) Stray user files in mapred.system.dir with permissions other than 777 can prevent the jobtracker from starting up.

2009-08-24 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-807:
-

Attachment: MAPREDUCE-807-v1.6-branch-0.20.patch

Attaching a patch for branch 0.20. 

 Stray user files in mapred.system.dir with permissions other than 777 can 
 prevent the jobtracker from starting up.
 --

 Key: MAPREDUCE-807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amar Kamat
Assignee: Amar Kamat
Priority: Blocker
 Attachments: MAPRED-807-v1.1.patch, MAPRED-807-v1.2.patch, 
 MAPRED-807-v1.3.patch, MAPRED-807-v1.4.patch, MAPRED-807-v1.6.patch, 
 MAPREDUCE-807-v1.6-branch-0.20.patch


 With restart disabled, the jobtracker does a _rm -rf_ of the 
 mapred.system.dir. If the mapred.system.dir contains user files with 
 permissions other than 777 then the jobtracker gets stuck in a loop trying to 
 delete the mapred.system.dir (and each time failing with 
 AccessControlException).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-768) Configuration information should generate dump in a standard format.

2009-08-24 Thread V.V.Chaitanya Krishna (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

V.V.Chaitanya Krishna updated MAPREDUCE-768:


Attachment: MAPREDUCE-768-3.patch

The patch is not compatible with the recent updates in mapreduce. Uploading 
patch with this issue resolved.

 Configuration information should generate dump in a standard format.
 

 Key: MAPREDUCE-768
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-768
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: rahul k singh
 Attachments: MAPREDUCE-768-1.patch, MAPREDUCE-768-2.patch, 
 MAPREDUCE-768-3.patch, MAPREDUCE-768.patch


  We need to generate the configuration dump in a standard format .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-768) Configuration information should generate dump in a standard format.

2009-08-24 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746786#action_12746786
 ] 

Hemanth Yamijala commented on MAPREDUCE-768:


I think we need a new patch, because the one on the jira currently is not 
applying.

But I briefly looked at the patch, and can think of a few minor comments:

- I think JobTracker.dumpConfiguration should not take JobConf as a parameter. 
It should create one inside the call.
- Similarly, QueueManager.dumpConfiguration should also not take a JobConf. 
Further, it should not load the default resources, because otherwise, the 
JobTracker's configuration would get dumped twice.


 Configuration information should generate dump in a standard format.
 

 Key: MAPREDUCE-768
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-768
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: rahul k singh
 Attachments: MAPREDUCE-768-1.patch, MAPREDUCE-768-2.patch, 
 MAPREDUCE-768-3.patch, MAPREDUCE-768.patch


  We need to generate the configuration dump in a standard format .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

2009-08-24 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746801#action_12746801
 ] 

Tom White commented on MAPREDUCE-370:
-

Could the counter name be based on the named output, rather than the base 
filename?

bq. if user doesn't give unique name for the output file, there are chances 
that output will be garbled.

This is true, but like MultipleOutputFormat it would be up to the application 
to give unique names to the output files. Most users would use the simpler form 
that takes a named output and lets MultipleOutputs construct the output 
filename {{{namedOutput}-(m|r)-{part-number}}}, but this change I'm proposing 
would allow advanced users to control the precise filename of the outputs.

 Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
 ---

 Key: MAPREDUCE-370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-370-1.txt, patch-370.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-807) Stray user files in mapred.system.dir with permissions other than 777 can prevent the jobtracker from starting up.

2009-08-24 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-807:
-

Attachment: MAPREDUCE-807-v1.7-branch-0.20.patch
MAPRED-807-v1.7.patch

Attaching new patches after Devaraj's offline comments.

 Stray user files in mapred.system.dir with permissions other than 777 can 
 prevent the jobtracker from starting up.
 --

 Key: MAPREDUCE-807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amar Kamat
Assignee: Amar Kamat
Priority: Blocker
 Attachments: MAPRED-807-v1.1.patch, MAPRED-807-v1.2.patch, 
 MAPRED-807-v1.3.patch, MAPRED-807-v1.4.patch, MAPRED-807-v1.6.patch, 
 MAPRED-807-v1.7.patch, MAPREDUCE-807-v1.6-branch-0.20.patch, 
 MAPREDUCE-807-v1.7-branch-0.20.patch


 With restart disabled, the jobtracker does a _rm -rf_ of the 
 mapred.system.dir. If the mapred.system.dir contains user files with 
 permissions other than 777 then the jobtracker gets stuck in a loop trying to 
 delete the mapred.system.dir (and each time failing with 
 AccessControlException).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

2009-08-24 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746811#action_12746811
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-370:
---

bq. Could the counter name be based on the named output, rather than the base 
filename?
Possible. But counters will be maintained only for named outputs. 

bq. but this change I'm proposing would allow advanced users to control the 
precise filename of the outputs.
I think these users can override FileOutputFormat.getDefaultWorkFile to control 
the precise filename.

 Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
 ---

 Key: MAPREDUCE-370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-370-1.txt, patch-370.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

2009-08-24 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-370:
--

Status: Patch Available  (was: Open)

 Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
 ---

 Key: MAPREDUCE-370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-370-1.txt, patch-370-2.txt, patch-370.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

2009-08-24 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746816#action_12746816
 ] 

Tom White commented on MAPREDUCE-370:
-

bq. I think these users can override FileOutputFormat.getDefaultWorkFile to 
control the precise filename.

This is true. So to have complete control over the output filename you would 
call the write method with a base output path of the name you want (possibly 
using the key and value to construct it). You would then override 
FileOutputFormat.getDefaultWorkFile() to omit the {m,r}-n suffix.

We could make this slightly easier in the future perhaps (by putting it in the 
MultipleOutputs API, for example), but I think the current approach is 
reasonable.

 Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
 ---

 Key: MAPREDUCE-370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-370-1.txt, patch-370-2.txt, patch-370.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-807) Stray user files in mapred.system.dir with permissions other than 777 can prevent the jobtracker from starting up.

2009-08-24 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das resolved MAPREDUCE-807.
---

   Resolution: Fixed
Fix Version/s: 0.20.1

I just committed this. Thanks, Amar!

 Stray user files in mapred.system.dir with permissions other than 777 can 
 prevent the jobtracker from starting up.
 --

 Key: MAPREDUCE-807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amar Kamat
Assignee: Amar Kamat
Priority: Blocker
 Fix For: 0.20.1

 Attachments: MAPRED-807-v1.1.patch, MAPRED-807-v1.2.patch, 
 MAPRED-807-v1.3.patch, MAPRED-807-v1.4.patch, MAPRED-807-v1.6.patch, 
 MAPRED-807-v1.7.patch, MAPREDUCE-807-v1.6-branch-0.20.patch, 
 MAPREDUCE-807-v1.7-branch-0.20.patch


 With restart disabled, the jobtracker does a _rm -rf_ of the 
 mapred.system.dir. If the mapred.system.dir contains user files with 
 permissions other than 777 then the jobtracker gets stuck in a loop trying to 
 delete the mapred.system.dir (and each time failing with 
 AccessControlException).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

2009-08-24 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-370:
--

Attachment: patch-370-2.txt

Patch changing checkTokenName() and checkbaseOutputPath() to be private. 

 Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
 ---

 Key: MAPREDUCE-370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-370-1.txt, patch-370-2.txt, patch-370.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-856) Localized files from DistributedCache should have right access-control

2009-08-24 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746822#action_12746822
 ] 

Hemanth Yamijala commented on MAPREDUCE-856:


bq. User directory can be 570. So also distributed cache directory (no need 
even for setuid, right ?)

I meant setgid.. However, that may be required, as we realized in an internal 
discussion.

 Localized files from DistributedCache should have right access-control
 --

 Key: MAPREDUCE-856
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-856
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: tasktracker
Reporter: Arun C Murthy
Assignee: Vinod K V
 Attachments: MAPREDUCE-856-20090820.txt, MAPREDUCE-856-20090821.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-24 Thread Tom White (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746824#action_12746824
 ] 

Tom White commented on MAPREDUCE-476:
-

Sorry Philip, but I've just noticed that the testFileSystemOtherThanDefault() 
test from TestDistributedCache (introduced in HADOOP-5635) got missed during 
the move to TestTrackerDistributedCacheManager.

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476-v8.patch, MAPREDUCE-476.patch, v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-798) MRUnit should be able to test a succession of MapReduce passes

2009-08-24 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated MAPREDUCE-798:


   Resolution: Fixed
Fix Version/s: 0.21.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Aaron!

 MRUnit should be able to test a succession of MapReduce passes
 --

 Key: MAPREDUCE-798
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-798
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Fix For: 0.21.0

 Attachments: MAPREDUCE-798.2.patch, MAPREDUCE-798.3.patch, 
 MAPREDUCE-798.patch


 MRUnit can currently test that the inputs to a given (mapper, reducer) job 
 produce certain outputs at the end of the reducer. It would be good to 
 support more end-to-end tests of a series of MapReduce jobs that form a 
 longer pipeline surrounding some data.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-807) Stray user files in mapred.system.dir with permissions other than 777 can prevent the jobtracker from starting up.

2009-08-24 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-807:
-

Release Note: The JobTracker tries to delete the mapred.system.dir when it 
is starting up (with the job recovery disabled). The fix provided by this jira 
is that JobTracker will fail (bail out) with AccessControlException if it fails 
to delete files/directories in mapred.system.dir due to access control issues.

 Stray user files in mapred.system.dir with permissions other than 777 can 
 prevent the jobtracker from starting up.
 --

 Key: MAPREDUCE-807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amar Kamat
Assignee: Amar Kamat
Priority: Blocker
 Fix For: 0.20.1

 Attachments: MAPRED-807-v1.1.patch, MAPRED-807-v1.2.patch, 
 MAPRED-807-v1.3.patch, MAPRED-807-v1.4.patch, MAPRED-807-v1.6.patch, 
 MAPRED-807-v1.7.patch, MAPREDUCE-807-v1.6-branch-0.20.patch, 
 MAPREDUCE-807-v1.7-branch-0.20.patch


 With restart disabled, the jobtracker does a _rm -rf_ of the 
 mapred.system.dir. If the mapred.system.dir contains user files with 
 permissions other than 777 then the jobtracker gets stuck in a loop trying to 
 delete the mapred.system.dir (and each time failing with 
 AccessControlException).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-807) Stray user files in mapred.system.dir with permissions other than 777 can prevent the jobtracker from starting up.

2009-08-24 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-807:
-

Description: With restart disabled, the jobtracker does a _rm -rf_ of the 
mapred.system.dir. If the mapred.system.dir contains user files with 
permissions other than 777 then the jobtracker gets stuck in a loop trying to 
delete the mapred.system.dir (and each time failing with 
AccessControlException). The JobTracker admin has to manually cleanup the 
mapred.system.dir if this happens.  (was: With restart disabled, the jobtracker 
does a _rm -rf_ of the mapred.system.dir. If the mapred.system.dir contains 
user files with permissions other than 777 then the jobtracker gets stuck in a 
loop trying to delete the mapred.system.dir (and each time failing with 
AccessControlException).)

 Stray user files in mapred.system.dir with permissions other than 777 can 
 prevent the jobtracker from starting up.
 --

 Key: MAPREDUCE-807
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-807
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Amar Kamat
Assignee: Amar Kamat
Priority: Blocker
 Fix For: 0.20.1

 Attachments: MAPRED-807-v1.1.patch, MAPRED-807-v1.2.patch, 
 MAPRED-807-v1.3.patch, MAPRED-807-v1.4.patch, MAPRED-807-v1.6.patch, 
 MAPRED-807-v1.7.patch, MAPREDUCE-807-v1.6-branch-0.20.patch, 
 MAPREDUCE-807-v1.7-branch-0.20.patch


 With restart disabled, the jobtracker does a _rm -rf_ of the 
 mapred.system.dir. If the mapred.system.dir contains user files with 
 permissions other than 777 then the jobtracker gets stuck in a loop trying to 
 delete the mapred.system.dir (and each time failing with 
 AccessControlException). The JobTracker admin has to manually cleanup the 
 mapred.system.dir if this happens.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-318) Refactor reduce shuffle code

2009-08-24 Thread Jothi Padmanabhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jothi Padmanabhan updated MAPREDUCE-318:


Attachment: mapred-318-24Aug.patch

New patch with review comments incorporated. 
Also fixed some findbugs warnings. 

 Refactor reduce shuffle code
 

 Key: MAPREDUCE-318
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-318
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HADOOP-5233_api.patch, HADOOP-5233_part0.patch, 
 mapred-318-14Aug.patch, mapred-318-20Aug.patch, mapred-318-24Aug.patch, 
 mapred-318-common.patch


 The reduce shuffle code has become very complex and entangled. I think we 
 should move it out of ReduceTask and into a separate package 
 (org.apache.hadoop.mapred.task.reduce). Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-430) Task stuck in cleanup with OutOfMemoryErrors

2009-08-24 Thread Amar Kamat (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Kamat updated MAPREDUCE-430:
-

Attachment: MAPREDUCE-430-v1.12-branch-0.20.patch
MAPREDUCE-430-v1.12.patch

Attaching a new patch for review. Result of test-patch
[exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.



 Task stuck in cleanup with OutOfMemoryErrors
 

 Key: MAPREDUCE-430
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-430
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Amareshwari Sriramadasu
Assignee: Amar Kamat
 Fix For: 0.20.1

 Attachments: MAPREDUCE-430-v1.11.patch, 
 MAPREDUCE-430-v1.12-branch-0.20.patch, MAPREDUCE-430-v1.12.patch, 
 MAPREDUCE-430-v1.6-branch-0.20.patch, MAPREDUCE-430-v1.6.patch, 
 MAPREDUCE-430-v1.7.patch, MAPREDUCE-430-v1.8.patch


 Obesrved a task with OutOfMemory error, stuck in cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-679) XML-based metrics as JSP servlet for JobTracker

2009-08-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746861#action_12746861
 ] 

Steve Loughran commented on MAPREDUCE-679:
--

TldLocations cache is some cache for globally defined taglibs 

http://tomcat.apache.org/tomcat-5.5-doc/jasper/docs/api/org/apache/jasper/compiler/TldLocationsCache.html

source is here:
http://svn.apache.org/repos/asf/tomcat/tc6.0.x/trunk/java/org/apache/jasper/compiler/TldLocationsCache.java

Looking at the source, the message comes from {{{processWebDotXml()}}}; it 
doesnt do any harm, except that it doesnt bother parsing any web.xml -defined 
content if web.xml is nowhere to be found. Its a warning, not an error. 

There is a servlet context property, org.apache.catalina.deploy.alt_dd, which 
can be used to identify an alternate deployment descriptor, but I have no idea 
how to set that from command line jspc.

Recommendation: ignore the warning.

 XML-based metrics as JSP servlet for JobTracker
 ---

 Key: MAPREDUCE-679
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-679
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: example-jobtracker-completed-job.xml, 
 example-jobtracker-running-job.xml, MAPREDUCE-679.2.patch, 
 MAPREDUCE-679.3.patch, MAPREDUCE-679.patch


 In HADOOP-4559, a general REST API for reporting metrics was proposed but 
 work seems to have stalled. In the interim, we have a simple XML translation 
 of the existing JobTracker status page which provides the same metrics 
 (including the tables of running/completed/failed jobs) as the human-readable 
 page. This is a relatively lightweight addition to provide some 
 machine-understandable metrics reporting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-430) Task stuck in cleanup with OutOfMemoryErrors

2009-08-24 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746878#action_12746878
 ] 

Amar Kamat commented on MAPREDUCE-430:
--

All tests (core + contrib) passed except TestReduceFetch which timed out.

 Task stuck in cleanup with OutOfMemoryErrors
 

 Key: MAPREDUCE-430
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-430
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Amareshwari Sriramadasu
Assignee: Amar Kamat
 Fix For: 0.20.1

 Attachments: MAPREDUCE-430-v1.11.patch, 
 MAPREDUCE-430-v1.12-branch-0.20.patch, MAPREDUCE-430-v1.12.patch, 
 MAPREDUCE-430-v1.6-branch-0.20.patch, MAPREDUCE-430-v1.6.patch, 
 MAPREDUCE-430-v1.7.patch, MAPREDUCE-430-v1.8.patch


 Obesrved a task with OutOfMemory error, stuck in cleanup.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

2009-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746887#action_12746887
 ] 

Hadoop QA commented on MAPREDUCE-370:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12417469/patch-370-2.txt
  against trunk revision 807123.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/508/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/508/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/508/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/508/console

This message is automatically generated.

 Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
 ---

 Key: MAPREDUCE-370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-370-1.txt, patch-370-2.txt, patch-370.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-699) Several streaming test cases seem to be failing

2009-08-24 Thread Nigel Daley (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746929#action_12746929
 ] 

Nigel Daley commented on MAPREDUCE-699:
---

I just blew away the Hudson workspace for this patch build to see if that fixes 
it.

 Several streaming test cases seem to be failing
 ---

 Key: MAPREDUCE-699
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-699
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Jothi Padmanabhan

 ant test is failing several streaming tests with the following error
 Error Message
 java.lang.NullPointerException  at 
 org.apache.commons.cli.GnuParser.flatten(GnuParser.java:110)  at 
 org.apache.commons.cli.Parser.parse(Parser.java:143)  at 
 org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:374)
   at 
 org.apache.hadoop.util.GenericOptionsParser.init(GenericOptionsParser.java:153)
   at 
 org.apache.hadoop.util.GenericOptionsParser.init(GenericOptionsParser.java:138)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1314)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:414) 
  at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:278)  at 
 org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:119)  at 
 org.apache.hadoop.streaming.TestMultipleCachefiles.testMultipleCachefiles(TestMultipleCachefiles.java:68)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
  at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)  at 
 junit.framework.TestCase.runTest(TestCase.java:168)  at 
 junit.framework.TestCase.runBare(TestCase.java:134)  at 
 junit.framework.TestResult$1.protect(TestResult.java:110)  at 
 junit.framework.TestResult.runProtected(TestResult.java:128)  at 
 junit.framework.TestResult.run(TestResult.java:113)  at 
 junit.framework.TestCase.run(TestCase.java:124)  at 
 junit.framework.TestSuite.runTest(TestSuite.java:232)  at 
 junit.framework.TestSuite.run(TestSuite.java:227)  at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:79) 
  at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)  at 
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
   at 
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
   at 
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
  
 Stacktrace
 junit.framework.AssertionFailedError: java.lang.NullPointerException
   at org.apache.commons.cli.GnuParser.flatten(GnuParser.java:110)
   at org.apache.commons.cli.Parser.parse(Parser.java:143)
   at 
 org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:374)
   at 
 org.apache.hadoop.util.GenericOptionsParser.init(GenericOptionsParser.java:153)
   at 
 org.apache.hadoop.util.GenericOptionsParser.init(GenericOptionsParser.java:138)
   at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1314)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:414)
   at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:278)
   at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:119)
   at 
 org.apache.hadoop.streaming.TestMultipleCachefiles.testMultipleCachefiles(TestMultipleCachefiles.java:68)
   at 
 org.apache.hadoop.streaming.TestMultipleCachefiles.failTrace(TestMultipleCachefiles.java:141)
   at 
 org.apache.hadoop.streaming.TestMultipleCachefiles.testMultipleCachefiles(TestMultipleCachefiles.java:133)
 The following are links to two such failures
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/337/testReport/
 http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/336/testReport/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-24 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Status: Patch Available  (was: Open)

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476-v8.patch, MAPREDUCE-476-v9.patch, MAPREDUCE-476.patch, 
 v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-24 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger updated MAPREDUCE-476:
--

Attachment: MAPREDUCE-476-v9.patch

Well-spotted, Tom.  I've restored the missing test.

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476-v8.patch, MAPREDUCE-476-v9.patch, MAPREDUCE-476.patch, 
 v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-895) FileSystem::ListStatus will now throw FileNotFoundException, MapRed needs updated

2009-08-24 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated MAPREDUCE-895:
--

Release Note: The semantics for dealing with non-existent paths passed to 
FileSystem::listStatus() were updated and solidified in HADOOP-6201 and 
HDFS-538.  Existing code within MapReduce that relied on the previous behavior 
of some FileSystem implementations of returning null has been updated to catch 
or propagate a FileNotFoundException, per the method's contract.

Adding release note.

 FileSystem::ListStatus will now throw FileNotFoundException, MapRed needs 
 updated
 -

 Key: MAPREDUCE-895
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-895
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.21.0

 Attachments: MAPREDUCE-895.patch


 HADOOP-6201 (and HDFS-538) determined the semantics of FileSystem::ListStatus 
 is not correct and that the actual file system class vary in their 
 implemenations, with some throwing an exception and some returning null.  
 Fixing this will require adjusting code that calls this method. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-679) XML-based metrics as JSP servlet for JobTracker

2009-08-24 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12746997#action_12746997
 ] 

Aaron Kimball commented on MAPREDUCE-679:
-

Good enough. Is this ready to be committed?

 XML-based metrics as JSP servlet for JobTracker
 ---

 Key: MAPREDUCE-679
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-679
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: jobtracker
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: example-jobtracker-completed-job.xml, 
 example-jobtracker-running-job.xml, MAPREDUCE-679.2.patch, 
 MAPREDUCE-679.3.patch, MAPREDUCE-679.patch


 In HADOOP-4559, a general REST API for reporting metrics was proposed but 
 work seems to have stalled. In the interim, we have a simple XML translation 
 of the existing JobTracker status page which provides the same metrics 
 (including the tables of running/completed/failed jobs) as the human-readable 
 page. This is a relatively lightweight addition to provide some 
 machine-understandable metrics reporting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-910) MRUnit should support counters

2009-08-24 Thread Aaron Kimball (JIRA)
MRUnit should support counters
--

 Key: MAPREDUCE-910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-910
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Aaron Kimball
Assignee: Aaron Kimball


incrCounter() is currently a dummy stub method in MRUnit that does nothing. 
Would be good for the mock reporter/context implementations to support counters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-910) MRUnit should support counters

2009-08-24 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-910:


Attachment: MAPREDUCE-910.patch

Attaching patch which provides this functionality.

All TestDriver implementations have a getCounters() method which returns the 
counters used by that test. The user can then verify that the actual counts 
meet their expected values.

 MRUnit should support counters
 --

 Key: MAPREDUCE-910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-910
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-910.patch


 incrCounter() is currently a dummy stub method in MRUnit that does nothing. 
 Would be good for the mock reporter/context implementations to support 
 counters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-910) MRUnit should support counters

2009-08-24 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-910:


Status: Patch Available  (was: Open)

 MRUnit should support counters
 --

 Key: MAPREDUCE-910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-910
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-910.patch


 incrCounter() is currently a dummy stub method in MRUnit that does nothing. 
 Would be good for the mock reporter/context implementations to support 
 counters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure

2009-08-24 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-901:
--

Attachment: 901_1.patch

Attaching a patch for review. I am still testing the patch. Also, a little bit 
of cleanup is required especially w.r.t to naming variables/fields in the 
classes. I will do that in a follow up patch.

Some points on the approach:
1) Defined a class TaskMetrics that has methods for updating the counters 
defined in o.a.h.mapreduce.TaskCounter.java. It also provides a utility method 
to update framework Counters that aren't defined in TaskCounter.java. Examples 
of such counters are the counters that the framework defines in the 
countergroup FileSystemCounters. For the TaskCounter counters, the RPC is 
optimized. For the framework counters like the FileSystemCounters, RPC uses the 
Counters serialization. 
2) The above is serialized out as part of TaskStatus object in the heartbeats.
3) In TaskInProgress.java, the TIP's Counters is updated with the above 
counters obtained in the heartbeat.

Would really appreciate a review on this one.

And yes, this looks like a good thing to have for the jiras MAPREDUCE-220 and 
MAPREDUCE-718.

 Move Framework Counters into a TaskMetric structure
 ---

 Key: MAPREDUCE-901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.21.0
Reporter: Owen O'Malley
Assignee: Devaraj Das
 Fix For: 0.21.0

 Attachments: 901_1.patch


 I think we should move all of the Counters that the framework updates into a 
 single class called TaskMetrics. TaskMetrics would have specific fields for 
 each of the metrics like input records, input bytes, output records, etc.
 It would both reduce the serialized size of the heartbeats (by shrinking the 
 Counters down to just the user's counters) and decrease the latency for 
 updates to the JobTracker (since Counters are sent at most 1/minute instead 
 of 1/heartbeat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-775) Add input/output formatters for Vertica clustered ADBMS.

2009-08-24 Thread Omer Trajman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omer Trajman updated MAPREDUCE-775:
---

Status: Patch Available  (was: Open)

 Add input/output formatters for Vertica clustered ADBMS.
 

 Key: MAPREDUCE-775
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-775
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Omer Trajman
 Fix For: 0.21.0

 Attachments: MAPREDUCE-775.patch


 Add native support for Vertica as an input or output format taking advantage 
 of parallel read and write properties of the DBMS.
  
 On the input side allow for parametrized queries (a la prepared statements) 
 and create a split for each combination of parameters.  Also support the 
 parameter list to be generated from a sql statement.  For example - return 
 metrics for all dimensions that meet criteria X with one input split for each 
 dimension.  Divide the read among any number of hosts in the Vertica cluster.
  
 On the output side, support Vertica streaming load to any number of hosts in 
 the Vertica cluster.  Output may be to a different cluster than input.
  
 Also includes Input and Output formatters that support streaming interface.
 Code has been tested and run on live systems under 19 and 20.  Patch for 21 
 with new API will be ready end of this week.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-775) Add input/output formatters for Vertica clustered ADBMS.

2009-08-24 Thread Omer Trajman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Omer Trajman updated MAPREDUCE-775:
---

Status: Open  (was: Patch Available)

Fixing issues with new patch.  I seem to have replaced the original instead of 
adding a .N.patch - sorry for the confusion.

 Add input/output formatters for Vertica clustered ADBMS.
 

 Key: MAPREDUCE-775
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-775
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Omer Trajman
 Fix For: 0.21.0

 Attachments: MAPREDUCE-775.patch


 Add native support for Vertica as an input or output format taking advantage 
 of parallel read and write properties of the DBMS.
  
 On the input side allow for parametrized queries (a la prepared statements) 
 and create a split for each combination of parameters.  Also support the 
 parameter list to be generated from a sql statement.  For example - return 
 metrics for all dimensions that meet criteria X with one input split for each 
 dimension.  Divide the read among any number of hosts in the Vertica cluster.
  
 On the output side, support Vertica streaming load to any number of hosts in 
 the Vertica cluster.  Output may be to a different cluster than input.
  
 Also includes Input and Output formatters that support streaming interface.
 Code has been tested and run on live systems under 19 and 20.  Patch for 21 
 with new API will be ready end of this week.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-476) extend DistributedCache to work locally (LocalJobRunner)

2009-08-24 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747092#action_12747092
 ] 

Philip Zeyliger commented on MAPREDUCE-476:
---

Failing test is 
org.apache.hadoop.mapred.TestRecoveryManager.testRestartCount.  I think 
that's failing all-over, not just here.

-- Philip

 extend DistributedCache to work locally (LocalJobRunner)
 

 Key: MAPREDUCE-476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-476
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: sam rash
Assignee: Philip Zeyliger
Priority: Minor
 Attachments: HADOOP-2914-v1-full.patch, 
 HADOOP-2914-v1-since-4041.patch, HADOOP-2914-v2.patch, HADOOP-2914-v3.patch, 
 MAPREDUCE-476-20090814.1.txt, MAPREDUCE-476-20090818.txt, 
 MAPREDUCE-476-v2-vs-v3.patch, MAPREDUCE-476-v2-vs-v3.try2.patch, 
 MAPREDUCE-476-v2-vs-v4.txt, MAPREDUCE-476-v2.patch, MAPREDUCE-476-v3.patch, 
 MAPREDUCE-476-v3.try2.patch, MAPREDUCE-476-v4-requires-MR711.patch, 
 MAPREDUCE-476-v5-requires-MR711.patch, MAPREDUCE-476-v7.patch, 
 MAPREDUCE-476-v8.patch, MAPREDUCE-476-v9.patch, MAPREDUCE-476.patch, 
 v6-to-v7.patch


 The DistributedCache does not work locally when using the outlined recipe at 
 http://hadoop.apache.org/core/docs/r0.16.0/api/org/apache/hadoop/filecache/DistributedCache.html
  
 Ideally, LocalJobRunner would take care of populating the JobConf and copying 
 remote files to the local file sytem (http, assume hdfs = default fs = local 
 fs when doing local development.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-906) Updated Sqoop documentation

2009-08-24 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-906:


Status: Patch Available  (was: Open)

 Updated Sqoop documentation
 ---

 Key: MAPREDUCE-906
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-906
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: contrib/sqoop
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-906.patch


 Here's the latest documentation for Sqoop, in both user-guide and manpage 
 form. Built with asciidoc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-875) Make DBRecordReader execute queries lazily

2009-08-24 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-875:


Status: Open  (was: Patch Available)

 Make DBRecordReader execute queries lazily
 --

 Key: MAPREDUCE-875
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-875
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-875.2.patch, MAPREDUCE-875.patch


 DBInputFormat's DBRecordReader executes the user's SQL query in the 
 constructor. If the query is long-running, this can cause task timeout. The 
 user is unable to spawn a background thread (e.g., in a MapRunnable) to 
 inform Hadoop of on-going progress. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-875) Make DBRecordReader execute queries lazily

2009-08-24 Thread Aaron Kimball (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Kimball updated MAPREDUCE-875:


Status: Patch Available  (was: Open)

Recycling patch again.. seems to have been dropped from the queue.

 Make DBRecordReader execute queries lazily
 --

 Key: MAPREDUCE-875
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-875
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-875.2.patch, MAPREDUCE-875.patch


 DBInputFormat's DBRecordReader executes the user's SQL query in the 
 constructor. If the query is long-running, this can cause task timeout. The 
 user is unable to spawn a background thread (e.g., in a MapRunnable) to 
 inform Hadoop of on-going progress. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure

2009-08-24 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747133#action_12747133
 ] 

Arun C Murthy commented on MAPREDUCE-901:
-

Hmm... at the risk of sounding completely lame, I can't seem to find the 
definition of TaskMetrics or TaskCounters - did you forget to do included that 
in the patch?

From the description it seems like TaskMetrics is related to Counters, maybe I 
should wait to see the patch - anyway I was hoping TaskMetrics would be a 
Writable and isn't related to Counters at all.

 Move Framework Counters into a TaskMetric structure
 ---

 Key: MAPREDUCE-901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.21.0
Reporter: Owen O'Malley
Assignee: Devaraj Das
 Fix For: 0.21.0

 Attachments: 901_1.patch


 I think we should move all of the Counters that the framework updates into a 
 single class called TaskMetrics. TaskMetrics would have specific fields for 
 each of the metrics like input records, input bytes, output records, etc.
 It would both reduce the serialized size of the heartbeats (by shrinking the 
 Counters down to just the user's counters) and decrease the latency for 
 updates to the JobTracker (since Counters are sent at most 1/minute instead 
 of 1/heartbeat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-910) MRUnit should support counters

2009-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747162#action_12747162
 ] 

Hadoop QA commented on MAPREDUCE-910:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12417505/MAPREDUCE-910.patch
  against trunk revision 807165.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 13 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/510/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/510/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/510/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/510/console

This message is automatically generated.

 MRUnit should support counters
 --

 Key: MAPREDUCE-910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-910
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-910.patch


 incrCounter() is currently a dummy stub method in MRUnit that does nothing. 
 Would be good for the mock reporter/context implementations to support 
 counters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-910) MRUnit should support counters

2009-08-24 Thread Aaron Kimball (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747165#action_12747165
 ] 

Aaron Kimball commented on MAPREDUCE-910:
-

Unrelated test failure.

 MRUnit should support counters
 --

 Key: MAPREDUCE-910
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-910
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Aaron Kimball
Assignee: Aaron Kimball
 Attachments: MAPREDUCE-910.patch


 incrCounter() is currently a dummy stub method in MRUnit that does nothing. 
 Would be good for the mock reporter/context implementations to support 
 counters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-901) Move Framework Counters into a TaskMetric structure

2009-08-24 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-901:
--

Attachment: 901_1.patch

That was my bad. *sigh*
Attached is the correct patch. The TaskMetrics has a Counters field but that's 
mostly to take care of counters that are related to the FileSystemCounters 
which depends on the FileSystem in use, etc.

 Move Framework Counters into a TaskMetric structure
 ---

 Key: MAPREDUCE-901
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
Affects Versions: 0.21.0
Reporter: Owen O'Malley
Assignee: Devaraj Das
 Fix For: 0.21.0

 Attachments: 901_1.patch, 901_1.patch


 I think we should move all of the Counters that the framework updates into a 
 single class called TaskMetrics. TaskMetrics would have specific fields for 
 each of the metrics like input records, input bytes, output records, etc.
 It would both reduce the serialized size of the heartbeats (by shrinking the 
 Counters down to just the user's counters) and decrease the latency for 
 updates to the JobTracker (since Counters are sent at most 1/minute instead 
 of 1/heartbeat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-768) Configuration information should generate dump in a standard format.

2009-08-24 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747204#action_12747204
 ] 

Hemanth Yamijala commented on MAPREDUCE-768:


Javadocs out of sync in both the APIs JobTracker.dumpConfiguration and 
QueueManager.dumpConfiguration. Other than that, +1.

 Configuration information should generate dump in a standard format.
 

 Key: MAPREDUCE-768
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-768
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: rahul k singh
 Attachments: MAPREDUCE-768-1.patch, MAPREDUCE-768-2.patch, 
 MAPREDUCE-768-3.patch, MAPREDUCE-768.patch


  We need to generate the configuration dump in a standard format .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-824) Support a hierarchy of queues in the capacity scheduler

2009-08-24 Thread Hemanth Yamijala (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747207#action_12747207
 ] 

Hemanth Yamijala commented on MAPREDUCE-824:


This is getting better. I do have some more feedback:

- updateStatsOnRunningJob, addRunningJob, removeRunningJob, removeWaitingJob - 
make private
- ASF licence header should be the first in the src file.
- Replace sortJobQueues with inline method.
- QueueHierarchyBuilder is creating a new instance of the 
CapacityTaskScheduler, which is unnecessary.
- static builder instance also seems unnecessary.
- In QueueHierarchyBuilder, when checking for separator char, 
IllegalArgumentException must show the queue name which failed the check.
- Discuss: Back dependency between QueueHierarchyBuilder and Scheduler - can 
this be avoided.
- AbstractQueue does not override equals, while hashcode is overridden. Also, 
the toString API was previously printing other information. I'd only asked the 
name of the queue to be prepended to it, not to remove the other information.
- It is a little confusing that the number of slots being asserted after task 
assignment does not include the currently scheduled task. Recommend to move the 
asserts before assignment.
- Root should always be set up only in a certain way. I would recommend, 
there's a single static instance of root, which is always got from the capacity 
scheduler, even in tests.
- In testMaxCapacity, rt.update in tests should send in the capacity of the 
clusters to be in sync.
- getTaskDataView() need not be in TaskSchedulingContext. Since it is static, 
it can be called directly from other classes like the scheduler, passing the 
type.
- AbstractQueue.addChildren should be addChild.

Some of the earlier comments are not taken:
- APIs in JobQueuesManager and JobQueue can be folded still.
- mapTSI and reduceTSI member variables of JobQueue are not needed.
- AbstractQueue.getChildren is still public
- getCapacity() should not return max capacity any time. It should always 
return the current capacity or limit, whichever is smaller.




 Support a hierarchy of queues in the capacity scheduler
 ---

 Key: MAPREDUCE-824
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-824
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: contrib/capacity-sched
Reporter: Hemanth Yamijala
 Attachments: HADOOP-824-1.patch, HADOOP-824-2.patch, 
 HADOOP-824-3.patch, HADOOP-824-4.patch, HADOOP-824-5.patch


 Currently in Capacity Scheduler, cluster capacity is divided among the queues 
 based on the queue capacity. These queues typically represent an organization 
 and the capacity of the queue represents the capacity the organization is 
 entitled to. Most organizations are large and need to divide their capacity 
 among sub-organizations they have. Or they may want to divide the capacity 
 based on a category or type of jobs they run. This JIRA covers the 
 requirements and other details to provide the above feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-370) Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.

2009-08-24 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747209#action_12747209
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-370:
---

-1 core tests. Due to test failure TestRecoveryManager (MAPREDUCE-880)

 Change org.apache.hadoop.mapred.lib.MultipleOutputs to use new api.
 ---

 Key: MAPREDUCE-370
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-370
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
Reporter: Amareshwari Sriramadasu
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-370-1.txt, patch-370-2.txt, patch-370.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-777) A method for finding and tracking jobs from the new API

2009-08-24 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated MAPREDUCE-777:


Status: Open  (was: Patch Available)

I'm not happy with this patch. I need to go through it in more depth, but:

1. The setters mostly look right, although some of them are missing the 
assertion that the job is in the setup phase.

2. The getters should move to JobContext.

3. I think JobClient is a bad name for the job browser. Something like 
JobBrowser is probably clearer.



 A method for finding and tracking jobs from the new API
 ---

 Key: MAPREDUCE-777
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-777
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: client
Reporter: Owen O'Malley
Assignee: Amareshwari Sriramadasu
 Fix For: 0.21.0

 Attachments: patch-777-1.txt, patch-777-2.txt, patch-777.txt


 We need to create a replacement interface for the JobClient API in the new 
 interface. In particular, the user needs to be able to query and track jobs 
 that were launched by other processes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-775) Add input/output formatters for Vertica clustered ADBMS.

2009-08-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747222#action_12747222
 ] 

Hadoop QA commented on MAPREDUCE-775:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12415514/MAPREDUCE-775.patch
  against trunk revision 807165.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 16 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/511/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/511/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/511/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-vesta.apache.org/511/console

This message is automatically generated.

 Add input/output formatters for Vertica clustered ADBMS.
 

 Key: MAPREDUCE-775
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-775
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Omer Trajman
 Fix For: 0.21.0

 Attachments: MAPREDUCE-775.patch


 Add native support for Vertica as an input or output format taking advantage 
 of parallel read and write properties of the DBMS.
  
 On the input side allow for parametrized queries (a la prepared statements) 
 and create a split for each combination of parameters.  Also support the 
 parameter list to be generated from a sql statement.  For example - return 
 metrics for all dimensions that meet criteria X with one input split for each 
 dimension.  Divide the read among any number of hosts in the Vertica cluster.
  
 On the output side, support Vertica streaming load to any number of hosts in 
 the Vertica cluster.  Output may be to a different cluster than input.
  
 Also includes Input and Output formatters that support streaming interface.
 Code has been tested and run on live systems under 19 and 20.  Patch for 21 
 with new API will be ready end of this week.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.