[jira] Updated: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-22 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1871:
--

Attachment: 1871-ydist-security-patch.txt

That patch is stale. So creating a new patch with exactly the same code.

> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> MAPREDUCE-1871.patch, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch, 
> MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.

2010-07-22 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1919:
-

Attachment: 1919-ydist-security.patch

Addressed cos comments.

> [Herriot] Test for verification of per cache file ref  count.
> -
>
> Key: MAPREDUCE-1919
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1919-ydist-security.patch, 1919-ydist-security.patch, 
> MAPREDUCE-1919.patch
>
>
> It covers the following scenarios.
> 1. Run the job with two distributed cache files and verify whether job is 
> succeeded or not.
> 2.  Run the job with distributed cache files and remove one cache file from 
> the DFS when it is localized.verify whether the job is failed or not.
> 3.  Run the job with two distribute cache files and the size of  one file 
> should be larger than local.cache.size.Verify  whether job is succeeded or 
> not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1955) Because of changes in JobInProgress.java, JobInProgressAspect.aj also needs to change.

2010-07-22 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1955:
--

Attachment: 1955-ydist-security-patch.txt

That patch became stale. Attachign a new patch with exactly the same changes.

> Because of changes in JobInProgress.java, JobInProgressAspect.aj also needs 
> to change.
> --
>
> Key: MAPREDUCE-1955
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1955
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1955-ydist-security-patch.txt, 
> JobInProgressAspectaj.patch, MAPREDUCE-1955.patch
>
>
> Because of changes in JobInProgress.java, JobInProgressAspect.aj also needs 
> to change.
> A variable taskInited is changed from Boolean to boolean in 
> JobInProgress.java. So JobInProgressAspect.aj  also needs to change too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1957) [Herriot] Test Job cache directories cleanup after job completes.

2010-07-22 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1957:
-

Attachment: 1957-ydist-security.patch

Addressed Balaji comments.

> [Herriot] Test Job cache directories cleanup after job completes.
> -
>
> Key: MAPREDUCE-1957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1957
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1957-ydist-security.patch, 1957-ydist-security.patch, 
> 1957-ydist-security.patch
>
>
> Test the job cache directories cleanup after job completes.Test covers the 
> following scenarios.
> 1. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Wait till the job 
> completes and verify whether the files and folders are cleaned up or not.
> 2. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Kill the job and 
> verify whether the files and folders are cleaned up or not.
> 3. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Fail the job and 
> verify whether the files and folders are cleaned up or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1933) Create automated testcase for tasktracker dealing with corrupted disk.

2010-07-22 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1933:
--

Attachment: MAPREDUCE-1933.patch

cancel.delegation token string literal replaced with constant. I couldnt find a 
costant for mapred.local.dir. 


> Create automated testcase for tasktracker dealing with corrupted disk.
> --
>
> Key: MAPREDUCE-1933
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1933-ydist-security-patch.txt, MAPREDUCE-1933.patch, 
> MAPREDUCE-1933.patch, TestCorruptedDiskJob.java
>
>
> After the TaskTracker has already run some tasks successfully, "corrupt" a 
> disk by making the corresponding mapred.local.dir unreadable/unwritable. 
> Make sure that jobs continue to succeed even though some tasks scheduled 
> there fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-22 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891471#action_12891471
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1901:
--

thanks for taking a look. i think there are some differences (and potentially 
some overlap as well) with what we are trying to do here:

1. the jobclient in this approach computes md5 of jars/files/archives (when a 
special option is enabled) and then automatically submits these jars as shared 
objects by putting them in a global namespace - where the (md5,file-name) 
identifies the shared object. (instead of the (jobid, file-name, file-timestamp)

2. it treats shared objects as immutable. meaning that we never look up the 
timestamp of the backing object in hdfs during task localization/validation. 
this saves time during task setup. 

3. reasonable effort has been put to bypass as many hdfs calls as possible in 
step 1. the client gets a listing of all shared objects and their md5 
signatures in one shot. because of the immutability assumption - individual 
file stamps are never required and save hdfs calls.

4. finally - there is inbuilt code to do garbage collection of the shared 
namespace (in hdfs)  by deleting old shared objects that have not been recently 
accessed.

so i believe the scope of this effort is somewhat different (based on looking 
at the last patch for 744).

the difference here is that all applications (like Hive) using libjars etc. 
options provided in hadoop automatically share jars with each other (when they 
set this option). the applications don't have to do anything special (like 
figuring out the right global identifier in hdfs for their jars).

Our primary use case is for Hive. Hive submits multiple jars for each Hadoop 
job. Users can add more. At any given time - we have at least 4-5 official 
versions of Hive being used to submit jobs. in addition - hive developers are 
developing custom builds and submitting jobs using them. total jobs submitted 
per day is tens of thousands.

with this patch - we automatically get sharing of jars and zero administration 
overhead of managing a global namespace amongst many versions of our software 
libraries. I believe there's nothing Hive specific here. We use hadoop jar/file 
resources just like hadoop-streaming and other map-reduce jobs.

before embarking on this venture - we looked at the hadoop code and tried to 
find out whether a similar facility existed. we noticed a md5 class - but no 
uses for it. if there is existing functionality to the above effect - we would 
love to pick it up (less work for us). otherwise - i think this is very useful 
functionality that would be good to have in Hadoop framework.

if u can look at the patch a bit - that might help understand the differences 
as well. 

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1933) Create automated testcase for tasktracker dealing with corrupted disk.

2010-07-22 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1933:
--

Attachment: MAPREDUCE-1933.patch

patch for trunk

> Create automated testcase for tasktracker dealing with corrupted disk.
> --
>
> Key: MAPREDUCE-1933
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1933-ydist-security-patch.txt, MAPREDUCE-1933.patch, 
> TestCorruptedDiskJob.java
>
>
> After the TaskTracker has already run some tasks successfully, "corrupt" a 
> disk by making the corresponding mapred.local.dir unreadable/unwritable. 
> Make sure that jobs continue to succeed even though some tasks scheduled 
> there fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1933) Create automated testcase for tasktracker dealing with corrupted disk.

2010-07-22 Thread Balaji Rajagopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891468#action_12891468
 ] 

Balaji Rajagopalan commented on MAPREDUCE-1933:
---

+1

> Create automated testcase for tasktracker dealing with corrupted disk.
> --
>
> Key: MAPREDUCE-1933
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1933-ydist-security-patch.txt, TestCorruptedDiskJob.java
>
>
> After the TaskTracker has already run some tasks successfully, "corrupt" a 
> disk by making the corresponding mapred.local.dir unreadable/unwritable. 
> Make sure that jobs continue to succeed even though some tasks scheduled 
> there fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1933) Create automated testcase for tasktracker dealing with corrupted disk.

2010-07-22 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1933:
--

Attachment: 1933-ydist-security-patch.txt

patch for 20.1.xxx

review comments addressed.
1) String literals not used.
2) JTClient::isJobStopped used.





> Create automated testcase for tasktracker dealing with corrupted disk.
> --
>
> Key: MAPREDUCE-1933
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1933-ydist-security-patch.txt, TestCorruptedDiskJob.java
>
>
> After the TaskTracker has already run some tasks successfully, "corrupt" a 
> disk by making the corresponding mapred.local.dir unreadable/unwritable. 
> Make sure that jobs continue to succeed even though some tasks scheduled 
> there fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-22 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891461#action_12891461
 ] 

Vinod K V commented on MAPREDUCE-1901:
--

Apologies for not looking at this issue before.

Distributed cache already has the support for sharing of files/archives via 
MAPREDUCE-744. Went into 0.21, may be all you need is a back-port.

The requirements for this issue can be simply met by making the job jar files 
on dfs as public and adding them to distributed cache as files/archives to be 
put on the task's classpath. I don't see anything else needed.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (MAPREDUCE-1566) Need to add a mechanism to import tokens and secrets into a submitted job.

2010-07-22 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das resolved MAPREDUCE-1566.


  Assignee: Jitendra Nath Pandey  (was: Owen O'Malley)
Resolution: Fixed

I just committed this. Thanks, Jitendra & Owen!

> Need to add a mechanism to import tokens and secrets into a submitted job.
> --
>
> Key: MAPREDUCE-1566
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1566
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Jitendra Nath Pandey
> Fix For: 0.22.0
>
> Attachments: mr-1566-1.1.patch, mr-1566-1.patch, MR-1566.1.patch, 
> MR-1566.2.patch, MR-1566.3.patch
>
>
> We need to include tokens and secrets into a submitted job. I propose adding 
> a configuration attribute that when pointed at a token storage file will 
> include the tokens and secrets from that token storage file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1961) [gridmix3] ConcurrentModificationException when shutting down Gridmix

2010-07-22 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891431#action_12891431
 ] 

Hong Tang commented on MAPREDUCE-1961:
--

It seems that StatsCollectorThread is still active when we call 
"clusterStatlisteners.clear();" from "Statistics.shutdown()".

A simple fix is to replace "clusterStatlisteners.clear();" with 
"lusterStatlisteners = new ArrayList>()".

While we are at it, we probably should also fix the same problem with 
jobStatListeners.

> [gridmix3] ConcurrentModificationException when shutting down Gridmix
> -
>
> Key: MAPREDUCE-1961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Hong Tang
>
> We observed the following exception occasionally at the end of the Gridmix 
> run:
> {code}
> Exception in thread "StatsCollectorThread" 
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.updateAndNotifyClusterStatsListeners(Statistics.java:220)
>   at 
> org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.run(Statistics.java:205)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (MAPREDUCE-1961) [gridmix3] ConcurrentModificationException when shutting down Gridmix

2010-07-22 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang reassigned MAPREDUCE-1961:


Assignee: Hong Tang

> [gridmix3] ConcurrentModificationException when shutting down Gridmix
> -
>
> Key: MAPREDUCE-1961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Hong Tang
>Assignee: Hong Tang
>
> We observed the following exception occasionally at the end of the Gridmix 
> run:
> {code}
> Exception in thread "StatsCollectorThread" 
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.updateAndNotifyClusterStatsListeners(Statistics.java:220)
>   at 
> org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.run(Statistics.java:205)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1961) [gridmix3] ConcurrentModificationException when shutting down Gridmix

2010-07-22 Thread Hong Tang (JIRA)
[gridmix3] ConcurrentModificationException when shutting down Gridmix
-

 Key: MAPREDUCE-1961
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1961
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Hong Tang


We observed the following exception occasionally at the end of the Gridmix run:

{code}
Exception in thread "StatsCollectorThread" 
java.util.ConcurrentModificationException
  at 
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
  at java.util.AbstractList$Itr.next(AbstractList.java:343)
  at 
org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.updateAndNotifyClusterStatsListeners(Statistics.java:220)
  at 
org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.run(Statistics.java:205)
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1718) job conf key for the services name of DelegationToken for HFTP url is constructed incorrectly in HFTPFileSystem

2010-07-22 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated MAPREDUCE-1718:
--

Attachment: MAPREDUCE-1718-3.patch

> job conf key for the services name of DelegationToken for HFTP url is 
> constructed incorrectly in HFTPFileSystem
> ---
>
> Key: MAPREDUCE-1718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1718-2.patch, MAPREDUCE-1718-3.patch, 
> MAPREDUCE-1718-BP20-1.patch, MAPREDUCE-1718-BP20-2.patch
>
>
> the key (build in TokenCache) is hdfs.service.host_HOSTNAME.PORT, but 
> in HftpFileSystem it is sometimes built as hdfs.service.host_IP.PORT.
> Fix. change it to always be IP.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-22 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891393#action_12891393
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1901:
--

Arun and other hadoop'ers  - it might take JJ sometime to get the patch for 
trunk ready. if u guys have some cycles - it would be good to vet the general 
approach by looking at the patch for 20. I think the code for trunk differs 
primarily in security related aspect (from a quick glance).

we have started testing this patch internally and this would become production 
in a couple of weeks.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1936) [gridmix3] Make Gridmix3 more customizable.

2010-07-22 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1936:
-

Attachment: mr-1936-delta-20.1xx.patch

Delta patch incorporating changes reflecting Chris's comments.

> [gridmix3] Make Gridmix3 more customizable.
> ---
>
> Key: MAPREDUCE-1936
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1936
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: contrib/gridmix
>Reporter: Hong Tang
>Assignee: Hong Tang
> Fix For: 0.22.0
>
> Attachments: mr-1936-20100715.patch, mr-1936-20100720.patch, 
> mr-1936-delta-20.1xx.patch, mr-1936-yhadoop-20.1xx.patch
>
>
> I'd like to make gridmix3 more customizable. Specifically, the proposed 
> customizations include:
> - add (random) location information for each sleep map task.
> - make the parameters used in stress submission load throttling configurable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1566) Need to add a mechanism to import tokens and secrets into a submitted job.

2010-07-22 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891389#action_12891389
 ] 

Jitendra Nath Pandey commented on MAPREDUCE-1566:
-

When credentials object reads a token from file, it clears all exisiting 
credentials. Therefore, merging the two will require another job and mapper 
implementation. It will be cleaner to have a separate test. Also, the mechanism 
to pass tokens via a file is not much related to TokenCache except that it uses 
obtainTokensFromNamenodeInternal method.

> Also, in the mapper, we should look at the credentials via the APIs (as is 
> done in TestTokenCache) instead of reading the file
The map task in the new test gets the tokens from credentials and verifies it 
against the token in the file, therefore it also reads the file.

> Need to add a mechanism to import tokens and secrets into a submitted job.
> --
>
> Key: MAPREDUCE-1566
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1566
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.22.0
>
> Attachments: mr-1566-1.1.patch, mr-1566-1.patch, MR-1566.1.patch, 
> MR-1566.2.patch, MR-1566.3.patch
>
>
> We need to include tokens and secrets into a submitted job. I propose adding 
> a configuration attribute that when pointed at a token storage file will 
> include the tokens and secrets from that token storage file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1733) Authentication between pipes processes and java counterparts.

2010-07-22 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1733:
---

Status: Resolved  (was: Patch Available)
Resolution: Fixed

I just committed this. Thanks, Jitendra!

> Authentication between pipes processes and java counterparts.
> -
>
> Key: MAPREDUCE-1733
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1733
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Fix For: 0.22.0
>
> Attachments: MR-1733-y20.1.patch, MR-1733-y20.2.patch, 
> MR-1733-y20.3.patch, MR-1733.5.patch, MR-1733.6.patch
>
>
> The connection between a pipe process and its parent java process should be 
> authenticated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

2010-07-22 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1073:
-

Attachment: MAPREDUCE-1073--yhadoop20--2010-07-22--1530.patch

I revised the patch to not add an API to read and set set the property that 
tells MapTask.TrackedRecordReader to not record
progress as it reads the input; just read and set the property "by hand" in the 
code.  Since this is a
pipes-specific feature, it should be handled only by a focused attribute, which 
I then renamed to
mapred.pipes.disable.record.reader.progress .

In 
https://issues.apache.org/jira/browse/MAPREDUCE-1073?focusedCommentId=12891327&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12891327
 , the API for marking a job as having mappers who will indicate their own 
progress is now to just set the property, which I have renamed from 
{{mapred.job.disable.record.reader.progress}} to 
{{mapred.pipes.disable.record.reader.progress}} , because this is a pipes-only 
concept.

> Progress reported for pipes tasks is incorrect.
> ---
>
> Key: MAPREDUCE-1073
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: pipes
>Affects Versions: 0.20.1
>Reporter: Sreekanth Ramakrishnan
>Assignee: Dick King
> Attachments: mapreduce-1073--2010-03-31.patch, 
> mapreduce-1073--2010-04-06.patch, 
> MAPREDUCE-1073--yhadoop20--2010-07-22--1530.patch, 
> MAPREDUCE-1073--yhadoop20--2010-07-22.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, 
> {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader, 
> OutputCollector, Reporter)}} we do the following:
> {code}
> while (input.next(key, value)) {
>   downlink.mapItem(key, value);
>   if(skipping) {
> downlink.flush();
>   }
> }
> {code}
> This would result in consumption of all the records for current task and 
> taking task progress to 100% whereas the actual pipes application would be 
> trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1960) Limit the size of jobconf.

2010-07-22 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1960:
-

Attachment: MAPREDUCE-1960-yahoo-hadoop-0.20S.patch

this patch adds a limit parameter at the jobtracker and throws an exception on 
job submission if the job.xml exceeds this size. The size is set to a default 
of 10MB. So any job that has a job.xml greater than 10MB will fail on 
submission.

This patch is for yahoo-0.20 branch with tests. Will upload a patch for trunk 
soon.



> Limit the size of jobconf.
> --
>
> Key: MAPREDUCE-1960
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1960
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1960-yahoo-hadoop-0.20S.patch
>
>
> In some of our production cluster users have huge job.xml's that bring down 
> the jobtracker. THis jira is to put limit on the size of the jobconf, so that 
> we dont blow up the memory on jobtracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-22 Thread Junjie Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junjie Liang updated MAPREDUCE-1901:


Attachment: 1901.PATCH

Patch for version 20.2
=

Set "mapred.cache.shared.enabled" to "true" to enable cache files to be shared 
across jobs.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1960) Limit the size of jobconf.

2010-07-22 Thread Mahadev konar (JIRA)
Limit the size of jobconf.
--

 Key: MAPREDUCE-1960
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1960
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Fix For: 0.22.0


In some of our production cluster users have huge job.xml's that bring down the 
jobtracker. THis jira is to put limit on the size of the jobconf, so that we 
dont blow up the memory on jobtracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1733) Authentication between pipes processes and java counterparts.

2010-07-22 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated MAPREDUCE-1733:


Attachment: MR-1733.6.patch

Earlier patch had an issue with automatically generated configure scripts. Rest 
of the patch is exactly same.

> Authentication between pipes processes and java counterparts.
> -
>
> Key: MAPREDUCE-1733
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1733
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Fix For: 0.22.0
>
> Attachments: MR-1733-y20.1.patch, MR-1733-y20.2.patch, 
> MR-1733-y20.3.patch, MR-1733.5.patch, MR-1733.6.patch
>
>
> The connection between a pipe process and its parent java process should be 
> authenticated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1898) [Herriot] Implement a functionality for getting the job summary information of a job.

2010-07-22 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1898:
--

Status: Patch Available  (was: Open)

+1 patch looks good. Let's re-verify before commit. Also, since current 
{{test-patch}} doesn't verify Herriot tests it'd be awesome to have a comment 
from the author on how it works.

> [Herriot] Implement a functionality for getting the job summary information 
> of a job.
> -
>
> Key: MAPREDUCE-1898
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1898
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20.1
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1898-ydist-security.patch, 1898-ydist-security.patch, 
> 1898-ydist-security.patch, 1898-ydist-security.patch, MAPREDUCE-1898.patch, 
> MAPREDUCE-1898.patch, MAPREDUCE-1898.patch
>
>
> Implement a method for getting the job summary details of a job. The job 
> summary should be.
> jobId, startTime, launchTime, finishTime, numMaps, numSlotsPerMap, 
> numReduces, numSlotsPerReduce, user, queue, status, mapSlotSeconds, 
> reduceSlotSeconds, clusterMapCapacity,clusterReduceCapacity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1898) [Herriot] Implement a functionality for getting the job summary information of a job.

2010-07-22 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated MAPREDUCE-1898:
--

Status: Open  (was: Patch Available)

> [Herriot] Implement a functionality for getting the job summary information 
> of a job.
> -
>
> Key: MAPREDUCE-1898
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1898
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20.1
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1898-ydist-security.patch, 1898-ydist-security.patch, 
> 1898-ydist-security.patch, 1898-ydist-security.patch, MAPREDUCE-1898.patch, 
> MAPREDUCE-1898.patch, MAPREDUCE-1898.patch
>
>
> Implement a method for getting the job summary details of a job. The job 
> summary should be.
> jobId, startTime, launchTime, finishTime, numMaps, numSlotsPerMap, 
> numReduces, numSlotsPerReduce, user, queue, status, mapSlotSeconds, 
> reduceSlotSeconds, clusterMapCapacity,clusterReduceCapacity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.

2010-07-22 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891331#action_12891331
 ] 

Konstantin Boudnik commented on MAPREDUCE-1919:
---

The comment from 7/14 is still isn't addressed.

> [Herriot] Test for verification of per cache file ref  count.
> -
>
> Key: MAPREDUCE-1919
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1919-ydist-security.patch, MAPREDUCE-1919.patch
>
>
> It covers the following scenarios.
> 1. Run the job with two distributed cache files and verify whether job is 
> succeeded or not.
> 2.  Run the job with distributed cache files and remove one cache file from 
> the DFS when it is localized.verify whether the job is failed or not.
> 3.  Run the job with two distribute cache files and the size of  one file 
> should be larger than local.cache.size.Verify  whether job is succeeded or 
> not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-22 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891333#action_12891333
 ] 

Konstantin Boudnik commented on MAPREDUCE-1871:
---

Looks good. +1 upon usual verification 

> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch, 
> MAPREDUCE-1871.patch, MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1827) [Herriot] Task Killing/Failing tests for a streaming job.

2010-07-22 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891330#action_12891330
 ] 

Konstantin Boudnik commented on MAPREDUCE-1827:
---

Please use named constants for the parameters like these:
{noformat}
+String runtimeArgs [] = {
+"-D", "mapred.job.name=Numbers Sum",
+"-D", "mapred.map.tasks=1",
+"-D", "mapred.reduce.tasks=1",
+"-D", "mapred.map.max.attempts=1",
+"-D", "mapred.reduce.max.attempts=1"};
{noformat}

Looks good otherwise.

> [Herriot] Task Killing/Failing tests for a streaming job.
> -
>
> Key: MAPREDUCE-1827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1827
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1827-ydist-security.patch, 1827-ydist-security.patch, 
> MAPREDUCE-1827.patch
>
>
> 1. Set the sleep time for the tasks is 3 seconds and kill the task of 
> streaming job using SIGKILL. After that  verify whether task is killed after 
> 3 seconds or not and also verify whether job is succeeded or not.
> 2. Set the maximum attempts for the maps and reducers are one. make the task 
> to fail and verify whether task  is failed or not.Also verify whether the job 
> is failed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

2010-07-22 Thread Dick King (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891328#action_12891328
 ] 

Dick King commented on MAPREDUCE-1073:
--

In my previous comment I should have said that this patch addresses BOTH 
points, and is complete modulo a forward port.

> Progress reported for pipes tasks is incorrect.
> ---
>
> Key: MAPREDUCE-1073
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: pipes
>Affects Versions: 0.20.1
>Reporter: Sreekanth Ramakrishnan
>Assignee: Dick King
> Attachments: mapreduce-1073--2010-03-31.patch, 
> mapreduce-1073--2010-04-06.patch, 
> MAPREDUCE-1073--yhadoop20--2010-07-22.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, 
> {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader, 
> OutputCollector, Reporter)}} we do the following:
> {code}
> while (input.next(key, value)) {
>   downlink.mapItem(key, value);
>   if(skipping) {
> downlink.flush();
>   }
> }
> {code}
> This would result in consumption of all the records for current task and 
> taking task progress to 100% whereas the actual pipes application would be 
> trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1073) Progress reported for pipes tasks is incorrect.

2010-07-22 Thread Dick King (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dick King updated MAPREDUCE-1073:
-

Attachment: MAPREDUCE-1073--yhadoop20--2010-07-22.patch

The previous versions of this attachment missed one point.

The basic problem is that with the existing code base the progress is based on 
the records read from the input split, but there is buffering in the way pipes 
works.  This makes the tasks appear to have made more progress than they 
deserve to have made, in jobs where the input splits are small.

To make speculation work under pipes with small input splits, two conditions 
have to be met:

1: The pipes code has to have an API to report progress, and has to use it.  
The old patch met this goal.  You incant {{(&context)->serProgress(float)}} 
within {{HadoopPipes::Mapper.map(HadoopPipes::MapContext& context)}} .  This 
does require that you have a way of measuring progress,which I consider likely 
because this is only needed when the input splits are small, which implies that 
the "input data" is really a signal to get the real data somewhere else [or to 
generate it].

2: The job has to be able to say that the progress that would otherwise be 
inferred from input split reads has to be ignored.  This newest version of the 
patch does that; you can either call 
{{JobConf.setRecordReaderProgressDisabled(true)}}, or set the attribute 
{{mapred.job.disable.record.reader.progress}} to {{true}} .

This patch addresses the second point.  I did not mark it available because it 
needs a forward port.  I attached it to this issue for comments, and for the 
record.

> Progress reported for pipes tasks is incorrect.
> ---
>
> Key: MAPREDUCE-1073
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1073
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: pipes
>Affects Versions: 0.20.1
>Reporter: Sreekanth Ramakrishnan
>Assignee: Dick King
> Attachments: mapreduce-1073--2010-03-31.patch, 
> mapreduce-1073--2010-04-06.patch, 
> MAPREDUCE-1073--yhadoop20--2010-07-22.patch, MAPREDUCE-1073_yhadoop20.patch
>
>
> Currently in pipes, 
> {{org.apache.hadoop.mapred.pipes.PipesMapRunner.run(RecordReader, 
> OutputCollector, Reporter)}} we do the following:
> {code}
> while (input.next(key, value)) {
>   downlink.mapItem(key, value);
>   if(skipping) {
> downlink.flush();
>   }
> }
> {code}
> This would result in consumption of all the records for current task and 
> taking task progress to 100% whereas the actual pipes application would be 
> trailing behind. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1566) Need to add a mechanism to import tokens and secrets into a submitted job.

2010-07-22 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891288#action_12891288
 ] 

Devaraj Das commented on MAPREDUCE-1566:


Couldn't we enhance TestTokenCache instead of adding a new test. Also, in the 
mapper, we should look at the credentials via the APIs (as is done in 
TestTokenCache) instead of reading the file...

> Need to add a mechanism to import tokens and secrets into a submitted job.
> --
>
> Key: MAPREDUCE-1566
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1566
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.22.0
>
> Attachments: mr-1566-1.1.patch, mr-1566-1.patch, MR-1566.1.patch, 
> MR-1566.2.patch, MR-1566.3.patch
>
>
> We need to include tokens and secrets into a submitted job. I propose adding 
> a configuration attribute that when pointed at a token storage file will 
> include the tokens and secrets from that token storage file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1566) Need to add a mechanism to import tokens and secrets into a submitted job.

2010-07-22 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891267#action_12891267
 ] 

Jitendra Nath Pandey commented on MAPREDUCE-1566:
-

ant test was run manually. All tests pass except TestRumenJobTraces, which is 
unrelated.

> Need to add a mechanism to import tokens and secrets into a submitted job.
> --
>
> Key: MAPREDUCE-1566
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1566
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.22.0
>
> Attachments: mr-1566-1.1.patch, mr-1566-1.patch, MR-1566.1.patch, 
> MR-1566.2.patch, MR-1566.3.patch
>
>
> We need to include tokens and secrets into a submitted job. I propose adding 
> a configuration attribute that when pointed at a token storage file will 
> include the tokens and secrets from that token storage file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1959) Should use long name for token renewer on the client side

2010-07-22 Thread Kan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kan Zhang updated MAPREDUCE-1959:
-

Attachment: m1959-01.patch

A trivial patch.

> Should use long name for token renewer on the client side
> -
>
> Key: MAPREDUCE-1959
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1959
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Reporter: Kan Zhang
>Assignee: Kan Zhang
> Attachments: m1959-01.patch
>
>
> When getting a delegation token from a NN, a client needs to specify the 
> renewer for the token. For use on a MapRed cluster, JT should be specified as 
> the renewer. However, in the current code, the client maps JT's long name 
> (Kerberos principal name) to cluster-internal short name and then sets the 
> short name as the renewer. This is undesirable for 2 reasons. 1) It's 
> unnecessary since NN (or JT) converts client-supplied renewer from long to 
> short name anyway. 2) In principle, the mapping from long to short name 
> should be done on the server. This is consistent with the authentication 
> case, where the client uses the same long name to authenticate to multiple 
> servers and servers map client's long name to their own internal short names. 
> It facilitates using the same job client to get delegation tokens from 
> multiple NN's, which may have different mapping rules for JT.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1959) Should use long name for token renewer on the client side

2010-07-22 Thread Kan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kan Zhang updated MAPREDUCE-1959:
-

Status: Patch Available  (was: Open)

> Should use long name for token renewer on the client side
> -
>
> Key: MAPREDUCE-1959
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1959
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Reporter: Kan Zhang
>Assignee: Kan Zhang
> Attachments: m1959-01.patch
>
>
> When getting a delegation token from a NN, a client needs to specify the 
> renewer for the token. For use on a MapRed cluster, JT should be specified as 
> the renewer. However, in the current code, the client maps JT's long name 
> (Kerberos principal name) to cluster-internal short name and then sets the 
> short name as the renewer. This is undesirable for 2 reasons. 1) It's 
> unnecessary since NN (or JT) converts client-supplied renewer from long to 
> short name anyway. 2) In principle, the mapping from long to short name 
> should be done on the server. This is consistent with the authentication 
> case, where the client uses the same long name to authenticate to multiple 
> servers and servers map client's long name to their own internal short names. 
> It facilitates using the same job client to get delegation tokens from 
> multiple NN's, which may have different mapping rules for JT.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1899) [Herriot] Test jobsummary information for different jobs.

2010-07-22 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1899:
-

Attachment: 1899-ydist-security.patch

> [Herriot] Test jobsummary information for different jobs.
> -
>
> Key: MAPREDUCE-1899
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1899
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1899-ydist-security.patch, 1899-ydist-security.patch, 
> 1899-ydist-security.patch
>
>
> Test the following scenarios.
> 1. Verify the job summary information for killed job.
> 2. Verify the job summary information for failed job.
> 3. Verify the job queue information in job summary after job has successfully 
> completed.
> 4. Verify the job summary information for high ram jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1959) Should use long name for token renewer on the client side

2010-07-22 Thread Kan Zhang (JIRA)
Should use long name for token renewer on the client side
-

 Key: MAPREDUCE-1959
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1959
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Reporter: Kan Zhang
Assignee: Kan Zhang


When getting a delegation token from a NN, a client needs to specify the 
renewer for the token. For use on a MapRed cluster, JT should be specified as 
the renewer. However, in the current code, the client maps JT's long name 
(Kerberos principal name) to cluster-internal short name and then sets the 
short name as the renewer. This is undesirable for 2 reasons. 1) It's 
unnecessary since NN (or JT) converts client-supplied renewer from long to 
short name anyway. 2) In principle, the mapping from long to short name should 
be done on the server. This is consistent with the authentication case, where 
the client uses the same long name to authenticate to multiple servers and 
servers map client's long name to their own internal short names. It 
facilitates using the same job client to get delegation tokens from multiple 
NN's, which may have different mapping rules for JT.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars

2010-07-22 Thread Paul Burkhardt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Burkhardt updated MAPREDUCE-1686:
--

   Status: Patch Available  (was: Open)
Affects Version/s: 0.20.1
   (was: 0.20.2)

> ClassNotFoundException for custom format classes provided in libjars
> 
>
> Key: MAPREDUCE-1686
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.20.1
>Reporter: Paul Burkhardt
>Priority: Minor
> Attachments: HADOOP-1686.patch
>
>
> The StreamUtil::goodClassOrNull method assumes user-provided classes have 
> package names and if not, they are part of the Hadoop Streaming package. For 
> example, using custom InputFormat or OutputFormat classes without package 
> names will fail with a ClassNotFound exception which is not indicative given 
> the classes are provided in the libjars option. Admittedly, most Java 
> packages should have a package name so this should rarely come up.
> Possible resolution options:
> 1) modify the error message to include the actual classname that was 
> attempted in the goodClassOrNull method
> 2) call the Configuration::getClassByName method first and if class not found 
> check for default package name and try the call again
> {code}
> public static Class goodClassOrNull(Configuration conf, String className, 
> String defaultPackage) {
> Class clazz = null;
> try {
> clazz = conf.getClassByName(className);
> } catch (ClassNotFoundException cnf) {
> }
> if (clazz == null) {
> if (className.indexOf('.') == -1 && defaultPackage != null) {
> className = defaultPackage + "." + className;
> try {
> clazz = conf.getClassByName(className);
> } catch (ClassNotFoundException cnf) {
> }
> }
> }
> return clazz;
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1686) ClassNotFoundException for custom format classes provided in libjars

2010-07-22 Thread Paul Burkhardt (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Burkhardt updated MAPREDUCE-1686:
--

Attachment: HADOOP-1686.patch

> ClassNotFoundException for custom format classes provided in libjars
> 
>
> Key: MAPREDUCE-1686
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1686
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/streaming
>Affects Versions: 0.20.2
>Reporter: Paul Burkhardt
>Priority: Minor
> Attachments: HADOOP-1686.patch
>
>
> The StreamUtil::goodClassOrNull method assumes user-provided classes have 
> package names and if not, they are part of the Hadoop Streaming package. For 
> example, using custom InputFormat or OutputFormat classes without package 
> names will fail with a ClassNotFound exception which is not indicative given 
> the classes are provided in the libjars option. Admittedly, most Java 
> packages should have a package name so this should rarely come up.
> Possible resolution options:
> 1) modify the error message to include the actual classname that was 
> attempted in the goodClassOrNull method
> 2) call the Configuration::getClassByName method first and if class not found 
> check for default package name and try the call again
> {code}
> public static Class goodClassOrNull(Configuration conf, String className, 
> String defaultPackage) {
> Class clazz = null;
> try {
> clazz = conf.getClassByName(className);
> } catch (ClassNotFoundException cnf) {
> }
> if (clazz == null) {
> if (className.indexOf('.') == -1 && defaultPackage != null) {
> className = defaultPackage + "." + className;
> try {
> clazz = conf.getClassByName(className);
> } catch (ClassNotFoundException cnf) {
> }
> }
> }
> return clazz;
> }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1898) [Herriot] Implement a functionality for getting the job summary information of a job.

2010-07-22 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1898:
-

Status: Patch Available  (was: Open)

> [Herriot] Implement a functionality for getting the job summary information 
> of a job.
> -
>
> Key: MAPREDUCE-1898
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1898
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20.1
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1898-ydist-security.patch, 1898-ydist-security.patch, 
> 1898-ydist-security.patch, 1898-ydist-security.patch, MAPREDUCE-1898.patch, 
> MAPREDUCE-1898.patch, MAPREDUCE-1898.patch
>
>
> Implement a method for getting the job summary details of a job. The job 
> summary should be.
> jobId, startTime, launchTime, finishTime, numMaps, numSlotsPerMap, 
> numReduces, numSlotsPerReduce, user, queue, status, mapSlotSeconds, 
> reduceSlotSeconds, clusterMapCapacity,clusterReduceCapacity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1898) [Herriot] Implement a functionality for getting the job summary information of a job.

2010-07-22 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1898:
-

Attachment: MAPREDUCE-1898.patch

> [Herriot] Implement a functionality for getting the job summary information 
> of a job.
> -
>
> Key: MAPREDUCE-1898
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1898
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20.1
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1898-ydist-security.patch, 1898-ydist-security.patch, 
> 1898-ydist-security.patch, 1898-ydist-security.patch, MAPREDUCE-1898.patch, 
> MAPREDUCE-1898.patch, MAPREDUCE-1898.patch
>
>
> Implement a method for getting the job summary details of a job. The job 
> summary should be.
> jobId, startTime, launchTime, finishTime, numMaps, numSlotsPerMap, 
> numReduces, numSlotsPerReduce, user, queue, status, mapSlotSeconds, 
> reduceSlotSeconds, clusterMapCapacity,clusterReduceCapacity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1898) [Herriot] Implement a functionality for getting the job summary information of a job.

2010-07-22 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1898:
-

Status: Open  (was: Patch Available)

> [Herriot] Implement a functionality for getting the job summary information 
> of a job.
> -
>
> Key: MAPREDUCE-1898
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1898
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20.1
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1898-ydist-security.patch, 1898-ydist-security.patch, 
> 1898-ydist-security.patch, 1898-ydist-security.patch, MAPREDUCE-1898.patch, 
> MAPREDUCE-1898.patch
>
>
> Implement a method for getting the job summary details of a job. The job 
> summary should be.
> jobId, startTime, launchTime, finishTime, numMaps, numSlotsPerMap, 
> numReduces, numSlotsPerReduce, user, queue, status, mapSlotSeconds, 
> reduceSlotSeconds, clusterMapCapacity,clusterReduceCapacity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-22 Thread Balaji Rajagopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891083#action_12891083
 ] 

Balaji Rajagopalan commented on MAPREDUCE-1871:
---

+1

> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch, 
> MAPREDUCE-1871.patch, MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1827) [Herriot] Task Killing/Failing tests for a streaming job.

2010-07-22 Thread Balaji Rajagopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891081#action_12891081
 ] 

Balaji Rajagopalan commented on MAPREDUCE-1827:
---

+String [] expExcludeList = {"java.net.ConnectException",
+"java.io.IOException"};
+cluster = MRCluster.createCluster(conf);
+cluster.setExcludeExpList(expExcludeList);

You need to remove expecludelist since restart cluster is not part of this test 
case.Otherwise looks good. 

> [Herriot] Task Killing/Failing tests for a streaming job.
> -
>
> Key: MAPREDUCE-1827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1827
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1827-ydist-security.patch, 1827-ydist-security.patch, 
> MAPREDUCE-1827.patch
>
>
> 1. Set the sleep time for the tasks is 3 seconds and kill the task of 
> streaming job using SIGKILL. After that  verify whether task is killed after 
> 3 seconds or not and also verify whether job is succeeded or not.
> 2. Set the maximum attempts for the maps and reducers are one. make the task 
> to fail and verify whether task  is failed or not.Also verify whether the job 
> is failed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1827) [Herriot] Task Killing/Failing tests for a streaming job.

2010-07-22 Thread Balaji Rajagopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891082#action_12891082
 ] 

Balaji Rajagopalan commented on MAPREDUCE-1827:
---

I see you are using toolrunner and you still need expexclude +1

> [Herriot] Task Killing/Failing tests for a streaming job.
> -
>
> Key: MAPREDUCE-1827
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1827
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1827-ydist-security.patch, 1827-ydist-security.patch, 
> MAPREDUCE-1827.patch
>
>
> 1. Set the sleep time for the tasks is 3 seconds and kill the task of 
> streaming job using SIGKILL. After that  verify whether task is killed after 
> 3 seconds or not and also verify whether job is succeeded or not.
> 2. Set the maximum attempts for the maps and reducers are one. make the task 
> to fail and verify whether task  is failed or not.Also verify whether the job 
> is failed or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.

2010-07-22 Thread Balaji Rajagopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891079#action_12891079
 ] 

Balaji Rajagopalan commented on MAPREDUCE-1919:
---

+1

> [Herriot] Test for verification of per cache file ref  count.
> -
>
> Key: MAPREDUCE-1919
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1919-ydist-security.patch, MAPREDUCE-1919.patch
>
>
> It covers the following scenarios.
> 1. Run the job with two distributed cache files and verify whether job is 
> succeeded or not.
> 2.  Run the job with distributed cache files and remove one cache file from 
> the DFS when it is localized.verify whether the job is failed or not.
> 3.  Run the job with two distribute cache files and the size of  one file 
> should be larger than local.cache.size.Verify  whether job is succeeded or 
> not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1898) [Herriot] Implement a functionality for getting the job summary information of a job.

2010-07-22 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1898:
-

Attachment: MAPREDUCE-1898.patch

> [Herriot] Implement a functionality for getting the job summary information 
> of a job.
> -
>
> Key: MAPREDUCE-1898
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1898
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1898-ydist-security.patch, 1898-ydist-security.patch, 
> 1898-ydist-security.patch, 1898-ydist-security.patch, MAPREDUCE-1898.patch, 
> MAPREDUCE-1898.patch
>
>
> Implement a method for getting the job summary details of a job. The job 
> summary should be.
> jobId, startTime, launchTime, finishTime, numMaps, numSlotsPerMap, 
> numReduces, numSlotsPerReduce, user, queue, status, mapSlotSeconds, 
> reduceSlotSeconds, clusterMapCapacity,clusterReduceCapacity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1898) [Herriot] Implement a functionality for getting the job summary information of a job.

2010-07-22 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1898:
-

   Status: Patch Available  (was: Open)
 Hadoop Flags: [Reviewed]
Affects Version/s: 0.20.1

> [Herriot] Implement a functionality for getting the job summary information 
> of a job.
> -
>
> Key: MAPREDUCE-1898
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1898
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Affects Versions: 0.20.1
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1898-ydist-security.patch, 1898-ydist-security.patch, 
> 1898-ydist-security.patch, 1898-ydist-security.patch, MAPREDUCE-1898.patch, 
> MAPREDUCE-1898.patch
>
>
> Implement a method for getting the job summary details of a job. The job 
> summary should be.
> jobId, startTime, launchTime, finishTime, numMaps, numSlotsPerMap, 
> numReduces, numSlotsPerReduce, user, queue, status, mapSlotSeconds, 
> reduceSlotSeconds, clusterMapCapacity,clusterReduceCapacity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1898) [Herriot] Implement a functionality for getting the job summary information of a job.

2010-07-22 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1898:
-

Attachment: 1898-ydist-security.patch

Addressed the comments.

> [Herriot] Implement a functionality for getting the job summary information 
> of a job.
> -
>
> Key: MAPREDUCE-1898
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1898
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1898-ydist-security.patch, 1898-ydist-security.patch, 
> 1898-ydist-security.patch, 1898-ydist-security.patch, MAPREDUCE-1898.patch
>
>
> Implement a method for getting the job summary details of a job. The job 
> summary should be.
> jobId, startTime, launchTime, finishTime, numMaps, numSlotsPerMap, 
> numReduces, numSlotsPerReduce, user, queue, status, mapSlotSeconds, 
> reduceSlotSeconds, clusterMapCapacity,clusterReduceCapacity.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1957) [Herriot] Test Job cache directories cleanup after job completes.

2010-07-22 Thread Balaji Rajagopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891049#action_12891049
 ] 

Balaji Rajagopalan commented on MAPREDUCE-1957:
---

+FinishTaskControlAction action = new FinishTaskControlAction(taskId);
+if (ttClient != null ) {
+  ttClient.getProxy().sendAction(action);
+  String localDirs[] = ttClient.getMapredLocalDirs();
+  TaskAttemptID taskAttID = new TaskAttemptID(taskId, 0);
+  return createFilesInTaskDir(localDirs, jobId, taskAttID, ttClient);

The order of the create file and signal tasks for completion is wrong, you will 
have to create the files and then signal the tasks for completion. Other wise 
the code looks good, nice job.

> [Herriot] Test Job cache directories cleanup after job completes.
> -
>
> Key: MAPREDUCE-1957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1957
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1957-ydist-security.patch, 1957-ydist-security.patch
>
>
> Test the job cache directories cleanup after job completes.Test covers the 
> following scenarios.
> 1. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Wait till the job 
> completes and verify whether the files and folders are cleaned up or not.
> 2. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Kill the job and 
> verify whether the files and folders are cleaned up or not.
> 3. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Fail the job and 
> verify whether the files and folders are cleaned up or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-22 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1871:
--

Attachment: 1871-ydist-security-patch.txt

Cleared some extra files that got added with the 20.1.xxx patch.

> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch, 
> MAPREDUCE-1871.patch, MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1912) [Rumen] Add a driver for Rumen tool

2010-07-22 Thread Amar Kamat (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891039#action_12891039
 ] 

Amar Kamat commented on MAPREDUCE-1912:
---

Nicholas, the next version of the patch has these changes. I will upload the 
latest patch shortly.

> [Rumen] Add a driver for Rumen tool 
> 
>
> Key: MAPREDUCE-1912
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1912
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amar Kamat
>Assignee: Amar Kamat
> Fix For: 0.22.0
>
> Attachments: mapreduce-1912-v1.1.patch
>
>
> Rumen, as a tool, has 2 entry points :
> - Trace builder
> - Folder
> It would be nice to have a single driver program and have 'trace-builder' and 
> 'folder' as its options. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1957) [Herriot] Test Job cache directories cleanup after job completes.

2010-07-22 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1957:
-

Description: 
Test the job cache directories cleanup after job completes.Test covers the 
following scenarios.

1. Submit a job and create folders and files in work folder with  non-writable 
permissions under task attempt id folder. Wait till the job completes and 
verify whether the files and folders are cleaned up or not.

2. Submit a job and create folders and files in work folder with  non-writable 
permissions under task attempt id folder. Kill the job and verify whether the 
files and folders are cleaned up or not.

3. Submit a job and create folders and files in work folder with  non-writable 
permissions under task attempt id folder. Fail the job and verify whether the 
files and folders are cleaned up or not.


  was:
Test the job cache directories cleanup after job completes.



> [Herriot] Test Job cache directories cleanup after job completes.
> -
>
> Key: MAPREDUCE-1957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1957
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1957-ydist-security.patch, 1957-ydist-security.patch
>
>
> Test the job cache directories cleanup after job completes.Test covers the 
> following scenarios.
> 1. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Wait till the job 
> completes and verify whether the files and folders are cleaned up or not.
> 2. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Kill the job and 
> verify whether the files and folders are cleaned up or not.
> 3. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Fail the job and 
> verify whether the files and folders are cleaned up or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.