[jira] Commented: (MAPREDUCE-323) Improve the way job history files are managed

2010-07-23 Thread Dick King (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891928#action_12891928
 ] 

Dick King commented on MAPREDUCE-323:
-

   A PROPOSAL

   introduction

The way the completed job history file system now works is that when a job is 
started, an empty history file is created by the job tracker.  The name of the 
file contains nough information about the job to let an application tell 
whether the file documents a job that satisfies a search criterion.  In 
particular, it includes the job tracker instance ID, the job ID, the user name, 
and the job name.

As the job progresses, records get added to the file, and when it's finished 
[either successfully or failed] the file is moved to another directory, the 
completed job history files directory [the "DONE directory"]. Currently this 
directory has a simple flat structure.  If an application [in particular, the 
job history browser] wants some job histories, it reads this directory and 
chooses the files with names that indicate that the files will meet the 
criteria.  In practical cases this can includes hundreds of thousands or even a 
million files.  Note that each job is represented by two files, the history 
file and the config file, doubling the burden on the name node.

 proposal

I would like to implement a simple data base to solve this problem.  My 
proposal has the following features:

1: The DONE directory will contain subdirectories, each containing a few 
hundred or a thousand files.

2: At any time, the job tracker will be filling one of the DONE directory's 
subdirectories.  All the rest are closed out, never to be added to again.

3: The subdirectories have a naming scheme so they're created in 
lexicographical  rder.  We would like to use subdirectory names like 
2010-07-23--, etc [the four digits are a serial number, not an HHMM field].

4: When the job tracker decides to bind off a subdirectory and start a new one, 
it creates a new index file in the subdirectory it's closing out.  That index 
is a simple list of the history files the directory contains.

4a: The job tracker starts a new subdirectory whenever the first history file 
is copied on a given day, and whenever the current subdirectory would otherwise 
contain more than a certain number of files. 

4b: Perhaps the files can be renamed?  These files' names are a few dozen 
characters each, and in a system that has run a half million jobs the names 
collectively occupy 100+ megabytes in the name node.  Significant, but not 
decisive. 

4b1: 4b would require that rumen understand indices.

5: The processing is:

5a: [optional] create a new short name for every file in the subdirectory 
that's being closed out

5a1: The job tracker keeps this information in memory.  It doesn't need to read 
the directory

5b: Write out the index file in a temporary location {{temp-index}} within the 
directory it's indexing.

5b1: The index contains all of the names in text form [if 5a is not use] or all 
pairs of { long name, short name } in text form, if we are shortening the names.

5c: rename the temp-index file to {{index}} when it's done

5d: [optional] If we chose file renaming, delete all of the long names.

6: When doing a search, we 

6a: determine all subdirectories of the DONE directory

6b: see which ones have an index

6c: read each index that exists, and

6d: read all of the files, for the subdirectories that don't have indices yet.

7: To aid retirement of old job history files, the job tracker always binds off 
the current subdirectory when the date changes, even if it doesn't have very 
many files, and we retire files on date boundaries, a subdirectory at a time. 
The relevant date is the date that the file is being moved, which is normally a 
short time after the job is completed.

8: [optional] We may want to consolidate the indices of a completed day in a 
per-day index written as a file directly under the done directory. 

> Improve the way job history files are managed
> -
>
> Key: MAPREDUCE-323
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-323
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.21.0, 0.22.0
>Reporter: Amar Kamat
>Assignee: Dick King
>Priority: Critical
>
> Today all the jobhistory files are dumped in one _job-history_ folder. This 
> can cause problems when there is a need to search the history folder 
> (job-recovery etc). It would be nice if we group all the jobs under a _user_ 
> folder. So all the jobs for user _amar_ will go in _history-folder/amar/_. 
> Jobs can be categorized using various features like _jobid, date, jobname_ 
> 

[jira] Assigned: (MAPREDUCE-1966) Fix tracker blacklisting

2010-07-23 Thread Greg Roelofs (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Roelofs reassigned MAPREDUCE-1966:
---

Assignee: Greg Roelofs

> Fix tracker blacklisting 
> -
>
> Key: MAPREDUCE-1966
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1966
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Arun C Murthy
>Assignee: Greg Roelofs
>
> The current heuristic of rolling up fixed number of job failures per tracker 
> isn't working well, we need better design/heuristics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1966) Fix tracker blacklisting

2010-07-23 Thread Greg Roelofs (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891905#action_12891905
 ] 

Greg Roelofs commented on MAPREDUCE-1966:
-

There's an ambiguity between sick nodes (typically due to failing hardware, 
either hard drive or memory or occasionally NIC/network switch) and nodes that 
have been rendered unresponsive due to user abuse.  The existing blacklist 
heuristics touch on this, but they're a bit ad hoc, and there's not much 
visibility on the internal state at any given time.

One improvement would be to track the per-node, per-job blacklisting history in 
a sliding window that's divided into buckets of some suitable granularity.  Bad 
hardware would tend to show up as an elevated fault level on one node (or a few 
nodes) for an extended period--i.e., multiple buckets--while abusive jobs would 
tend to show up as a spike (ideally) or at least a limited-duration jump in 
faults (one or a few buckets) across many nodes.

Because the heuristics are open to argument even among experts (which would not 
include me), and because automatic, hardcoded blacklisting has the potential to 
wipe out a good fraction of a cluster for the wrong reasons, it would seem best 
to convert the heuristic form of blacklisting to an advisory mode (i.e., 
"graylisting") until the behavior is better understood.

> Fix tracker blacklisting 
> -
>
> Key: MAPREDUCE-1966
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1966
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Arun C Murthy
>
> The current heuristic of rolling up fixed number of job failures per tracker 
> isn't working well, we need better design/heuristics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1967) When a reducer fails on DFS quota, the job should fail immediately

2010-07-23 Thread Dick King (JIRA)
When a reducer fails on DFS quota, the job should fail immediately
--

 Key: MAPREDUCE-1967
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1967
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Dick King


Suppose an M/R job has so much output that the user is certain to exceed hir 
quota.  Then some of the reducers will succeed but the job will get into a 
state where the remaining reducers squabble over the remaining space.  The 
remaining reducers will nibble at the remaining space, and finally one reducer 
will fail on quota.  Its output file will be erased, and the other reducers 
will collectively consume that space until one of _them_ fails on quota.  Since 
the incomplete reducer that fails on quota is "chosen" randomly, the tasks will 
accumulate their failures at similar rates, and the system will have made a 
substantial futile investment.

I would like to say that if a single reducer fails on DFS quota, the job should 
be failed.  There may be a corner case that induces us to think that we 
shouldn't be quite this stringent, but at least we shouldn't have to await four 
failures by one task before shutting the job down.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1961) [gridmix3] ConcurrentModificationException when shutting down Gridmix

2010-07-23 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1961:
-

Status: Patch Available  (was: Open)

> [gridmix3] ConcurrentModificationException when shutting down Gridmix
> -
>
> Key: MAPREDUCE-1961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Hong Tang
>Assignee: Hong Tang
> Attachments: mr-1961-20100723.patch
>
>
> We observed the following exception occasionally at the end of the Gridmix 
> run:
> {code}
> Exception in thread "StatsCollectorThread" 
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.updateAndNotifyClusterStatsListeners(Statistics.java:220)
>   at 
> org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.run(Statistics.java:205)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1961) [gridmix3] ConcurrentModificationException when shutting down Gridmix

2010-07-23 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated MAPREDUCE-1961:
-

Attachment: mr-1961-20100723.patch

Trivial patch that uses CopyOnWriteArrayList to avoid concurrent modification. 
No unit test included as it is hard to reproduce the synchronization bug 
through unit tests.

> [gridmix3] ConcurrentModificationException when shutting down Gridmix
> -
>
> Key: MAPREDUCE-1961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1961
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Hong Tang
>Assignee: Hong Tang
> Attachments: mr-1961-20100723.patch
>
>
> We observed the following exception occasionally at the end of the Gridmix 
> run:
> {code}
> Exception in thread "StatsCollectorThread" 
> java.util.ConcurrentModificationException
>   at 
> java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
>   at java.util.AbstractList$Itr.next(AbstractList.java:343)
>   at 
> org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.updateAndNotifyClusterStatsListeners(Statistics.java:220)
>   at 
> org.apache.hadoop.mapred.gridmix.Statistics$StatCollector.run(Statistics.java:205)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1718) job conf key for the services name of DelegationToken for HFTP url is constructed incorrectly in HFTPFileSystem

2010-07-23 Thread Devaraj Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated MAPREDUCE-1718:
---

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.22.0
   Resolution: Fixed

I just committed this. Thanks, Boris!

> job conf key for the services name of DelegationToken for HFTP url is 
> constructed incorrectly in HFTPFileSystem
> ---
>
> Key: MAPREDUCE-1718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1718-2.patch, MAPREDUCE-1718-3.patch, 
> MAPREDUCE-1718-4.patch, MAPREDUCE-1718-4.patch, MAPREDUCE-1718-BP20-1.patch, 
> MAPREDUCE-1718-BP20-2.patch
>
>
> the key (build in TokenCache) is hdfs.service.host_HOSTNAME.PORT, but 
> in HftpFileSystem it is sometimes built as hdfs.service.host_IP.PORT.
> Fix. change it to always be IP.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-23 Thread Junjie Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891836#action_12891836
 ] 

Junjie Liang commented on MAPREDUCE-1901:
-

To supplement Joydeep's comment:

We are trying to save the number of calls to the NameNode, through the 
following optimizations:

1) Currently, files loaded through hadoop libjars/files/archives mechanism are 
copied onto HDFS and removed on every job. This is inefficient if most jobs are 
submitted from only 3-4 versions of hive, because rightfully the files should 
persist in HDFS to be reused. Hence the idea of decoupling files with their 
jobId to make them sharable across jobs.

2) If files are identified with their md5 checksums, we no longer need to 
verify file modification time in the TT. This saves another call to the 
NameNode to get the FileStatus object.

The reduction in the number of calls to the NameNode is small, but over a large 
number of jobs we believe it will be a noticeable difference.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1966) Fix tracker blacklisting

2010-07-23 Thread Arun C Murthy (JIRA)
Fix tracker blacklisting 
-

 Key: MAPREDUCE-1966
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1966
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobtracker
Reporter: Arun C Murthy


The current heuristic of rolling up fixed number of job failures per tracker 
isn't working well, we need better design/heuristics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1718) job conf key for the services name of DelegationToken for HFTP url is constructed incorrectly in HFTPFileSystem

2010-07-23 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated MAPREDUCE-1718:
--

Attachment: MAPREDUCE-1718-4.patch

merged with trunk.
Ran tests all passed (except TestRumenJobTraces - see  MAPREDUCE-1925)

> job conf key for the services name of DelegationToken for HFTP url is 
> constructed incorrectly in HFTPFileSystem
> ---
>
> Key: MAPREDUCE-1718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1718-2.patch, MAPREDUCE-1718-3.patch, 
> MAPREDUCE-1718-4.patch, MAPREDUCE-1718-4.patch, MAPREDUCE-1718-BP20-1.patch, 
> MAPREDUCE-1718-BP20-2.patch
>
>
> the key (build in TokenCache) is hdfs.service.host_HOSTNAME.PORT, but 
> in HftpFileSystem it is sometimes built as hdfs.service.host_IP.PORT.
> Fix. change it to always be IP.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-23 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891824#action_12891824
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1901:
--

> The DistributedCache already tracks mtimes for files

u - that's what i am saying. if u consider objects as immutable - then u 
don't have to track and look up mtimes. part of the goal here is to not have to 
look up mtimes again and again. if u have an object with matching md5 localized 
- you are done. (but we can't use the names alone for that. names can collide. 
md5 cannot (or nearly so). so we name objects based on their content signature 
(md5) - which is what a content addressible store/cache does).

> Admin installs pig/hive on hdfs:
> /share/hive/v1/hive.jar
> /share/hive/v2/hive.jar

that's not how hive works (or how hadoop streaming works). people deploy hive 
on NFS filers or local disks. users run hive jobs from these installation 
points. there's no hdfs involvement anywhere. people add jars to hive or hadoop 
streaming from their personal or shared folders. when people run hive jobs - 
they are not writing java. there's no .setRemoteJar() code they are writing.

hive loads the required jars (from the install directory) to hadoop via hadoop 
libjars/files/archives functionality. different hive clients are not aware of 
each other (ditto for hadoop streaming). most of the hive clients are running 
from common install points - but people may be running from personal install 
points with altered builds.

with what we have done in this patch - all these uncoordinated clients 
automatically share jars with each other. because the name for the shared 
object now is derived from the content of the object. we are still leveraging 
distributed cache - but we are naming objects based on their contents. Junjie 
tells me we can leverage the 'shared' objects namespace from trunk (in 20 we 
added our own shared namespace).

because the names are based on strong content signature - we can make the 
assumption of immutability. as i have tried to point out many times - when 
objects are immutable - one can make optimizations and skip timestamp based 
validation. the latter requires hdfs lookups and creates load and latency.

note that we need zero application changes for this sharing and zero admin 
overhead. so all sorts of hadoop users will automatically start getting the 
benefit a shared jars without writing any code and without any special admin 
recipe.

isn't that good?


> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-23 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-1925:
-

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

+1

I committed this. Thanks, Ravi!

> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1925.patch, 1925.v1.1.patch, 1925.v1.patch, 
> 1925.v2.1.patch, 1925.v2.patch
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-23 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891812#action_12891812
 ] 

Arun C Murthy commented on MAPREDUCE-1901:
--

Joydeep - Maybe we are talking past each other, yet ...

The DistributedCache _already_ tracks mtimes for files. Each TT, via the 
DistributedCache, localizes the file based on .

This seems sufficient for the use case as I understand it.

Here is the flow:

Admin installs pig/hive on hdfs:
/share/hive/v1/hive.jar
/share/hive/v2/hive.jar

The pig/hive framework, in fact, any MR job then does:

JobConf job = new JobConf();
job.setRemoteJar(new Path("/share/hive/v1/hive.jar")
JobConf.submitJob(job);


That's it. The JobClient has the smarts to use 
DistributedCache.addArchiveToClassPath as the implementation of 
JobConf.setRemoteJar.

If you want a new version of hive.jar, you change hive to use 
/share/hive/v2/hive.jar.

What am I missing here?

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1965) Add info for job failure on jobtracker UI.

2010-07-23 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1965:
-

Fix Version/s: 0.22.0

> Add info for job failure on jobtracker UI.
> --
>
> Key: MAPREDUCE-1965
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1965
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1965-yahoo-hadoop-0.20S.patch
>
>
> MAPREDUCE-1521 added a filed to jobstatus to mark reason for failures of the 
> job. This information needs to be displayed on the jobtracker UI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1965) Add info for job failure on jobtracker UI.

2010-07-23 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1965:
-

Attachment: MAPREDUCE-1965-yahoo-hadoop-0.20S.patch

this patch adds the failure info the jobtracker UI.

> Add info for job failure on jobtracker UI.
> --
>
> Key: MAPREDUCE-1965
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1965
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1965-yahoo-hadoop-0.20S.patch
>
>
> MAPREDUCE-1521 added a filed to jobstatus to mark reason for failures of the 
> job. This information needs to be displayed on the jobtracker UI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1965) Add info for job failure on jobtracker UI.

2010-07-23 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1965:
-

Attachment: MAPREDUCE-1965-yahoo-hadoop-0.20S.patch

forgot --no-prefix.

> Add info for job failure on jobtracker UI.
> --
>
> Key: MAPREDUCE-1965
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1965
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1965-yahoo-hadoop-0.20S.patch, 
> MAPREDUCE-1965-yahoo-hadoop-0.20S.patch
>
>
> MAPREDUCE-1521 added a filed to jobstatus to mark reason for failures of the 
> job. This information needs to be displayed on the jobtracker UI.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1965) Add info for job failure on jobtracker UI.

2010-07-23 Thread Mahadev konar (JIRA)
Add info for job failure on jobtracker UI.
--

 Key: MAPREDUCE-1965
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1965
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mahadev konar
Assignee: Mahadev konar
 Attachments: MAPREDUCE-1965-yahoo-hadoop-0.20S.patch

MAPREDUCE-1521 added a filed to jobstatus to mark reason for failures of the 
job. This information needs to be displayed on the jobtracker UI.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1718) job conf key for the services name of DelegationToken for HFTP url is constructed incorrectly in HFTPFileSystem

2010-07-23 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated MAPREDUCE-1718:
--

Attachment: MAPREDUCE-1718-4.patch

merged with the trunk.
modified test to verify value set in conf .

> job conf key for the services name of DelegationToken for HFTP url is 
> constructed incorrectly in HFTPFileSystem
> ---
>
> Key: MAPREDUCE-1718
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1718
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Attachments: MAPREDUCE-1718-2.patch, MAPREDUCE-1718-3.patch, 
> MAPREDUCE-1718-4.patch, MAPREDUCE-1718-BP20-1.patch, 
> MAPREDUCE-1718-BP20-2.patch
>
>
> the key (build in TokenCache) is hdfs.service.host_HOSTNAME.PORT, but 
> in HftpFileSystem it is sometimes built as hdfs.service.host_IP.PORT.
> Fix. change it to always be IP.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1919) [Herriot] Test for verification of per cache file ref count.

2010-07-23 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891735#action_12891735
 ] 

Konstantin Boudnik commented on MAPREDUCE-1919:
---

Looks good. Please do the same for the trunk and validate through 
{{test-patch}} and by running the test in a cluster.

> [Herriot] Test for verification of per cache file ref  count.
> -
>
> Key: MAPREDUCE-1919
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1919
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1919-ydist-security.patch, 1919-ydist-security.patch, 
> MAPREDUCE-1919.patch
>
>
> It covers the following scenarios.
> 1. Run the job with two distributed cache files and verify whether job is 
> succeeded or not.
> 2.  Run the job with distributed cache files and remove one cache file from 
> the DFS when it is localized.verify whether the job is failed or not.
> 3.  Run the job with two distribute cache files and the size of  one file 
> should be larger than local.cache.size.Verify  whether job is succeeded or 
> not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1955) Because of changes in JobInProgress.java, JobInProgressAspect.aj also needs to change.

2010-07-23 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891732#action_12891732
 ] 

Konstantin Boudnik commented on MAPREDUCE-1955:
---

I'm looking into the MR trunk code and I see that {{protected AtomicBoolean 
tasksInited = new AtomicBoolean(false);}} Besides there's {{public boolean 
inited()}} to access it. So, I don't see how this patch makes any sense for 
trunk?



> Because of changes in JobInProgress.java, JobInProgressAspect.aj also needs 
> to change.
> --
>
> Key: MAPREDUCE-1955
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1955
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1955-ydist-security-patch.txt, 
> JobInProgressAspectaj.patch, MAPREDUCE-1955.patch
>
>
> Because of changes in JobInProgress.java, JobInProgressAspect.aj also needs 
> to change.
> A variable taskInited is changed from Boolean to boolean in 
> JobInProgress.java. So JobInProgressAspect.aj  also needs to change too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1809) Ant build changes for Streaming system tests in contrib projects.

2010-07-23 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891726#action_12891726
 ] 

Konstantin Boudnik commented on MAPREDUCE-1809:
---

looks Ok, please verify as usual

> Ant build changes for Streaming system tests in contrib projects.
> -
>
> Key: MAPREDUCE-1809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1809
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1809-ydist-security.patch, 1809-ydist-security.patch, 
> MAPREDUCE-1809.patch, MAPREDUCE-1809.patch, MAPREDUCE-1809.patch, 
> MAPREDUCE-1809.patch
>
>
> Implementing new target( test-system) in build-contrib.xml file for executing 
> the system test that are in contrib projects. Also adding 'subant'  target in 
> aop.xml that calls the build-contrib.xml file for system tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1933) Create automated testcase for tasktracker dealing with corrupted disk.

2010-07-23 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891725#action_12891725
 ] 

Konstantin Boudnik commented on MAPREDUCE-1933:
---

Let's see...
- {{find src -name *java | xargs grep 'mapred.*.local.dir'}} shows that 
{noformat}
src/java/org/apache/hadoop/mapreduce/util/ConfigUtil.java:
Configuration.addDeprecation("mapred.local.dir", 
{noformat}
- also it finds 
{noformat}
src/java/org/apache/hadoop/mapreduce/MRConfig.java:  
  public static final String LOCAL_DIR = "mapreduce.cluster.local.dir";
{noformat}

Hope it helps.

> Create automated testcase for tasktracker dealing with corrupted disk.
> --
>
> Key: MAPREDUCE-1933
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1933
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1933-ydist-security-patch.txt, MAPREDUCE-1933.patch, 
> MAPREDUCE-1933.patch, TestCorruptedDiskJob.java
>
>
> After the TaskTracker has already run some tasks successfully, "corrupt" a 
> disk by making the corresponding mapred.local.dir unreadable/unwritable. 
> Make sure that jobs continue to succeed even though some tasks scheduled 
> there fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1960) Limit the size of jobconf.

2010-07-23 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1960:
-

Attachment: MAPREDUCE-1960-yahoo-hadoop-0.20S.patch

changed the default limit to 5MB.

> Limit the size of jobconf.
> --
>
> Key: MAPREDUCE-1960
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1960
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1960-yahoo-hadoop-0.20S.patch, 
> MAPREDUCE-1960-yahoo-hadoop-0.20S.patch, 
> MAPREDUCE-1960-yahoo-hadoop-0.20S.patch
>
>
> In some of our production cluster users have huge job.xml's that bring down 
> the jobtracker. THis jira is to put limit on the size of the jobconf, so that 
> we dont blow up the memory on jobtracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-23 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891687#action_12891687
 ] 

Hong Tang commented on MAPREDUCE-1925:
--

Patch looks good to me. +1.

> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1925.patch, 1925.v1.1.patch, 1925.v1.patch, 
> 1925.v2.1.patch, 1925.v2.patch
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-23 Thread Joydeep Sen Sarma (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891674#action_12891674
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1901:
--

@Arun - you are right - this is a layer above distributed cache for the most 
part. Take a look at our use case (bottom of my previous comments). Essentially 
we are extending the Distributed Cache a bit to be a content addressible cache. 
I do not think our use case is directly supported by Hadoop for this purpose - 
and we are hoping to make the change in the framework (instead of Hive) because 
there's nothing Hive specific here and whatever we are doing will be directly 
leveraged by other apps.

Sharing != Content addressible. A NFS filer can be globally shared - but it's 
not content addressible. An EMC Centera (amongst others) is. Sorry - terrible 
examples - trying to come up with something quickly.

Will address Vinod's comments later - we have taken race considerations into 
account.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1960) Limit the size of jobconf.

2010-07-23 Thread Mahadev konar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mahadev konar updated MAPREDUCE-1960:
-

Attachment: MAPREDUCE-1960-yahoo-hadoop-0.20S.patch

updated patch which throws an exception in jobinprogress rather than the 
jobtracker.

> Limit the size of jobconf.
> --
>
> Key: MAPREDUCE-1960
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1960
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1960-yahoo-hadoop-0.20S.patch, 
> MAPREDUCE-1960-yahoo-hadoop-0.20S.patch
>
>
> In some of our production cluster users have huge job.xml's that bring down 
> the jobtracker. THis jira is to put limit on the size of the jobconf, so that 
> we dont blow up the memory on jobtracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1270) Hadoop C++ Extention

2010-07-23 Thread Doug Cutting (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891662#action_12891662
 ] 

Doug Cutting commented on MAPREDUCE-1270:
-

Looks like BSD:

http://www.boost.org/LICENSE_1_0.txt

So we'd just need to append it to LICENSE.txt, noting there which files are 
under this license.

> Hadoop C++ Extention
> 
>
> Key: MAPREDUCE-1270
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1270
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.20.1
> Environment:  hadoop linux
>Reporter: Wang Shouyan
> Attachments: HADOOP-HCE-1.0.0.patch, HCE InstallMenu.pdf, HCE 
> Performance Report.pdf, HCE Tutorial.pdf, Overall Design of Hadoop C++ 
> Extension.doc
>
>
>   Hadoop C++ extension is an internal project in baidu, We start it for these 
> reasons:
>1  To provide C++ API. We mostly use Streaming before, and we also try to 
> use PIPES, but we do not find PIPES is more efficient than Streaming. So we 
> think a new C++ extention is needed for us.
>2  Even using PIPES or Streaming, it is hard to control memory of hadoop 
> map/reduce Child JVM.
>3  It costs so much to read/write/sort TB/PB data by Java. When using 
> PIPES or Streaming, pipe or socket is not efficient to carry so huge data.
>What we want to do: 
>1 We do not use map/reduce Child JVM to do any data processing, which just 
> prepares environment, starts C++ mapper, tells mapper which split it should  
> deal with, and reads report from mapper until that finished. The mapper will 
> read record, ivoke user defined map, to do partition, write spill, combine 
> and merge into file.out. We think these operations can be done by C++ code.
>2 Reducer is similar to mapper, it was started after sort finished, it 
> read from sorted files, ivoke user difined reduce, and write to user defined 
> record writer.
>3 We also intend to rewrite shuffle and sort with C++, for efficience and 
> memory control.
>at first, 1 and 2, then 3.  
>What's the difference with PIPES:
>1 Yes, We will reuse most PIPES code.
>2 And, We should do it more completely, nothing changed in scheduling and 
> management, but everything in execution.
> *UPDATE:*
> Now you can get a test version of HCE from this link 
> http://docs.google.com/leaf?id=0B5xhnqH1558YZjcxZmI0NzEtODczMy00NmZiLWFkNjAtZGM1MjZkMmNkNWFk&hl=zh_CN&pli=1
> This is a full package with all hadoop source code.
> Following document "HCE InstallMenu.pdf" in attachment, you will build and 
> deploy it in your cluster.
> Attachment "HCE Tutorial.pdf" will lead you to write the first HCE program 
> and give other specifications of the interface.
> Attachment "HCE Performance Report.pdf" gives a performance report of HCE 
> compared to Java MapRed and Pipes.
> Any comments are welcomed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-23 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891640#action_12891640
 ] 

Arun C Murthy commented on MAPREDUCE-1901:
--

To re-terate: 
Pre-security - Artifacts in DistributedCache are _already_ shared across jobs, 
no changes needed.
Post-security - MAPREDUCE-774 allows for a shared distributed cache across jobs 
too.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-23 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891633#action_12891633
 ] 

Arun C Murthy commented on MAPREDUCE-1901:
--

bq. I'm proposing a change in the way files are stored in HDFS. Instead of 
storing files in /jobid/files or /jobid/archives, we store them directly in 
{mapred.system.dir}/files and {mapred.system.dir}/archives. This removes the 
association between a file and the job ID, so that files can be persistent 
across jobs.

I'm confused here. The distributed-cache does not write any files to HDFS, it 
merely is configured with a set of files to be copied from HDFS to the compute 
node. Why are we making these changes?

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1270) Hadoop C++ Extention

2010-07-23 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891596#action_12891596
 ] 

Allen Wittenauer commented on MAPREDUCE-1270:
-

This patch appears to contain code from the C++ Boost library. Someone needs to 
do the legwork to determine the legality of the patch.

> Hadoop C++ Extention
> 
>
> Key: MAPREDUCE-1270
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1270
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.20.1
> Environment:  hadoop linux
>Reporter: Wang Shouyan
> Attachments: HADOOP-HCE-1.0.0.patch, HCE InstallMenu.pdf, HCE 
> Performance Report.pdf, HCE Tutorial.pdf, Overall Design of Hadoop C++ 
> Extension.doc
>
>
>   Hadoop C++ extension is an internal project in baidu, We start it for these 
> reasons:
>1  To provide C++ API. We mostly use Streaming before, and we also try to 
> use PIPES, but we do not find PIPES is more efficient than Streaming. So we 
> think a new C++ extention is needed for us.
>2  Even using PIPES or Streaming, it is hard to control memory of hadoop 
> map/reduce Child JVM.
>3  It costs so much to read/write/sort TB/PB data by Java. When using 
> PIPES or Streaming, pipe or socket is not efficient to carry so huge data.
>What we want to do: 
>1 We do not use map/reduce Child JVM to do any data processing, which just 
> prepares environment, starts C++ mapper, tells mapper which split it should  
> deal with, and reads report from mapper until that finished. The mapper will 
> read record, ivoke user defined map, to do partition, write spill, combine 
> and merge into file.out. We think these operations can be done by C++ code.
>2 Reducer is similar to mapper, it was started after sort finished, it 
> read from sorted files, ivoke user difined reduce, and write to user defined 
> record writer.
>3 We also intend to rewrite shuffle and sort with C++, for efficience and 
> memory control.
>at first, 1 and 2, then 3.  
>What's the difference with PIPES:
>1 Yes, We will reuse most PIPES code.
>2 And, We should do it more completely, nothing changed in scheduling and 
> management, but everything in execution.
> *UPDATE:*
> Now you can get a test version of HCE from this link 
> http://docs.google.com/leaf?id=0B5xhnqH1558YZjcxZmI0NzEtODczMy00NmZiLWFkNjAtZGM1MjZkMmNkNWFk&hl=zh_CN&pli=1
> This is a full package with all hadoop source code.
> Following document "HCE InstallMenu.pdf" in attachment, you will build and 
> deploy it in your cluster.
> Attachment "HCE Tutorial.pdf" will lead you to write the first HCE program 
> and give other specifications of the interface.
> Attachment "HCE Performance Report.pdf" gives a performance report of HCE 
> compared to Java MapRed and Pipes.
> Any comments are welcomed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1962) [Herriot] IOException throws and it fails with token expired while running the tests.

2010-07-23 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1962:
-

Summary: [Herriot] IOException throws and it fails with token expired while 
running the tests.  (was: IOException throws and it fails with token expired 
while running the tests.)

> [Herriot] IOException throws and it fails with token expired while running 
> the tests.
> -
>
> Key: MAPREDUCE-1962
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1962
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
>
> Throwing IOException and tests fails due to token is expired. I could see 
> this issue in a secure cluster. 
> This issue has been resolved by setting the following attribute in the 
> configuration before running the tests.
> mapreduce.job.complete.cancel.delegation.tokens=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1809) Ant build changes for Streaming system tests in contrib projects.

2010-07-23 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1809:
-

Attachment: 1809-ydist-security.patch

> Ant build changes for Streaming system tests in contrib projects.
> -
>
> Key: MAPREDUCE-1809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1809
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1809-ydist-security.patch, 1809-ydist-security.patch, 
> MAPREDUCE-1809.patch, MAPREDUCE-1809.patch, MAPREDUCE-1809.patch, 
> MAPREDUCE-1809.patch
>
>
> Implementing new target( test-system) in build-contrib.xml file for executing 
> the system test that are in contrib projects. Also adding 'subant'  target in 
> aop.xml that calls the build-contrib.xml file for system tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-23 Thread Ravi Gummadi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891552#action_12891552
 ] 

Ravi Gummadi commented on MAPREDUCE-1925:
-

Hudson seems to be not responding.
I ran ant test and test-patch myself.

All unit tests passed except the known failure of MAPREDUCE-1834.

ant test-patch gave:

 [exec] +1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 5 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1925.patch, 1925.v1.1.patch, 1925.v1.patch, 
> 1925.v2.1.patch, 1925.v2.patch
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1270) Hadoop C++ Extention

2010-07-23 Thread Dong Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Yang updated MAPREDUCE-1270:
-

Attachment: HADOOP-HCE-1.0.0.patch

HCE-1.0.0.patch for mapreduce trunk (revision 963075)

> Hadoop C++ Extention
> 
>
> Key: MAPREDUCE-1270
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1270
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.20.1
> Environment:  hadoop linux
>Reporter: Wang Shouyan
> Attachments: HADOOP-HCE-1.0.0.patch, HCE InstallMenu.pdf, HCE 
> Performance Report.pdf, HCE Tutorial.pdf, Overall Design of Hadoop C++ 
> Extension.doc
>
>
>   Hadoop C++ extension is an internal project in baidu, We start it for these 
> reasons:
>1  To provide C++ API. We mostly use Streaming before, and we also try to 
> use PIPES, but we do not find PIPES is more efficient than Streaming. So we 
> think a new C++ extention is needed for us.
>2  Even using PIPES or Streaming, it is hard to control memory of hadoop 
> map/reduce Child JVM.
>3  It costs so much to read/write/sort TB/PB data by Java. When using 
> PIPES or Streaming, pipe or socket is not efficient to carry so huge data.
>What we want to do: 
>1 We do not use map/reduce Child JVM to do any data processing, which just 
> prepares environment, starts C++ mapper, tells mapper which split it should  
> deal with, and reads report from mapper until that finished. The mapper will 
> read record, ivoke user defined map, to do partition, write spill, combine 
> and merge into file.out. We think these operations can be done by C++ code.
>2 Reducer is similar to mapper, it was started after sort finished, it 
> read from sorted files, ivoke user difined reduce, and write to user defined 
> record writer.
>3 We also intend to rewrite shuffle and sort with C++, for efficience and 
> memory control.
>at first, 1 and 2, then 3.  
>What's the difference with PIPES:
>1 Yes, We will reuse most PIPES code.
>2 And, We should do it more completely, nothing changed in scheduling and 
> management, but everything in execution.
> *UPDATE:*
> Now you can get a test version of HCE from this link 
> http://docs.google.com/leaf?id=0B5xhnqH1558YZjcxZmI0NzEtODczMy00NmZiLWFkNjAtZGM1MjZkMmNkNWFk&hl=zh_CN&pli=1
> This is a full package with all hadoop source code.
> Following document "HCE InstallMenu.pdf" in attachment, you will build and 
> deploy it in your cluster.
> Attachment "HCE Tutorial.pdf" will lead you to write the first HCE program 
> and give other specifications of the interface.
> Attachment "HCE Performance Report.pdf" gives a performance report of HCE 
> compared to Java MapRed and Pipes.
> Any comments are welcomed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1270) Hadoop C++ Extention

2010-07-23 Thread Dong Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891544#action_12891544
 ] 

Dong Yang commented on MAPREDUCE-1270:
--

Here is a HADOOP-HCE-1.0.0.patch for mapreduce trunk (revision 963075), which 
includes Hadoop C++ Extension (short for HCE) changes to mapreduce-963075.

The steps for using this patch is as follows:
1. Download HADOOP-HCE-1.0.0.patch
2. svn co -r 963075 http://svn.apache.org/repos/asf/hadoop/mapreduce/trunk 
trunk-963075; 
3. cd trunk-963075; 
4. patch -p0 < HADOOP-HCE-1.0.0.patch
5. sh build.sh (need java, forrest and ant)

HCE includes java and c++ codes, which depends on libhdfs, so in this build.sh 
we first check out hdfs trunk and build it.


> Hadoop C++ Extention
> 
>
> Key: MAPREDUCE-1270
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1270
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Affects Versions: 0.20.1
> Environment:  hadoop linux
>Reporter: Wang Shouyan
> Attachments: HCE InstallMenu.pdf, HCE Performance Report.pdf, HCE 
> Tutorial.pdf, Overall Design of Hadoop C++ Extension.doc
>
>
>   Hadoop C++ extension is an internal project in baidu, We start it for these 
> reasons:
>1  To provide C++ API. We mostly use Streaming before, and we also try to 
> use PIPES, but we do not find PIPES is more efficient than Streaming. So we 
> think a new C++ extention is needed for us.
>2  Even using PIPES or Streaming, it is hard to control memory of hadoop 
> map/reduce Child JVM.
>3  It costs so much to read/write/sort TB/PB data by Java. When using 
> PIPES or Streaming, pipe or socket is not efficient to carry so huge data.
>What we want to do: 
>1 We do not use map/reduce Child JVM to do any data processing, which just 
> prepares environment, starts C++ mapper, tells mapper which split it should  
> deal with, and reads report from mapper until that finished. The mapper will 
> read record, ivoke user defined map, to do partition, write spill, combine 
> and merge into file.out. We think these operations can be done by C++ code.
>2 Reducer is similar to mapper, it was started after sort finished, it 
> read from sorted files, ivoke user difined reduce, and write to user defined 
> record writer.
>3 We also intend to rewrite shuffle and sort with C++, for efficience and 
> memory control.
>at first, 1 and 2, then 3.  
>What's the difference with PIPES:
>1 Yes, We will reuse most PIPES code.
>2 And, We should do it more completely, nothing changed in scheduling and 
> management, but everything in execution.
> *UPDATE:*
> Now you can get a test version of HCE from this link 
> http://docs.google.com/leaf?id=0B5xhnqH1558YZjcxZmI0NzEtODczMy00NmZiLWFkNjAtZGM1MjZkMmNkNWFk&hl=zh_CN&pli=1
> This is a full package with all hadoop source code.
> Following document "HCE InstallMenu.pdf" in attachment, you will build and 
> deploy it in your cluster.
> Attachment "HCE Tutorial.pdf" will lead you to write the first HCE program 
> and give other specifications of the interface.
> Attachment "HCE Performance Report.pdf" gives a performance report of HCE 
> compared to Java MapRed and Pipes.
> Any comments are welcomed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1902) job jar file is not distributed via DistributedCache

2010-07-23 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891532#action_12891532
 ] 

Vinod K V commented on MAPREDUCE-1902:
--

Both are equally efficient I think, unless you bring in sharing of job jars 
across jobs also.

It'd definitely help code reuse.

I checked trunk and realized that only a minor difference exists between the 
present way and the dist-cache way. We also un-jar the job.jar so that classes 
inside sub-directories (according to a job-configurable pattern), for e.g., 
lib/, classes/, are also made to be available on class-path. Accommodating it 
should be straight forward.

> job jar file is not distributed via DistributedCache
> 
>
> Key: MAPREDUCE-1902
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1902
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
>
> The main jar file for an job is not distributed via the distributed cache. It 
> would be more efficient if that were the case.
> It would also allow us to comprehensively tackle the inefficiencies in 
> distribution of jar files and such (see MAPREDUCE-1901).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2010-07-23 Thread Vinod K V (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891529#action_12891529
 ] 

Vinod K V commented on MAPREDUCE-1901:
--

bq. Currently, auxiliary files added through DistributedCache.addCacheFiles  
and DistributedCache.addCacheArchive end up in {mapred.system.dir}/job_id/files 
or {mapred.system.dir}/job_id/archives. The /job_id directory is then removed 
after every job, which is why files cannot be reused across jobs.
That is only true for private distributed cache files. Artifacts which are 
already public on the DFS don't go to mapredsystem directly at all and are 
reusable across users/jobs.

bq. 2. it treats shared objects as immutable. meaning that we never look up the 
timestamp of the backing object in hdfs during task localization/validation. 
this saves time during task setup.
bq. 3. reasonable effort has been put to bypass as many hdfs calls as possible 
in step 1. the client gets a listing of all shared objects and their md5 
signatures in one shot. because of the immutability assumption - individual 
file stamps are never required and save hdfs calls.
I think this is an orthogonal. If md5 checksums are preferred over timestamp 
based checks for the sake of lessening DFS accesses, that can be done 
separately within the current design, no? Distributed cache files originally 
did rely on md5 checksum of the files/jars that HDFS itself used to have. 
However that changed via HADOOP-1084 when checksums paved way for block level 
crcs.

bq. 4. finally - there is inbuilt code to do garbage collection of the shared 
namespace (in hdfs) by deleting old shared objects that have not been recently 
accessed.
This is where I think it gets tricky. First, garbage collection of the dfs 
namespace should be accompanied by the same on individual TTs - more complexity.

There are race conditions too. It's not clear how the JobTracker is prevented 
from expiring shared cache files/jars when some JobClient has already marked or 
is in the process of marking those artifacts for usage by the job. Warranting 
such synchronization across JobTracker and JobClients is difficult and, at 
best, brittle. Leaving the synchronization issues unsolved would only mean 
leaving the tasks/job to fail later which is not desirable.

bq. the difference here is that all applications (like Hive) using libjars etc. 
options provided in hadoop automatically share jars with each other (when they 
set this option). the applications don't have to do anything special (like 
figuring out the right global identifier in hdfs for their jars).
That seems like a valid use-case. But as I mentioned above, because of 
complexity and race conditions it seems like a wrong place to develop it. 

I think the core problem is trying to perform a service (sharing of files) that 
strictly belongs to the layer above mapreduce - maintaining the share list 
doesn't seem like a JT's responsibility. The current way of leaving it to the 
users to decide which are public files(and hence shareable) and which are not 
and how and when they are purged, keeps things saner from the mapreduce 
framework point of view. What do you think?

bq. if u can look at the patch a bit - that might help understand the 
differences as well
I looked at the patch. And I am still not convinced. Yet, that is.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1809) Ant build changes for Streaming system tests in contrib projects.

2010-07-23 Thread Balaji Rajagopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891521#action_12891521
 ] 

Balaji Rajagopalan commented on MAPREDUCE-1809:
---

+1

> Ant build changes for Streaming system tests in contrib projects.
> -
>
> Key: MAPREDUCE-1809
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1809
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: build
>Affects Versions: 0.21.0
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1809-ydist-security.patch, MAPREDUCE-1809.patch, 
> MAPREDUCE-1809.patch, MAPREDUCE-1809.patch, MAPREDUCE-1809.patch
>
>
> Implementing new target( test-system) in build-contrib.xml file for executing 
> the system test that are in contrib projects. Also adding 'subant'  target in 
> aop.xml that calls the build-contrib.xml file for system tests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1964) Running hi Ram jobs when TTs are blacklisted

2010-07-23 Thread Vinay Kumar Thota (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891519#action_12891519
 ] 

Vinay Kumar Thota commented on MAPREDUCE-1964:
--

1. Please add some brief description about class.

2. Add java doc information for each public methods.
3.
{noformat}
 +Assert.assertEquals("Job has not been succeeded", 
+  jInfo.getStatus().getRunState(), JobStatus.SUCCEEDED);
{noformat}
don't use the above statement in helper method and it left up to test.

4.
{noformat} 
+  private int runTool(Configuration job, Tool tool, 
+  String[] jobArgs) throws Exception {
+  int returnStatus = ToolRunner.run(job, tool, jobArgs);
+  return returnStatus;
+  }
{noformat}
Instead of writing the separate method use ToolRunner statement directly.

{noformat}
+JobID jobId = helper.runHighRamJob(conf,jobClient,remoteJTClient);
{noformat}

final HighRamJobHelper helper = new HighRamJobHelper();
JobID jobId = helper.runHighRamJob(conf,jobClient,remoteJTClient);
 Make it final and use it locally instead of defining globally.Because its 
using only one place in the class.




> Running hi Ram jobs when TTs are blacklisted
> 
>
> Key: MAPREDUCE-1964
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1964
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Balaji Rajagopalan
> Attachments: hiRam_bList_y20.patch
>
>
> More slots are getting reserved for HiRAM job tasks then required 
> Blacklist more than 25% TTs across the job.  Run high ram job.  No 
> java.lang.RuntimeException should be displayed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1964) Running hi Ram jobs when TTs are blacklisted

2010-07-23 Thread Balaji Rajagopalan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balaji Rajagopalan updated MAPREDUCE-1964:
--

Attachment: hiRam_bList_y20.patch

First patch for review

> Running hi Ram jobs when TTs are blacklisted
> 
>
> Key: MAPREDUCE-1964
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1964
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Balaji Rajagopalan
> Attachments: hiRam_bList_y20.patch
>
>
> More slots are getting reserved for HiRAM job tasks then required 
> Blacklist more than 25% TTs across the job.  Run high ram job.  No 
> java.lang.RuntimeException should be displayed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1964) Running hi Ram jobs when TTs are blacklisted

2010-07-23 Thread Balaji Rajagopalan (JIRA)
Running hi Ram jobs when TTs are blacklisted


 Key: MAPREDUCE-1964
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1964
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Reporter: Balaji Rajagopalan


More slots are getting reserved for HiRAM job tasks then required 

Blacklist more than 25% TTs across the job.  Run high ram job.  No 
java.lang.RuntimeException should be displayed. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1963) [Herriot] TaskMemoryManager should log process-tree's status while killing tasks

2010-07-23 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1963:
-

Attachment: 1963-ydist-security.patch

patch for yahoo security dist branch.

> [Herriot] TaskMemoryManager should log process-tree's status while killing 
> tasks
> 
>
> Key: MAPREDUCE-1963
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1963
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1963-ydist-security.patch
>
>
> 1. Execute a streaming job which will increase memory usage beyond configured 
> memory limits during mapping phase. TaskMemoryManager should logs a map 
> task's process-tree's status just before killing the task. 
> 2. Execute a streaming job which will increase memory usage beyond configured 
> memory limits during reduce phase.  TaskMemoryManager should logs a 
> reduce task's process-tree's status just before killing the task.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1963) [Herriot] TaskMemoryManager should log process-tree's status while killing tasks

2010-07-23 Thread Vinay Kumar Thota (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay Kumar Thota updated MAPREDUCE-1963:
-

Assignee: Vinay Kumar Thota

> [Herriot] TaskMemoryManager should log process-tree's status while killing 
> tasks
> 
>
> Key: MAPREDUCE-1963
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1963
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
>
> 1. Execute a streaming job which will increase memory usage beyond configured 
> memory limits during mapping phase. TaskMemoryManager should logs a map 
> task's process-tree's status just before killing the task. 
> 2. Execute a streaming job which will increase memory usage beyond configured 
> memory limits during reduce phase.  TaskMemoryManager should logs a 
> reduce task's process-tree's status just before killing the task.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1963) [Herriot] TaskMemoryManager should log process-tree's status while killing tasks

2010-07-23 Thread Vinay Kumar Thota (JIRA)
[Herriot] TaskMemoryManager should log process-tree's status while killing tasks


 Key: MAPREDUCE-1963
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1963
 Project: Hadoop Map/Reduce
  Issue Type: Task
  Components: test
Reporter: Vinay Kumar Thota


1. Execute a streaming job which will increase memory usage beyond configured 
memory limits during mapping phase. TaskMemoryManager should logs a map task's 
process-tree's status just before killing the task. 

2. Execute a streaming job which will increase memory usage beyond configured 
memory limits during reduce phase.TaskMemoryManager should logs a 
reduce task's process-tree's status just before killing the task.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1957) [Herriot] Test Job cache directories cleanup after job completes.

2010-07-23 Thread Balaji Rajagopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891501#action_12891501
 ] 

Balaji Rajagopalan commented on MAPREDUCE-1957:
---

+1

> [Herriot] Test Job cache directories cleanup after job completes.
> -
>
> Key: MAPREDUCE-1957
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1957
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>  Components: test
>Reporter: Vinay Kumar Thota
>Assignee: Vinay Kumar Thota
> Attachments: 1957-ydist-security.patch, 1957-ydist-security.patch, 
> 1957-ydist-security.patch
>
>
> Test the job cache directories cleanup after job completes.Test covers the 
> following scenarios.
> 1. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Wait till the job 
> completes and verify whether the files and folders are cleaned up or not.
> 2. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Kill the job and 
> verify whether the files and folders are cleaned up or not.
> 3. Submit a job and create folders and files in work folder with  
> non-writable permissions under task attempt id folder. Fail the job and 
> verify whether the files and folders are cleaned up or not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1925:


Status: Patch Available  (was: Open)

> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1925.patch, 1925.v1.1.patch, 1925.v1.patch, 
> 1925.v2.1.patch, 1925.v2.patch
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (MAPREDUCE-1962) IOException throws and it fails with token expired while running the tests.

2010-07-23 Thread Vinay Kumar Thota (JIRA)
IOException throws and it fails with token expired while running the tests.
---

 Key: MAPREDUCE-1962
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1962
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Vinay Kumar Thota
Assignee: Vinay Kumar Thota


Throwing IOException and tests fails due to token is expired. I could see this 
issue in a secure cluster. 

This issue has been resolved by setting the following attribute in the 
configuration before running the tests.
mapreduce.job.complete.cancel.delegation.tokens=false

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1925:


Attachment: 1925.v2.1.patch

Attaching new patch removing the dependency of InputDemuxer in 
getRewindableInputStream().

> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1925.patch, 1925.v1.1.patch, 1925.v1.patch, 
> 1925.v2.1.patch, 1925.v2.patch
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-23 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1871:
--

   Status: Patch Available  (was: Open)
Affects Version/s: 0.21.0

> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> MAPREDUCE-1871.patch, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch, 
> MAPREDUCE-1871.patch, MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1871) Create automated test scenario for "Collect information about number of tasks succeeded / total per time unit for a tasktracker"

2010-07-23 Thread Iyappan Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Iyappan Srinivasan updated MAPREDUCE-1871:
--

Attachment: MAPREDUCE-1871.patch

patch for trunk making it on top to make sure it gets picked up when patch is 
made available.

> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 
>
> Key: MAPREDUCE-1871
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1871
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>  Components: test
>Reporter: Iyappan Srinivasan
>Assignee: Iyappan Srinivasan
> Attachments: 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> 1871-ydist-security-patch.txt, 1871-ydist-security-patch.txt, 
> MAPREDUCE-1871.patch, MAPREDUCE-1871.patch, MAPREDUCE-1871.patch, 
> MAPREDUCE-1871.patch, MAPREDUCE-1871.patch
>
>
> Create automated test scenario for "Collect information about number of tasks 
> succeeded / total per time unit for a tasktracker"
> 1) Verification of all the above mentioned fields with the specified TTs. 
> Total no. of tasks and successful tasks should be equal to the corresponding 
> no. of tasks specified in TTs logs
> 2)  Fail a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly. 
> 3)  Kill a task on tasktracker.  Node UI should update the status of tasks on 
> that TT accordingly
> 4) Positive Run simultaneous jobs and check if all the fields are populated 
> with proper values of tasks.  Node UI should have correct valiues for all the 
> fields mentioned above. 
> 5)  Check the fields across one hour window  Fields related to hour should be 
> updated after every hour
> 6) Check the fields across one day window  fields related to hour should be 
> updated after every day
> 7) Restart a TT and bring it back.  UI should retain the fields values.  
> 8) Positive Run a bunch of jobs with 0 maps and 0 reduces simultanously.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-1925) TestRumenJobTraces fails in trunk

2010-07-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-1925:


Status: Open  (was: Patch Available)

> TestRumenJobTraces fails in trunk
> -
>
> Key: MAPREDUCE-1925
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1925
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tools/rumen
>Affects Versions: 0.22.0
>Reporter: Amareshwari Sriramadasu
>Assignee: Ravi Gummadi
> Fix For: 0.22.0
>
> Attachments: 1925.patch, 1925.v1.1.patch, 1925.v1.patch, 1925.v2.patch
>
>
> TestRumenJobTraces failed with following error:
> Error Message
> the gold file contains more text at line 1 expected:<56> but was:<0>
> Stacktrace
>   at 
> org.apache.hadoop.tools.rumen.TestRumenJobTraces.testHadoop20JHParser(TestRumenJobTraces.java:294)
> Full log of the failure is available at 
> http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/292/testReport/org.apache.hadoop.tools.rumen/TestRumenJobTraces/testHadoop20JHParser/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.