[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


  Resolution: Fixed
Release Note: Makes Gridmix emulate HDFS based distributed cache files and 
local file system based distributed cache files.
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I just committed this to trunk.

> Make Gridmix emulate usage of Distributed Cache files
> -
>
> Key: MAPREDUCE-2407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Affects Versions: 0.23.0
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.23.0
>
> Attachments: 2407.patch, 2407.v1.1.patch, 2407.v1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
> emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Status: Patch Available  (was: Open)

> Make Gridmix emulate usage of Distributed Cache files
> -
>
> Key: MAPREDUCE-2407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Affects Versions: 0.23.0
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.23.0
>
> Attachments: 2407.patch, 2407.v1.1.patch, 2407.v1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
> emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Attachment: 2407.v1.1.patch

Attaching new patch updating Amar's offline minor comments.

> Make Gridmix emulate usage of Distributed Cache files
> -
>
> Key: MAPREDUCE-2407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Affects Versions: 0.23.0
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.23.0
>
> Attachments: 2407.patch, 2407.v1.1.patch, 2407.v1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
> emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-23 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Status: Open  (was: Patch Available)

> Make Gridmix emulate usage of Distributed Cache files
> -
>
> Key: MAPREDUCE-2407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Affects Versions: 0.23.0
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.23.0
>
> Attachments: 2407.patch, 2407.v1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
> emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-20 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Status: Patch Available  (was: Open)

> Make Gridmix emulate usage of Distributed Cache files
> -
>
> Key: MAPREDUCE-2407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Affects Versions: 0.23.0
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.23.0
>
> Attachments: 2407.patch, 2407.v1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
> emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-20 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Attachment: 2407.v1.patch

Attaching new patch fixing the release audit warning.

> Make Gridmix emulate usage of Distributed Cache files
> -
>
> Key: MAPREDUCE-2407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Affects Versions: 0.23.0
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.23.0
>
> Attachments: 2407.patch, 2407.v1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
> emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-20 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Status: Open  (was: Patch Available)

> Make Gridmix emulate usage of Distributed Cache files
> -
>
> Key: MAPREDUCE-2407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Affects Versions: 0.23.0
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.23.0
>
> Attachments: 2407.patch, 2407.v1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
> emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-19 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Fix Version/s: 0.23.0
Affects Version/s: 0.23.0
   Status: Patch Available  (was: Open)

> Make Gridmix emulate usage of Distributed Cache files
> -
>
> Key: MAPREDUCE-2407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Affects Versions: 0.23.0
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Fix For: 0.23.0
>
> Attachments: 2407.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
> emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-2407) Make Gridmix emulate usage of Distributed Cache files

2011-05-12 Thread Ravi Gummadi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2407:


Attachment: 2407.patch

Attaching patch that adds emulation of distributed cache load in gridmix 
simulated jobs.

High level details of what this patch does are:

(1) New gridmix configuration property 
"gridmix.distributed-cache-emulation.enable" is added, whose default value is 
true. Setting it to false disables emulation of distributed cache load. 
Irrespective of this config property setting, with -generate option, 
distributed cache files are generated on HDFS by gridmix.
Distributed Cache Emulation is disabled for the case of '-' as input trace(i.e. 
stdin stream instead of file).
Distributed Cache Emulation is disabled for the case where  is on local 
file system.

(2) Behavior of the option -generate is changed. -generate option means (a) 
generate input data in the directory
/input/ and (b) generate distributed cache data needed for emulation of 
distributed cache load of this
trace file in the directory /distributedCache/.
For (a), same old GenerateData MR job is used.
For (b), a new MR job GenerateDistCacheData is added, which is run after 
GenerateData and before submission of simulated jobs.

With -generate option, (a) existence of /input/ directory gives an 
error, similar to current behavior and
(b) existence of /gridmixDistCache/ directory is not an error and leads 
to generation of only the missing/nonexisting distributed cache files under 
/gridmixDistCache/ for the specific trace file. If all the needed 
distributed cache files are already
there, then submission of GenerateDistCacheData job is skipped.

Without -generate option, if emulation of distributed cache load is enabled, 
then gridmix checks if all the needed distributed cache files are available 
under /distributedCache/ and emits an error if any of the expected 
files are missing.

(3) setupDistCacheEmulation : Read the trace file and build a list of 
distributed cache file paths and their file sizes. The
file paths are the mapped paths on the simulated cluster(mapped from original 
cluster's paths to simulated cluster's
paths using
{code}MD5Hash(filePath+timestamp){code} for public distributed cache files
and
{code}MD5Hash(filePath+timestamp+username){code} for private distributed cache 
files.

This list of mappeed file paths along with the file sizes is written to a 
special file
/distributedCache/_distCacheFiles.txt and the file name can be 
configured using
"gridmix.distcache.file.list".

So this means all distributed cache files in the gridmix simulated jobs are 
public distributed cache files but for each private distributed cache file of a 
user of the original cluster (i.e. from trace file), there will be a different 
public distributed cache file on gridmix simulated cluster.

(4) GenerateDistCacheData : The MR job (launched by gridmix if -generate option 
is seen) that generates distributed cache data files on HDFS. Input to this job 
is the special file _distCacheFiles.txt that contains the distributed cache 
file paths and their sizes.
Each map() call generates one distributed cache file.

(5) configureDistCacheFiles : The mapped distributed cache files' paths are 
configured for the simulated jobs' configrations sothat MapReduce framework 
takes care of adding the actual distributed cache load equivalent to original 
cluster's distributed cache load.

> Make Gridmix emulate usage of Distributed Cache files
> -
>
> Key: MAPREDUCE-2407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2407
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/gridmix
>Reporter: Ravi Gummadi
>Assignee: Ravi Gummadi
> Attachments: 2407.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix 
> emulate Distributed Cache load as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira