[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918070#action_12918070
 ] 

Olga Natkovich commented on MAPREDUCE-2112:
-------------------------------------------

It is important that the tool supports different column distribution so that we 
can simulate different kinds of data with it.

> Create a Common Data-Generator for Testing Hadoop
> -------------------------------------------------
>
>                 Key: MAPREDUCE-2112
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2112
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Ranjit Mathew
>            Priority: Minor
>
> It is useful to have a common data-generator for testing Hadoop and related 
> projects. Such a tool
> should be able to generate data in a specified format and should be able to 
> use a Hadoop cluster
> for speeding up the data-generation. This tool can then be used across Hadoop 
> (e.g. GridMix3),
> Pig, Hive, etc. reducing the need for each project to invent something like 
> this itself.
> We can use the data-generator used in PigMix2 (PIG-200) as a starting point. 
> It is described
> in [http://wiki.apache.org/pig/DataGeneratorHadoop]. Since it depends on the 
> SDSU
> Java library ([http://www.eli.sdsu.edu/java-SDSU/]) released under the GNU 
> GPL, it has to be
> modified a bit to eliminate this dependency before it can be included in 
> Apache Hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to