[ https://issues.apache.org/jira/browse/MAPREDUCE-2112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918070#action_12918070 ]
Olga Natkovich commented on MAPREDUCE-2112: ------------------------------------------- It is important that the tool supports different column distribution so that we can simulate different kinds of data with it. > Create a Common Data-Generator for Testing Hadoop > ------------------------------------------------- > > Key: MAPREDUCE-2112 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2112 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Reporter: Ranjit Mathew > Priority: Minor > > It is useful to have a common data-generator for testing Hadoop and related > projects. Such a tool > should be able to generate data in a specified format and should be able to > use a Hadoop cluster > for speeding up the data-generation. This tool can then be used across Hadoop > (e.g. GridMix3), > Pig, Hive, etc. reducing the need for each project to invent something like > this itself. > We can use the data-generator used in PigMix2 (PIG-200) as a starting point. > It is described > in [http://wiki.apache.org/pig/DataGeneratorHadoop]. Since it depends on the > SDSU > Java library ([http://www.eli.sdsu.edu/java-SDSU/]) released under the GNU > GPL, it has to be > modified a bit to eliminate this dependency before it can be included in > Apache Hadoop. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.