[ https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Romain Rigaux updated PIG-1404: ------------------------------- Description: The goal is to provide a simple xUnit framework that enables our Pig scripts to be easily: - unit tested - regression tested - quickly prototyped For example: TestCase {code} @Test public void testTop3Queries() { String[] args = { "n=3", }; test = new PigTest("top_queries.pig", args); String[] input = { "yahoo\t10", "twitter\t7", "facebook\t10", "yahoo\t15", "facebook\t5", .... }; String[] output = { "(yahoo,25L)", "(facebook,15L)", "(twitter,7L)", }; test.assertOutput("data", input, "queries_limit", output); } {code} top_queries.pig {code} data = LOAD '$input' AS (query:CHARARRAY, count:INT); ... queries_sum = FOREACH queries_group GENERATE group AS query, SUM(queries.count) AS count; ... queries_limit = LIMIT queries_ordered $n; STORE queries_limit INTO '$output'; {code} They are 3 modes: * LOCAL (if "pigunit.exectype.local" properties is present) * MAPREDUCE (use the cluster specified in the classpath, same as HADOOP_CONF_DIR) ** automatic mini cluster (default) ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties is present) For now, it would be nice to see how this idea could be integrated in Piggybank and if PigParser/PigServer could improve their interfaces in order to make PigUnit simple. Other components based on PigUnit could be built later: - standalone MiniCluster - notion of workspaces for each test - standalone utility that reads test configuration and generates a test report... It is a first prototype, open to suggestions and can definitely take advantage of feedbacks. How to test, in pig_trunk: {code} Apply patch $pig_trunk ant compile-test $pig_trunk ant If you use the MiniCluster, the HADOOP_CONF_DIR to have in the class path will be: ~/pigtest/conf. $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999 {code} (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the future between 'unit' and 'integration') Many examples are in: {code} contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java {code} was: The goal is to provide a simple xUnit framework that enables our Pig scripts to be easily: - unit tested - regression tested - quickly prototyped For example: TestCase {code} @Test public void testTop3Queries() { String[] args = { "n=3", }; test = new PigTest("top_queries.pig", args); String[] input = { "yahoo\t10", "twitter\t7", "facebook\t10", "yahoo\t15", "facebook\t5", .... }; String[] output = { "(yahoo,25L)", "(facebook,15L)", "(twitter,7L)", }; test.assertOutput("data", input, "queries_limit", output); } {code} top_queries.pig {code} data = LOAD '$input' AS (query:CHARARRAY, count:INT); ... queries_sum = FOREACH queries_group GENERATE group AS query, SUM(queries.count) AS count; ... queries_limit = LIMIT queries_ordered $n; STORE queries_limit INTO '$output'; {code} They are 3 modes: * LOCAL (if "pigunit.exectype.local" properties is present) * MAPREDUCE (use the cluster specified in the classpath, same as HADOOP_CONF_DIR) ** automatic mini cluster (default) ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties is present) For now, it would be nice to see how this idea could be integrated in Piggybank and if PigParser/PigServer could improve their interfaces in order to make PigUnit simple. Other components based on PigUnit could be built later: - standalone MiniCluster - notion of workspaces for each test - standalone utility that reads test configuration and generates a test report... It is a first prototype, open to suggestions and can definitely take advantage of feedbacks. How to test, in pig_trunk: {code} Apply patch $pig_trunk ant $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999 {code} (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the future between 'unit' and 'integration') Many examples are in: {code} contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java {code} > PigUnit - Pig script testing simplified. > ----------------------------------------- > > Key: PIG-1404 > URL: https://issues.apache.org/jira/browse/PIG-1404 > Project: Pig > Issue Type: New Feature > Reporter: Romain Rigaux > Fix For: 0.8.0 > > Attachments: PIG-1404.patch > > > The goal is to provide a simple xUnit framework that enables our Pig scripts > to be easily: > - unit tested > - regression tested > - quickly prototyped > For example: > TestCase > {code} > @Test > public void testTop3Queries() { > String[] args = { > "n=3", > }; > test = new PigTest("top_queries.pig", args); > String[] input = { > "yahoo\t10", > "twitter\t7", > "facebook\t10", > "yahoo\t15", > "facebook\t5", > .... > }; > String[] output = { > "(yahoo,25L)", > "(facebook,15L)", > "(twitter,7L)", > }; > test.assertOutput("data", input, "queries_limit", output); > } > {code} > top_queries.pig > {code} > data = > LOAD '$input' > AS (query:CHARARRAY, count:INT); > > ... > > queries_sum = > FOREACH queries_group > GENERATE > group AS query, > SUM(queries.count) AS count; > > ... > > queries_limit = LIMIT queries_ordered $n; > STORE queries_limit INTO '$output'; > {code} > They are 3 modes: > * LOCAL (if "pigunit.exectype.local" properties is present) > * MAPREDUCE (use the cluster specified in the classpath, same as > HADOOP_CONF_DIR) > ** automatic mini cluster (default) > ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties > is present) > For now, it would be nice to see how this idea could be integrated in > Piggybank and if PigParser/PigServer could improve their interfaces in order > to make PigUnit simple. > Other components based on PigUnit could be built later: > - standalone MiniCluster > - notion of workspaces for each test > - standalone utility that reads test configuration and generates a test > report... > It is a first prototype, open to suggestions and can definitely take > advantage of feedbacks. > How to test, in pig_trunk: > {code} > Apply patch > $pig_trunk ant compile-test > $pig_trunk ant > If you use the MiniCluster, the HADOOP_CONF_DIR to have in the class path > will be: ~/pigtest/conf. > $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999 > {code} > (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the > future between 'unit' and 'integration') > Many examples are in: > {code} > contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.