[ 
https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865340#action_12865340
 ] 

Alan Gates commented on PIG-1404:
---------------------------------

This looks really cool.  All the examples of how to use it are very nice.  I 
have a few questions:

# It looks like commons.lang.StringUtils can be pulled from maven, so we'll 
want to add that to the ivy files.
# I don't understand what the purpose of the re-implementation of GruntParser 
is.  Could you explain that a bit?
# (This one is for other pig developers) Is Piggybank the right place for this 
or should we put it under test?  I think this will be really useful for Pig 
users in setting up automated tests of their Pig Latin scripts.  Should we 
support it outright rather than put it in piggybank and risk having it go 
unmaintained?

> PigUnit - Pig script testing simplified. 
> -----------------------------------------
>
>                 Key: PIG-1404
>                 URL: https://issues.apache.org/jira/browse/PIG-1404
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Romain Rigaux
>             Fix For: 0.8.0
>
>         Attachments: commons-lang-2.4.jar, PIG-1404.patch
>
>
> The goal is to provide a simple xUnit framework that enables our Pig scripts 
> to be easily:
>   - unit tested
>   - regression tested
>   - quickly prototyped
> No cluster set up is required.
> For example:
> TestCase
> {code}
>   @Test
>   public void testTop3Queries() {
>     String[] args = {
>         "n=3",        
>         };
>     test = new PigTest("top_queries.pig", args);
>     String[] input = {
>         "yahoo\t10",
>         "twitter\t7",
>         "facebook\t10",
>         "yahoo\t15",
>         "facebook\t5",
>         ....
>     };
>     String[] output = {
>         "(yahoo,25L)",
>         "(facebook,15L)",
>         "(twitter,7L)",
>     };
>     test.assertOutput("data", input, "queries_limit", output);
>   }
> {code}
> top_queries.pig
> {code}
> data =
>     LOAD '$input'
>     AS (query:CHARARRAY, count:INT);
>      
>     ... 
>     
> queries_sum = 
>     FOREACH queries_group 
>     GENERATE 
>         group AS query, 
>         SUM(queries.count) AS count;
>         
>     ...
>             
> queries_limit = LIMIT queries_ordered $n;
> STORE queries_limit INTO '$output';
> {code}
> They are 3 modes:
> * LOCAL (if "pigunit.exectype.local" properties is present)
> * MAPREDUCE (use the cluster specified in the classpath, same as 
> HADOOP_CONF_DIR)
> ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in 
> the class path will be: ~/pigtest/conf)
> ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties 
> is present)
> For now, it would be nice to see how this idea could be integrated in 
> Piggybank and if PigParser/PigServer could improve their interfaces in order 
> to make PigUnit simple.
> Other components based on PigUnit could be built later:
>   - standalone MiniCluster
>   - notion of workspaces for each test
>   - standalone utility that reads test configuration and generates a test 
> report...
> It is a first prototype, open to suggestions and can definitely take 
> advantage of feedbacks.
> How to test, in pig_trunk:
> {code}
> Apply patch
> $pig_trunk ant compile-test
> $pig_trunk ant
> $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
> {code}
> (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the 
> future between 'unit' and 'integration')
> Many examples are in:
> {code}
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
> {code}
> When used as a standalone, do not forget commons-lang-2.4.jar and the 
> HADOOP_CONF_DIR to your cluster in your CLASSPATH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to