[jira] Updated: (PIG-1404) PigUnit - Pig script testing simplified.

Romain Rigaux (JIRA) Mon, 03 May 2010 18:12:31 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Romain Rigaux updated PIG-1404:
-------------------------------

    Description: 
The goal is to provide a simple xUnit framework that enables our Pig scripts to 
be easily:
  - unit tested
  - regression tested
  - quickly prototyped

No cluster set up is required.

For example:

TestCase
{code}
  @Test
  public void testTop3Queries() {
    String[] args = {
        "n=3",        
        };
    test = new PigTest("top_queries.pig", args);

    String[] input = {
        "yahoo\t10",
        "twitter\t7",
        "facebook\t10",
        "yahoo\t15",
        "facebook\t5",
        ....
    };

    String[] output = {
        "(yahoo,25L)",
        "(facebook,15L)",
        "(twitter,7L)",
    };

    test.assertOutput("data", input, "queries_limit", output);
  }
{code}

top_queries.pig
{code}
data =
    LOAD '$input'
    AS (query:CHARARRAY, count:INT);
     
    ... 
    
queries_sum = 
    FOREACH queries_group 
    GENERATE 
        group AS query, 
        SUM(queries.count) AS count;
        
    ...
            
queries_limit = LIMIT queries_ordered $n;

STORE queries_limit INTO '$output';
{code}

They are 3 modes:
* LOCAL (if "pigunit.exectype.local" properties is present)
* MAPREDUCE (use the cluster specified in the classpath, same as 
HADOOP_CONF_DIR)
** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in 
the class path will be: ~/pigtest/conf)
** pointing to an existing cluster (if "pigunit.exectype.cluster" properties is 
present)

For now, it would be nice to see how this idea could be integrated in Piggybank 
and if PigParser/PigServer could improve their interfaces in order to make 
PigUnit simple.

Other components based on PigUnit could be built later:
  - standalone MiniCluster
  - notion of workspaces for each test
  - standalone utility that reads test configuration and generates a test 
report...

It is a first prototype, open to suggestions and can definitely take advantage 
of feedbacks.

How to test, in pig_trunk:
{code}
Apply patch
$pig_trunk ant compile-test
$pig_trunk ant
$pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
{code}

(it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the 
future between 'unit' and 'integration')

Many examples are in:
{code}
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
{code}

When used as a standalone, do not forget commons-lang-2.4.jar and the 
HADOOP_CONF_DIR to your cluster in your CLASSPATH.


  was:
The goal is to provide a simple xUnit framework that enables our Pig scripts to 
be easily:
  - unit tested
  - regression tested
  - quickly prototyped

For example:

TestCase
{code}
  @Test
  public void testTop3Queries() {
    String[] args = {
        "n=3",        
        };
    test = new PigTest("top_queries.pig", args);

    String[] input = {
        "yahoo\t10",
        "twitter\t7",
        "facebook\t10",
        "yahoo\t15",
        "facebook\t5",
        ....
    };

    String[] output = {
        "(yahoo,25L)",
        "(facebook,15L)",
        "(twitter,7L)",
    };

    test.assertOutput("data", input, "queries_limit", output);
  }
{code}

top_queries.pig
{code}
data =
    LOAD '$input'
    AS (query:CHARARRAY, count:INT);
     
    ... 
    
queries_sum = 
    FOREACH queries_group 
    GENERATE 
        group AS query, 
        SUM(queries.count) AS count;
        
    ...
            
queries_limit = LIMIT queries_ordered $n;

STORE queries_limit INTO '$output';
{code}

They are 3 modes:
* LOCAL (if "pigunit.exectype.local" properties is present)
* MAPREDUCE (use the cluster specified in the classpath, same as 
HADOOP_CONF_DIR)
** automatic mini cluster (default)
** pointing to an existing cluster (if "pigunit.exectype.cluster" properties is 
present)

For now, it would be nice to see how this idea could be integrated in Piggybank 
and if PigParser/PigServer could improve their interfaces in order to make 
PigUnit simple.

Other components based on PigUnit could be built later:
  - standalone MiniCluster
  - notion of workspaces for each test
  - standalone utility that reads test configuration and generates a test 
report...

It is a first prototype, open to suggestions and can definitely take advantage 
of feedbacks.

How to test, in pig_trunk:
{code}
Apply patch
$pig_trunk ant compile-test
$pig_trunk ant

If you use the MiniCluster, the HADOOP_CONF_DIR to have in the class path will 
be: ~/pigtest/conf.

$pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
{code}

(it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the 
future between 'unit' and 'integration')

Many examples are in:
{code}
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
{code}



> PigUnit - Pig script testing simplified. 
> -----------------------------------------
>
>                 Key: PIG-1404
>                 URL: https://issues.apache.org/jira/browse/PIG-1404
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Romain Rigaux
>             Fix For: 0.8.0
>
>         Attachments: commons-lang-2.4.jar, PIG-1404.patch
>
>
> The goal is to provide a simple xUnit framework that enables our Pig scripts 
> to be easily:
>   - unit tested
>   - regression tested
>   - quickly prototyped
> No cluster set up is required.
> For example:
> TestCase
> {code}
>   @Test
>   public void testTop3Queries() {
>     String[] args = {
>         "n=3",        
>         };
>     test = new PigTest("top_queries.pig", args);
>     String[] input = {
>         "yahoo\t10",
>         "twitter\t7",
>         "facebook\t10",
>         "yahoo\t15",
>         "facebook\t5",
>         ....
>     };
>     String[] output = {
>         "(yahoo,25L)",
>         "(facebook,15L)",
>         "(twitter,7L)",
>     };
>     test.assertOutput("data", input, "queries_limit", output);
>   }
> {code}
> top_queries.pig
> {code}
> data =
>     LOAD '$input'
>     AS (query:CHARARRAY, count:INT);
>      
>     ... 
>     
> queries_sum = 
>     FOREACH queries_group 
>     GENERATE 
>         group AS query, 
>         SUM(queries.count) AS count;
>         
>     ...
>             
> queries_limit = LIMIT queries_ordered $n;
> STORE queries_limit INTO '$output';
> {code}
> They are 3 modes:
> * LOCAL (if "pigunit.exectype.local" properties is present)
> * MAPREDUCE (use the cluster specified in the classpath, same as 
> HADOOP_CONF_DIR)
> ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in 
> the class path will be: ~/pigtest/conf)
> ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties 
> is present)
> For now, it would be nice to see how this idea could be integrated in 
> Piggybank and if PigParser/PigServer could improve their interfaces in order 
> to make PigUnit simple.
> Other components based on PigUnit could be built later:
>   - standalone MiniCluster
>   - notion of workspaces for each test
>   - standalone utility that reads test configuration and generates a test 
> report...
> It is a first prototype, open to suggestions and can definitely take 
> advantage of feedbacks.
> How to test, in pig_trunk:
> {code}
> Apply patch
> $pig_trunk ant compile-test
> $pig_trunk ant
> $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
> {code}
> (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the 
> future between 'unit' and 'integration')
> Many examples are in:
> {code}
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
> {code}
> When used as a standalone, do not forget commons-lang-2.4.jar and the 
> HADOOP_CONF_DIR to your cluster in your CLASSPATH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1404) PigUnit - Pig script testing simplified.

Reply via email to