[
https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Romain Rigaux updated PIG-1404:
-------------------------------
Description:
The goal is to provide a simple xUnit framework that enables our Pig scripts to
be easily:
- unit tested
- regression tested
- quickly prototyped
No cluster set up is required.
For example:
TestCase
{code}
@Test
public void testTop3Queries() {
String[] args = {
"n=3",
};
test = new PigTest("top_queries.pig", args);
String[] input = {
"yahoo\t10",
"twitter\t7",
"facebook\t10",
"yahoo\t15",
"facebook\t5",
....
};
String[] output = {
"(yahoo,25L)",
"(facebook,15L)",
"(twitter,7L)",
};
test.assertOutput("data", input, "queries_limit", output);
}
{code}
top_queries.pig
{code}
data =
LOAD '$input'
AS (query:CHARARRAY, count:INT);
...
queries_sum =
FOREACH queries_group
GENERATE
group AS query,
SUM(queries.count) AS count;
...
queries_limit = LIMIT queries_ordered $n;
STORE queries_limit INTO '$output';
{code}
They are 3 modes:
* LOCAL (if "pigunit.exectype.local" properties is present)
* MAPREDUCE (use the cluster specified in the classpath, same as
HADOOP_CONF_DIR)
** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in
the class path will be: ~/pigtest/conf)
** pointing to an existing cluster (if "pigunit.exectype.cluster" properties is
present)
For now, it would be nice to see how this idea could be integrated in Piggybank
and if PigParser/PigServer could improve their interfaces in order to make
PigUnit simple.
Other components based on PigUnit could be built later:
- standalone MiniCluster
- notion of workspaces for each test
- standalone utility that reads test configuration and generates a test
report...
It is a first prototype, open to suggestions and can definitely take advantage
of feedbacks.
How to test, in pig_trunk:
{code}
Apply patch
$pig_trunk ant compile-test
$pig_trunk ant
$pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
{code}
(it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the
future between 'unit' and 'integration')
Many examples are in:
{code}
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
{code}
When used as a standalone, do not forget commons-lang-2.4.jar and the
HADOOP_CONF_DIR to your cluster in your CLASSPATH.
was:
The goal is to provide a simple xUnit framework that enables our Pig scripts to
be easily:
- unit tested
- regression tested
- quickly prototyped
For example:
TestCase
{code}
@Test
public void testTop3Queries() {
String[] args = {
"n=3",
};
test = new PigTest("top_queries.pig", args);
String[] input = {
"yahoo\t10",
"twitter\t7",
"facebook\t10",
"yahoo\t15",
"facebook\t5",
....
};
String[] output = {
"(yahoo,25L)",
"(facebook,15L)",
"(twitter,7L)",
};
test.assertOutput("data", input, "queries_limit", output);
}
{code}
top_queries.pig
{code}
data =
LOAD '$input'
AS (query:CHARARRAY, count:INT);
...
queries_sum =
FOREACH queries_group
GENERATE
group AS query,
SUM(queries.count) AS count;
...
queries_limit = LIMIT queries_ordered $n;
STORE queries_limit INTO '$output';
{code}
They are 3 modes:
* LOCAL (if "pigunit.exectype.local" properties is present)
* MAPREDUCE (use the cluster specified in the classpath, same as
HADOOP_CONF_DIR)
** automatic mini cluster (default)
** pointing to an existing cluster (if "pigunit.exectype.cluster" properties is
present)
For now, it would be nice to see how this idea could be integrated in Piggybank
and if PigParser/PigServer could improve their interfaces in order to make
PigUnit simple.
Other components based on PigUnit could be built later:
- standalone MiniCluster
- notion of workspaces for each test
- standalone utility that reads test configuration and generates a test
report...
It is a first prototype, open to suggestions and can definitely take advantage
of feedbacks.
How to test, in pig_trunk:
{code}
Apply patch
$pig_trunk ant compile-test
$pig_trunk ant
If you use the MiniCluster, the HADOOP_CONF_DIR to have in the class path will
be: ~/pigtest/conf.
$pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
{code}
(it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the
future between 'unit' and 'integration')
Many examples are in:
{code}
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
{code}
> PigUnit - Pig script testing simplified.
> -----------------------------------------
>
> Key: PIG-1404
> URL: https://issues.apache.org/jira/browse/PIG-1404
> Project: Pig
> Issue Type: New Feature
> Reporter: Romain Rigaux
> Fix For: 0.8.0
>
> Attachments: commons-lang-2.4.jar, PIG-1404.patch
>
>
> The goal is to provide a simple xUnit framework that enables our Pig scripts
> to be easily:
> - unit tested
> - regression tested
> - quickly prototyped
> No cluster set up is required.
> For example:
> TestCase
> {code}
> @Test
> public void testTop3Queries() {
> String[] args = {
> "n=3",
> };
> test = new PigTest("top_queries.pig", args);
> String[] input = {
> "yahoo\t10",
> "twitter\t7",
> "facebook\t10",
> "yahoo\t15",
> "facebook\t5",
> ....
> };
> String[] output = {
> "(yahoo,25L)",
> "(facebook,15L)",
> "(twitter,7L)",
> };
> test.assertOutput("data", input, "queries_limit", output);
> }
> {code}
> top_queries.pig
> {code}
> data =
> LOAD '$input'
> AS (query:CHARARRAY, count:INT);
>
> ...
>
> queries_sum =
> FOREACH queries_group
> GENERATE
> group AS query,
> SUM(queries.count) AS count;
>
> ...
>
> queries_limit = LIMIT queries_ordered $n;
> STORE queries_limit INTO '$output';
> {code}
> They are 3 modes:
> * LOCAL (if "pigunit.exectype.local" properties is present)
> * MAPREDUCE (use the cluster specified in the classpath, same as
> HADOOP_CONF_DIR)
> ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in
> the class path will be: ~/pigtest/conf)
> ** pointing to an existing cluster (if "pigunit.exectype.cluster" properties
> is present)
> For now, it would be nice to see how this idea could be integrated in
> Piggybank and if PigParser/PigServer could improve their interfaces in order
> to make PigUnit simple.
> Other components based on PigUnit could be built later:
> - standalone MiniCluster
> - notion of workspaces for each test
> - standalone utility that reads test configuration and generates a test
> report...
> It is a first prototype, open to suggestions and can definitely take
> advantage of feedbacks.
> How to test, in pig_trunk:
> {code}
> Apply patch
> $pig_trunk ant compile-test
> $pig_trunk ant
> $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=999999
> {code}
> (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the
> future between 'unit' and 'integration')
> Many examples are in:
> {code}
> contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
> {code}
> When used as a standalone, do not forget commons-lang-2.4.jar and the
> HADOOP_CONF_DIR to your cluster in your CLASSPATH.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.