[jira] Created: (PIG-1485) Need a means to register jars in PigLatin scripts without using the register keyword.

2010-07-07 Thread Sameer M (JIRA)
Need a means to register jars in PigLatin scripts without using the register 
keyword.
---

 Key: PIG-1485
 URL: https://issues.apache.org/jira/browse/PIG-1485
 Project: Pig
  Issue Type: New Feature
Reporter: Sameer M


Hi

Currently to instruct Pig to add a jar to its classpath and also make it 
available to the map/reduce jobs, we have use the register keyword with path 
to a jar.

This problem with this approach is that it hardcodes the pig latin script to a 
specific jar version thus making the pig latin script forward incompatible to 
any version upgrades to that jar.
I can see the value for this keyword in an interactive session i.e. when using 
the Grunt shell however feels like this is an incorrect thing to do in PigLatin 
script files.

Would be great if there was an alternative method to expose a jar to Pig such 
as using the classpath. 
This would help scripts do away with the need to embed registers in them and 
make them agnostic to jar file names.

The benefit of using the classpath is that there are lots of hooks to configure 
it from different environments e.g. $CLASSPATH when invoked from shell, testing 
frameworks like maven and junit can also inject stuff into classpath and so can 
frameworks like Oozie.

Thanks
Sameer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1404) PigUnit - Pig script testing simplified.

2010-07-04 Thread Sameer M (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885094#action_12885094
 ] 

Sameer M commented on PIG-1404:
---

Hi Guys

FWIW if its not too late, would be great if this API could be part of the 
pig-core.jar itself.

Thanks
Sameer

 PigUnit - Pig script testing simplified. 
 -

 Key: PIG-1404
 URL: https://issues.apache.org/jira/browse/PIG-1404
 Project: Pig
  Issue Type: New Feature
Reporter: Romain Rigaux
Assignee: Romain Rigaux
 Fix For: 0.8.0

 Attachments: commons-lang-2.4.jar, PIG-1404-2.patch, 
 PIG-1404-3-doc.patch, PIG-1404-3.patch, PIG-1404.patch


 The goal is to provide a simple xUnit framework that enables our Pig scripts 
 to be easily:
   - unit tested
   - regression tested
   - quickly prototyped
 No cluster set up is required.
 For example:
 TestCase
 {code}
   @Test
   public void testTop3Queries() {
 String[] args = {
 n=3,
 };
 test = new PigTest(top_queries.pig, args);
 String[] input = {
 yahoo\t10,
 twitter\t7,
 facebook\t10,
 yahoo\t15,
 facebook\t5,
 
 };
 String[] output = {
 (yahoo,25L),
 (facebook,15L),
 (twitter,7L),
 };
 test.assertOutput(data, input, queries_limit, output);
   }
 {code}
 top_queries.pig
 {code}
 data =
 LOAD '$input'
 AS (query:CHARARRAY, count:INT);
  
 ... 
 
 queries_sum = 
 FOREACH queries_group 
 GENERATE 
 group AS query, 
 SUM(queries.count) AS count;
 
 ...
 
 queries_limit = LIMIT queries_ordered $n;
 STORE queries_limit INTO '$output';
 {code}
 They are 3 modes:
 * LOCAL (if pigunit.exectype.local properties is present)
 * MAPREDUCE (use the cluster specified in the classpath, same as 
 HADOOP_CONF_DIR)
 ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in 
 the class path will be: ~/pigtest/conf)
 ** pointing to an existing cluster (if pigunit.exectype.cluster properties 
 is present)
 For now, it would be nice to see how this idea could be integrated in 
 Piggybank and if PigParser/PigServer could improve their interfaces in order 
 to make PigUnit simple.
 Other components based on PigUnit could be built later:
   - standalone MiniCluster
   - notion of workspaces for each test
   - standalone utility that reads test configuration and generates a test 
 report...
 It is a first prototype, open to suggestions and can definitely take 
 advantage of feedbacks.
 How to test, in pig_trunk:
 {code}
 Apply patch
 $pig_trunk ant compile-test
 $pig_trunk ant
 $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=99
 {code}
 (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the 
 future between 'unit' and 'integration')
 Many examples are in:
 {code}
 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
 {code}
 When used as a standalone, do not forget commons-lang-2.4.jar and the 
 HADOOP_CONF_DIR to your cluster in your CLASSPATH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1481) PigServer throws exception if it cannot find hadoop-site.xml or core-site.xml

2010-07-02 Thread Sameer M (JIRA)
PigServer throws exception if it cannot find hadoop-site.xml or core-site.xml
-

 Key: PIG-1481
 URL: https://issues.apache.org/jira/browse/PIG-1481
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Sameer M


Hi

We've been using the Hadoop MiniCluster to do unit testing of our pig scripts 
in the following way.

MiniCluster minicluster = MiniCluster.buildCluster(2,2);
pigServer = new  PigServer(ExecType.MAPREDUCE, minicluster.getProperties());

This has been working fine for 0.6 and 0.7. 

However in the trunk (0.8) looks like there is change due to which an exception 
is thrown if hadoop-site.xml or core-site.xml is not found in the classpath.

org.apache.pig.backend.executionengine.ExecException: ERROR 4010: Cannot find 
hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml 
was found in the classpath).If you plan to use local mode, please put -x local 
option in command line
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:149)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:114)
at org.apache.pig.impl.PigContext.connect(PigContext.java:177)
at org.apache.pig.PigServer.init(PigServer.java:215)
at org.apache.pig.PigServer.init(PigServer.java:204)
at org.apache.pig.PigServer.init(PigServer.java:200)


The problem seems to be 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine: 148
if( hadoop_site == null  core_site == null ) {
throw new ExecException(Cannot find hadoop configurations in 
classpath (neither hadoop-site.xml nor core-site.xml was found in the 
classpath). +
If you plan to use local mode, please put -x 
local option in command line, 
4010);
}

We would like to use the mapreduce mode but with the minicluster and have a lot 
of unit test with that setup.

Can this check be removed from this level ?

Thanks
Sameer

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.