[jira] Created: (PIG-1485) Need a means to register jars in PigLatin scripts without using the register keyword.
Need a means to register jars in PigLatin scripts without using the register keyword. --- Key: PIG-1485 URL: https://issues.apache.org/jira/browse/PIG-1485 Project: Pig Issue Type: New Feature Reporter: Sameer M Hi Currently to instruct Pig to add a jar to its classpath and also make it available to the map/reduce jobs, we have use the register keyword with path to a jar. This problem with this approach is that it hardcodes the pig latin script to a specific jar version thus making the pig latin script forward incompatible to any version upgrades to that jar. I can see the value for this keyword in an interactive session i.e. when using the Grunt shell however feels like this is an incorrect thing to do in PigLatin script files. Would be great if there was an alternative method to expose a jar to Pig such as using the classpath. This would help scripts do away with the need to embed registers in them and make them agnostic to jar file names. The benefit of using the classpath is that there are lots of hooks to configure it from different environments e.g. $CLASSPATH when invoked from shell, testing frameworks like maven and junit can also inject stuff into classpath and so can frameworks like Oozie. Thanks Sameer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1404) PigUnit - Pig script testing simplified.
[ https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12885094#action_12885094 ] Sameer M commented on PIG-1404: --- Hi Guys FWIW if its not too late, would be great if this API could be part of the pig-core.jar itself. Thanks Sameer PigUnit - Pig script testing simplified. - Key: PIG-1404 URL: https://issues.apache.org/jira/browse/PIG-1404 Project: Pig Issue Type: New Feature Reporter: Romain Rigaux Assignee: Romain Rigaux Fix For: 0.8.0 Attachments: commons-lang-2.4.jar, PIG-1404-2.patch, PIG-1404-3-doc.patch, PIG-1404-3.patch, PIG-1404.patch The goal is to provide a simple xUnit framework that enables our Pig scripts to be easily: - unit tested - regression tested - quickly prototyped No cluster set up is required. For example: TestCase {code} @Test public void testTop3Queries() { String[] args = { n=3, }; test = new PigTest(top_queries.pig, args); String[] input = { yahoo\t10, twitter\t7, facebook\t10, yahoo\t15, facebook\t5, }; String[] output = { (yahoo,25L), (facebook,15L), (twitter,7L), }; test.assertOutput(data, input, queries_limit, output); } {code} top_queries.pig {code} data = LOAD '$input' AS (query:CHARARRAY, count:INT); ... queries_sum = FOREACH queries_group GENERATE group AS query, SUM(queries.count) AS count; ... queries_limit = LIMIT queries_ordered $n; STORE queries_limit INTO '$output'; {code} They are 3 modes: * LOCAL (if pigunit.exectype.local properties is present) * MAPREDUCE (use the cluster specified in the classpath, same as HADOOP_CONF_DIR) ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in the class path will be: ~/pigtest/conf) ** pointing to an existing cluster (if pigunit.exectype.cluster properties is present) For now, it would be nice to see how this idea could be integrated in Piggybank and if PigParser/PigServer could improve their interfaces in order to make PigUnit simple. Other components based on PigUnit could be built later: - standalone MiniCluster - notion of workspaces for each test - standalone utility that reads test configuration and generates a test report... It is a first prototype, open to suggestions and can definitely take advantage of feedbacks. How to test, in pig_trunk: {code} Apply patch $pig_trunk ant compile-test $pig_trunk ant $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=99 {code} (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the future between 'unit' and 'integration') Many examples are in: {code} contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java {code} When used as a standalone, do not forget commons-lang-2.4.jar and the HADOOP_CONF_DIR to your cluster in your CLASSPATH. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1481) PigServer throws exception if it cannot find hadoop-site.xml or core-site.xml
PigServer throws exception if it cannot find hadoop-site.xml or core-site.xml - Key: PIG-1481 URL: https://issues.apache.org/jira/browse/PIG-1481 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Sameer M Hi We've been using the Hadoop MiniCluster to do unit testing of our pig scripts in the following way. MiniCluster minicluster = MiniCluster.buildCluster(2,2); pigServer = new PigServer(ExecType.MAPREDUCE, minicluster.getProperties()); This has been working fine for 0.6 and 0.7. However in the trunk (0.8) looks like there is change due to which an exception is thrown if hadoop-site.xml or core-site.xml is not found in the classpath. org.apache.pig.backend.executionengine.ExecException: ERROR 4010: Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml was found in the classpath).If you plan to use local mode, please put -x local option in command line at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:149) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.init(HExecutionEngine.java:114) at org.apache.pig.impl.PigContext.connect(PigContext.java:177) at org.apache.pig.PigServer.init(PigServer.java:215) at org.apache.pig.PigServer.init(PigServer.java:204) at org.apache.pig.PigServer.init(PigServer.java:200) The problem seems to be org.apache.pig.backend.hadoop.executionengine.HExecutionEngine: 148 if( hadoop_site == null core_site == null ) { throw new ExecException(Cannot find hadoop configurations in classpath (neither hadoop-site.xml nor core-site.xml was found in the classpath). + If you plan to use local mode, please put -x local option in command line, 4010); } We would like to use the mapreduce mode but with the minicluster and have a lot of unit test with that setup. Can this check be removed from this level ? Thanks Sameer -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.