okay no attachments...try this gist: http://gist.github.com/484135
On Jul 21, 2010, at 12:02 AM, Corbin Hoenes wrote: > Trying to attach the PigRunner class in case that helps give you a start > using register script. > > > > On Jul 20, 2010, at 11:56 PM, Corbin Hoenes wrote: > >> Hey Todd we run against entire pig scripts with some helper classes we built >> basically they preprocess the variables then call register script but the >> test looks like this: >> >> @Before >> public void setUp() throws Exception { >> Helper.delete(OUT_FILE); >> runner = new PigRunner(); >> } >> >> >> @Test >> public void testRecordCount() throws Exception { >> runner.execute("myscript.pig", "param1=foo","param2=bar"); >> >> Iterator<Tuple> tuples = runner.getPigServer().openIterator("foo"); >> assertEquals(41L, Helper.countTuples(tuples)); >> } >> >> It's been very useful for us to test this way. Would love to see more >> chatter about other techniques. >> >> On Jul 20, 2010, at 3:26 PM, ToddG wrote: >> >> >>> I'd like to include running various PIG scripts in my continuous build >>> system. Of course, I'll only use small datasets for this, and in the >>> beginning, I'll only target a local machine instance. However, this brings >>> up several questions: >>> >>> >>> Q: Whats the best way to run PIG from java? Here's what I'm doing, >>> following a pattern I found in some of the pig tests: >>> >>> 1. Create Pig resources in a base class (shamelessly copied from >>> PigExecTestCase): >>> >>> protected MiniCluster cluster; >>> protected PigServer pigServer; >>> >>> @Before >>> public void setUp() throws Exception { >>> >>> String execTypeString = System.getProperty("test.exectype"); >>> if(execTypeString!=null && execTypeString.length()>0){ >>> execType = PigServer.parseExecType(execTypeString); >>> } >>> if(execType == MAPREDUCE) { >>> cluster = MiniCluster.buildCluster(); >>> pigServer = new PigServer(MAPREDUCE, cluster.getProperties()); >>> } else { >>> pigServer = new PigServer(LOCAL); >>> } >>> } >>> >>> 2. Test classes sub class this to get access to the MiniCluster and >>> PigServer (copied from TestPigSplit): >>> >>> @Test >>> public void notestLongEvalSpec() throws Exception{ >>> inputFileName = "notestLongEvalSpec-input.txt"; >>> createInput(new String[] {"0\ta"}); >>> >>> pigServer.registerQuery("a = load '" + inputFileName + "';"); >>> for (int i=0; i< 500; i++){ >>> pigServer.registerQuery("a = filter a by $0 == '1';"); >>> } >>> Iterator<Tuple> iter = pigServer.openIterator("a"); >>> while (iter.hasNext()){ >>> throw new Exception(); >>> } >>> } >>> >>> 3. ERROR >>> >>> This pattern works for simple PIG directives, but I want to load up entire >>> pig scripts, which have REGISTER and DEFINE directives, then the >>> pigServer.registerQuery() fails with: >>> >>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error >>> during parsing. Unrecognized alias REGISTER >>> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170) >>> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114) >>> at org.apache.pig.PigServer.registerQuery(PigServer.java:425) >>> at org.apache.pig.PigServer.registerQuery(PigServer.java:441) >>> at >>> com.audiencescience.apollo.reporting.NetworkRevenueReportTest.shouldParseNetworkRevenueReportScript(NetworkRevenueReportTest.java:74) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> >>> Any suggestions? >>> >>> -Todd >> >