okay no attachments...try this gist:

http://gist.github.com/484135

On Jul 21, 2010, at 12:02 AM, Corbin Hoenes wrote:

> Trying to attach the PigRunner class in case that helps give you a start 
> using register script.
> 
> 
> 
> On Jul 20, 2010, at 11:56 PM, Corbin Hoenes wrote:
> 
>> Hey Todd we run against entire pig scripts with some helper classes we built 
>> basically they preprocess the variables then call register script but the 
>> test looks like this:
>> 
>>   @Before
>>   public void setUp() throws Exception {
>>       Helper.delete(OUT_FILE);
>>       runner = new PigRunner();
>>   }
>> 
>> 
>>   @Test
>>   public void testRecordCount() throws Exception {
>>      runner.execute("myscript.pig", "param1=foo","param2=bar");
>> 
>>      Iterator<Tuple> tuples = runner.getPigServer().openIterator("foo");
>>      assertEquals(41L, Helper.countTuples(tuples));
>>   }
>> 
>> It's been very useful for us to test this way.  Would love to see more 
>> chatter about other techniques.
>> 
>> On Jul 20, 2010, at 3:26 PM, ToddG wrote:
>> 
>> 
>>> I'd like to include running various PIG scripts in my continuous build 
>>> system. Of course, I'll only use small datasets for this, and in the 
>>> beginning, I'll only target a local machine instance. However, this brings 
>>> up several questions:
>>> 
>>> 
>>> Q: Whats the best way to run PIG from java? Here's what I'm doing, 
>>> following a pattern I found in some of the pig tests:
>>> 
>>> 1. Create Pig resources in a base class (shamelessly copied from 
>>> PigExecTestCase):
>>> 
>>>  protected MiniCluster cluster;
>>>  protected PigServer pigServer;
>>> 
>>>  @Before
>>>  public void setUp() throws Exception {
>>> 
>>>      String execTypeString = System.getProperty("test.exectype");
>>>      if(execTypeString!=null && execTypeString.length()>0){
>>>          execType = PigServer.parseExecType(execTypeString);
>>>      }
>>>      if(execType == MAPREDUCE) {
>>>          cluster = MiniCluster.buildCluster();
>>>          pigServer = new PigServer(MAPREDUCE, cluster.getProperties());
>>>      } else {
>>>          pigServer = new PigServer(LOCAL);
>>>      }
>>>  }
>>> 
>>> 2. Test classes sub class this to get access to the MiniCluster and 
>>> PigServer (copied from TestPigSplit):
>>> 
>>>  @Test
>>>  public void notestLongEvalSpec() throws Exception{
>>>      inputFileName = "notestLongEvalSpec-input.txt";
>>>      createInput(new String[] {"0\ta"});
>>> 
>>>      pigServer.registerQuery("a = load '" + inputFileName + "';");
>>>      for (int i=0; i< 500; i++){
>>>          pigServer.registerQuery("a = filter a by $0 == '1';");
>>>      }
>>>      Iterator<Tuple> iter = pigServer.openIterator("a");
>>>      while (iter.hasNext()){
>>>          throw new Exception();
>>>      }
>>>  }
>>> 
>>> 3. ERROR
>>> 
>>> This pattern works for simple PIG directives, but I want to load up entire 
>>> pig scripts, which have REGISTER and DEFINE directives, then the 
>>> pigServer.registerQuery() fails with:
>>> 
>>> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error 
>>> during parsing. Unrecognized alias REGISTER
>>>  at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
>>>  at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
>>>  at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
>>>  at org.apache.pig.PigServer.registerQuery(PigServer.java:441)
>>>  at 
>>> com.audiencescience.apollo.reporting.NetworkRevenueReportTest.shouldParseNetworkRevenueReportScript(NetworkRevenueReportTest.java:74)
>>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>  at 
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>  at 
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>> 
>>> Any suggestions?
>>> 
>>> -Todd
>> 
> 

Reply via email to