[ 
https://issues.apache.org/jira/browse/PIG-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12877585#action_12877585
 ] 

Richard Ding commented on PIG-1333:
-----------------------------------

Thanks for reviewing the patch. My comments are inline.

bq. (1) General comment. This patch is very large and combines multiple 
different issues that could have been separated into multiple patches to make 
it easier to review and test

In this patch JobStats and OutputStats replaced the HJob as the return value of 
execution of a pig script. This results in several deprecated/removed classes. 
I also go through the unit tests to replace most of the deprecated methods used 
there and this causes modification of  many test classes. 

bq. (2) We are missing script level feature collection. (I see the one at job 
level.) For each script, we want to collect overall script features such as 
different operators: join, order by, etc., is it a multiquery, does it have 
UDF. Also, we would want to know if combiner was used and whether the script 
spilled but maybe both of those can be at the job level.

I'll add the script level features, in addition to the job level feature. The 
value of job level feature (pig.job.feature is the key in job xml) will be a 
string, but the value of script level feature (pig.script.features) will be a 
bit set represented by a long.

bq. (3) We need to add separate comment to the JIRA marked as documentation 
that describes PigRunner since it is a new interface that we need to include in 
0.8.0 documentation.

Will do.

bq. (4) MapReduceLauncher. Why was exception handling and temp store handling 
code removed?

Exception handling is still there. Handling of temp stores for failed jobs is 
moved to JobStats.  

bq. (5) OutputStats assumes that location is a path which might not be true for 
non-file stores.

It isn't necessary to use path in OutputStats (just trying to get a more 
readable short name for location). I'll move it.

bq. (6) ScriptState: There are maps/hashes optimized for enums 
(http://java.sun.com/j2se/1.5.0/docs/api/java/util/EnumMap.html)

Dmitriy also commented on this. I'll modified the usage of emums based on the 
comments.

bq. (7) Why JobStats is derived from an operator?

The JobGraph of the script is a derived class from BaseOperatorPlan (in 
experimental package) whose elements are Operators. JobStats is derived from 
Operator in order to use the methods on the base class. 

bq. (8) Why did JOB_NAME_PREFIX got removed from PigContext?

JOB_NAME_PREFIX is still there. What is removed is the private member 'jobName' 
which isn't used in the class and whose getter and setter had been removed long 
time ago. 

bq. (9) Why do we need to synchronize getTemporaryFile?

I'll remove the synchronize on this method. It was copied from the 
corresponding deprecated method.

> API interface to Pig
> --------------------
>
>                 Key: PIG-1333
>                 URL: https://issues.apache.org/jira/browse/PIG-1333
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>         Attachments: PIG-1333.patch
>
>
> It would be nice to make Pig more friendly for applications like workflow 
> that would be executing pig scripts on user behalf.
> Currently, they would have to use pig command line to execute the code; 
> however, this has limitation on the kind of output that would be delivered. 
> For instance, it is hard to produce error information that is easy to use 
> programatically or collect statistics.
> The proposal is to create a class that mimics the behavior of the Main but 
> gives users a status object back. The the main code of pig would look 
> somethig like:
> public static void main(String args[])
> {
>     PigStatus ps = PigMain.exec(args);
>     exit (PigStatus.rc);
> }
> We need to define the following:
> - Content of PigStatus. It should at least include
>    * return code
>    * error string
>    * exception 
>    * statistics
> - A way to propagate the status class through pig code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to