[ 
https://issues.apache.org/jira/browse/PIG-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872031#action_12872031
 ] 

Ashutosh Chauhan commented on PIG-1427:
---------------------------------------

A useful feature. Couple of comments:

1. Currently in case of time outs and error you are always returning null. It 
will be useful if user can specify a default return value as a definition of 
his annotation which is returned in those cases. For example if my regex fails 
on an input String, I want to return an empty String back. Something like:
{code}
 @MonitoredUDF(timeUnit = TimeUnit.MILLISECONDS, duration = 500, 
defaultReturnValue = "")
{code} 

2. It seems that PigHadoopLogger.getReporter() method accidentally got removed 
in 0.7 and trunk. This needs to be restored. It will be really cool to see how 
many of my input records are faulty on UI. Since, it is a small change, I think 
you can add that getter method in there and then update the appropriate 
counters. 

> Monitor and kill runaway UDFs
> -----------------------------
>
>                 Key: PIG-1427
>                 URL: https://issues.apache.org/jira/browse/PIG-1427
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.8.0
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: monitoredUdf.patch
>
>
> As a safety measure, it is sometimes useful to monitor UDFs as they execute. 
> It is often preferable to return null or some other default value instead of 
> timing out a runaway evaluation and killing a job. We have in the past seen 
> complex regular expressions lead to job failures due to just half a dozen 
> (out of millions) particularly obnoxious strings.
> It would be great to give Pig users a lightweight way of enabling UDF 
> monitoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to