[ https://issues.apache.org/jira/browse/PIG-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmitriy V. Ryaboy updated PIG-1427: ----------------------------------- Attachment: monitoredUdf.patch The attached patch is a basic sketch of the proposed implementation. It uses the guava library ( http://code.google.com/p/guava-libraries/ ). I tested with r03, but I see that r04 is out now and may be preferable. The real patch will include the appropriate ivy changes, as well as all the apache headers and other niceties. The idea is to create a @MonitoredUDF annotation that a udf author can add to the EvalFunc. If such an annotation is seen on the eval func, its evaluation is wrapped in a java Future, executed in a separate thread, and monitored with a timeout. The most basic usage is possible even now -- just add @MonitoredUDF to EvalFuncs class definitions you expect might time out, and try it. For ease of testing, one can set the timeout interval to the millisecond level. This is based heavily on Florian Leibert's implementation of the same concept. Please take a look and comment. > Monitor and kill runaway UDFs > ----------------------------- > > Key: PIG-1427 > URL: https://issues.apache.org/jira/browse/PIG-1427 > Project: Pig > Issue Type: New Feature > Affects Versions: 0.8.0 > Reporter: Dmitriy V. Ryaboy > Assignee: Dmitriy V. Ryaboy > Attachments: monitoredUdf.patch > > > As a safety measure, it is sometimes useful to monitor UDFs as they execute. > It is often preferable to return null or some other default value instead of > timing out a runaway evaluation and killing a job. We have in the past seen > complex regular expressions lead to job failures due to just half a dozen > (out of millions) particularly obnoxious strings. > It would be great to give Pig users a lightweight way of enabling UDF > monitoring. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.