[ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Giuseppe Santoro updated PIG-3619: ---------------------------------- Attachment: xpath2.patch I have tried to use this UDF but I get some exceptions related to the Function Mapping definition. You define here just one parameter while there are at least two mandatory parameters and one optional. I have fixed that issue in my new patch xpath2.patch you can find attached to this ticket. I have been running this UDF with hundreds of XPath queries and it works really well even with the optional parameter. > Provide XPath function > ---------------------- > > Key: PIG-3619 > URL: https://issues.apache.org/jira/browse/PIG-3619 > Project: Pig > Issue Type: Improvement > Components: piggybank > Reporter: Saad Patel > Assignee: Saad Patel > Fix For: 0.13.0 > > Attachments: xpath.patch, xpath2.patch > > > Xml is often loaded using XMLLoader with a record boundary tag as one of the > parameters. A common use case is to then extract data from those records. > XPath would allow those extractions to be done very easily. I'm proposing a > patch that adds simple XPath support as a UDF. > Example usage of this the XPath UDF would be: > {code} > extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), > XPath(record, 'book/title'); > {code} > The proposed UDF also caches the last xml document. This is helpful for > improving performance when multiple consecutive xpath extractions on the same > xml document, such as the example above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)