[jira] [Updated] (PIG-3619) Provide XPath function
[ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giuseppe Santoro updated PIG-3619: -- Attachment: xpath2.patch Provide XPath function -- Key: PIG-3619 URL: https://issues.apache.org/jira/browse/PIG-3619 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Saad Patel Assignee: Saad Patel Fix For: 0.13.0 Attachments: xpath.patch, xpath2.patch Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-3619) Provide XPath function
[ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giuseppe Santoro updated PIG-3619: -- Attachment: xpath2.patch I have tried to use this UDF but I get some exceptions related to the Function Mapping definition. You define here just one parameter while there are at least two mandatory parameters and one optional. I have fixed that issue in my new patch xpath2.patch you can find attached to this ticket. I have been running this UDF with hundreds of XPath queries and it works really well even with the optional parameter. Provide XPath function -- Key: PIG-3619 URL: https://issues.apache.org/jira/browse/PIG-3619 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Saad Patel Assignee: Saad Patel Fix For: 0.13.0 Attachments: xpath.patch, xpath2.patch Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-3619) Provide XPath function
[ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Giuseppe Santoro updated PIG-3619: -- Attachment: (was: xpath2.patch) Provide XPath function -- Key: PIG-3619 URL: https://issues.apache.org/jira/browse/PIG-3619 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Saad Patel Assignee: Saad Patel Fix For: 0.13.0 Attachments: xpath.patch Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-3619) Provide XPath function
[ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated PIG-3619: --- Fix Version/s: 0.13.0 Updating FixVersion. Provide XPath function -- Key: PIG-3619 URL: https://issues.apache.org/jira/browse/PIG-3619 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Saad Patel Assignee: Saad Patel Fix For: 0.13.0 Attachments: xpath.patch Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (PIG-3619) Provide XPath function
[ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3619: Assignee: Saad Patel Provide XPath function -- Key: PIG-3619 URL: https://issues.apache.org/jira/browse/PIG-3619 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Saad Patel Assignee: Saad Patel Attachments: xpath.patch Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (PIG-3619) Provide XPath function
[ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saad Patel updated PIG-3619: Attachment: xpath.patch The XPath UDF and XPathTest are both included. Usage is also included in javadocs Provide XPath function -- Key: PIG-3619 URL: https://issues.apache.org/jira/browse/PIG-3619 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Saad Patel Attachments: xpath.patch Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (PIG-3619) Provide XPath function
[ https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saad Patel updated PIG-3619: Description: Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above. was: Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} Provide XPath function -- Key: PIG-3619 URL: https://issues.apache.org/jira/browse/PIG-3619 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Saad Patel Attachments: xpath.patch Xml is often loaded using XMLLoader with a record boundary tag as one of the parameters. A common use case is to then extract data from those records. XPath would allow those extractions to be done very easily. I'm proposing a patch that adds simple XPath support as a UDF. Example usage of this the XPath UDF would be: {code} extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), XPath(record, 'book/title'); {code} The proposed UDF also caches the last xml document. This is helpful for improving performance when multiple consecutive xpath extractions on the same xml document, such as the example above. -- This message was sent by Atlassian JIRA (v6.1.4#6159)