[jira] [Updated] (PIG-3619) Provide XPath function

2014-09-13 Thread Giuseppe Santoro (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Santoro updated PIG-3619:
--
Attachment: xpath2.patch

 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
Assignee: Saad Patel
 Fix For: 0.13.0

 Attachments: xpath.patch, xpath2.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}
 The proposed UDF also caches the last xml document. This is helpful for 
 improving performance when multiple consecutive xpath extractions on the same 
 xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-3619) Provide XPath function

2014-09-13 Thread Giuseppe Santoro (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Santoro updated PIG-3619:
--
Attachment: xpath2.patch

I have tried to use this UDF but I get some exceptions related to the Function 
Mapping definition. You define here just one parameter while there are at least 
two mandatory parameters and one optional. I have fixed that issue in my new 
patch xpath2.patch you can find attached to this ticket. I have been running 
this UDF with hundreds of XPath queries and it works really well even with the 
optional parameter.

 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
Assignee: Saad Patel
 Fix For: 0.13.0

 Attachments: xpath.patch, xpath2.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}
 The proposed UDF also caches the last xml document. This is helpful for 
 improving performance when multiple consecutive xpath extractions on the same 
 xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-3619) Provide XPath function

2014-09-13 Thread Giuseppe Santoro (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Santoro updated PIG-3619:
--
Attachment: (was: xpath2.patch)

 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
Assignee: Saad Patel
 Fix For: 0.13.0

 Attachments: xpath.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}
 The proposed UDF also caches the last xml document. This is helpful for 
 improving performance when multiple consecutive xpath extractions on the same 
 xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-3619) Provide XPath function

2014-06-17 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3619:
---

Fix Version/s: 0.13.0

Updating FixVersion.

 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
Assignee: Saad Patel
 Fix For: 0.13.0

 Attachments: xpath.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}
 The proposed UDF also caches the last xml document. This is helpful for 
 improving performance when multiple consecutive xpath extractions on the same 
 xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (PIG-3619) Provide XPath function

2013-12-13 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3619:


Assignee: Saad Patel

 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
Assignee: Saad Patel
 Attachments: xpath.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}
 The proposed UDF also caches the last xml document. This is helpful for 
 improving performance when multiple consecutive xpath extractions on the same 
 xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (PIG-3619) Provide XPath function

2013-12-12 Thread Saad Patel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saad Patel updated PIG-3619:


Attachment: xpath.patch

The XPath UDF and XPathTest are both included. Usage is also included in 
javadocs

 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
 Attachments: xpath.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (PIG-3619) Provide XPath function

2013-12-12 Thread Saad Patel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saad Patel updated PIG-3619:


Description: 
Xml is often loaded using XMLLoader with a record boundary tag as one of the 
parameters. A common use case is to then extract data from those records. XPath 
would allow those extractions to be done very easily. I'm  proposing a patch 
that adds simple XPath support as a UDF.

Example usage of this the XPath UDF would be:

{code}
extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
XPath(record, 'book/title');
{code}

The proposed UDF also caches the last xml document. This is helpful for 
improving performance when multiple consecutive xpath extractions on the same 
xml document, such as the example above. 

  was:
Xml is often loaded using XMLLoader with a record boundary tag as one of the 
parameters. A common use case is to then extract data from those records. XPath 
would allow those extractions to be done very easily. I'm  proposing a patch 
that adds simple XPath support as a UDF.

Example usage of this the XPath UDF would be:

{code}
extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
XPath(record, 'book/title');
{code}


 Provide XPath function
 --

 Key: PIG-3619
 URL: https://issues.apache.org/jira/browse/PIG-3619
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Saad Patel
 Attachments: xpath.patch


 Xml is often loaded using XMLLoader with a record boundary tag as one of the 
 parameters. A common use case is to then extract data from those records. 
 XPath would allow those extractions to be done very easily. I'm  proposing a 
 patch that adds simple XPath support as a UDF.
 Example usage of this the XPath UDF would be:
 {code}
 extractions = FOREACH xmlrecords GENERATE XPath(record, 'book/author'), 
 XPath(record, 'book/title');
 {code}
 The proposed UDF also caches the last xml document. This is helpful for 
 improving performance when multiple consecutive xpath extractions on the same 
 xml document, such as the example above. 



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)