[ https://issues.apache.org/jira/browse/PIG-1284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846130#action_12846130 ]
Alan Gates commented on PIG-1284: --------------------------------- I'll take a look at the patch. > pig UDF is lacking XMLLoader. Plan to add the XMLLoader > ------------------------------------------------------- > > Key: PIG-1284 > URL: https://issues.apache.org/jira/browse/PIG-1284 > Project: Pig > Issue Type: New Feature > Affects Versions: 0.7.0 > Reporter: Alok Singh > Fix For: 0.7.0 > > Attachments: pigudf_xmlLoader.patch, pigudf_xmlLoader.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > Hi All, > We are planning to add the XMLLoader UDF in the piggybank repository. > Here is the proposal with the user docs :- > The load function to load the XML file > This will implements the LoadFunc interface which is used to parse records > from a dataset. > This takes a xmlTag as the arg which it will use to split the inputdataset > into > multiple records. > For example if the input xml (input.xml) is like this > <configuration> > <property> > <name> foobar </name> > <value> barfoo </value> > </property> > <ignoreProperty> > <name> foo </name> > </ignoreProperty> > <property> > <name> justname </name> > </property> > </configuration> > And your pig script is like this > --load the jar files > register loader.jar; > -- load the dataset using XMLLoader > -- A is the bag containing the tuple which contains one atom i.e doc see > output > A = load '/user/aloks/pig/input.xml using loader.XMLLoader('property') as > (doc:chararray); > --dump the result > dump A; > Then you will get the output > (<property> > <name> foobar </name> > <value> barfoo </value> > </property>) > (<property> > <name> justname </name> > </property>) > Where each () indicate one record > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.