pig UDF is lacking XMLLoader. Plan to add the XMLLoader
-------------------------------------------------------

                 Key: PIG-1284
                 URL: https://issues.apache.org/jira/browse/PIG-1284
             Project: Pig
          Issue Type: New Feature
            Reporter: Alok Singh
             Fix For: 0.7.0


Hi All,

 We are planning to add the XMLLoader UDF in the piggybank repository.

Here is the proposal with the user docs :-

 The load function to load the XML file
 This will implements the LoadFunc interface which is used to parse records
 from a dataset.
 This takes a xmlTag as the arg which it will use to split the inputdataset into
 multiple records.


 For example if the input xml (input.xml) is like this
 <configuration>
 <property>
 <name> foobar </name>
 <value> barfoo </value>
 </property>
 <ignoreProperty>
 <name> foo </name>
 </ignoreProperty>
 <property>
 <name> justname </name>
 </property>
 </configuration>

 And your pig script is like this

 --load the jar files
 register loader.jar;
 -- load the dataset using XMLLoader
 -- A is the bag containing the tuple which contains one atom i.e doc see output
 A = load '/user/aloks/pig/input.xml using loader.XMLLoader('property') as 
(doc:chararray);
 --dump the result
 dump A;


 Then you will get the output

(<property>
<name> foobar </name>
<value> barfoo </value>
</property>)
(<property>
<name> justname </name>
</property>)

Where each () indicate one record

 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to