Yep, we do.
We have a xml Writable that uses XUM behind the scene. This has a getDom and getNode(xquery) method. In readIn we read the byte array and create the xum dom object from the byte array. Write simply triggers the BinaryCodec.serialize and we write the bytes out. However the same would work if you de/serialize xml as text, though we found that is slower than xum, though works pretty stable, since xum has other issues (you need to use BinaryCodex as jvm sigelton etc).
However in general this works pretty well.
Stefan



On Jun 23, 2008, at 9:38 PM, Kayla Jay wrote:

Hi

Just wondering if anyone out there works with and manipulates and stores XML data using Hadoop? I've seen some threads about XML RecordReaders and people who use that XML StreamXmlRecordReader to do splits. But, has anyone implemented a query framework that will use the hadoop layer to query against the XML in their map/reduce jobs?

I want to know if anyone has done an XQuery or XPath executed within a haoop job to find something within the XML stored in hadoop?

I can't find any samples or anyone else out there who uses XML data vs. traditional log text data.

Are there any use cases of using hadoop to work with XML and then do queries against XML in a distributed manner using hadoop?

Thanks.




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com


Reply via email to