Yep, we do.
We have a xml Writable that uses XUM behind the scene. This has a
getDom and getNode(xquery) method. In readIn we read the byte array
and create the xum dom object from the byte array.
Write simply triggers the BinaryCodec.serialize and we write the bytes
out.
However the same would work if you de/serialize xml as text, though we
found that is slower than xum, though works pretty stable, since xum
has other issues (you need to use BinaryCodex as jvm sigelton etc).
However in general this works pretty well.
Stefan
On Jun 23, 2008, at 9:38 PM, Kayla Jay wrote:
Hi
Just wondering if anyone out there works with and manipulates and
stores XML data using Hadoop? I've seen some threads about XML
RecordReaders and people who use that XML StreamXmlRecordReader to
do splits. But, has anyone implemented a query framework that will
use the hadoop layer to query against the XML in their map/reduce
jobs?
I want to know if anyone has done an XQuery or XPath executed within
a haoop job to find something within the XML stored in hadoop?
I can't find any samples or anyone else out there who uses XML data
vs. traditional log text data.
Are there any use cases of using hadoop to work with XML and then do
queries against XML in a distributed manner using hadoop?
Thanks.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California, USA
http://www.101tec.com