Kasper,
This might work.
One issue that I see is that Metamodel seems to take a very XML centric
view of things while Drill takes a pretty JSON view of things.
The point at which I think that this might cause problems is that Drill
currently has troubles when it sees a records like
<record><item>1</item></record>
<record><item>2</item><item>3</item></record>
This is fine as far as XML is concerned, but if you think about it in terms
of JSON, it is probably best to view these records as
{"item":[1]}
{"item":[2,3]}
Unfortunately, from the first record, there is no way to tell that it
should not be viewed as
{"item":1}
Do you have a suggestion that would help with this?
On Sun, Oct 18, 2015 at 8:41 AM, Kasper Sørensen <
[email protected]> wrote:
> Hi there,
>
> Sorry for barging in, but maybe this is a place where Drill and MetaModel
> could benefit from each other? We've considered that before at least ...
>
> MetaModel already has support for both DOM and SAX based XML querying. They
> basically inherit some characteristics from DOM and SAX respectively:
>
> - In the DOM variant we can infer a schema and all the user has to do is
> select a XML file/resource anywhere.
> - In the SAX variant the user has to specify which paths in the XML
> document should represent logical "tables" and what paths represent their
> columns.
>
> See [1] for more info. Hope this might be of interest to integrate into
> Drill?
>
> Best regards,
> Kasper Sørensen (from the MetaModel project)
>
> [1] http://wiki.apache.org/metamodel/examples/XmlTableMapping
>
> 2015-10-18 0:35 GMT+02:00 Magnus Pierre <[email protected]>:
>
> > Well, very few lines of code imho. And simple. Been able to parse pretty
> > deep structures with no issues so far. Performance? 10-15 5mb xml's in
> less
> > than a second on my laptop but then I run it using Storm with some
> > parallelism in place. Don't know if it's good or bad. I'll share the code
> > next time I use computer. You don't need to use it, but it works at
> least.
> >
> > /M
> > Den 17 okt 2015 10:43 em skrev "Matt Burgess" <[email protected]>:
> >
> > > If the converter is clean and performant then I'm sure the community
> > > (including me) is interested :)
> > >
> > > However I wonder if Drill can afford to add a translation layer between
> > > data formats, could we be better served with similar parsing in Drill
> for
> > > XML as we do for JSON, or can it be pushed down far enough (to the
> > parser)
> > > to not make a noticeable difference (which is what I think Julian is
> > > implying)?
> > >
> > > Sent from my iPhone
> > >
> > > > On Oct 17, 2015, at 1:41 PM, Magnus Pierre <[email protected]>
> > wrote:
> > > >
> > > > Hello,
> > > >
> > > > Just wrote a simple sax implementation that converts xml to json and
> > that
> > > > is able to deal with decently complex xml's, that I currently use in
> > > Storm.
> > > > Takes attributes, and everything.
> > > >
> > > > I can share it with the community if interesting.
> > > >
> > > > /Magnus
> > > > Den 17 okt 2015 7:02 em skrev "Julian Hyde" <[email protected]>:
> > > >
> > > >> Seems to me the biggest problem is to make drill understand the
> nested
> > > >> structure of an xml document. That work has been done for json, so
> > let's
> > > >> build on it. Suppose there was a translator that converted xml to
> json
> > > >> (adding attributes for things that json lacks, such as namespaces,
> > text,
> > > >> element tags). Drill knows how to handle json, even if it is a bit
> > > verbose.
> > > >> The translator could be applied on the fly.
> > > >>
> > > >> Julian
> > > >>
> > > >>
> > > >>
> > > >> Sent from my iPad
> > > >>>> On Oct 16, 2015, at 2:31 PM, Stefán Baxter <
> > [email protected]
> > > >
> > > >>> wrote:
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> It's not possible but there has been some talk here about
> supporting
> > > it.
> > > >>> If I remember correctly it's rather complicated and not really
> > > feasible.
> > > >>> (I'm just a newbie so don't take my words for it)
> > > >>>
> > > >>>
> > > >>> Regards,
> > > >>> -Stefan
> > > >>>
> > > >>> On Fri, Oct 16, 2015 at 8:54 PM, Daniel Ajo <
> > > [email protected]
> > > >>>
> > > >>> wrote:
> > > >>>
> > > >>>> Hey there,
> > > >>>>
> > > >>>> I was wondering if it is possible to query XML files using Apache
> > > Drill?
> > > >>>>
> > > >>>> I see there are several formats, and maybe it would work using an
> > > xpath
> > > >>>> query of some sorts, but just wondering if it would work to
> directly
> > > >> query
> > > >>>> it using some sort of plug-in.
> > > >>>>
> > > >>>> Well, let me know,
> > > >>>>
> > > >>>> Daniel Ajo
> > > >>>> *********************************************************
> > > >> CONFIDENTIALITY
> > > >>>> NOTE: This electronic transmission contains information belonging
> to
> > > >> Abarca
> > > >>>> Health LLC, which is confidential or legally privileged. If you
> are
> > > not
> > > >> the
> > > >>>> intended recipient, please immediately advise the sender by reply
> > > >> e-mail or
> > > >>>> telephone that this message has been inadvertently transmitted to
> > you
> > > >> and
> > > >>>> delete this e-mail from your system. If you have received this
> > > >> transmission
> > > >>>> in error, you are hereby notified that any disclosure, copying,
> > > >>>> distribution or the taking of any action in reliance on the
> contents
> > > of
> > > >> the
> > > >>>> information is strictly prohibited.
> > > >>
> > >
> >
>