@Jacques - I might be interested, but I am new to Drill - just saw the demo
at Apache: Big Data EU and felt that what we do in MetaModel and what you
do in Drill is hugely complementary. But I need someone to help me get
started and thought this might be a good spot for it.

@Ted - The mapping is specified as you can see it in the wiki page linked
to [1]. The first argument of the XmlSaxTableDef instantiation defines the
scope of a record. I agree that every <record> in your example should be a
record in the dataset as well. Today we would then not have a function in
our mapping to support mapping a field that represent the list of <item>s,
but that's a function we could add IMO.

What I mean by mapping two tables is to have two XmlSaxTableDef instances,
one for <record> and one for <item>. Then you could essentially join them a
la:

  SELECT ... FROM /record r INNER JOIN /record/item i ON r.row_id =
i.index(/record)

But I think this approach is probably less desirable given the way you guys
think about XML documents as a single table/source. In MetaModel we allow
for both use cases, but I guess most users would also find it more natural
to just have a single table mapped.

Yes, MetaModel's way of coping with this kind of stuff is to map towards a
relational model. We do offer richer data types such as list and key-value
map to allow nested structures within a record. But MetaModel's row format
is bound to a set of columns, not a document or such. Basically that has
been our way of enabling SQL queries to fit well with the underlying
datastore.

[1] http://wiki.apache.org/metamodel/examples/XmlTableMapping

2015-10-19 2:56 GMT+02:00 Ted Dunning <ted.dunn...@gmail.com>:

> Kasper,
>
> How is the mapping you suggest specified?
>
> In my example, I meant for there to be many records in a file and each
> record element to be a record insofar as Drill is concerned.  I also didn't
> include other information that presumably would make it more interesting to
> talk about a record element as a unit.
>
> Your suggestion (1) is essentially to denest the records, but that loses
> the nice hierarchical structure expressed in the original that so easily
> could be expressed in the JSON data model.
>
> For your option (2), what do you mean by map 2 tables?  Does MetaModel
> inherently assume that all output is purely relational?
>
>
>
>
> On Sun, Oct 18, 2015 at 1:18 PM, Kasper Sørensen <
> i.am.kasper.soren...@gmail.com> wrote:
>
> > Hi Ted,
> >
> > Actually in MetaModel you then have two choices with your mapping to
> table
> > format.
> >
> > 1) Either map the "item" as the granularity of a record. That way you
> will
> > get three rows - one for each item. On the last of the two rows you would
> > have the same values for any element that is registered at the <record>
> > scope.
> >
> > 2) You can also map 2 tables instead - one for <record> and one for
> <item>
> > and then join them as you like.
> >
> >
> > 2015-10-18 20:24 GMT+02:00 Ted Dunning <ted.dunn...@gmail.com>:
> >
> > > Kasper,
> > >
> > > This might work.
> > >
> > > One issue that I see is that Metamodel seems to take a very XML centric
> > > view of things while Drill takes a pretty JSON view of things.
> > >
> > > The point at which I think that this might cause problems is that Drill
> > > currently has troubles when it sees a records like
> > >
> > > <record><item>1</item></record>
> > > <record><item>2</item><item>3</item></record>
> > >
> > > This is fine as far as XML is concerned, but if you think about it in
> > terms
> > > of JSON, it is probably best to view these records as
> > >
> > > {"item":[1]}
> > > {"item":[2,3]}
> > >
> > > Unfortunately, from the first record, there is no way to tell that it
> > > should not be viewed as
> > >
> > > {"item":1}
> > >
> > > Do you have a suggestion that would help with this?
> > >
> > >
> > > On Sun, Oct 18, 2015 at 8:41 AM, Kasper Sørensen <
> > > i.am.kasper.soren...@gmail.com> wrote:
> > >
> > > > Hi there,
> > > >
> > > > Sorry for barging in, but maybe this is a place where Drill and
> > MetaModel
> > > > could benefit from each other? We've considered that before at least
> > ...
> > > >
> > > > MetaModel already has support for both DOM and SAX based XML
> querying.
> > > They
> > > > basically inherit some characteristics from DOM and SAX respectively:
> > > >
> > > >  - In the DOM variant we can infer a schema and all the user has to
> do
> > is
> > > > select a XML file/resource anywhere.
> > > >  - In the SAX variant the user has to specify which paths in the XML
> > > > document should represent logical "tables" and what paths represent
> > their
> > > > columns.
> > > >
> > > > See [1] for more info. Hope this might be of interest to integrate
> into
> > > > Drill?
> > > >
> > > > Best regards,
> > > > Kasper Sørensen (from the MetaModel project)
> > > >
> > > > [1] http://wiki.apache.org/metamodel/examples/XmlTableMapping
> > > >
> > > > 2015-10-18 0:35 GMT+02:00 Magnus Pierre <mpie...@maprtech.com>:
> > > >
> > > > > Well, very few lines of code imho. And simple. Been able to parse
> > > pretty
> > > > > deep structures with no issues so far. Performance? 10-15 5mb xml's
> > in
> > > > less
> > > > > than a second on my laptop but then I run it using Storm with some
> > > > > parallelism in place. Don't know if it's good or bad. I'll share
> the
> > > code
> > > > > next time I use computer. You don't need to use it, but it works at
> > > > least.
> > > > >
> > > > > /M
> > > > > Den 17 okt 2015 10:43 em skrev "Matt Burgess" <mattyb...@gmail.com
> >:
> > > > >
> > > > > > If the converter is clean and performant then I'm sure the
> > community
> > > > > > (including me) is interested :)
> > > > > >
> > > > > > However I wonder if Drill can afford to add a translation layer
> > > between
> > > > > > data formats, could we be better served with similar parsing in
> > Drill
> > > > for
> > > > > > XML as we do for JSON, or can it be pushed down far enough (to
> the
> > > > > parser)
> > > > > > to not make a noticeable difference (which is what I think Julian
> > is
> > > > > > implying)?
> > > > > >
> > > > > > Sent from my iPhone
> > > > > >
> > > > > > > On Oct 17, 2015, at 1:41 PM, Magnus Pierre <
> mpie...@maprtech.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > Just wrote a simple sax implementation that converts xml to
> json
> > > and
> > > > > that
> > > > > > > is able to deal with decently complex xml's, that I currently
> use
> > > in
> > > > > > Storm.
> > > > > > > Takes attributes, and everything.
> > > > > > >
> > > > > > > I can share it with the community if interesting.
> > > > > > >
> > > > > > > /Magnus
> > > > > > > Den 17 okt 2015 7:02 em skrev "Julian Hyde" <
> > jul...@hydromatic.net
> > > >:
> > > > > > >
> > > > > > >> Seems to me the biggest problem is to make drill understand
> the
> > > > nested
> > > > > > >> structure of an xml document. That work has been done for
> json,
> > so
> > > > > let's
> > > > > > >> build on it. Suppose there was a translator that converted xml
> > to
> > > > json
> > > > > > >> (adding attributes for things that json lacks, such as
> > namespaces,
> > > > > text,
> > > > > > >> element tags). Drill knows how to handle json, even if it is a
> > bit
> > > > > > verbose.
> > > > > > >> The translator could be applied on the fly.
> > > > > > >>
> > > > > > >> Julian
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> Sent from my iPad
> > > > > > >>>> On Oct 16, 2015, at 2:31 PM, Stefán Baxter <
> > > > > ste...@activitystream.com
> > > > > > >
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>> Hi,
> > > > > > >>>
> > > > > > >>> It's not possible but there has been some talk here about
> > > > supporting
> > > > > > it.
> > > > > > >>> If I remember correctly it's rather complicated and not
> really
> > > > > > feasible.
> > > > > > >>> (I'm just a newbie so don't take my words for it)
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> Regards,
> > > > > > >>> -Stefan
> > > > > > >>>
> > > > > > >>> On Fri, Oct 16, 2015 at 8:54 PM, Daniel Ajo <
> > > > > > daniel....@abarcahealth.com
> > > > > > >>>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>>> Hey there,
> > > > > > >>>>
> > > > > > >>>> I was wondering if it is possible to query XML files using
> > > Apache
> > > > > > Drill?
> > > > > > >>>>
> > > > > > >>>> I see there are several formats, and maybe it would work
> using
> > > an
> > > > > > xpath
> > > > > > >>>> query of some sorts, but just wondering if it would work to
> > > > directly
> > > > > > >> query
> > > > > > >>>> it using some sort of plug-in.
> > > > > > >>>>
> > > > > > >>>> Well, let me know,
> > > > > > >>>>
> > > > > > >>>> Daniel Ajo
> > > > > > >>>> *********************************************************
> > > > > > >> CONFIDENTIALITY
> > > > > > >>>> NOTE: This electronic transmission contains information
> > > belonging
> > > > to
> > > > > > >> Abarca
> > > > > > >>>> Health LLC, which is confidential or legally privileged. If
> > you
> > > > are
> > > > > > not
> > > > > > >> the
> > > > > > >>>> intended recipient, please immediately advise the sender by
> > > reply
> > > > > > >> e-mail or
> > > > > > >>>> telephone that this message has been inadvertently
> transmitted
> > > to
> > > > > you
> > > > > > >> and
> > > > > > >>>> delete this e-mail from your system. If you have received
> this
> > > > > > >> transmission
> > > > > > >>>> in error, you are hereby notified that any disclosure,
> > copying,
> > > > > > >>>> distribution or the taking of any action in reliance on the
> > > > contents
> > > > > > of
> > > > > > >> the
> > > > > > >>>> information is strictly prohibited.
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to