@Jacques - I might be interested, but I am new to Drill - just saw the demo at Apache: Big Data EU and felt that what we do in MetaModel and what you do in Drill is hugely complementary. But I need someone to help me get started and thought this might be a good spot for it.
@Ted - The mapping is specified as you can see it in the wiki page linked to [1]. The first argument of the XmlSaxTableDef instantiation defines the scope of a record. I agree that every <record> in your example should be a record in the dataset as well. Today we would then not have a function in our mapping to support mapping a field that represent the list of <item>s, but that's a function we could add IMO. What I mean by mapping two tables is to have two XmlSaxTableDef instances, one for <record> and one for <item>. Then you could essentially join them a la: SELECT ... FROM /record r INNER JOIN /record/item i ON r.row_id = i.index(/record) But I think this approach is probably less desirable given the way you guys think about XML documents as a single table/source. In MetaModel we allow for both use cases, but I guess most users would also find it more natural to just have a single table mapped. Yes, MetaModel's way of coping with this kind of stuff is to map towards a relational model. We do offer richer data types such as list and key-value map to allow nested structures within a record. But MetaModel's row format is bound to a set of columns, not a document or such. Basically that has been our way of enabling SQL queries to fit well with the underlying datastore. [1] http://wiki.apache.org/metamodel/examples/XmlTableMapping 2015-10-19 2:56 GMT+02:00 Ted Dunning <ted.dunn...@gmail.com>: > Kasper, > > How is the mapping you suggest specified? > > In my example, I meant for there to be many records in a file and each > record element to be a record insofar as Drill is concerned. I also didn't > include other information that presumably would make it more interesting to > talk about a record element as a unit. > > Your suggestion (1) is essentially to denest the records, but that loses > the nice hierarchical structure expressed in the original that so easily > could be expressed in the JSON data model. > > For your option (2), what do you mean by map 2 tables? Does MetaModel > inherently assume that all output is purely relational? > > > > > On Sun, Oct 18, 2015 at 1:18 PM, Kasper Sørensen < > i.am.kasper.soren...@gmail.com> wrote: > > > Hi Ted, > > > > Actually in MetaModel you then have two choices with your mapping to > table > > format. > > > > 1) Either map the "item" as the granularity of a record. That way you > will > > get three rows - one for each item. On the last of the two rows you would > > have the same values for any element that is registered at the <record> > > scope. > > > > 2) You can also map 2 tables instead - one for <record> and one for > <item> > > and then join them as you like. > > > > > > 2015-10-18 20:24 GMT+02:00 Ted Dunning <ted.dunn...@gmail.com>: > > > > > Kasper, > > > > > > This might work. > > > > > > One issue that I see is that Metamodel seems to take a very XML centric > > > view of things while Drill takes a pretty JSON view of things. > > > > > > The point at which I think that this might cause problems is that Drill > > > currently has troubles when it sees a records like > > > > > > <record><item>1</item></record> > > > <record><item>2</item><item>3</item></record> > > > > > > This is fine as far as XML is concerned, but if you think about it in > > terms > > > of JSON, it is probably best to view these records as > > > > > > {"item":[1]} > > > {"item":[2,3]} > > > > > > Unfortunately, from the first record, there is no way to tell that it > > > should not be viewed as > > > > > > {"item":1} > > > > > > Do you have a suggestion that would help with this? > > > > > > > > > On Sun, Oct 18, 2015 at 8:41 AM, Kasper Sørensen < > > > i.am.kasper.soren...@gmail.com> wrote: > > > > > > > Hi there, > > > > > > > > Sorry for barging in, but maybe this is a place where Drill and > > MetaModel > > > > could benefit from each other? We've considered that before at least > > ... > > > > > > > > MetaModel already has support for both DOM and SAX based XML > querying. > > > They > > > > basically inherit some characteristics from DOM and SAX respectively: > > > > > > > > - In the DOM variant we can infer a schema and all the user has to > do > > is > > > > select a XML file/resource anywhere. > > > > - In the SAX variant the user has to specify which paths in the XML > > > > document should represent logical "tables" and what paths represent > > their > > > > columns. > > > > > > > > See [1] for more info. Hope this might be of interest to integrate > into > > > > Drill? > > > > > > > > Best regards, > > > > Kasper Sørensen (from the MetaModel project) > > > > > > > > [1] http://wiki.apache.org/metamodel/examples/XmlTableMapping > > > > > > > > 2015-10-18 0:35 GMT+02:00 Magnus Pierre <mpie...@maprtech.com>: > > > > > > > > > Well, very few lines of code imho. And simple. Been able to parse > > > pretty > > > > > deep structures with no issues so far. Performance? 10-15 5mb xml's > > in > > > > less > > > > > than a second on my laptop but then I run it using Storm with some > > > > > parallelism in place. Don't know if it's good or bad. I'll share > the > > > code > > > > > next time I use computer. You don't need to use it, but it works at > > > > least. > > > > > > > > > > /M > > > > > Den 17 okt 2015 10:43 em skrev "Matt Burgess" <mattyb...@gmail.com > >: > > > > > > > > > > > If the converter is clean and performant then I'm sure the > > community > > > > > > (including me) is interested :) > > > > > > > > > > > > However I wonder if Drill can afford to add a translation layer > > > between > > > > > > data formats, could we be better served with similar parsing in > > Drill > > > > for > > > > > > XML as we do for JSON, or can it be pushed down far enough (to > the > > > > > parser) > > > > > > to not make a noticeable difference (which is what I think Julian > > is > > > > > > implying)? > > > > > > > > > > > > Sent from my iPhone > > > > > > > > > > > > > On Oct 17, 2015, at 1:41 PM, Magnus Pierre < > mpie...@maprtech.com > > > > > > > > wrote: > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > Just wrote a simple sax implementation that converts xml to > json > > > and > > > > > that > > > > > > > is able to deal with decently complex xml's, that I currently > use > > > in > > > > > > Storm. > > > > > > > Takes attributes, and everything. > > > > > > > > > > > > > > I can share it with the community if interesting. > > > > > > > > > > > > > > /Magnus > > > > > > > Den 17 okt 2015 7:02 em skrev "Julian Hyde" < > > jul...@hydromatic.net > > > >: > > > > > > > > > > > > > >> Seems to me the biggest problem is to make drill understand > the > > > > nested > > > > > > >> structure of an xml document. That work has been done for > json, > > so > > > > > let's > > > > > > >> build on it. Suppose there was a translator that converted xml > > to > > > > json > > > > > > >> (adding attributes for things that json lacks, such as > > namespaces, > > > > > text, > > > > > > >> element tags). Drill knows how to handle json, even if it is a > > bit > > > > > > verbose. > > > > > > >> The translator could be applied on the fly. > > > > > > >> > > > > > > >> Julian > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> Sent from my iPad > > > > > > >>>> On Oct 16, 2015, at 2:31 PM, Stefán Baxter < > > > > > ste...@activitystream.com > > > > > > > > > > > > > >>> wrote: > > > > > > >>> > > > > > > >>> Hi, > > > > > > >>> > > > > > > >>> It's not possible but there has been some talk here about > > > > supporting > > > > > > it. > > > > > > >>> If I remember correctly it's rather complicated and not > really > > > > > > feasible. > > > > > > >>> (I'm just a newbie so don't take my words for it) > > > > > > >>> > > > > > > >>> > > > > > > >>> Regards, > > > > > > >>> -Stefan > > > > > > >>> > > > > > > >>> On Fri, Oct 16, 2015 at 8:54 PM, Daniel Ajo < > > > > > > daniel....@abarcahealth.com > > > > > > >>> > > > > > > >>> wrote: > > > > > > >>> > > > > > > >>>> Hey there, > > > > > > >>>> > > > > > > >>>> I was wondering if it is possible to query XML files using > > > Apache > > > > > > Drill? > > > > > > >>>> > > > > > > >>>> I see there are several formats, and maybe it would work > using > > > an > > > > > > xpath > > > > > > >>>> query of some sorts, but just wondering if it would work to > > > > directly > > > > > > >> query > > > > > > >>>> it using some sort of plug-in. > > > > > > >>>> > > > > > > >>>> Well, let me know, > > > > > > >>>> > > > > > > >>>> Daniel Ajo > > > > > > >>>> ********************************************************* > > > > > > >> CONFIDENTIALITY > > > > > > >>>> NOTE: This electronic transmission contains information > > > belonging > > > > to > > > > > > >> Abarca > > > > > > >>>> Health LLC, which is confidential or legally privileged. If > > you > > > > are > > > > > > not > > > > > > >> the > > > > > > >>>> intended recipient, please immediately advise the sender by > > > reply > > > > > > >> e-mail or > > > > > > >>>> telephone that this message has been inadvertently > transmitted > > > to > > > > > you > > > > > > >> and > > > > > > >>>> delete this e-mail from your system. If you have received > this > > > > > > >> transmission > > > > > > >>>> in error, you are hereby notified that any disclosure, > > copying, > > > > > > >>>> distribution or the taking of any action in reliance on the > > > > contents > > > > > > of > > > > > > >> the > > > > > > >>>> information is strictly prohibited. > > > > > > >> > > > > > > > > > > > > > > > > > > > > >