Hi Julian, It seems like there is a beginning of convergence of the minds here. I went to the Apache Roadshow in DC and that was where I learned about DFDL and immediately thought this was a really interesting possibility.
I'd love to see if we could foster some collaboration between the various projects on this. From the Drill side of things, it would make it SO much easier to get Drill to read (and by extension query) various data types. I'd be willing to contribute time from the Drill side, but I definitely will need help understanding how DFDL works. --C > On Nov 3, 2019, at 8:01 AM, Julian Feinauer <j.feina...@pragmaticminds.de> > wrote: > > Hi Charles, > > this is an interesting idea and in fact we also discussed the same matter for > Calcite at ApacheCon NA. > But, I agree that it would be really powerful together with a complete > Runtime like Drill. > > Julian > > > Von: Charles Givre <cgi...@gmail.com <mailto:cgi...@gmail.com>> > Antworten an: "us...@daffodil.apache.org <mailto:us...@daffodil.apache.org>" > <us...@daffodil.apache.org <mailto:us...@daffodil.apache.org>> > Datum: Mittwoch, 30. Oktober 2019 um 19:38 > An: "Costello, Roger L." <coste...@mitre.org> > Cc: "us...@daffodil.apache.org" <us...@daffodil.apache.org> > Betreff: Re: Use cases for DFDL > > +1 > > >> On Oct 30, 2019, at 2:36 PM, Costello, Roger L. <coste...@mitre.org >> <mailto:coste...@mitre.org>> wrote: >> >> Excellent! Okay, here’s the use case: >> >> A Daffodil extension could be created for Apache Drill so that you could >> parse any kind of data with Daffodil using a DFDL schema, and then you could >> use ANSI SQL to query the data, join it with other data, do analysis, etc., >> just as if it came from a database. So, instead of parsing data to XML and >> then using XPath to pull out data, you could instead parse data to Apache >> Drill's data representation and then use ANSI SQL to pull out data, and even >> combine it with other non-Daffodil data types. The advantage for this would >> be that it would make it very easy to enable Drill to query new data types >> (IE simply by using a DFDL schema) and it would enable users to easily query >> this data without having to load it into another system. >> >> How’s that Charles? >> >> /Roger >> From: Charles Givre <cgi...@gmail.com <mailto:cgi...@gmail.com>> >> Sent: Wednesday, October 30, 2019 2:28 PM >> To: Costello, Roger L. <coste...@mitre.org <mailto:coste...@mitre.org>> >> Cc: us...@daffodil.apache.org <mailto:us...@daffodil.apache.org> >> Subject: [EXT] Re: Use cases for DFDL >> >> Close... One minor nit is that Drill doesn't use a "query-like" syntax. It >> is regular ANSI SQL. IMHO, I think this. would be a really great >> collaboration of the two communities. >> --C >> >> >> >> >>> On Oct 30, 2019, at 1:10 PM, Costello, Roger L. <coste...@mitre.org >>> <mailto:coste...@mitre.org>> wrote: >>> >>> Thanks again Charles. Is the following use case description correct? >>> >>> A Daffodil extension could be created for Apache Drill so that you could >>> parse any kind of data with Daffodil using a DFDL schema, and then you >>> could use Apache Drill's query-like syntax and rich capabilities to query >>> parts of that data, join it with other data, do analysis, etc., just as if >>> it came from a database. So, instead of parsing data to XML and then using >>> XPath to pull out data, you could instead parse data to Apache Drill's data >>> representation and then use Drills rich data-query capabilities to pull out >>> data, and even combine it with other non-Daffodil data types. The advantage >>> for this would be that it would make it very easy to enable Drill to query >>> new data types (IE simply by using a DFDL schema) and it would enable users >>> to easily query this data without having to load it into another system. >>> >>> Is that correct? >>> >>> /Roger >>> From: Charles Givre <cgi...@gmail.com <mailto:cgi...@gmail.com>> >>> Sent: Wednesday, October 30, 2019 12:19 PM >>> To: Costello, Roger L. <coste...@mitre.org <mailto:coste...@mitre.org>> >>> Cc: us...@daffodil.apache.org <mailto:us...@daffodil.apache.org> >>> Subject: [EXT] Re: Use cases for DFDL >>> >>> Not exactly... >>> I was thinking of using DFDL to enable Drill to create a schema for data >>> that Drill cannot read. If DFDL can be used to describe the schema, a >>> plugin could be written for Drill that mirrors this schema and ultimately >>> reads the data files. Drill wouldn't be populating any database, but >>> rather directly querying the data. >>> >>> The advantage for this would be that it would make it very easy to enable >>> Drill to query new data types (IE simply by using a DFDL schema) and it >>> would enable users to easily query this data w/o having to load it into >>> another system. Does that make sense? >>> -- C >>> >>> >>>> On Oct 30, 2019, at 12:13 PM, Costello, Roger L. <coste...@mitre.org >>>> <mailto:coste...@mitre.org>> wrote: >>>> >>>> Thanks Charles. Let me see if I understand the use case correctly. >>>> >>>> Use DFDL to parse data to populate a database and then use Apache Drill to >>>> query the database. >>>> >>>> Is that correct? >>>> >>>> /Roger >>>> >>>> From: Charles Givre <cgi...@gmail.com <mailto:cgi...@gmail.com>> >>>> Sent: Wednesday, October 30, 2019 12:01 PM >>>> To: us...@daffodil.apache.org <mailto:us...@daffodil.apache.org> >>>> Subject: [EXT] Re: Use cases for DFDL >>>> >>>> To add to this discussion, I'm the PMC chair for Apache Drill. I think a >>>> compelling use case for DFDL would be enabling Drill to use DFDL to enable >>>> Drill to query data based on a DFDL schema. This same concept could be >>>> applied to other SQL query engines such as Presto and/or Impala. >>>> >>>> IMHO, this would facilitate the analysis of data sets supported by DFDL. >>>> -- C >>>> >>>> >>>> >>>> >>>> >>>>> On Oct 30, 2019, at 11:53 AM, Costello, Roger L. <coste...@mitre.org >>>>> <mailto:coste...@mitre.org>> wrote: >>>>> >>>>> Thanks Mike! I updated the slide: >>>>> >>>>> <image002.png> >>>>> >>>>> From: Beckerle, Mike <mbecke...@tresys.com <mailto:mbecke...@tresys.com>> >>>>> Sent: Wednesday, October 30, 2019 11:45 AM >>>>> To: us...@daffodil.apache.org <mailto:us...@daffodil.apache.org> >>>>> Subject: [EXT] Re: Use cases for DFDL >>>>> >>>>> I would not pick on RDF data stores as the target. >>>>> >>>>> Parsing data to populate a database (any variety) is the actual case. The >>>>> fact that we did do one project involving RDF is why I cited that example >>>>> in particular but pulling data into any data store/data base begins with >>>>> the ability to parse the data, and then process it into suitable form. >>>>> >>>>> This is an incomplete list so perhaps this slide title should be "Example >>>>> Use Cases for DFDL" ? >>>>> >>>>> ...mikeb >>>>> From: Costello, Roger L. <coste...@mitre.org <mailto:coste...@mitre.org>> >>>>> Sent: Monday, October 28, 2019 10:41 AM >>>>> To: us...@daffodil.apache.org <mailto:us...@daffodil.apache.org> >>>>> <us...@daffodil.apache.org <mailto:us...@daffodil.apache.org>> >>>>> Subject: Use cases for DFDL >>>>> >>>>> Hi Folks, >>>>> >>>>> I created a slide of use cases. See below. Do you agree with the slide? >>>>> Anything you would add, delete, or change? /Roger >>>>> >>>>> <image003.png>