Re: Apache Drill

Ted Dunning Sun, 18 Oct 2015 17:53:56 -0700

Inline

On Sun, Oct 18, 2015 at 11:37 AM, Julian Hyde <jh...@apache.org> wrote:


> ...
> My proposed “solution” — and I suspect you’re not going to like it — is to
> ignore, for now, harder XML problems and focus on the easier ones.


Hmm.... I think that this may or may not be easy. But it is real important.


> A lot of XML documents do not have repeating scalar values. They are
> collections of records, perhaps with nested records or nested collections
> of records.


The scalar-ness of my example was just a simplification. The same problem
occurs every time there is a list that sometimes contains 1 element.


> Whitespace can be safely thrown away. Namespaces are not used.


Fine.


> A lot of data is in XML format because XML was the only option considered,
> not because the data structure pushed the limits of what XML’s rich model
> can express.
>

True.


> I think 90% of cases can be handled using a simple XML-to-JSON mapper that
> takes hints such as that the “employee” tag is to become a list of JSON
> maps and the “salary” and “name” tags are to be treated as attributes.
>

Great.

The real question is whether or not the XML community already has such a
hinting mechanism.  Or is Drill about to reinvent that?


>
> I really think that if we focus on the harder cases we’ll end up with the
> wrong solution.
>

No doubt.  This isn't one of those.

Re: Apache Drill

Reply via email to