Hi Ted,

I like the "schema auto-detect" idea.

As we discussed in a prior thread, caching of schema is a nice-add on once we 
have defined the schema-on-read mechanism. Maybe we first get it to work with a 
user-provided schema. Then, as an enhancement, we offer to infer the schema by 
scanning data.

There are some ambiguities that schema inference can't resolve: in {x: "1002"} 
{x: 1003}, should x be an Int or a Varchar?

Still if Drill could provide a guess at the schema, and the user could refine 
it, we'd have a very elegant solution.


Thanks,
- Paul

 

    On Wednesday, August 15, 2018, 5:35:06 PM PDT, Ted Dunning 
<ted.dunn...@gmail.com> wrote:  
 
 This is a bold statement.

And there are variants of it that could give users nearly the same
experience that we have now. For instance, if we cache discovered schemas
for old files and discover the schema for any new file that we see (and
cache it) before actually running a query. That gives us pretty much the
flexibility of schema on read without as much of the burden.



On Wed, Aug 15, 2018 at 5:02 PM weijie tong <tongweijie...@gmail.com> wrote:

> Hi all:
>  Hope the statement not seems too dash to you.
>  Drill claims be a schema-free distributed SQL engine. It pays lots of
> work to make the execution engine to support it to support JSON file like
> storage format. It is easier to make bugs and let the code logic ugly. I
> wonder do we still insist on this ,since we are designing the metadata
> system with DRILL-6552.
>    Traditionally, people is used to design its table schema firstly before
> firing a SQL query. I don't think this saves people too much time. Other
> system like Spark is popular not due to lack the schema claiming. I think
> we should be brave enough to take the right decision whether to still
> insist on this feature which seems not so important but a burden.
>    Thanks.
>
  

Reply via email to