Hi dev,

Currently,  we are used to creating a table by listing all physical columns or 
using like syntax to reuse the table schema in Catalogs.  
However, in our company there are many cases that the messages in the external 
systems are with very complex schema. The worst
case is that some protobuf data has even thousands of fields in it. 

In these cases, listing fields in the DDL will be a very hard work. Creating 
and updating such complex schema in Catalogs will also cost a lot.
Therefore, I’d like to introduce an ability for detecting table schema from 
external files in DDL. 

A good precedent from SnowFlake[1] works like below:

CREATE TABLE mytable
  USING TEMPLATE (
    SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
      FROM TABLE(
        INFER_SCHEMA(
          LOCATION=>'@mystage/json/',
          FILE_FORMAT=>'my_json_format'
        )
      ));

The INFER_SCHEMA is a table function to 'automatically detects the file 
metadata schema in a set of staged data files that contain 
semi-structured data and retrieves the column definitions.’ The files can be in 
Parquet, Avro, ORC, JSON, and CSV.

We don’t need to follow the syntax, but the functionality is exactly what I 
want. In addition, the file can be more than just semi-structured data
file. It can be metadata file. For example, a .proto file, a .thrift file.

As it will be a big feature which deserves a FLIP to describe it in detail. I'm 
forward to your feedback and suggestions before I start to do it.

Best,
Yisha

[1]https://docs.snowflake.com/en/sql-reference/functions/infer_schema 
<https://docs.snowflake.com/en/sql-reference/functions/infer_schema>

Reply via email to