Hi Yisha,

Thanks for driving this discussion. I think it's a valuable feature, and
feel free to go ahead.

Best,
Jane

On Tue, Jan 16, 2024 at 6:58 PM Benchao Li <libenc...@apache.org> wrote:

> Thanks Yisha for bringing up this discussion. Schema inferring is a
> very interesting and useful feature, especially when it comes to
> formats with well defined schemas such as Protobuf/Parquet. I'm
> looking forward to the FLIP.
>
> Yisha Zhou <zhouyi...@bytedance.com.invalid> 于2024年1月15日周一 16:29写道:
> >
> > Hi dev,
> >
> > Currently,  we are used to creating a table by listing all physical
> columns or using like syntax to reuse the table schema in Catalogs.
> > However, in our company there are many cases that the messages in the
> external systems are with very complex schema. The worst
> > case is that some protobuf data has even thousands of fields in it.
> >
> > In these cases, listing fields in the DDL will be a very hard work.
> Creating and updating such complex schema in Catalogs will also cost a lot.
> > Therefore, I’d like to introduce an ability for detecting table schema
> from external files in DDL.
> >
> > A good precedent from SnowFlake[1] works like below:
> >
> > CREATE TABLE mytable
> >   USING TEMPLATE (
> >     SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
> >       FROM TABLE(
> >         INFER_SCHEMA(
> >           LOCATION=>'@mystage/json/',
> >           FILE_FORMAT=>'my_json_format'
> >         )
> >       ));
> >
> > The INFER_SCHEMA is a table function to 'automatically detects the file
> metadata schema in a set of staged data files that contain
> > semi-structured data and retrieves the column definitions.’ The files
> can be in Parquet, Avro, ORC, JSON, and CSV.
> >
> > We don’t need to follow the syntax, but the functionality is exactly
> what I want. In addition, the file can be more than just semi-structured
> data
> > file. It can be metadata file. For example, a .proto file, a .thrift
> file.
> >
> > As it will be a big feature which deserves a FLIP to describe it in
> detail. I'm forward to your feedback and suggestions before I start to do
> it.
> >
> > Best,
> > Yisha
> >
> > [1]https://docs.snowflake.com/en/sql-reference/functions/infer_schema <
> https://docs.snowflake.com/en/sql-reference/functions/infer_schema>
>
>
>
> --
>
> Best,
> Benchao Li
>

Reply via email to