Hi Yisha, Thanks for driving this discussion. I think it's a valuable feature, and feel free to go ahead.
Best, Jane On Tue, Jan 16, 2024 at 6:58 PM Benchao Li <libenc...@apache.org> wrote: > Thanks Yisha for bringing up this discussion. Schema inferring is a > very interesting and useful feature, especially when it comes to > formats with well defined schemas such as Protobuf/Parquet. I'm > looking forward to the FLIP. > > Yisha Zhou <zhouyi...@bytedance.com.invalid> 于2024年1月15日周一 16:29写道: > > > > Hi dev, > > > > Currently, we are used to creating a table by listing all physical > columns or using like syntax to reuse the table schema in Catalogs. > > However, in our company there are many cases that the messages in the > external systems are with very complex schema. The worst > > case is that some protobuf data has even thousands of fields in it. > > > > In these cases, listing fields in the DDL will be a very hard work. > Creating and updating such complex schema in Catalogs will also cost a lot. > > Therefore, I’d like to introduce an ability for detecting table schema > from external files in DDL. > > > > A good precedent from SnowFlake[1] works like below: > > > > CREATE TABLE mytable > > USING TEMPLATE ( > > SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*)) > > FROM TABLE( > > INFER_SCHEMA( > > LOCATION=>'@mystage/json/', > > FILE_FORMAT=>'my_json_format' > > ) > > )); > > > > The INFER_SCHEMA is a table function to 'automatically detects the file > metadata schema in a set of staged data files that contain > > semi-structured data and retrieves the column definitions.’ The files > can be in Parquet, Avro, ORC, JSON, and CSV. > > > > We don’t need to follow the syntax, but the functionality is exactly > what I want. In addition, the file can be more than just semi-structured > data > > file. It can be metadata file. For example, a .proto file, a .thrift > file. > > > > As it will be a big feature which deserves a FLIP to describe it in > detail. I'm forward to your feedback and suggestions before I start to do > it. > > > > Best, > > Yisha > > > > [1]https://docs.snowflake.com/en/sql-reference/functions/infer_schema < > https://docs.snowflake.com/en/sql-reference/functions/infer_schema> > > > > -- > > Best, > Benchao Li >