Re: flink1.16 sql gateway hive2

2023-03-26 Thread Shengkai Fang
方勇老师说的没错。我们在文档里面也加了如何配置 hiveserver2 endpoint 的文档[1]

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/hive-compatibility/hiveserver2/#setting-up

Shammon FY  于2023年3月27日周一 08:41写道:

> Hi
>
>
> 如果要启动hiveserver2协议的gateway,需要将jar包flink-connector-hive_${scala.binary.version}放入到gateway的lib目录
>
> Best,
> Shammon FY
>
>
> On Sun, Mar 26, 2023 at 12:07 PM guanyq  wrote:
>
> > 本地启动了flink及hive在启动sql gateway时有以下异常,请问还需要其他什么操作么
> > ./bin/sql-gateway.sh start-foreground
> > -Dsql-gateway.endpoint.type=hiveserver2
> >
> -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=/usr/local/app/apache-hive-3.1.2-bin/conf
> >
> >
> > 异常信息
> >
> > Available factory identifiers are:
> > rest
> > at
> >
> org.apache.flink.table.factories.FactoryUtil.discoverFactory(FactoryUtil.java:545)
> > ~[flink-table-api-java-uber-1.16.0.jar:1.16.0]
> > at
> >
> org.apache.flink.table.gateway.api.endpoint.SqlGatewayEndpointFactoryUtils.createSqlGatewayEndpoint(SqlGatewayEndpointFactoryUtils.java:65)
> > ~[flink-table-api-java-uber-1.16.0.jar:1.16.0]
> > at org.apache.flink.table.gateway.SqlGateway.start(SqlGateway.java:72)
> > [flink-sql-gateway-1.16.0.jar:1.16.0]
> > at
> >
> org.apache.flink.table.gateway.SqlGateway.startSqlGateway(SqlGateway.java:118)
> > [flink-sql-gateway-1.16.0.jar:1.16.0]
> > at org.apache.flink.table.gateway.SqlGateway.main(SqlGateway.java:98)
> > [flink-sql-gateway-1.16.0.jar:1.16.0]
> > Exception in thread "main"
> > org.apache.flink.table.gateway.api.utils.SqlGatewayException: Failed to
> > start the endpoints.
> > at org.apache.flink.table.gateway.SqlGateway.start(SqlGateway.java:79)
> > at
> >
> org.apache.flink.table.gateway.SqlGateway.startSqlGateway(SqlGateway.java:118)
> > at org.apache.flink.table.gateway.SqlGateway.main(SqlGateway.java:98)
> > Caused by: org.apache.flink.table.api.ValidationException: Could not find
> > any factory for identifier 'hiveserver2' that implements
> > 'SqlGatewayEndpointFactory' in the classpath.
> > Available factory identifiers are:
> > rest
> > at
> >
> org.apache.flink.table.factories.FactoryUtil.discoverFactory(FactoryUtil.java:545)
> > at
> >
> org.apache.flink.table.gateway.api.endpoint.SqlGatewayEndpointFactoryUtils.createSqlGatewayEndpoint(SqlGatewayEndpointFactoryUtils.java:65)
> > at org.apache.flink.table.gateway.SqlGateway.start(SqlGateway.java:72)
> > ... 2 more
> >
> >
>


Re: flink1.16 sql gateway hive2

2023-03-26 Thread Shammon FY
Hi

如果要启动hiveserver2协议的gateway,需要将jar包flink-connector-hive_${scala.binary.version}放入到gateway的lib目录

Best,
Shammon FY


On Sun, Mar 26, 2023 at 12:07 PM guanyq  wrote:

> 本地启动了flink及hive在启动sql gateway时有以下异常,请问还需要其他什么操作么
> ./bin/sql-gateway.sh start-foreground
> -Dsql-gateway.endpoint.type=hiveserver2
> -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=/usr/local/app/apache-hive-3.1.2-bin/conf
>
>
> 异常信息
>
> Available factory identifiers are:
> rest
> at
> org.apache.flink.table.factories.FactoryUtil.discoverFactory(FactoryUtil.java:545)
> ~[flink-table-api-java-uber-1.16.0.jar:1.16.0]
> at
> org.apache.flink.table.gateway.api.endpoint.SqlGatewayEndpointFactoryUtils.createSqlGatewayEndpoint(SqlGatewayEndpointFactoryUtils.java:65)
> ~[flink-table-api-java-uber-1.16.0.jar:1.16.0]
> at org.apache.flink.table.gateway.SqlGateway.start(SqlGateway.java:72)
> [flink-sql-gateway-1.16.0.jar:1.16.0]
> at
> org.apache.flink.table.gateway.SqlGateway.startSqlGateway(SqlGateway.java:118)
> [flink-sql-gateway-1.16.0.jar:1.16.0]
> at org.apache.flink.table.gateway.SqlGateway.main(SqlGateway.java:98)
> [flink-sql-gateway-1.16.0.jar:1.16.0]
> Exception in thread "main"
> org.apache.flink.table.gateway.api.utils.SqlGatewayException: Failed to
> start the endpoints.
> at org.apache.flink.table.gateway.SqlGateway.start(SqlGateway.java:79)
> at
> org.apache.flink.table.gateway.SqlGateway.startSqlGateway(SqlGateway.java:118)
> at org.apache.flink.table.gateway.SqlGateway.main(SqlGateway.java:98)
> Caused by: org.apache.flink.table.api.ValidationException: Could not find
> any factory for identifier 'hiveserver2' that implements
> 'SqlGatewayEndpointFactory' in the classpath.
> Available factory identifiers are:
> rest
> at
> org.apache.flink.table.factories.FactoryUtil.discoverFactory(FactoryUtil.java:545)
> at
> org.apache.flink.table.gateway.api.endpoint.SqlGatewayEndpointFactoryUtils.createSqlGatewayEndpoint(SqlGatewayEndpointFactoryUtils.java:65)
> at org.apache.flink.table.gateway.SqlGateway.start(SqlGateway.java:72)
> ... 2 more
>
>


Re: flink watermark 乱序数据问题

2023-03-26 Thread Shammon FY
Hi

使用withTimestampAssigner只是定义了生成watermark消息的策略,不会影响数据流。超出指定时间的数据是否处理,可以在定义window的时候使用allowedLateness定义最晚的late
event,超出这个时间的窗口数据会直接丢弃

Best,
Shammon FY

On Sat, Mar 25, 2023 at 12:28 AM crazy <2463829...@qq.com.invalid> wrote:

> 大佬好,如下程序,flink在生成watermark策略中,forBoundedOutOfOrderness
> 这个乱序时长的指定会不会导致数据的丢失呢?比如有数据事件时间超过5ms,这条数据会进入到streamTS里吗?
>
>
> SingleOutputStreamOperator mySource.assignTimestampsAndWatermarks(
>   
> WatermarkStrategy.  
>  .withTimestampAssigner( 
>   new
> SerializableTimestampAssigner  
>@Override
>  
>   public long extractTimestamp(ClickEvent event, long
> recordTimestamp) {   
>  
>  return event.getDateTime();
>  
>   }  
> 
> })   );


Re: Flink CEP Resource Utilisation Optimisation

2023-03-26 Thread Abhishek Singla
Thanks, yes there were a lot of keys in the test input. In fact, every
event has a unique key which is not repeated in subsequent events.

On Sun, Mar 26, 2023 at 10:26 PM Geng Biao  wrote:

> I see your point. Are there lots of different keys in your test input? If
> that is the case, CEP operator in 1.15.0 will not clean some intermediate
> states(partial matches will be cleaned due to timeout but some computation 
> states
> are leaked). It is fixed in flink1.16(FLINK-31017) by Juntao Hu.
> Best,
> Biao
>
> 获取 Outlook for iOS 
> --
> *发件人:* Abhishek Singla 
> *发送时间:* Monday, March 27, 2023 12:38:59 AM
> *收件人:* Geng Biao 
> *抄送:* user@flink.apache.org 
> *主题:* Re: Flink CEP Resource Utilisation Optimisation
>
> Thanks, Geng for the quick and actionable response.
>
> I will definitely try this with Flink version >= 1.16.0 and get back with
> the observations.
>
> Regarding the checkpoint size issue, my concern is if there is no more
> state, shouldn't the checkpoint size be way less than 2 GB? I mean I was
> expecting it to be only a few MBs. Is there something I am missing here?
>
> Regards,
> Abhishek Singla
>
> On Sun, Mar 26, 2023 at 9:56 PM Geng Biao  wrote:
>
> Hi Abhishek,
>
>
>
> Thanks for sharing the experiment! As for the performance question, I
> believe you could give a try on Flink CEP with version >= 1.16.0, which
> includes the optimization introduced in FLINK-23890
> . This optimization
> will reduce lots of timer registration which can increase the throughput
> significantly. In our own experiment, given same papalism settings, the
> same job in 1.16.0 will require much less CPU usage than that in 1.15.x.
> (~100% -> ~30%). In fact, due to the implementation, the optimization
> should make CEP 10x better.  If you must use Flink1.15.0 for some reason,
> you may cherry-pick the relevant change and recompile the CEP library by
> yourself. The change does not depend on some framework changes so it may
> not cost much efforts.
>
> As for the checkpoint size issue, CEP Operator will store immediate
> matching result in the state. So if there are no new events, then there are
> no new partial matched and CEP Operator will not use more state.
>
>
>
> Best,
> Biao Geng
>
>
>
> *From: *Abhishek Singla 
> *Date: *Sunday, March 26, 2023 at 11:58 PM
> *To: *user@flink.apache.org 
> *Subject: *Flink CEP Resource Utilisation Optimisation
>
> Hi Team,
>
> *Flink Version:* 1.15.0
> *Java Version:* 1.8
> *Standalone Cluster*
> *Task Manager:* AWS EC2 of Instance Type c5n.4xlarge (vCPU 16, Memory 42
> Gb, 8 slots per TM)
> *CEP Scenario:* Kafka Event A followed by Kafka Event B within 10 mins
> *Throughput:* 20k events per second for Event A, 0 for Kafka Event B
> *State Backend:* FsStateBackend
> *Unaligned Checkpoints:* Enabled
> *asynchronousSnapshots:* true
>
>
>
> While testing this (Kafka Event A followed by Kafka Event B within 10
> mins) scenario on load environment, it took 20 nodes of TM to achieve this
> throughput otherwise either CPU utilization would reach its peak or
> backpressure would be observed because output buffers are full. The
> checkpoint size is only 6.75 GB, the state stored within the CEP operator
> would be much lesser as we do unaligned checkpointing.
>
>
> I am looking for some input on if it takes this many resources to
> archive this throughput, and if not what probably could be the issue here.
>
>
>
> There was one more issue that I found If the throughput of Event A goes to
> zero, then also the checkpoint size stays around 2 GB even after hours. Is
> this expected?
>
> Regards,
> Abhishek Singla
>
>


Re: Flink CEP Resource Utilisation Optimisation

2023-03-26 Thread Geng Biao
I see your point. Are there lots of different keys in your test input? If that 
is the case, CEP operator in 1.15.0 will not clean some intermediate 
states(partial matches will be cleaned due to timeout but some computation 
states are leaked). It is fixed in flink1.16(FLINK-31017) by Juntao Hu.
Best,
Biao

获取 Outlook for iOS

发件人: Abhishek Singla 
发送时间: Monday, March 27, 2023 12:38:59 AM
收件人: Geng Biao 
抄送: user@flink.apache.org 
主题: Re: Flink CEP Resource Utilisation Optimisation

Thanks, Geng for the quick and actionable response.

I will definitely try this with Flink version >= 1.16.0 and get back with the 
observations.

Regarding the checkpoint size issue, my concern is if there is no more state, 
shouldn't the checkpoint size be way less than 2 GB? I mean I was expecting it 
to be only a few MBs. Is there something I am missing here?

Regards,
Abhishek Singla

On Sun, Mar 26, 2023 at 9:56 PM Geng Biao 
mailto:biaoge...@gmail.com>> wrote:

Hi Abhishek,



Thanks for sharing the experiment! As for the performance question, I believe 
you could give a try on Flink CEP with version >= 1.16.0, which includes the 
optimization introduced in 
FLINK-23890. This 
optimization will reduce lots of timer registration which can increase the 
throughput significantly. In our own experiment, given same papalism settings, 
the same job in 1.16.0 will require much less CPU usage than that in 1.15.x. 
(~100% -> ~30%). In fact, due to the implementation, the optimization should 
make CEP 10x better.  If you must use Flink1.15.0 for some reason, you may 
cherry-pick the relevant change and recompile the CEP library by yourself. The 
change does not depend on some framework changes so it may not cost much 
efforts.

As for the checkpoint size issue, CEP Operator will store immediate matching 
result in the state. So if there are no new events, then there are no new 
partial matched and CEP Operator will not use more state.



Best,
Biao Geng



From: Abhishek Singla 
mailto:abhisheksingla...@gmail.com>>
Date: Sunday, March 26, 2023 at 11:58 PM
To: user@flink.apache.org 
mailto:user@flink.apache.org>>
Subject: Flink CEP Resource Utilisation Optimisation

Hi Team,

Flink Version: 1.15.0
Java Version: 1.8
Standalone Cluster
Task Manager: AWS EC2 of Instance Type c5n.4xlarge (vCPU 16, Memory 42 Gb, 8 
slots per TM)
CEP Scenario: Kafka Event A followed by Kafka Event B within 10 mins
Throughput: 20k events per second for Event A, 0 for Kafka Event B
State Backend: FsStateBackend
Unaligned Checkpoints: Enabled
asynchronousSnapshots: true



While testing this (Kafka Event A followed by Kafka Event B within 10 mins) 
scenario on load environment, it took 20 nodes of TM to achieve this throughput 
otherwise either CPU utilization would reach its peak or backpressure would be 
observed because output buffers are full. The checkpoint size is only 6.75 GB, 
the state stored within the CEP operator would be much lesser as we do 
unaligned checkpointing.

I am looking for some input on if it takes this many resources to archive this 
throughput, and if not what probably could be the issue here.



There was one more issue that I found If the throughput of Event A goes to 
zero, then also the checkpoint size stays around 2 GB even after hours. Is this 
expected?

Regards,
Abhishek Singla


Re: Flink CEP Resource Utilisation Optimisation

2023-03-26 Thread Abhishek Singla
Thanks, Geng for the quick and actionable response.

I will definitely try this with Flink version >= 1.16.0 and get back with
the observations.

Regarding the checkpoint size issue, my concern is if there is no more
state, shouldn't the checkpoint size be way less than 2 GB? I mean I was
expecting it to be only a few MBs. Is there something I am missing here?

Regards,
Abhishek Singla

On Sun, Mar 26, 2023 at 9:56 PM Geng Biao  wrote:

> Hi Abhishek,
>
>
>
> Thanks for sharing the experiment! As for the performance question, I
> believe you could give a try on Flink CEP with version >= 1.16.0, which
> includes the optimization introduced in FLINK-23890
> . This optimization
> will reduce lots of timer registration which can increase the throughput
> significantly. In our own experiment, given same papalism settings, the
> same job in 1.16.0 will require much less CPU usage than that in 1.15.x.
> (~100% -> ~30%). In fact, due to the implementation, the optimization
> should make CEP 10x better.  If you must use Flink1.15.0 for some reason,
> you may cherry-pick the relevant change and recompile the CEP library by
> yourself. The change does not depend on some framework changes so it may
> not cost much efforts.
>
> As for the checkpoint size issue, CEP Operator will store immediate
> matching result in the state. So if there are no new events, then there are
> no new partial matched and CEP Operator will not use more state.
>
>
>
> Best,
> Biao Geng
>
>
>
> *From: *Abhishek Singla 
> *Date: *Sunday, March 26, 2023 at 11:58 PM
> *To: *user@flink.apache.org 
> *Subject: *Flink CEP Resource Utilisation Optimisation
>
> Hi Team,
>
> *Flink Version:* 1.15.0
> *Java Version:* 1.8
> *Standalone Cluster*
> *Task Manager:* AWS EC2 of Instance Type c5n.4xlarge (vCPU 16, Memory 42
> Gb, 8 slots per TM)
> *CEP Scenario:* Kafka Event A followed by Kafka Event B within 10 mins
> *Throughput:* 20k events per second for Event A, 0 for Kafka Event B
> *State Backend:* FsStateBackend
> *Unaligned Checkpoints:* Enabled
> *asynchronousSnapshots:* true
>
>
>
> While testing this (Kafka Event A followed by Kafka Event B within 10
> mins) scenario on load environment, it took 20 nodes of TM to achieve this
> throughput otherwise either CPU utilization would reach its peak or
> backpressure would be observed because output buffers are full. The
> checkpoint size is only 6.75 GB, the state stored within the CEP operator
> would be much lesser as we do unaligned checkpointing.
>
>
> I am looking for some input on if it takes this many resources to
> archive this throughput, and if not what probably could be the issue here.
>
>
>
> There was one more issue that I found If the throughput of Event A goes to
> zero, then also the checkpoint size stays around 2 GB even after hours. Is
> this expected?
>
> Regards,
> Abhishek Singla
>


Re: Flink CEP Resource Utilisation Optimisation

2023-03-26 Thread Geng Biao
Hi Abhishek,

Thanks for sharing the experiment! As for the performance question, I believe 
you could give a try on Flink CEP with version >= 1.16.0, which includes the 
optimization introduced in 
FLINK-23890. This 
optimization will reduce lots of timer registration which can increase the 
throughput significantly. In our own experiment, given same papalism settings, 
the same job in 1.16.0 will require much less CPU usage than that in 1.15.x. 
(~100% -> ~30%). In fact, due to the implementation, the optimization should 
make CEP 10x better.  If you must use Flink1.15.0 for some reason, you may 
cherry-pick the relevant change and recompile the CEP library by yourself. The 
change does not depend on some framework changes so it may not cost much 
efforts.
As for the checkpoint size issue, CEP Operator will store immediate matching 
result in the state. So if there are no new events, then there are no new 
partial matched and CEP Operator will not use more state.

Best,
Biao Geng

From: Abhishek Singla 
Date: Sunday, March 26, 2023 at 11:58 PM
To: user@flink.apache.org 
Subject: Flink CEP Resource Utilisation Optimisation
Hi Team,

Flink Version: 1.15.0
Java Version: 1.8
Standalone Cluster
Task Manager: AWS EC2 of Instance Type c5n.4xlarge (vCPU 16, Memory 42 Gb, 8 
slots per TM)
CEP Scenario: Kafka Event A followed by Kafka Event B within 10 mins
Throughput: 20k events per second for Event A, 0 for Kafka Event B
State Backend: FsStateBackend
Unaligned Checkpoints: Enabled
asynchronousSnapshots: true

While testing this (Kafka Event A followed by Kafka Event B within 10 mins) 
scenario on load environment, it took 20 nodes of TM to achieve this throughput 
otherwise either CPU utilization would reach its peak or backpressure would be 
observed because output buffers are full. The checkpoint size is only 6.75 GB, 
the state stored within the CEP operator would be much lesser as we do 
unaligned checkpointing.

I am looking for some input on if it takes this many resources to archive this 
throughput, and if not what probably could be the issue here.

There was one more issue that I found If the throughput of Event A goes to 
zero, then also the checkpoint size stays around 2 GB even after hours. Is this 
expected?

Regards,
Abhishek Singla


Re: Flink CEP Resource Utilisation Optimisation

2023-03-26 Thread simple
退订



发自我的iPhone


-- Original --
From: Abhishek Singla 

Flink CEP Resource Utilisation Optimisation

2023-03-26 Thread Abhishek Singla
Hi Team,

*Flink Version:* 1.15.0
*Java Version:* 1.8
*Standalone Cluster*
*Task Manager:* AWS EC2 of Instance Type c5n.4xlarge (vCPU 16, Memory 42
Gb, 8 slots per TM)
*CEP Scenario:* Kafka Event A followed by Kafka Event B within 10 mins
*Throughput:* 20k events per second for Event A, 0 for Kafka Event B
*State Backend:* FsStateBackend
*Unaligned Checkpoints:* Enabled
*asynchronousSnapshots:* true

While testing this (Kafka Event A followed by Kafka Event B within 10 mins)
scenario on load environment, it took 20 nodes of TM to achieve this
throughput otherwise either CPU utilization would reach its peak or
backpressure would be observed because output buffers are full. The
checkpoint size is only 6.75 GB, the state stored within the CEP operator
would be much lesser as we do unaligned checkpointing.

I am looking for some input on if it takes this many resources to
archive this throughput, and if not what probably could be the issue here.

There was one more issue that I found If the throughput of Event A goes to
zero, then also the checkpoint size stays around 2 GB even after hours. Is
this expected?

Regards,
Abhishek Singla


Re: Are metadata columns required to get declared in the table's schema?

2023-03-26 Thread Hang Ruan
Hi, Jie,

If you don't need these metadata columns, you don't need to declare them
for the table. Then metadata columns will not be read from sources and will
not be written into the sink.
You can query a table that is without the metadata column declaration. It
depends on your requests.

Best,
Hang

Jie Han  于2023年3月26日周日 21:42写道:

> Thank you for your respond.
> Actually I noticed that the doc says 'However, declaring a metadata column
> in a table’s schema is optional’.
> So, does it mean that we don’t need to declare it when we don't query it
> rather than we can query it without the declaration?
>
> Best,
> Jay
>
>
>
>
>
>
>


Re: Are metadata columns required to get declared in the table's schema?

2023-03-26 Thread Jie Han
Thank you for your respond.
Actually I noticed that the doc says 'However, declaring a metadata column in a 
table’s schema is optional’.
So, does it mean that we don’t need to declare it when we don't query it rather 
than we can query it without the declaration?

Best,
Jay








Re: Are metadata columns required to get declared in the table's schema?

2023-03-26 Thread Hang Ruan
ps : DDL I said is the CREATE TABLE statements.

Best,
Hang

Hang Ruan  于2023年3月26日周日 21:33写道:

> Hi, Jie,
>
> In Flink, if we want to access a metadata column, we need to declare it in
> the DDL.
> More details could be found here[1].
>
> Best,
> Hang
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/create/#columns
>
> Jie Han  于2023年3月26日周日 14:58写道:
>
>> Hi community, I want to query a metadata column from my table t. Do I
>> need to declare it in the table schema explicitly?
>>
>> In spark, metadata columns are *hidden* columns, which means we don’t
>> need to declare it in the table ddl, we only explicitly reference it in our
>> query. For instance, select *, _metadata from t.
>>
>>
>>


Re: Are metadata columns required to get declared in the table's schema?

2023-03-26 Thread Hang Ruan
Hi, Jie,

In Flink, if we want to access a metadata column, we need to declare it in
the DDL.
More details could be found here[1].

Best,
Hang

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/sql/create/#columns

Jie Han  于2023年3月26日周日 14:58写道:

> Hi community, I want to query a metadata column from my table t. Do I
> need to declare it in the table schema explicitly?
>
> In spark, metadata columns are *hidden* columns, which means we don’t
> need to declare it in the table ddl, we only explicitly reference it in our
> query. For instance, select *, _metadata from t.
>
>
>


Re: Table API function and expression vs SQL

2023-03-26 Thread ravi_suryavanshi.yahoo.com via user
 Thanks a lot Hand and Mate
On Saturday, 25 March, 2023 at 06:21:49 pm IST, Mate Czagany 
 wrote:  
 
 Hi,
Please also keep in mind that restoring existing Table API jobs from savepoints 
when upgrading to a newer minor version of Flink, e.g. 1.16 -> 1.17 is not 
supported as the topology might change between these versions due to optimizer 
changes.
See here for more information: 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/concepts/overview/#stateful-upgrades-and-evolution
Regards,Mate

Hang Ruan  ezt írta (időpont: 2023. márc. 25., Szo, 
13:38):

Hi, 
I think the SQL job is better. Flink SQL jobs can be easily shared with others 
for debugging. And it is more suitable for flow batch integration.For a small 
part of jobs which can not be expressed through SQL, we will choose a job by 
DataStream API.
Best,Hang
ravi_suryavanshi.yahoo.com via user  于2023年3月24日周五 
17:25写道:

Hello Team,Need your advice on which method is recommended considering don't 
want to change my query code when the Flink is updated/upgraded to the higher 
version.
Here I am seeking advice for writing the SQL using java code(Table API  
function and Expression) or using pure SQL.
I am assuming that SQL will not have any impact if upgraded to the higher 
version.
Thanks and Regards,Ravi

  

Are metadata columns required to get declared in the table's schema?

2023-03-26 Thread Jie Han
Hi community, I want to query a metadata column from my table t. Do I need to 
declare it in the table schema explicitly? 

In spark, metadata columns are hidden columns, which means we don’t need to 
declare it in the table ddl, we only explicitly reference it in our query. For 
instance, select *, _metadata from t.