Re: SQL-gateway Failed to Run

2023-07-05 Thread liu ron
Hi, Xiaolong

As Shammon says, I think you should the exception info of Flink cluster
first to confirm the root cause.

Best,
Ron

Shammon FY  于2023年7月4日周二 16:44写道:

> Hi Xiaolong,
>
> I think you may need to check the error log in the flink cluster to find
> out the root cause.
>
> Best,
> Shammon FY
>
> On Tue, Jul 4, 2023 at 3:38 PM Xiaolong Wang 
> wrote:
>
>> The flink web ui is fine until I run the Hive query. After that the flink
>> deployment is down and the web UI is not accessible.
>>
>> On Tue, Jul 4, 2023 at 9:13 AM Shammon FY  wrote:
>>
>>> Hi Xiaolong,
>>>
>>> From the exception it seems that the flink session cluster is not
>>> running properly. Can you visit the flink web ui and everything is ok?
>>>
>>> Best,
>>> Shammon FY
>>>
>>> On Mon, Jul 3, 2023 at 2:43 PM Xiaolong Wang <
>>> xiaolong.w...@smartnews.com> wrote:
>>>
 Hi,
 I've tested the Flink SQL-gateway to run some simple Hive queries but
 met some exceptions.


 Environment Description:
 Run on : Kubernetes
 Deployment Mode: Session Mode (created by a flink-kubernetes-operator)
 Steps to run:
 1. Apply a `flinkdeployment` of flink session cluster to flink operator
 ```
 apiVersion: flink.apache.org/v1beta1
 kind: FlinkDeployment
 metadata:
   name: flink-session-cluster-example
   namespace: xxx
 spec:
   image: xxx/flink:1.17-sql-gateway-dev
   flinkVersion: v1_17
   flinkConfiguration:
 taskmanager.numberOfTaskSlots: "2"
 pipeline.max-parallelism: "1000"
 state.backend.type: rocksdb
 state.backend.incremental: "true"
 state.checkpoints.dir: xxx
 execution.checkpointing.interval: 1m
 execution.checkpointing.timeout: 30m
 high-availability:
 org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
 high-availability.storageDir: xxx
 akka.framesize: 20971520b
 execution.checkpointing.externalized-checkpoint-retention:
 RETAIN_ON_CANCELLATION
 taskmanager.memory.managed.fraction: "0.2"
 kubernetes.hadoop.conf.config-map.name: xxx
   serviceAccount: default
   podTemplate:
 apiVersion: v1
 kind: Pod
 metadata:
   name: pod-template
 spec:
   serviceAccount: default
   jobManager:
 resource:
   memory: "2048m"
   cpu: 1
   taskManager:
 resource:
   memory: "4096m"
   cpu: 1
 ```
 This image has been built with a `hadoop dependency` , an existing
 `hadoop configmap`.

 2. Login to the job-manager pod and run the followings
 `./bin/sql-gateway.sh start-foreground
 -Dsql-gateway.endpoint.type=hiveserver2
 -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=/opt/flink/hive-conf`

 3. Start a beeline and connect to the SQL gateway then run a simple
 Hive query
 `select count(1) from simple_demo_output where dt = '2021-08-14';`

 4.The SQL gateway goes wrong with the following logs:
 ```

 2023-07-03 06:27:11,078 INFO  
 org.apache.flink.client.program.rest.RestClusterClient
   [] - Submitting job 'collect' (4c99c40392cb935d3df94891655d2ce5).

 2023-07-03 06:27:15,092 INFO  
 org.apache.flink.client.program.rest.RestClusterClient
   [] - Successfully submitted job 'collect'
 (4c99c40392cb935d3df94891655d2ce5) to '
 http://flink-session-cluster-example-rest.realtime-streaming:8081'.

 2023-07-03 06:27:15,879 ERROR
 org.apache.flink.table.gateway.service.operation.OperationManager [] -
 Failed to execute the operation 7613f663-8641-428c-b3d2-ec77a12fa6ee.

 org.apache.flink.table.api.TableException: Failed to execute sql

 at
 org.apache.flink.table.api.internal.TableEnvironmentImpl.executeQueryOperation(TableEnvironmentImpl.java:974)
 ~[flink-table-api-java-uber-1.17.1.jar:1.17.1]

 at
 org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1422)
 ~[flink-table-api-java-uber-1.17.1.jar:1.17.1]

 at
 org.apache.flink.table.gateway.service.operation.OperationExecutor.executeOperation(OperationExecutor.java:431)
 ~[flink-sql-gateway-1.17.1.jar:1.17.1]

 at
 org.apache.flink.table.gateway.service.operation.OperationExecutor.executeStatement(OperationExecutor.java:195)
 ~[flink-sql-gateway-1.17.1.jar:1.17.1]

 at
 org.apache.flink.table.gateway.service.SqlGatewayServiceImpl.lambda$executeStatement$1(SqlGatewayServiceImpl.java:212)
 ~[flink-sql-gateway-1.17.1.jar:1.17.1]

 at
 org.apache.flink.table.gateway.service.operation.OperationManager.lambda$submitOperation$1(OperationManager.java:119)
 ~[flink-sql-gateway-1.17.1.jar:1.17.1]

 at
 

Re: SQL-gateway Failed to Run

2023-07-04 Thread Shammon FY
Hi Xiaolong,

I think you may need to check the error log in the flink cluster to find
out the root cause.

Best,
Shammon FY

On Tue, Jul 4, 2023 at 3:38 PM Xiaolong Wang 
wrote:

> The flink web ui is fine until I run the Hive query. After that the flink
> deployment is down and the web UI is not accessible.
>
> On Tue, Jul 4, 2023 at 9:13 AM Shammon FY  wrote:
>
>> Hi Xiaolong,
>>
>> From the exception it seems that the flink session cluster is not
>> running properly. Can you visit the flink web ui and everything is ok?
>>
>> Best,
>> Shammon FY
>>
>> On Mon, Jul 3, 2023 at 2:43 PM Xiaolong Wang 
>> wrote:
>>
>>> Hi,
>>> I've tested the Flink SQL-gateway to run some simple Hive queries but
>>> met some exceptions.
>>>
>>>
>>> Environment Description:
>>> Run on : Kubernetes
>>> Deployment Mode: Session Mode (created by a flink-kubernetes-operator)
>>> Steps to run:
>>> 1. Apply a `flinkdeployment` of flink session cluster to flink operator
>>> ```
>>> apiVersion: flink.apache.org/v1beta1
>>> kind: FlinkDeployment
>>> metadata:
>>>   name: flink-session-cluster-example
>>>   namespace: xxx
>>> spec:
>>>   image: xxx/flink:1.17-sql-gateway-dev
>>>   flinkVersion: v1_17
>>>   flinkConfiguration:
>>> taskmanager.numberOfTaskSlots: "2"
>>> pipeline.max-parallelism: "1000"
>>> state.backend.type: rocksdb
>>> state.backend.incremental: "true"
>>> state.checkpoints.dir: xxx
>>> execution.checkpointing.interval: 1m
>>> execution.checkpointing.timeout: 30m
>>> high-availability:
>>> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
>>> high-availability.storageDir: xxx
>>> akka.framesize: 20971520b
>>> execution.checkpointing.externalized-checkpoint-retention:
>>> RETAIN_ON_CANCELLATION
>>> taskmanager.memory.managed.fraction: "0.2"
>>> kubernetes.hadoop.conf.config-map.name: xxx
>>>   serviceAccount: default
>>>   podTemplate:
>>> apiVersion: v1
>>> kind: Pod
>>> metadata:
>>>   name: pod-template
>>> spec:
>>>   serviceAccount: default
>>>   jobManager:
>>> resource:
>>>   memory: "2048m"
>>>   cpu: 1
>>>   taskManager:
>>> resource:
>>>   memory: "4096m"
>>>   cpu: 1
>>> ```
>>> This image has been built with a `hadoop dependency` , an existing
>>> `hadoop configmap`.
>>>
>>> 2. Login to the job-manager pod and run the followings
>>> `./bin/sql-gateway.sh start-foreground
>>> -Dsql-gateway.endpoint.type=hiveserver2
>>> -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=/opt/flink/hive-conf`
>>>
>>> 3. Start a beeline and connect to the SQL gateway then run a simple Hive
>>> query
>>> `select count(1) from simple_demo_output where dt = '2021-08-14';`
>>>
>>> 4.The SQL gateway goes wrong with the following logs:
>>> ```
>>>
>>> 2023-07-03 06:27:11,078 INFO  
>>> org.apache.flink.client.program.rest.RestClusterClient
>>>   [] - Submitting job 'collect' (4c99c40392cb935d3df94891655d2ce5).
>>>
>>> 2023-07-03 06:27:15,092 INFO  
>>> org.apache.flink.client.program.rest.RestClusterClient
>>>   [] - Successfully submitted job 'collect'
>>> (4c99c40392cb935d3df94891655d2ce5) to '
>>> http://flink-session-cluster-example-rest.realtime-streaming:8081'.
>>>
>>> 2023-07-03 06:27:15,879 ERROR
>>> org.apache.flink.table.gateway.service.operation.OperationManager [] -
>>> Failed to execute the operation 7613f663-8641-428c-b3d2-ec77a12fa6ee.
>>>
>>> org.apache.flink.table.api.TableException: Failed to execute sql
>>>
>>> at
>>> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeQueryOperation(TableEnvironmentImpl.java:974)
>>> ~[flink-table-api-java-uber-1.17.1.jar:1.17.1]
>>>
>>> at
>>> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1422)
>>> ~[flink-table-api-java-uber-1.17.1.jar:1.17.1]
>>>
>>> at
>>> org.apache.flink.table.gateway.service.operation.OperationExecutor.executeOperation(OperationExecutor.java:431)
>>> ~[flink-sql-gateway-1.17.1.jar:1.17.1]
>>>
>>> at
>>> org.apache.flink.table.gateway.service.operation.OperationExecutor.executeStatement(OperationExecutor.java:195)
>>> ~[flink-sql-gateway-1.17.1.jar:1.17.1]
>>>
>>> at
>>> org.apache.flink.table.gateway.service.SqlGatewayServiceImpl.lambda$executeStatement$1(SqlGatewayServiceImpl.java:212)
>>> ~[flink-sql-gateway-1.17.1.jar:1.17.1]
>>>
>>> at
>>> org.apache.flink.table.gateway.service.operation.OperationManager.lambda$submitOperation$1(OperationManager.java:119)
>>> ~[flink-sql-gateway-1.17.1.jar:1.17.1]
>>>
>>> at
>>> org.apache.flink.table.gateway.service.operation.OperationManager$Operation.lambda$run$0(OperationManager.java:258)
>>> ~[flink-sql-gateway-1.17.1.jar:1.17.1]
>>>
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>>> [?:?]
>>>
>>> at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
>>>
>>> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>>> [?:?]
>>>
>>> at 

Re: SQL-gateway Failed to Run

2023-07-03 Thread Shammon FY
Hi Xiaolong,

>From the exception it seems that the flink session cluster is not
running properly. Can you visit the flink web ui and everything is ok?

Best,
Shammon FY

On Mon, Jul 3, 2023 at 2:43 PM Xiaolong Wang 
wrote:

> Hi,
> I've tested the Flink SQL-gateway to run some simple Hive queries but met
> some exceptions.
>
>
> Environment Description:
> Run on : Kubernetes
> Deployment Mode: Session Mode (created by a flink-kubernetes-operator)
> Steps to run:
> 1. Apply a `flinkdeployment` of flink session cluster to flink operator
> ```
> apiVersion: flink.apache.org/v1beta1
> kind: FlinkDeployment
> metadata:
>   name: flink-session-cluster-example
>   namespace: xxx
> spec:
>   image: xxx/flink:1.17-sql-gateway-dev
>   flinkVersion: v1_17
>   flinkConfiguration:
> taskmanager.numberOfTaskSlots: "2"
> pipeline.max-parallelism: "1000"
> state.backend.type: rocksdb
> state.backend.incremental: "true"
> state.checkpoints.dir: xxx
> execution.checkpointing.interval: 1m
> execution.checkpointing.timeout: 30m
> high-availability:
> org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
> high-availability.storageDir: xxx
> akka.framesize: 20971520b
> execution.checkpointing.externalized-checkpoint-retention:
> RETAIN_ON_CANCELLATION
> taskmanager.memory.managed.fraction: "0.2"
> kubernetes.hadoop.conf.config-map.name: xxx
>   serviceAccount: default
>   podTemplate:
> apiVersion: v1
> kind: Pod
> metadata:
>   name: pod-template
> spec:
>   serviceAccount: default
>   jobManager:
> resource:
>   memory: "2048m"
>   cpu: 1
>   taskManager:
> resource:
>   memory: "4096m"
>   cpu: 1
> ```
> This image has been built with a `hadoop dependency` , an existing `hadoop
> configmap`.
>
> 2. Login to the job-manager pod and run the followings
> `./bin/sql-gateway.sh start-foreground
> -Dsql-gateway.endpoint.type=hiveserver2
> -Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=/opt/flink/hive-conf`
>
> 3. Start a beeline and connect to the SQL gateway then run a simple Hive
> query
> `select count(1) from simple_demo_output where dt = '2021-08-14';`
>
> 4.The SQL gateway goes wrong with the following logs:
> ```
>
> 2023-07-03 06:27:11,078 INFO  
> org.apache.flink.client.program.rest.RestClusterClient
>   [] - Submitting job 'collect' (4c99c40392cb935d3df94891655d2ce5).
>
> 2023-07-03 06:27:15,092 INFO  
> org.apache.flink.client.program.rest.RestClusterClient
>   [] - Successfully submitted job 'collect'
> (4c99c40392cb935d3df94891655d2ce5) to '
> http://flink-session-cluster-example-rest.realtime-streaming:8081'.
>
> 2023-07-03 06:27:15,879 ERROR
> org.apache.flink.table.gateway.service.operation.OperationManager [] -
> Failed to execute the operation 7613f663-8641-428c-b3d2-ec77a12fa6ee.
>
> org.apache.flink.table.api.TableException: Failed to execute sql
>
> at
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeQueryOperation(TableEnvironmentImpl.java:974)
> ~[flink-table-api-java-uber-1.17.1.jar:1.17.1]
>
> at
> org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1422)
> ~[flink-table-api-java-uber-1.17.1.jar:1.17.1]
>
> at
> org.apache.flink.table.gateway.service.operation.OperationExecutor.executeOperation(OperationExecutor.java:431)
> ~[flink-sql-gateway-1.17.1.jar:1.17.1]
>
> at
> org.apache.flink.table.gateway.service.operation.OperationExecutor.executeStatement(OperationExecutor.java:195)
> ~[flink-sql-gateway-1.17.1.jar:1.17.1]
>
> at
> org.apache.flink.table.gateway.service.SqlGatewayServiceImpl.lambda$executeStatement$1(SqlGatewayServiceImpl.java:212)
> ~[flink-sql-gateway-1.17.1.jar:1.17.1]
>
> at
> org.apache.flink.table.gateway.service.operation.OperationManager.lambda$submitOperation$1(OperationManager.java:119)
> ~[flink-sql-gateway-1.17.1.jar:1.17.1]
>
> at
> org.apache.flink.table.gateway.service.operation.OperationManager$Operation.lambda$run$0(OperationManager.java:258)
> ~[flink-sql-gateway-1.17.1.jar:1.17.1]
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> [?:?]
>
> at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
>
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> [?:?]
>
> at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]
>
> at java.lang.Thread.run(Unknown Source) [?:?]
>
> Caused by: org.apache.flink.util.FlinkException: Failed to execute job
> 'collect'.
>
> at
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2212)
> ~[flink-dist-1.17.1.jar:1.17.1]
>
> at
> org.apache.flink.table.planner.delegation.DefaultExecutor.executeAsync(DefaultExecutor.java:95)
> 

SQL-gateway Failed to Run

2023-07-03 Thread Xiaolong Wang
Hi,
I've tested the Flink SQL-gateway to run some simple Hive queries but met
some exceptions.


Environment Description:
Run on : Kubernetes
Deployment Mode: Session Mode (created by a flink-kubernetes-operator)
Steps to run:
1. Apply a `flinkdeployment` of flink session cluster to flink operator
```
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
  name: flink-session-cluster-example
  namespace: xxx
spec:
  image: xxx/flink:1.17-sql-gateway-dev
  flinkVersion: v1_17
  flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
pipeline.max-parallelism: "1000"
state.backend.type: rocksdb
state.backend.incremental: "true"
state.checkpoints.dir: xxx
execution.checkpointing.interval: 1m
execution.checkpointing.timeout: 30m
high-availability:
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: xxx
akka.framesize: 20971520b
execution.checkpointing.externalized-checkpoint-retention:
RETAIN_ON_CANCELLATION
taskmanager.memory.managed.fraction: "0.2"
kubernetes.hadoop.conf.config-map.name: xxx
  serviceAccount: default
  podTemplate:
apiVersion: v1
kind: Pod
metadata:
  name: pod-template
spec:
  serviceAccount: default
  jobManager:
resource:
  memory: "2048m"
  cpu: 1
  taskManager:
resource:
  memory: "4096m"
  cpu: 1
```
This image has been built with a `hadoop dependency` , an existing `hadoop
configmap`.

2. Login to the job-manager pod and run the followings
`./bin/sql-gateway.sh start-foreground
-Dsql-gateway.endpoint.type=hiveserver2
-Dsql-gateway.endpoint.hiveserver2.catalog.hive-conf-dir=/opt/flink/hive-conf`

3. Start a beeline and connect to the SQL gateway then run a simple Hive
query
`select count(1) from simple_demo_output where dt = '2021-08-14';`

4.The SQL gateway goes wrong with the following logs:
```

2023-07-03 06:27:11,078 INFO
org.apache.flink.client.program.rest.RestClusterClient
  [] - Submitting job 'collect' (4c99c40392cb935d3df94891655d2ce5).

2023-07-03 06:27:15,092 INFO
org.apache.flink.client.program.rest.RestClusterClient
  [] - Successfully submitted job 'collect'
(4c99c40392cb935d3df94891655d2ce5) to '
http://flink-session-cluster-example-rest.realtime-streaming:8081'.

2023-07-03 06:27:15,879 ERROR
org.apache.flink.table.gateway.service.operation.OperationManager [] -
Failed to execute the operation 7613f663-8641-428c-b3d2-ec77a12fa6ee.

org.apache.flink.table.api.TableException: Failed to execute sql

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeQueryOperation(TableEnvironmentImpl.java:974)
~[flink-table-api-java-uber-1.17.1.jar:1.17.1]

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeInternal(TableEnvironmentImpl.java:1422)
~[flink-table-api-java-uber-1.17.1.jar:1.17.1]

at
org.apache.flink.table.gateway.service.operation.OperationExecutor.executeOperation(OperationExecutor.java:431)
~[flink-sql-gateway-1.17.1.jar:1.17.1]

at
org.apache.flink.table.gateway.service.operation.OperationExecutor.executeStatement(OperationExecutor.java:195)
~[flink-sql-gateway-1.17.1.jar:1.17.1]

at
org.apache.flink.table.gateway.service.SqlGatewayServiceImpl.lambda$executeStatement$1(SqlGatewayServiceImpl.java:212)
~[flink-sql-gateway-1.17.1.jar:1.17.1]

at
org.apache.flink.table.gateway.service.operation.OperationManager.lambda$submitOperation$1(OperationManager.java:119)
~[flink-sql-gateway-1.17.1.jar:1.17.1]

at
org.apache.flink.table.gateway.service.operation.OperationManager$Operation.lambda$run$0(OperationManager.java:258)
~[flink-sql-gateway-1.17.1.jar:1.17.1]

at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]

at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]

at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) [?:?]

at java.util.concurrent.FutureTask.run(Unknown Source) [?:?]

at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) [?:?]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) [?:?]

at java.lang.Thread.run(Unknown Source) [?:?]

Caused by: org.apache.flink.util.FlinkException: Failed to execute job
'collect'.

at
org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2212)
~[flink-dist-1.17.1.jar:1.17.1]

at
org.apache.flink.table.planner.delegation.DefaultExecutor.executeAsync(DefaultExecutor.java:95)
~[flink-table-planner_2.12-1.17.1.jar:1.17.1]

at
org.apache.flink.table.api.internal.TableEnvironmentImpl.executeQueryOperation(TableEnvironmentImpl.java:955)
~[flink-table-api-java-uber-1.17.1.jar:1.17.1]

... 13 more

Caused by: java.lang.RuntimeException: Error while waiting for job to be
initialized

at
org.apache.flink.client.ClientUtils.waitUntilJobInitializationFinished(ClientUtils.java:151)
~[flink-dist-1.17.1.jar:1.17.1]

at