[jira] [Commented] (FLINK-25318) Improvement of scheduler and execution for Flink OLAP

Piotr Nowojski (Jira) Thu, 16 Dec 2021 02:52:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-25318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460615#comment-17460615
 ]


Piotr Nowojski commented on FLINK-25318:
----------------------------------------

Hi all. Thanks for taking up this interesting initiative. So far we - Flink 
developers - were not paying much attention to short living jobs/queries, and 
often that impacted our decisions in the past. It would be interesting to see 
how much demand is there for such use cases in Flink and how much we can 
improve Flink in this regard.

I would like to point out two things. 

# If we care about something, it should be tested, otherwise feature might be 
lost/accidentally removed. Here the feature is performance, and as such I think 
ideally (whenever it's feasible) every change that you are doing should be 
backed up by a visible benchmark improvement.
# For quite some time we are maintaining [various micro benchmarks 
|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=115511847]. 
Since they are micro benchmarks, some of them are submitting a job with small 
bounded input and simply measuring how long does it take to process the bounded 
input and those jobs are expected to finish under 1s. This accidentally was 
testing OLAP use cases. That was not our purpose, just a side effect of trying 
to make benchmarks run quickly. If we detected the performance regression that 
was caused by slower startup/initialisation (for example FLINK-23593), most 
often we were simply ignoring it or just extending the length of the test. If 
OLAP support is something that we want to seriously tackle, it would be great 
to have more support from the OLAP devs in investigating and policing this kind 
of issues in the future. Help with that would be very much welcome by the 
community.

> Improvement of scheduler and execution for Flink OLAP
> -----------------------------------------------------
>
>                 Key: FLINK-25318
>                 URL: https://issues.apache.org/jira/browse/FLINK-25318
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination, Runtime / Network
>    Affects Versions: 1.14.0, 1.12.5, 1.13.3
>            Reporter: Shammon
>            Priority: Major
>              Labels: Umbrella
>             Fix For: 1.15.0
>
>
> We use flink to perform OLAP queries. We launch flink session cluster, submit 
> batch jobs to the cluster as OLAP queries, and fetch the jobs' results. OLAP 
> jobs are generally small queries which will finish at the seconds or 
> milliseconds, and users always submit multiple jobs to the session cluster 
> concurrently. We found the qps and latency of jobs will be greatly affected 
> when there're tens jobs are running, even when there's little data in each 
> query. We will give the result of benchmark for the latest version later.
> After discussed with [~xtsong], and thanks for his advice, we create this 
> issue to trace and manager Flink OLAP related improvements. More users and 
> developers are welcome and feel free to create Flink OLAP related subtasks 
> here, thanks



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-25318) Improvement of scheduler and execution for Flink OLAP

Reply via email to