Flink1.14 table api & sql针对递增维度聚合 ttl是如何处理的

2021-12-10 文章 guanyq
请大佬指导下:
需求: 通过flink sql 统计每天各个省份的订单受理量,显然这种维度统计时递增,如何设置ttl,只想让维度存储1周的数据。
维度递增很可能会导致内存溢出,请教下flink sql ttl 配置在官网哪里有说明么。



 





 

Re:flink本地编译卡住

2021-12-10 文章 Yuepeng Pan



图片挂掉了,可以放到图床或者附带一些原提示信息。













在 2021-12-11 11:19:51,"Jeff"  写道:

根据官方建议的maven打包命令: mvm install -Dfast -DskipTests -Dscalla-2.12 -T 1C 
,但我在本地编译打包总是卡在flink-table-runtim-blink这里,也没有错误提示,如下图:
请问有什么处理方法么?




 





 

flink本地编译卡住

2021-12-10 文章 Jeff
根据官方建议的maven打包命令: mvm install -Dfast -DskipTests -Dscalla-2.12 -T 1C 
,但我在本地编译打包总是卡在flink-table-runtim-blink这里,也没有错误提示,如下图:
请问有什么处理方法么?




 

flinksql????????

2021-12-10 文章 ??????
?? 
     
flinksqlA_now:AA_now??A??
 
 
//sql  

StatementSet stmtSet = tenv.createStatementSet () ; 
stmtSet.addInsertSql ( insertSqlMongoDB ) ; 
stmtSet.addInsertSql ( insertSql ) ; 
stmtSet.execute () ; 
// 
/**  ?? */ 
 { 
MongoUtil2  = MongoUtil2??  () ; 
MongoCollection < Document > oldData = instance.getCollection ( db, 
"t_up_tag_data_" +mongoKey ) ; 
MongoCollection

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-10 文章 刘建刚
Glad to see the suggestion. In our test, we found that small jobs with the
changing configs can not improve the performance much just as your test. I
have some suggestions:

   - The config can affect the memory usage. Will the related memory
   configs be changed?
   - Can you share the tpcds results for different configs? Although we
   change the default values, it is helpful to change them for different
   users. In this case, the experience can help a lot.

Best,
Liu Jiangang

Yun Gao  于2021年12月10日周五 17:20写道:

> Hi Yingjie,
>
> Very thanks for drafting the FLIP and initiating the discussion!
>
> May I have a double confirmation for
> taskmanager.network.sort-shuffle.min-parallelism that
> since other frameworks like Spark have used sort-based shuffle for all the
> cases, does our
> current circumstance still have difference with them?
>
> Best,
> Yun
>
>
>
>
> --
> From:Yingjie Cao 
> Send Time:2021 Dec. 10 (Fri.) 16:17
> To:dev ; user ; user-zh <
> user-zh@flink.apache.org>
> Subject:Re: [DISCUSS] Change some default config values of blocking shuffle
>
> Hi dev & users:
>
> I have created a FLIP [1] for it, feedbacks are highly appreciated.
>
> Best,
> Yingjie
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability
> Yingjie Cao  于2021年12月3日周五 17:02写道:
>
> Hi dev & users,
>
> We propose to change some default values of blocking shuffle to improve
> the user out-of-box experience (not influence streaming). The default
> values we want to change are as follows:
>
> 1. Data compression
> (taskmanager.network.blocking-shuffle.compression.enabled): Currently, the
> default value is 'false'.  Usually, data compression can reduce both disk
> and network IO which is good for performance. At the same time, it can save
> storage space. We propose to change the default value to true.
>
> 2. Default shuffle implementation
> (taskmanager.network.sort-shuffle.min-parallelism): Currently, the default
> value is 'Integer.MAX', which means by default, Flink jobs will always use
> hash-shuffle. In fact, for high parallelism, sort-shuffle is better for
> both stability and performance. So we propose to reduce the default value
> to a proper smaller one, for example, 128. (We tested 128, 256, 512 and
> 1024 with a tpc-ds and 128 is the best one.)
>
> 3. Read buffer of sort-shuffle
> (taskmanager.memory.framework.off-heap.batch-shuffle.size): Currently, the
> default value is '32M'. Previously, when choosing the default value, both
> ‘32M' and '64M' are OK for tests and we chose the smaller one in a cautious
> way. However, recently, it is reported in the mailing list that the default
> value is not enough which caused a buffer request timeout issue. We already
> created a ticket to improve the behavior. At the same time, we propose to
> increase this default value to '64M' which can also help.
>
> 4. Sort buffer size of sort-shuffle
> (taskmanager.network.sort-shuffle.min-buffers): Currently, the default
> value is '64' which means '64' network buffers (32k per buffer by default).
> This default value is quite modest and the performance can be influenced.
> We propose to increase this value to a larger one, for example, 512 (the
> default TM and network buffer configuration can serve more than 10 result
> partitions concurrently).
>
> We already tested these default values together with tpc-ds benchmark in a
> cluster and both the performance and stability improved a lot. These
> changes can help to improve the out-of-box experience of blocking shuffle.
> What do you think about these changes? Is there any concern? If there are
> no objections, I will make these changes soon.
>
> Best,
> Yingjie
>
>


Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-10 文章 Yun Gao
Hi Yingjie,

Very thanks for drafting the FLIP and initiating the discussion! 

May I have a double confirmation for 
taskmanager.network.sort-shuffle.min-parallelism that
since other frameworks like Spark have used sort-based shuffle for all the 
cases, does our
current circumstance still have difference with them? 

Best,
Yun




--
From:Yingjie Cao 
Send Time:2021 Dec. 10 (Fri.) 16:17
To:dev ; user ; user-zh 

Subject:Re: [DISCUSS] Change some default config values of blocking shuffle

Hi dev & users:

I have created a FLIP [1] for it, feedbacks are highly appreciated.

Best,
Yingjie

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability
Yingjie Cao  于2021年12月3日周五 17:02写道:

Hi dev & users,

We propose to change some default values of blocking shuffle to improve the 
user out-of-box experience (not influence streaming). The default values we 
want to change are as follows:

1. Data compression (taskmanager.network.blocking-shuffle.compression.enabled): 
Currently, the default value is 'false'.  Usually, data compression can reduce 
both disk and network IO which is good for performance. At the same time, it 
can save storage space. We propose to change the default value to true.

2. Default shuffle implementation 
(taskmanager.network.sort-shuffle.min-parallelism): Currently, the default 
value is 'Integer.MAX', which means by default, Flink jobs will always use 
hash-shuffle. In fact, for high parallelism, sort-shuffle is better for both 
stability and performance. So we propose to reduce the default value to a 
proper smaller one, for example, 128. (We tested 128, 256, 512 and 1024 with a 
tpc-ds and 128 is the best one.)

3. Read buffer of sort-shuffle 
(taskmanager.memory.framework.off-heap.batch-shuffle.size): Currently, the 
default value is '32M'. Previously, when choosing the default value, both ‘32M' 
and '64M' are OK for tests and we chose the smaller one in a cautious way. 
However, recently, it is reported in the mailing list that the default value is 
not enough which caused a buffer request timeout issue. We already created a 
ticket to improve the behavior. At the same time, we propose to increase this 
default value to '64M' which can also help.

4. Sort buffer size of sort-shuffle 
(taskmanager.network.sort-shuffle.min-buffers): Currently, the default value is 
'64' which means '64' network buffers (32k per buffer by default). This default 
value is quite modest and the performance can be influenced. We propose to 
increase this value to a larger one, for example, 512 (the default TM and 
network buffer configuration can serve more than 10 result partitions 
concurrently).

We already tested these default values together with tpc-ds benchmark in a 
cluster and both the performance and stability improved a lot. These changes 
can help to improve the out-of-box experience of blocking shuffle. What do you 
think about these changes? Is there any concern? If there are no objections, I 
will make these changes soon.

Best,
Yingjie



Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-10 文章 Yingjie Cao
Hi dev & users:

I have created a FLIP [1] for it, feedbacks are highly appreciated.

Best,
Yingjie

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-199%3A+Change+some+default+config+values+of+blocking+shuffle+for+better+usability

Yingjie Cao  于2021年12月3日周五 17:02写道:

> Hi dev & users,
>
> We propose to change some default values of blocking shuffle to improve
> the user out-of-box experience (not influence streaming). The default
> values we want to change are as follows:
>
> 1. Data compression
> (taskmanager.network.blocking-shuffle.compression.enabled): Currently, the
> default value is 'false'.  Usually, data compression can reduce both disk
> and network IO which is good for performance. At the same time, it can save
> storage space. We propose to change the default value to true.
>
> 2. Default shuffle implementation
> (taskmanager.network.sort-shuffle.min-parallelism): Currently, the default
> value is 'Integer.MAX', which means by default, Flink jobs will always use
> hash-shuffle. In fact, for high parallelism, sort-shuffle is better for
> both stability and performance. So we propose to reduce the default value
> to a proper smaller one, for example, 128. (We tested 128, 256, 512 and
> 1024 with a tpc-ds and 128 is the best one.)
>
> 3. Read buffer of sort-shuffle
> (taskmanager.memory.framework.off-heap.batch-shuffle.size): Currently, the
> default value is '32M'. Previously, when choosing the default value, both
> ‘32M' and '64M' are OK for tests and we chose the smaller one in a cautious
> way. However, recently, it is reported in the mailing list that the default
> value is not enough which caused a buffer request timeout issue. We already
> created a ticket to improve the behavior. At the same time, we propose to
> increase this default value to '64M' which can also help.
>
> 4. Sort buffer size of sort-shuffle
> (taskmanager.network.sort-shuffle.min-buffers): Currently, the default
> value is '64' which means '64' network buffers (32k per buffer by default).
> This default value is quite modest and the performance can be influenced.
> We propose to increase this value to a larger one, for example, 512 (the
> default TM and network buffer configuration can serve more than 10
> result partitions concurrently).
>
> We already tested these default values together with tpc-ds benchmark in a
> cluster and both the performance and stability improved a lot. These
> changes can help to improve the out-of-box experience of blocking shuffle.
> What do you think about these changes? Is there any concern? If there are
> no objections, I will make these changes soon.
>
> Best,
> Yingjie
>