subject:"pyflink query 语句执行获取数据速度很慢，where子句不过滤数据么？"

Re: pyflink query 语句执行获取数据速度很慢，where子句不过滤数据么？

2020-12-23 文章 r pp

表a 在 sql 语句的哪里呢？
关心的真的是过滤问题么？ 如果你对你的业务十分熟悉，且了解到 flink1.11 不过 过滤，那为什么 不自行过滤 优化下呢？
如果，不是过滤问题，是大数 join 小数 问题，或者 大数 join 大数问题，是不是可以考虑 广播传播 或者 并行度 的优化方向？

是不是应该 先分析好业务问题，在去看 flink1.12 能否解决问题。

肖越 <18242988...@163.com> 于2020年12月24日周四 上午11:16写道：

> connector 从数据库读取整张表格，执行：
> env.sql_query("select a , b, c from table1 left join table2 on a = d where
> b = '103' and c = '203' and e = 'AC' and a between 20160701 and 20170307
> order a")
> 其中表 a 的数据量很大，能有1千万条，但匹配出来的数据只有250条，本机执行要10分钟~
> 了解到 flink 1.11存在where子句不会先过滤数据，请问flink1.12 仍存在这个问题么？怎么优化呢？

pyflink query 语句执行获取数据速度很慢，where子句不过滤数据么？

2020-12-23 文章肖越

connector 从数据库读取整张表格，执行：
env.sql_query("select a , b, c from table1 left join table2 on a = d where b = 
'103' and c = '203' and e = 'AC' and a between 20160701 and 20170307 order a")
其中表 a 的数据量很大，能有1千万条，但匹配出来的数据只有250条，本机执行要10分钟~
了解到 flink 1.11存在where子句不会先过滤数据，请问flink1.12 仍存在这个问题么？怎么优化呢？

pyflink query 语句执行获取数据速度很慢，where子句不过滤数据么？

2020-12-23 文章肖越

connector 从数据库读取整张表格，执行：
env.sql_query("select a , b, c from table1 left join table2 on a = d where b = 
'103' and c = '203' and e = 'AC' and a between 20160701 and 20170307 order by 
biz_date")
其中表 a 的数据量很大，能有1千万条，但匹配出来的数据只有250条，本机执行要10分钟！
了解到 flink 1.11存在where子句不会先过滤数据，请问flink1.12 仍存在这个问题么？怎么优化呢？

Re: pyflink query 语句执行获取数据速度很慢，where子句不过滤数据么？

pyflink query 语句执行获取数据速度很慢，where子句不过滤数据么？

pyflink query 语句执行获取数据速度很慢，where子句不过滤数据么？

3 matches

Site Navigation

Mail list logo

Footer information