[ https://issues.apache.org/jira/browse/FLINK-29617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17616855#comment-17616855 ]
luoyuxia edited comment on FLINK-29617 at 10/13/22 9:06 AM: ------------------------------------------------------------ [~dangshazi] Thanks for raising it and detail explanation. I'll be much appreciated that you can take the ticket. If you don't have time, maybe I can help take it. I'm fine with these two suggestions. But prefer suggestion 2 since suggestion 1 will bring new option which user may hardly know it. I have one question, have you ever tried with these suggestions? If so, what's the improvement of these two suggestions? Btw, the images uploaded is . Could you please upload them again? was (Author: luoyuxia): [~dangshazi] Thanks for raising it and detail explanation. I'll be much appreciated that you can take the ticket. I'm fine with these two suggestions. But prefer suggestion 2 since suggestion 1 will bring new option which user may hardly know it. I have one question, have you ever tried with these suggestions? If so, what's the improvement of these two suggestions? Btw, the images uploaded is . Could you please upload them again? > Cost too much time to start SourceCoordinator of hdfsFileSource when start > JobMaster > ------------------------------------------------------------------------------------ > > Key: FLINK-29617 > URL: https://issues.apache.org/jira/browse/FLINK-29617 > Project: Flink > Issue Type: Improvement > Components: Connectors / FileSystem, Runtime / Coordination > Affects Versions: 1.15.2 > Reporter: LI Mingkun > Priority: Major > Labels: coordination, file-system > > h1. Scenario: > Our user use flink batch to compact small files in one day. Flink version : > 1.15 > He split pipeline into 24 for each hour. So there are 24 source > > I find it costs too much time to start SourceCoordinator of hdfsFileSource > when start JobMaster > > as follow: > > !https://mail.google.com/mail/u/0?ui=2&ik=488d9ac3dd&attid=0.1&permmsgid=msg-a:r-3013789195315215531&th=183cb292e567fd9f&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9SVAoAslMUGQdVQJ_ccmEf4LxhaONYKJvS_V8nvijvT3JXw_VlyRBAEE9EQhTtWdYPa4TLCO5rxjXGrTDK2_PGHX4RZDPTQTJ0LwKXAUr4BYlMhYZsjcrY9eo&disp=emb&realattid=ii_l95bh7qy0|width=542,height=260! > > h1. Root Cause: > I got the root cause after check: > # AbstractFileSource will enumerateSplits when createEnumerator > # NotSplittingRecursiveEnumerator need to get fileblockLocation of every > fileblock which is a heavy IO operation > !https://mail.google.com/mail/u/0?ui=2&ik=488d9ac3dd&attid=0.3&permmsgid=msg-a:r-3013789195315215531&th=183cb292e567fd9f&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ8AoT071eCNMb_q3uJtcbrUmZnYbg3ucnDelMlRRPn7WLlXOBGj650srQk9vhqKyJEANvpOWoxHuH6jNHt7g6go8JkeRUZKc81yqT0yzzz7tbBciTe-YnRVQ7w&disp=emb&realattid=ii_l95bp1832|width=542,height=456! > > !https://mail.google.com/mail/u/0?ui=2&ik=488d9ac3dd&attid=0.2&permmsgid=msg-a:r-3013789195315215531&th=183cb292e567fd9f&view=fimg&fur=ip&sz=s0-l75-ft&attbid=ANGjdJ9phsX1nauTsx3xWje_YJM4uUaOLXKHcXKsm7WJquPQQGC7bQTni3OhQB5HtGYVOvrD-3Kbp9LURfUj6OiIUgsZU1AImSL0vj27cnDcf7HpVpLpaqdADtpoABU&disp=emb&realattid=ii_l95bjh1g1|width=526,height=542! > > h1. Suggestion > # FileSource add option to disable location fetcher > # Move location fetcher into IOExecutor -- This message was sent by Atlassian Jira (v8.20.10#820010)