Re: [DISCUSS] FLIP-304: Pluggable failure handling for Apache Flink

2023-03-27 Thread Gen Luo
the interface. +1 for that proposal > > > > > >> > > > > > >> - async operation: I think David is right. An async interface > makes > > > > the > > > > > >> listener implementations more robust when it comes to heavy IO > > >

Re: [DISCUSS] FLIP-304: Pluggable failure handling for Apache Flink

2023-03-21 Thread Gen Luo
Hi Panagiotis, Thanks for the proposal. It's useful to enrich the information so that users can be more clear why the job is failing, especially platform developers who need to provide the information to their end users. And for the very FLIP, I'd prefer the naming `FailureEnricher` proposed by

Re: [DISCUSS]Introduce a time-segment based restart strategy

2022-11-25 Thread Gen Luo
Hi all, Sorry for the late jumping in. To meet Weihua's need, Dong's proposal seems pretty fine, but the modification it requires, I'm afraid, is not really easy. RestartBackoffTimeStrategy is quite a simple interface. The strategy even doesn't know which task is failing, not to mention the

[jira] [Created] (FLINK-29927) AkkaUtils#getAddress may cause memory leak

2022-11-08 Thread Gen Luo (Jira)
Gen Luo created FLINK-29927: --- Summary: AkkaUtils#getAddress may cause memory leak Key: FLINK-29927 URL: https://issues.apache.org/jira/browse/FLINK-29927 Project: Flink Issue Type: Bug

[jira] [Created] (FLINK-28864) DynamicPartitionPruningRule#isNewSource should check if the source used by the DataStreamScanProvider is actually a new sourc

2022-08-08 Thread Gen Luo (Jira)
Gen Luo created FLINK-28864: --- Summary: DynamicPartitionPruningRule#isNewSource should check if the source used by the DataStreamScanProvider is actually a new sourc Key: FLINK-28864 URL: https://issues.apache.org/jira

Re: [DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI

2022-07-20 Thread Gen Luo
mpt id that is > different from the corresponding attempt number in REST, metrics and logs. > It adds burden to users to do the mapping in troubleshooting. Mis-mapping > can be easy to happen and result in a waste of efforts and wrong > conclusion. > > Therefore, +1 for this proposal. >

[DISCUSS] Replace Attempt column with Attempt Number on the subtask list page of the Web UI

2022-07-20 Thread Gen Luo
Hi everyone, I'd like to propose a change on the Web UI to replace the Attempt column with an Attempt Number column on the subtask list page. >From the very beginning, the attempt number shown is calculated at the frontend by subtask.attempt + 1, which means the attempt number shown on the web

[jira] [Created] (FLINK-28589) Enhance Web UI for Speculative Execution

2022-07-18 Thread Gen Luo (Jira)
Gen Luo created FLINK-28589: --- Summary: Enhance Web UI for Speculative Execution Key: FLINK-28589 URL: https://issues.apache.org/jira/browse/FLINK-28589 Project: Flink Issue Type: Sub-task

[jira] [Created] (FLINK-28588) Enhance REST API for Speculative Execution

2022-07-18 Thread Gen Luo (Jira)
Gen Luo created FLINK-28588: --- Summary: Enhance REST API for Speculative Execution Key: FLINK-28588 URL: https://issues.apache.org/jira/browse/FLINK-28588 Project: Flink Issue Type: Sub-task

[jira] [Created] (FLINK-28587) FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-18 Thread Gen Luo (Jira)
Gen Luo created FLINK-28587: --- Summary: FLIP-249: Flink Web UI Enhancement for Speculative Execution Key: FLINK-28587 URL: https://issues.apache.org/jira/browse/FLINK-28587 Project: Flink Issue

[RESULT][VOTE] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-15 Thread Gen Luo
Hi everyone, I’m happy to announce that FLIP-249[1] has been accepted, with 4 approving votes, 3 of which are binding[2]: - Zhu Zhu (binding) - Lijie Wang - Jing Zhang (binding) - Yun Gao (binding) There is no disapproving vote. [1]

Re: [DISCUSS] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-13 Thread Gen Luo
ated operation to check all vertex and all subtasks list > page. > It's better to have an easier way to know whether the job contains > speculative executions > even after the job finished. > Maybe the point could be took into consideration in the next version. > > Best, &

Re: [VOTE] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-13 Thread Gen Luo
Hi Jing, I have replied in the discussion thread about the questions. Hope that would be helpful. Best, Gen On Tue, Jul 12, 2022 at 8:43 PM Jing Zhang wrote: > Hi, Gen Luo, > > I left two minor questions in the DISCUSS thread. > Sorry for jumping into the discussion so late. >

Re: [DISCUSS] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-13 Thread Gen Luo
ow to know whether the job contains speculative execution instances > after the job finished? > Do we have to check each subtasks of all vertex one by one? > > Best, > Jing Zhang > > Gen Luo 于2022年7月11日周一 22:31写道: > > > Hi, everyone. > > > > Thanks for you

[VOTE] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-11 Thread Gen Luo
Hi everyone, Thanks for all the feedback so far. Based on the discussion [1], we seem to have consensus. So, I would like to start a vote on FLIP-249 [2]. The vote will last for at least 72 hours unless there is an objection or insufficient votes. [1]

Re: [DISCUSS] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-11 Thread Gen Luo
Hi, everyone. Thanks for your feedback. If there are no more concerns or comments, I will start the vote tomorrow. Gen Luo 于 2022年7月11日周一 11:12写道: > Hi Lijie and Zhu, > > Thanks for the suggestion. I agree that the name "Blocked Free Slots" is > more clear to users. >

Re: [DISCUSS] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-10 Thread Gen Luo
cked, I think it's OK to call this state "available". > > 2. free and blocked, I think it's not appropriate to call "blocked" > > directly, because "blocked" should include both the "free and blocked" > and > > "in-use and bl

Re: [DISCUSS] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-08 Thread Gen Luo
The set will have one only element in > non-speculative cases though. In this way, we can have a unified > processing for ArchivedExecutionVertex in speculative/non-speculative > cases. > > Thanks, > Zhu > > Gen Luo 于2022年7月5日周二 15:10写道: > > > > > Hi everyone,

[DISCUSS] FLIP-249: Flink Web UI Enhancement for Speculative Execution

2022-07-05 Thread Gen Luo
Hi everyone, The speculative execution for batch jobs has been proposed and accepted in FLIP-168[1], as well as the related blocklist mechanism in FLIP-224[2]. As a follow-up step, the Flink Web UI needs to be enhanced to display the related information if the speculative execution mechanism is

[jira] [Created] (FLINK-28240) NettyShuffleMetricFactory#RequestedMemoryUsageMetric#getValue may throw ArithmeticException when the total segments of NetworkBufferPool is 0

2022-06-24 Thread Gen Luo (Jira)
Gen Luo created FLINK-28240: --- Summary: NettyShuffleMetricFactory#RequestedMemoryUsageMetric#getValue may throw ArithmeticException when the total segments of NetworkBufferPool is 0 Key: FLINK-28240 URL: https

Re: [VOTE] FLIP-232: Add Retry Support For Async I/O In DataStream API

2022-05-30 Thread Gen Luo
+1 (non-binding) On Mon, May 30, 2022 at 3:50 PM Jark Wu wrote: > +1 (binding) > > Best, > Jark > > On Mon, 30 May 2022 at 15:40, Lincoln Lee wrote: > > > Dear Flink developers, > > > > Thanks for all your feedback for FLIP-232: Add Retry Support For Async > I/O > > In DataStream API[1] on the

Re: [DISCUSS] FLIP-232: Add Retry Support For Async I/O In DataStream API

2022-05-23 Thread Gen Luo
: > Thanks Gen Luo! > > Agree with you that prefer the simpler design. > > I’d like to share my thoughts on this choice: whether store the retry state > or not only affect the recovery logic, not the per-record processing, so I > just compare the two: > 1. w/ retry state: simple rec

Re: [DISCUSS] FLIP-232: Add Retry Support For Async I/O In DataStream API

2022-05-23 Thread Gen Luo
, in which the retry state is still possible to add if the need really arises in the future, but I respect your decision. 2. I think adding a currentAttempts parameter to the method is good enough. Lincoln Lee 于 2022年5月23日周一 14:52写道: > Hi Gen Luo, > Thanks a lot for your feedback!

Re: [DISCUSS] FLIP-232: Add Retry Support For Async I/O In DataStream API

2022-05-22 Thread Gen Luo
Thank Lincoln for the proposal! The FLIP looks good to me. I'm in favor of the timer based implementation, and I'd like to share some thoughts. I'm thinking if we have to store the retry status in the state. I suppose the retrying requests can just submit as the first attempt when the job

[jira] [Created] (FLINK-26610) FileSink can not upgrade from 1.13 if the uid of the origin sink is not set.

2022-03-11 Thread Gen Luo (Jira)
Gen Luo created FLINK-26610: --- Summary: FileSink can not upgrade from 1.13 if the uid of the origin sink is not set. Key: FLINK-26610 URL: https://issues.apache.org/jira/browse/FLINK-26610 Project: Flink

[jira] [Created] (FLINK-26580) FileSink CompactCoordinator add illegal committable as toCompacted.

2022-03-10 Thread Gen Luo (Jira)
Gen Luo created FLINK-26580: --- Summary: FileSink CompactCoordinator add illegal committable as toCompacted. Key: FLINK-26580 URL: https://issues.apache.org/jira/browse/FLINK-26580 Project: Flink

[jira] [Created] (FLINK-26564) CompactCoordinatorStateHandler doesn't properly handle the cleanup-in-progress requests.

2022-03-09 Thread Gen Luo (Jira)
Gen Luo created FLINK-26564: --- Summary: CompactCoordinatorStateHandler doesn't properly handle the cleanup-in-progress requests. Key: FLINK-26564 URL: https://issues.apache.org/jira/browse/FLINK-26564

[jira] [Created] (FLINK-26440) CompactorOperatorStateHandler can not work with unaligned checkpoint

2022-03-01 Thread Gen Luo (Jira)
Gen Luo created FLINK-26440: --- Summary: CompactorOperatorStateHandler can not work with unaligned checkpoint Key: FLINK-26440 URL: https://issues.apache.org/jira/browse/FLINK-26440 Project: Flink

[jira] [Created] (FLINK-26394) CheckpointCoordinator.isTriggering can not be reset if a checkpoint expires while the checkpointCoordinator task is queuing in the SourceCoordinator executor.

2022-02-28 Thread Gen Luo (Jira)
Gen Luo created FLINK-26394: --- Summary: CheckpointCoordinator.isTriggering can not be reset if a checkpoint expires while the checkpointCoordinator task is queuing in the SourceCoordinator executor. Key: FLINK-26394

[jira] [Created] (FLINK-26235) CompactingFileWriter and PendingFileRecoverable should not be exposed to users.

2022-02-17 Thread Gen Luo (Jira)
Gen Luo created FLINK-26235: --- Summary: CompactingFileWriter and PendingFileRecoverable should not be exposed to users. Key: FLINK-26235 URL: https://issues.apache.org/jira/browse/FLINK-26235 Project: Flink

[jira] [Created] (FLINK-26180) Update docs to introduce the compaction for FileSink

2022-02-16 Thread Gen Luo (Jira)
Gen Luo created FLINK-26180: --- Summary: Update docs to introduce the compaction for FileSink Key: FLINK-26180 URL: https://issues.apache.org/jira/browse/FLINK-26180 Project: Flink Issue Type: Sub

Re: [DISCUSS] Checkpointing (partially) failing jobs

2022-02-08 Thread Gen Luo
> Best, >> Piotrek >> >> wt., 8 lut 2022 o 11:44 Chesnay Schepler napisał(a): >> >> > Could someone expand on these operational issues you're facing when >> > achieving this via separate jobs? >> > >> > I feel like we're skipping a st

Re: [DISCUSS] Checkpointing (partially) failing jobs

2022-02-08 Thread Gen Luo
e no different than no data processed between CP8 and CP10" > > 2. I've noticed that from this question there is a gap between > "*allow aborted/failed checkpoint in independent sub-graph*" and > my intention: "*independent sub-graph checkpointing indepently*" >

Re: [DISCUSS] Checkpointing (partially) failing jobs

2022-02-07 Thread Gen Luo
think that a pipelined region that is failed > and > > > > cannot create a new checkpoint is more or less the same as a > pipelined > > > > region that didn't get new input or a very very slow pipelined region > > > which > > > > couldn't read new

Re: [DISCUSS] Checkpointing (partially) failing jobs

2022-02-07 Thread Gen Luo
gt; Cheers, > Till > > On Mon, Feb 7, 2022 at 8:55 AM Gen Luo wrote: > > > Hi Gyula, > > > > Thanks for sharing the idea. As Yuan mentioned, I think we can discuss > this > > within two scopes. One is the job subgraph, the other is the execution > > subg

Re: [DISCUSS] Checkpointing (partially) failing jobs

2022-02-06 Thread Gen Luo
Hi Gyula, Thanks for sharing the idea. As Yuan mentioned, I think we can discuss this within two scopes. One is the job subgraph, the other is the execution subgraph, which I suppose is the same as PipelineRegion. An idea is to individually checkpoint the PipelineRegions, for the recovering in a

Re: [VOTE] FLIP-205: Support cache in DataStream for Batch Processing

2022-01-20 Thread Gen Luo
+1 (non-binding) On Thu, Jan 20, 2022 at 3:26 PM Yun Gao wrote: > +1 (binding) > > Thanks Xuannan for driving this! > > Best, > Yun > > > -- > From:David Morávek > Send Time:2022 Jan. 20 (Thu.) 15:12 > To:dev > Subject:Re: [VOTE]

Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

2022-01-06 Thread Gen Luo
he > > Flink application consists of multiple batch jobs and the batch jobs > > share some intermediate results, so users can use cache to avoid > > re-computation. The intermediate result is not meaningful outside of > > the application. And the cache will be discarded a

Re: [DISCUSS] FLIP-205: Support cache in DataStream for Batch Processing

2021-12-30 Thread Gen Luo
Hi Xuannan, I found FLIP-188[1] that is aiming to introduce a built-in dynamic table storage, which provides a unified changelog & table representation. Tables stored there can be used in further ad-hoc queries. To my understanding, it's quite like an implementation of caching in Table API, and

[jira] [Created] (FLINK-24965) Improper usage of Map.Entry after Entry Iterator.remove in TaskLocaStateStoreImpl#pruneCheckpoints

2021-11-19 Thread Gen Luo (Jira)
Gen Luo created FLINK-24965: --- Summary: Improper usage of Map.Entry after Entry Iterator.remove in TaskLocaStateStoreImpl#pruneCheckpoints Key: FLINK-24965 URL: https://issues.apache.org/jira/browse/FLINK-24965

Re: [DISCUSS] FLIP-185: Shorter heartbeat timeout and interval default values

2021-07-22 Thread Gen Luo
system >> is >> > under heavy load they may block more than a few seconds, and having our >> app >> > killed because of a short timeout is not an option. >> > >> > >> > >> > That’s why I’m not in favor of very short timeouts… Because

Re: [DISCUSS] FLIP-185: Shorter heartbeat timeout and interval default values

2021-07-21 Thread Gen Luo
Hi, Thanks for driving this @Till Rohrmann . I would give +1 on reducing the heartbeat timeout and interval, though I'm not sure if 15s and 3s would be enough either. IMO, except for the standalone cluster, where the heartbeat mechanism in Flink is totally relied, reducing the heartbeat can also

Re: Job Recovery Time on TM Lost

2021-07-08 Thread Gen Luo
by task's report and jobmaster's probe. When >> a task fails because of the connection, it reports to the jobmaster. The >> jobmaster will try to confirm the liveness of the unconnected >> taskmanager for certain times by config. If the jobmaster find the >> taskmanager unconnected o

Re: Job Recovery Time on TM Lost

2021-07-06 Thread Gen Luo
could say that one can disable reacting to a failed heartbeat RPC as it > is currently the case. > > We currently have a discussion about this on this PR [1]. Maybe you wanna > join the discussion there and share your insights. > > [1] https://github.com/apache/flink/pull/16357 > > C

Re: Job Recovery Time on TM Lost

2021-07-05 Thread Gen Luo
mote system (e.g. a couple of lost heartbeat > messages). > > Cheers, > Till > > On Mon, Jul 5, 2021 at 10:00 AM Gen Luo wrote: > >> As far as I know, a TM will report connection failure once its connected >> TM is lost. I suppose JM can believe the report and fail the

Re: Job Recovery Time on TM Lost

2021-07-05 Thread Gen Luo
t as the heartbeat interval. > > [1] https://issues.apache.org/jira/browse/FLINK-23209 > > Cheers, > Till > > On Fri, Jul 2, 2021 at 9:33 AM Gen Luo wrote: > >> Thanks for sharing, Till and Yang. >> >> @Lu >> Sorry but I don't know how to explain the new te

Re: Job Recovery Time on TM Lost

2021-07-02 Thread Gen Luo
t.timeout` time before the system recovers. Even >>>>>> if >>>>>> the cancellation happens fast (e.g. by having configured a low >>>>>> akka.ask.timeout), then Flink will still try to deploy tasks onto the >>>>>>

[jira] [Created] (FLINK-23216) RM keeps allocating and freeing slots after a TM lost until its heartbeat timeout

2021-07-02 Thread Gen Luo (Jira)
Gen Luo created FLINK-23216: --- Summary: RM keeps allocating and freeing slots after a TM lost until its heartbeat timeout Key: FLINK-23216 URL: https://issues.apache.org/jira/browse/FLINK-23216 Project

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-06-17 Thread Gen Luo
epending on the > > > replies, we will decide if we shall delete it in Flink1.9 or > > > deprecate in the next release after 1.9. > > > > > > [1] > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-fli

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-21 Thread Gen Luo
; > I hope I summarised our discussion correctly. > > > On 17. May 2019, at 12:20, Gen Luo wrote: > > > > Thanks for your reply. > > > > For the first question, it's not strictly necessary. But I perfer not to > > have a TableEnvironment argument in Estimator.fit()

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-17 Thread Gen Luo
Thanks for your reply. For the first question, it's not strictly necessary. But I perfer not to have a TableEnvironment argument in Estimator.fit() or Transformer.transform(), which is not part of machine learning concept, and may make our API not as clean and pretty as other systems do. I would

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

2019-05-17 Thread Gen Luo
It's better not to depend on flink-table-planner indeed. It's currently needed for 3 points: registering udagg, judging the tableEnv batch or streaming, converting table to dataSet to collect data. Most of these requirements can be fulfilled by flink-table-api-java-bridge and

Apply for contributor permission

2019-05-09 Thread Gen Luo
Hi, Could someone add me as a contributor? My JIRA username is c4e. Thanks!