the interface. +1 for that proposal
> > > > > >>
> > > > > >> - async operation: I think David is right. An async interface
> makes
> > > > the
> > > > > >> listener implementations more robust when it comes to heavy IO
> > >
Hi Panagiotis,
Thanks for the proposal.
It's useful to enrich the information so that users can be more
clear why the job is failing, especially platform developers who
need to provide the information to their end users.
And for the very FLIP, I'd prefer the naming `FailureEnricher`
proposed by
Hi all,
Sorry for the late jumping in.
To meet Weihua's need, Dong's proposal seems pretty fine, but the
modification it requires, I'm afraid, is not really easy.
RestartBackoffTimeStrategy is quite a simple interface. The strategy even
doesn't know which task is failing, not to mention the
Gen Luo created FLINK-29927:
---
Summary: AkkaUtils#getAddress may cause memory leak
Key: FLINK-29927
URL: https://issues.apache.org/jira/browse/FLINK-29927
Project: Flink
Issue Type: Bug
Gen Luo created FLINK-28864:
---
Summary: DynamicPartitionPruningRule#isNewSource should check if
the source used by the DataStreamScanProvider is actually a new sourc
Key: FLINK-28864
URL: https://issues.apache.org/jira
mpt id that is
> different from the corresponding attempt number in REST, metrics and logs.
> It adds burden to users to do the mapping in troubleshooting. Mis-mapping
> can be easy to happen and result in a waste of efforts and wrong
> conclusion.
>
> Therefore, +1 for this proposal.
>
Hi everyone,
I'd like to propose a change on the Web UI to replace the Attempt column
with an Attempt Number column on the subtask list page.
>From the very beginning, the attempt number shown is calculated at the
frontend by subtask.attempt + 1, which means the attempt number shown on
the web
Gen Luo created FLINK-28589:
---
Summary: Enhance Web UI for Speculative Execution
Key: FLINK-28589
URL: https://issues.apache.org/jira/browse/FLINK-28589
Project: Flink
Issue Type: Sub-task
Gen Luo created FLINK-28588:
---
Summary: Enhance REST API for Speculative Execution
Key: FLINK-28588
URL: https://issues.apache.org/jira/browse/FLINK-28588
Project: Flink
Issue Type: Sub-task
Gen Luo created FLINK-28587:
---
Summary: FLIP-249: Flink Web UI Enhancement for Speculative
Execution
Key: FLINK-28587
URL: https://issues.apache.org/jira/browse/FLINK-28587
Project: Flink
Issue
Hi everyone,
I’m happy to announce that FLIP-249[1] has been accepted, with 4 approving
votes, 3 of which are binding[2]:
- Zhu Zhu (binding)
- Lijie Wang
- Jing Zhang (binding)
- Yun Gao (binding)
There is no disapproving vote.
[1]
ated operation to check all vertex and all subtasks list
> page.
> It's better to have an easier way to know whether the job contains
> speculative executions
> even after the job finished.
> Maybe the point could be took into consideration in the next version.
>
> Best,
&
Hi Jing,
I have replied in the discussion thread about the questions. Hope that
would be helpful.
Best,
Gen
On Tue, Jul 12, 2022 at 8:43 PM Jing Zhang wrote:
> Hi, Gen Luo,
>
> I left two minor questions in the DISCUSS thread.
> Sorry for jumping into the discussion so late.
>
ow to know whether the job contains speculative execution instances
> after the job finished?
> Do we have to check each subtasks of all vertex one by one?
>
> Best,
> Jing Zhang
>
> Gen Luo 于2022年7月11日周一 22:31写道:
>
> > Hi, everyone.
> >
> > Thanks for you
Hi everyone,
Thanks for all the feedback so far. Based on the discussion [1], we seem to
have consensus. So, I would like to start a vote on FLIP-249 [2].
The vote will last for at least 72 hours unless there is an objection or
insufficient votes.
[1]
Hi, everyone.
Thanks for your feedback.
If there are no more concerns or comments, I will start the vote tomorrow.
Gen Luo 于 2022年7月11日周一 11:12写道:
> Hi Lijie and Zhu,
>
> Thanks for the suggestion. I agree that the name "Blocked Free Slots" is
> more clear to users.
>
cked, I think it's OK to call this state "available".
> > 2. free and blocked, I think it's not appropriate to call "blocked"
> > directly, because "blocked" should include both the "free and blocked"
> and
> > "in-use and bl
The set will have one only element in
> non-speculative cases though. In this way, we can have a unified
> processing for ArchivedExecutionVertex in speculative/non-speculative
> cases.
>
> Thanks,
> Zhu
>
> Gen Luo 于2022年7月5日周二 15:10写道:
>
> >
> > Hi everyone,
Hi everyone,
The speculative execution for batch jobs has been proposed and accepted in
FLIP-168[1], as well as the related blocklist mechanism in FLIP-224[2]. As
a follow-up step, the Flink Web UI needs to be enhanced to display the
related information if the speculative execution mechanism is
Gen Luo created FLINK-28240:
---
Summary:
NettyShuffleMetricFactory#RequestedMemoryUsageMetric#getValue may throw
ArithmeticException when the total segments of NetworkBufferPool is 0
Key: FLINK-28240
URL: https
+1 (non-binding)
On Mon, May 30, 2022 at 3:50 PM Jark Wu wrote:
> +1 (binding)
>
> Best,
> Jark
>
> On Mon, 30 May 2022 at 15:40, Lincoln Lee wrote:
>
> > Dear Flink developers,
> >
> > Thanks for all your feedback for FLIP-232: Add Retry Support For Async
> I/O
> > In DataStream API[1] on the
:
> Thanks Gen Luo!
>
> Agree with you that prefer the simpler design.
>
> I’d like to share my thoughts on this choice: whether store the retry state
> or not only affect the recovery logic, not the per-record processing, so I
> just compare the two:
> 1. w/ retry state: simple rec
, in which the retry state is
still possible to add if the need really arises in the future, but I
respect your decision.
2. I think adding a currentAttempts parameter to the method is good enough.
Lincoln Lee 于 2022年5月23日周一 14:52写道:
> Hi Gen Luo,
> Thanks a lot for your feedback!
Thank Lincoln for the proposal!
The FLIP looks good to me. I'm in favor of the timer based implementation,
and I'd like to share some thoughts.
I'm thinking if we have to store the retry status in the state. I suppose
the retrying requests can just submit as the first attempt when the job
Gen Luo created FLINK-26610:
---
Summary: FileSink can not upgrade from 1.13 if the uid of the
origin sink is not set.
Key: FLINK-26610
URL: https://issues.apache.org/jira/browse/FLINK-26610
Project: Flink
Gen Luo created FLINK-26580:
---
Summary: FileSink CompactCoordinator add illegal committable as
toCompacted.
Key: FLINK-26580
URL: https://issues.apache.org/jira/browse/FLINK-26580
Project: Flink
Gen Luo created FLINK-26564:
---
Summary: CompactCoordinatorStateHandler doesn't properly handle
the cleanup-in-progress requests.
Key: FLINK-26564
URL: https://issues.apache.org/jira/browse/FLINK-26564
Gen Luo created FLINK-26440:
---
Summary: CompactorOperatorStateHandler can not work with unaligned
checkpoint
Key: FLINK-26440
URL: https://issues.apache.org/jira/browse/FLINK-26440
Project: Flink
Gen Luo created FLINK-26394:
---
Summary: CheckpointCoordinator.isTriggering can not be reset if a
checkpoint expires while the checkpointCoordinator task is queuing in the
SourceCoordinator executor.
Key: FLINK-26394
Gen Luo created FLINK-26235:
---
Summary: CompactingFileWriter and PendingFileRecoverable should
not be exposed to users.
Key: FLINK-26235
URL: https://issues.apache.org/jira/browse/FLINK-26235
Project: Flink
Gen Luo created FLINK-26180:
---
Summary: Update docs to introduce the compaction for FileSink
Key: FLINK-26180
URL: https://issues.apache.org/jira/browse/FLINK-26180
Project: Flink
Issue Type: Sub
> Best,
>> Piotrek
>>
>> wt., 8 lut 2022 o 11:44 Chesnay Schepler napisał(a):
>>
>> > Could someone expand on these operational issues you're facing when
>> > achieving this via separate jobs?
>> >
>> > I feel like we're skipping a st
e no different than no data processed between CP8 and CP10"
>
> 2. I've noticed that from this question there is a gap between
> "*allow aborted/failed checkpoint in independent sub-graph*" and
> my intention: "*independent sub-graph checkpointing indepently*"
>
think that a pipelined region that is failed
> and
> > > > cannot create a new checkpoint is more or less the same as a
> pipelined
> > > > region that didn't get new input or a very very slow pipelined region
> > > which
> > > > couldn't read new
gt; Cheers,
> Till
>
> On Mon, Feb 7, 2022 at 8:55 AM Gen Luo wrote:
>
> > Hi Gyula,
> >
> > Thanks for sharing the idea. As Yuan mentioned, I think we can discuss
> this
> > within two scopes. One is the job subgraph, the other is the execution
> > subg
Hi Gyula,
Thanks for sharing the idea. As Yuan mentioned, I think we can discuss this
within two scopes. One is the job subgraph, the other is the execution
subgraph, which I suppose is the same as PipelineRegion.
An idea is to individually checkpoint the PipelineRegions, for the
recovering in a
+1 (non-binding)
On Thu, Jan 20, 2022 at 3:26 PM Yun Gao
wrote:
> +1 (binding)
>
> Thanks Xuannan for driving this!
>
> Best,
> Yun
>
>
> --
> From:David Morávek
> Send Time:2022 Jan. 20 (Thu.) 15:12
> To:dev
> Subject:Re: [VOTE]
he
> > Flink application consists of multiple batch jobs and the batch jobs
> > share some intermediate results, so users can use cache to avoid
> > re-computation. The intermediate result is not meaningful outside of
> > the application. And the cache will be discarded a
Hi Xuannan,
I found FLIP-188[1] that is aiming to introduce a built-in dynamic table
storage, which provides a unified changelog & table representation. Tables
stored there can be used in further ad-hoc queries. To my understanding,
it's quite like an implementation of caching in Table API, and
Gen Luo created FLINK-24965:
---
Summary: Improper usage of Map.Entry after Entry Iterator.remove
in TaskLocaStateStoreImpl#pruneCheckpoints
Key: FLINK-24965
URL: https://issues.apache.org/jira/browse/FLINK-24965
system
>> is
>> > under heavy load they may block more than a few seconds, and having our
>> app
>> > killed because of a short timeout is not an option.
>> >
>> >
>> >
>> > That’s why I’m not in favor of very short timeouts… Because
Hi,
Thanks for driving this @Till Rohrmann . I would
give +1 on reducing the heartbeat timeout and interval, though I'm not sure
if 15s and 3s would be enough either.
IMO, except for the standalone cluster, where the heartbeat mechanism in
Flink is totally relied, reducing the heartbeat can also
by task's report and jobmaster's probe. When
>> a task fails because of the connection, it reports to the jobmaster. The
>> jobmaster will try to confirm the liveness of the unconnected
>> taskmanager for certain times by config. If the jobmaster find the
>> taskmanager unconnected o
could say that one can disable reacting to a failed heartbeat RPC as it
> is currently the case.
>
> We currently have a discussion about this on this PR [1]. Maybe you wanna
> join the discussion there and share your insights.
>
> [1] https://github.com/apache/flink/pull/16357
>
> C
mote system (e.g. a couple of lost heartbeat
> messages).
>
> Cheers,
> Till
>
> On Mon, Jul 5, 2021 at 10:00 AM Gen Luo wrote:
>
>> As far as I know, a TM will report connection failure once its connected
>> TM is lost. I suppose JM can believe the report and fail the
t as the heartbeat interval.
>
> [1] https://issues.apache.org/jira/browse/FLINK-23209
>
> Cheers,
> Till
>
> On Fri, Jul 2, 2021 at 9:33 AM Gen Luo wrote:
>
>> Thanks for sharing, Till and Yang.
>>
>> @Lu
>> Sorry but I don't know how to explain the new te
t.timeout` time before the system recovers. Even
>>>>>> if
>>>>>> the cancellation happens fast (e.g. by having configured a low
>>>>>> akka.ask.timeout), then Flink will still try to deploy tasks onto the
>>>>>>
Gen Luo created FLINK-23216:
---
Summary: RM keeps allocating and freeing slots after a TM lost
until its heartbeat timeout
Key: FLINK-23216
URL: https://issues.apache.org/jira/browse/FLINK-23216
Project
epending on the
> > > replies, we will decide if we shall delete it in Flink1.9 or
> > > deprecate in the next release after 1.9.
> > >
> > > [1]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-fli
;
> I hope I summarised our discussion correctly.
>
> > On 17. May 2019, at 12:20, Gen Luo wrote:
> >
> > Thanks for your reply.
> >
> > For the first question, it's not strictly necessary. But I perfer not to
> > have a TableEnvironment argument in Estimator.fit()
Thanks for your reply.
For the first question, it's not strictly necessary. But I perfer not to
have a TableEnvironment argument in Estimator.fit() or
Transformer.transform(), which is not part of machine learning concept, and
may make our API not as clean and pretty as other systems do. I would
It's better not to depend on flink-table-planner indeed. It's currently
needed for 3 points: registering udagg, judging the tableEnv batch or
streaming, converting table to dataSet to collect data. Most of these
requirements can be fulfilled by flink-table-api-java-bridge and
Hi,
Could someone add me as a contributor? My JIRA username is c4e.
Thanks!
53 matches
Mail list logo