[jira] [Created] (FLINK-23355) Use Flink SQL ,How to implement insert into seq to multi sinks?
Bo Wang created FLINK-23355: --- Summary: Use Flink SQL ,How to implement insert into seq to multi sinks? Key: FLINK-23355 URL: https://issues.apache.org/jira/browse/FLINK-23355 Project: Flink Issue Type: Technical Debt Components: Table SQL / API Affects Versions: 1.12.0 Reporter: Bo Wang Now, I want to use Flink SQL implement insert to multi sinks in seq, such as I have one Kafka Table source_table_one and have two sink table , db table sink_db_one, kafka table sink_kafka_two. I want to implement when the element from source_table_one arriavd, first insert into table sink_db_one select * from source_table_one when element be inserted, second stage insert into table sink_kafka_two select * from source_table_one。 the web system will listen the sink_kafka_two topic and query data from sink_db_one table. anyone have some idea to implement this function ? -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [ANNOUNCE] Kete Young is now part of the Flink PMC
Congratulations Kurt! Best, Bo WANG On Tue, Jul 23, 2019 at 5:24 PM Robert Metzger wrote: > Hi all, > > On behalf of the Flink PMC, I'm happy to announce that Kete Young is now > part of the Apache Flink Project Management Committee (PMC). > > Kete has been a committer since February 2017, working a lot on Table API / > SQL. He's currently co-managing the 1.9 release! Thanks a lot for your work > for Flink! > > Congratulations & Welcome Kurt! > > Best, > Robert >
Re: [ANNOUNCE] Zhijiang Wang has been added as a committer to the Flink project
Congratulations Zhijiang! Best, Bo WANG On Mon, Jul 22, 2019 at 10:12 PM Robert Metzger wrote: > Hey all, > > We've added another committer to the Flink project: Zhijiang Wang. > > Congratulations Zhijiang! > > Best, > Robert > (on behalf of the Flink PMC) >
Re: [ANNOUNCE] Jincheng Sun is now part of the Flink PMC
Congratulations Jincheng! Well deserved! On Tue, Jun 25, 2019 at 10:10 AM Kurt Young wrote: > Congratulations Jincheng! > > Best, > Kurt > > > On Tue, Jun 25, 2019 at 9:56 AM LakeShen > wrote: > > > Congratulations! Jincheng Sun > > > > Best, > > LakeShen > > > > Robert Metzger 于2019年6月24日周一 下午11:09写道: > > > > > Hi all, > > > > > > On behalf of the Flink PMC, I'm happy to announce that Jincheng Sun is > > now > > > part of the Apache Flink Project Management Committee (PMC). > > > > > > Jincheng has been a committer since July 2017. He has been very active > on > > > Flink's Table API / SQL component, as well as helping with releases. > > > > > > Congratulations & Welcome Jincheng! > > > > > > Best, > > > Robert > > > > > >
Re: [DISCUSS] Adaptive Parallelism of Job Vertex
Thanks Till for the comments. We will implement a new adaptive parallelism supported scheduler in the new schedule framework. Based on these schedule interfaces, we could do the work in parallel. On Tue, Apr 16, 2019 at 11:18 PM Till Rohrmann wrote: > Hi Bo Wang, > > thanks for proposing this design document. I think it is an interesting > idea to improve Flink's execution efficiency. > > At the moment, the community is actively working on making Flink's > scheduler pluggable. Once this is possible, we could try this feature out > by implementing a scheduler which supports adaptive parallelism without > affecting the existing code. I think this would be a nice approach to > further evaluate and benchmark the implications of such a strategy. What do > you think? > > Cheers, > Till > > On Mon, Apr 8, 2019 at 10:28 AM Bo WANG wrote: > > > Hi all, > > In distribution computing system, execution parallelism is vital for both > > resource efficiency and execution performance. In Flink, execution > > parallelism is a pre-specified parameter, which is usually an empirical > > value and thus might not be optimal on the various amount of data > processed > > by each task. > > > > Furthermore, a fixed parallelism cannot scale to varying data size, which > > is common in production cluster, since we may not frequently change the > > cluster configuration. > > > > Thus, we propose adaptively determine the execution parallelism of each > > vertex at runtime based on the actual input data size and an ideal data > > size processed by each task. The ideal data size is a pre-specified > > parameter according to the property of the operator. > > > > The design doc is ready: > > > > > https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing > > , > > any comments are highly appreciated. > > >
[DISCUSS] Adaptive Parallelism of Job Vertex
Hi all, In distribution computing system, execution parallelism is vital for both resource efficiency and execution performance. In Flink, execution parallelism is a pre-specified parameter, which is usually an empirical value and thus might not be optimal on the various amount of data processed by each task. Furthermore, a fixed parallelism cannot scale to varying data size, which is common in production cluster, since we may not frequently change the cluster configuration. Thus, we propose adaptively determine the execution parallelism of each vertex at runtime based on the actual input data size and an ideal data size processed by each task. The ideal data size is a pre-specified parameter according to the property of the operator. The design doc is ready: https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing, any comments are highly appreciated.
Re: [DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery
ause the proposed extension is actually necessary for proper batch > > > fault tolerance (independent of the DataSet or Query Processor stack). > > > > > > I am adding Kurt to this thread - maybe he help us find that out. > > > > > > On Thu, Jan 24, 2019 at 2:36 PM Piotr Nowojski > > > wrote: > > > > > > > Hi, > > > > > > > > I’m not sure how much effort we will be willing to invest in the > > existing > > > > batch stack. We are currently focusing on the support of bounded > > > > DataStreams (already done in Blink and will be merged to Flink soon) > > and > > > > unifing batch & stream under DataStream API. > > > > > > > > Piotrek > > > > > > > > > On 23 Jan 2019, at 04:45, Bo WANG wrote: > > > > > > > > > > Hi all, > > > > > > > > > > When running the batch WordCount example, I configured the job > > > execution > > > > > mode > > > > > as BATCH_FORCED, and failover-strategy as region, I manually injected > > > > some > > > > > errors to let the execution fail in different phases. In some cases, > > > the > > > > > job could > > > > > recovery from failover and became succeed, but in some cases, the job > > > > > retried > > > > > several times and failed. > > > > > > > > > > Example: > > > > > - If the failure occurred before task read data, e.g., failed before > > > > > invokable.invoke() in Task.java, failover could succeed. > > > > > - If the failure occurred after task having read data, failover did > > not > > > > > work. > > > > > > > > > > Problem diagnose: > > > > > Running the example described before, each ExecutionVertex is defined > > > as > > > > > a restart region, and the ResultPartitionType between executions is > > > > > BLOCKING. > > > > > Thus, SpillableSubpartition and SpillableSubpartitionView are used to > > > > > write/read > > > > > shuffle data, and data blocks are described as BufferConsumers stored > > > in > > > > a > > > > > list > > > > > called buffers, when task requires input data from > > > > > SpillableSubpartitionView, > > > > > BufferConsumers are REMOVED from buffers. Thus, when failures > > occurred > > > > > after having read data, some BufferConsumers have already released. > > > > > Although tasks retried, the input data is incomplete. > > > > > > > > > > Fix Proposal: > > > > > - BufferConsumer should not be removed from buffers until the > > consumed > > > > > ExecutionVertex is terminal. > > > > > - SpillableSubpartition should not be released until the consumed > > > > > ExecutionVertex is terminal. > > > > > - SpillableSubpartition could creates multi > > SpillableSubpartitionViews, > > > > > each of which is corresponding to a ExecutionAttempt. > > > > > > > > > > Best, > > > > > Bo > > > > > > > > > > > > > > > >
[DISCUSS] Shall we make SpillableSubpartition repeatedly readable to support fine grained recovery
Hi all, When running the batch WordCount example, I configured the job execution mode as BATCH_FORCED, and failover-strategy as region, I manually injected some errors to let the execution fail in different phases. In some cases, the job could recovery from failover and became succeed, but in some cases, the job retried several times and failed. Example: - If the failure occurred before task read data, e.g., failed before invokable.invoke() in Task.java, failover could succeed. - If the failure occurred after task having read data, failover did not work. Problem diagnose: Running the example described before, each ExecutionVertex is defined as a restart region, and the ResultPartitionType between executions is BLOCKING. Thus, SpillableSubpartition and SpillableSubpartitionView are used to write/read shuffle data, and data blocks are described as BufferConsumers stored in a list called buffers, when task requires input data from SpillableSubpartitionView, BufferConsumers are REMOVED from buffers. Thus, when failures occurred after having read data, some BufferConsumers have already released. Although tasks retried, the input data is incomplete. Fix Proposal: - BufferConsumer should not be removed from buffers until the consumed ExecutionVertex is terminal. - SpillableSubpartition should not be released until the consumed ExecutionVertex is terminal. - SpillableSubpartition could creates multi SpillableSubpartitionViews, each of which is corresponding to a ExecutionAttempt. Best, Bo
Apply for flink contributor permission
Hi, everyone We would like to make contribution to JIRA(FLINK-10644). Would anyone kindly give me and my team member the contribution permission? Our JIRA id: Ryantaocer, eaglewatcher. Best regards Bo