Re: [DISCUSS] FLIP-27: Refactor Source Interface
Hi Becket, Thanks for updating the progress! I have a question about the #OperatorCoordinator. Will there be any communication between different #OperatorCoordinators (or in the future plan)? Because in that way it may be able to cover some cases in FLIP-27[1] like initializing static data before main input processing. Of course it requires more thinking, just want to speak up some ideas in my mind. +1 to the FLIP and detailed design! [1]. https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API Best, Jiayi Liao Original Message Sender: Stephan Ewen Recipient: dev Date: Wednesday, Dec 4, 2019 18:25 Subject: Re: [DISCUSS] FLIP-27: Refactor Source Interface Thanks, Becket, for updating this. I agree with moving the aspects you mentioned into separate FLIPs - this one way becoming unwieldy in size. +1 to the FLIP in its current state. Its a very detailed write-up, nicely done! On Wed, Dec 4, 2019 at 7:38 AM Becket Qin wrote: > Hi all, > > Sorry for the long belated update. I have updated FLIP-27 wiki page with > the latest proposals. Some noticeable changes include: > 1. A new generic communication mechanism between SplitEnumerator and > SourceReader. > 2. Some detail API method signature changes. > > We left a few things out of this FLIP and will address them in separate > FLIPs. Including: > 1. Per split event time. > 2. Event time alignment. > 3. Fine grained failover for SplitEnumerator failure. > > Please let us know if you have any question. > > Thanks, > > Jiangjie (Becket) Qin > > On Sat, Nov 16, 2019 at 6:10 AM Stephan Ewen wrote: > > > Hi Łukasz! > > > > Becket and me are working hard on figuring out the last details and > > implementing the first PoC. We would update the FLIP hopefully next week. > > > > There is a fair chance that a first version of this will be in 1.10, but > I > > think it will take another release to battle test it and migrate the > > connectors. > > > > Best, > > Stephan > > > > > > > > > > On Fri, Nov 15, 2019 at 11:14 AM Łukasz Jędrzejewski > wrote: > > > > > Hi, > > > > > > This proposal looks very promising for us. Do you have any plans in > which > > > Flink release it is going to be released? We are thinking on using a > Data > > > Set API for our future use cases but on the other hand Data Set API is > > > going to be deprecated so using proposed bounded data streams solution > > > could be more viable in the long term. > > > > > > Thanks, > > > Łukasz > > > > > > On 2019/10/01 15:48:03, Thomas Weise wrote: > > > > Thanks for putting together this proposal! > > > > > > > > I see that the "Per Split Event Time" and "Event Time Alignment" > > sections > > > > are still TBD. > > > > > > > > It would probably be good to flesh those out a bit before proceeding > > too > > > far > > > > as the event time alignment will probably influence the interaction > > with > > > > the split reader, specifically ReaderStatus emitNext(SourceOutput > > > > output). > > > > > > > > We currently have only one implementation for event time alignment in > > the > > > > Kinesis consumer. The synchronization in that case takes place as the > > > last > > > > step before records are emitted downstream (RecordEmitter). With the > > > > currently proposed interfaces, the equivalent can be implemented in > the > > > > reader loop, although note that in the Kinesis consumer the per shard > > > > threads push records. > > > > > > > > Synchronization has not been implemented for the Kafka consumer yet. > > > > > > > > https://issues.apache.org/jira/browse/FLINK-12675 > > > > > > > > When I looked at it, I realized that the implementation will look > quite > > > > different > > > > from Kinesis because it needs to take place in the pull part, where > > > records > > > > are taken from the Kafka client. Due to the multiplexing it cannot be > > > done > > > > by blocking the split thread like it currently works for Kinesis. > > Reading > > > > from individual Kafka partitions needs to be controlled via > > pause/resume > > > > on the Kafka client. > > > > > > > > To take on that responsibility the split thread would need to be > aware > > of > > > > the > > > > watermarks or at least whether it should or should not continue to > > > consume > > > > a given split and this may require a different SourceReader or > > > SourceOutput > > > > interface. > > > > > > > > Thanks, > > > > Thomas > > > > > > > > > > > > On Fri, Jul 26, 2019 at 1:39 AM Biao Liu wrote: > > > > > > > > > Hi Stephan, > > > > > > > > > > Thank you for feedback! > > > > > Will take a look at your branch before public discussing. > > > > > > > > > > > > > > > On Fri, Jul 26, 2019 at 12:01 AM Stephan Ewen > > > wrote: > > > > > > > > > > > Hi Biao! > > > > > > > > > > > > Thanks for reviving this. I would like to join this discussion, > but > > > am > > > > > > quite occupied with the 1.9 release, so can we maybe pause this >
Re: [ANNOUNCE] Jark Wu is now part of the Flink PMC
Congratulations Jark! Best, Jiayi Liao Original Message Sender: Yun Gao Recipient: dev Date: Friday, Nov 8, 2019 18:37 Subject: Re: [ANNOUNCE] Jark Wu is now part of the Flink PMC Congratulations Jark! Best, Yun -- From:wenlong.lwl Send Time:2019 Nov. 8 (Fri.) 18:31 To:dev Subject:Re: [ANNOUNCE] Jark Wu is now part of the Flink PMC Congratulations Jark, well deserved! Best, Wenlong Lyu On Fri, 8 Nov 2019 at 18:22, tison wrote: > Congrats Jark! > > Best, > tison. > > > Jingsong Li 于2019年11月8日周五 下午6:08写道: > > > Congratulations to Jark. > > Jark has really contributed a lot to the table layer with a long time. > Well > > deserved. > > > > Best, > > Jingsong Lee > > > > On Fri, Nov 8, 2019 at 6:05 PM Yu Li wrote: > > > > > Congratulations Jark! Well deserved! > > > > > > Best Regards, > > > Yu > > > > > > > > > On Fri, 8 Nov 2019 at 17:55, OpenInx wrote: > > > > > > > Congrats Jark ! Well deserve. > > > > > > > > On Fri, Nov 8, 2019 at 5:53 PM Paul Lam > wrote: > > > > > > > > > Congrats Jark! > > > > > > > > > > Best, > > > > > Paul Lam > > > > > > > > > > > 在 2019年11月8日,17:51,jincheng sun 写道: > > > > > > > > > > > > Hi all, > > > > > > > > > > > > On behalf of the Flink PMC, I'm happy to announce that Jark Wu is > > now > > > > > > part of the Apache Flink Project Management Committee (PMC). > > > > > > > > > > > > Jark has been a committer since February 2017. He has been very > > > active > > > > on > > > > > > Flink's Table API / SQL component, as well as frequently helping > > > > > > manage/verify/vote releases. He has been writing many blogs about > > > > Flink, > > > > > > also driving the translation work of Flink website and > > documentation. > > > > He > > > > > is > > > > > > very active in China community as he gives talks about Flink at > > many > > > > > events > > > > > > in China. > > > > > > > > > > > > Congratulations & Welcome Jark! > > > > > > > > > > > > Best, > > > > > > Jincheng (on behalf of the Flink PMC) > > > > > > > > > > > > > > > > > > > > > > > -- > > Best, Jingsong Lee > > >
Re: RocksDB state on HDFS seems not being cleanned up
Hi Shuwen, The “shared” means that the state files are shared among multiple checkpoints, which happens when you enable incremental checkpointing[1]. Therefore, it’s reasonable that the size keeps growing if you set “state.checkpoint.num-retained” to be a big value. [1] https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html Best, Jiayi Liao Original Message Sender: shuwen zhou Recipient: dev Date: Tuesday, Nov 5, 2019 17:59 Subject: RocksDB state on HDFS seems not being cleanned up Hi Community, I have a job running on Flink1.9.0 on YARN with rocksDB on HDFS with incremental checkpoint enabled. I have some MapState in code with following config: val ttlConfig = StateTtlConfig .newBuilder(Time.minutes(30) .updateTtlOnCreateAndWrite() .cleanupInBackground() .cleanupFullSnapshot() .setStateVisibility(StateTtlConfig.StateVisibility.ReturnExpiredIfNotCleanedUp) After running for around 2 days, I observed checkpoint folder is showing 44.4 M /flink-chk743e4568a70b626837b/chk-40 65.9 M /flink-chk743e4568a70b626837b/chk-41 91.7 M /flink-chk743e4568a70b626837b/chk-42 96.1 M /flink-chk743e4568a70b626837b/chk-43 48.1 M /flink-chk743e4568a70b626837b/chk-44 71.6 M /flink-chk743e4568a70b626837b/chk-45 50.9 M /flink-chk743e4568a70b626837b/chk-46 90.2 M /flink-chk743e4568a70b626837b/chk-37 49.3 M /flink-chk743e4568a70b626837b/chk-38 96.9 M /flink-chk743e4568a70b626837b/chk-39 797.9 G /flink-chk743e4568a70b626837b/shared The ./shared folder size seems continuing increasing and seems the folder is not being clean up. However while I disabled incremental cleanup, the expired full snapshot will be removed automatically. Is there any way to remove outdated state on HDFS to stop it from increasing? Thanks. -- Best Wishes, Shuwen Zhou
Re: [DISCUSS] Semantic and implementation of per-job mode
Hi all, Firstly thanks @tison for bring this up and strongly +1 for the overall design. I’d like to add one more example of "multiple jobs in one program" with what I’m currently working on. I’m trying to run a TPC-DS benchmark testing (including tens of sql query job) on Flink and sufferring a lot from maintaining the client because I can’t run this program in per-job mode and have to make the client attached. Back to our discussion, I can see now there is a divergence of compiling the job graph between in client and in #ClusterEntrypoint. And up and downsides exist in either way. As for the opt-in solution, I have a question, what if the user chooses detach mode, compiling in the client and runs a multi-job program at the same time? And it still not gonna work. Besides, by adding an compiling option, we need to consider more things when submitting a job like "Is my program including multiple job?" or "Does the program need to be initialized before submitting to a remote cluster?", which looks a bit complicated and confusing to me. By summarizing, I'll vote for the per-program new concept but I may not prefer the opt-in option mentioned in the mailing list or maybe we need to reconsider a better concept and definition which is easy to understand. Best, Jiayi Liao Original Message Sender: Rong Rong Recipient: Regina" Cc: Theo Diefenthal; u...@flink.apache.org Date: Friday, Nov 1, 2019 11:01 Subject: Re: [DISCUSS] Semantic and implementation of per-job mode Hi All, Thanks @Tison for starting the discussion and I think we have very similar scenario with Theo's use cases. In our case we also generates the job graph using a client service (which serves multiple job graph generation from multiple user code) and we've found that managing the upload/download between the cluster and the DFS to be trick and error-prone. In addition, the management of different environment and requirement from different user in a single service posts even more trouble for us. However, shifting the job graph generation towards the cluster side also requires some thoughts regarding how to manage the driver-job as well as some dependencies conflicts - In the case for shipping the job graph generation to the cluster, some unnecessary dependencies for the runtime will be pulled in by the driver-job (correct me if I were wrong Theo) I think in general I agree with @Gyula's main point: unless there is a very strong reason, it is better if we put the driver-mode as an opt-in (at least at the beginning). I left some comments on the document as well. Please kindly take a look. Thanks, Rong On Thu, Oct 31, 2019 at 9:26 AM Chan, Regina wrote: Yeah just chiming in this conversation as well. We heavily use multiple job graphs to get isolation around retry logic and resource allocation across the job graphs. Putting all these parallel flows into a single graph would mean sharing of TaskManagers across what was meant to be truly independent. We also build our job graphs dynamically based off of the state of the world at the start of the job. While we’ve had a share of the pain described, my understanding is that there would be a tradeoff in number of jobs being submitted to the cluster and corresponding resource allocation requests. In the model with multiple jobs in a program, there’s at least the opportunity to reuse idle taskmanagers. From: Theo Diefenthal Sent: Thursday, October 31, 2019 10:56 AM To: u...@flink.apache.org Subject: Re: [DISCUSS] Semantic and implementation of per-job mode I agree with Gyula Fora, In our case, we have a client-machine in the middle between our YARN cluster and some backend services, which can not be reached directly from the cluster nodes. On application startup, we connect to some external systems, get some information crucial for the job runtime and finally build up the job graph to be committed. It is true that we could workaround this, but it would be pretty annoying to connect to the remote services, collect the data, upload it to HDFS, start the job and make sure, housekeeping of those files is also done at some later time. The current behavior also corresponds to the behavior of Sparks driver mode, which made the transition from Spark to Flink easier for us. But I see the point, especially in terms of Kubernetes and would thus also vote for an opt-in solution, being the client compilation the default and having an option for the per-program mode as well. Best regards Von: "Flavio Pompermaier" An: "Yang Wang" CC: "tison" , "Newport, Billy" , "Paul Lam" , "SHI Xiaogang" , "dev" , "user" Gesendet: Donnerstag, 31. Oktober 2019 10:45:36 Betreff: Re: [DISCUSS] Semantic and implementation of per-job mode Hi all, we're using a lot the multiple jobs in one program and this is why: when you fetch data from a huge number of sources and, for each source, you do some transformation and then you
Re: [ANNOUNCE] Becket Qin joins the Flink PMC
Congratulations Becket! Best, Jiayi Liao Original Message Sender: Biao Liu Recipient: dev Date: Tuesday, Oct 29, 2019 12:10 Subject: Re: [ANNOUNCE] Becket Qin joins the Flink PMC Congrats Becket! Thanks, Biao /'bɪ.aʊ/ On Tue, 29 Oct 2019 at 12:07, jincheng sun wrote: > Congratulations Becket. > Best, > Jincheng > > Rui Li 于2019年10月29日周二 上午11:37写道: > > > Congrats Becket! > > > > On Tue, Oct 29, 2019 at 11:20 AM Leonard Xu wrote: > > > > > Congratulations! Becket. > > > > > > Best, > > > Leonard Xu > > > > > > > On 2019年10月29日, at 上午11:00, Zhenghua Gao wrote: > > > > > > > > Congratulations, Becket! > > > > > > > > *Best Regards,* > > > > *Zhenghua Gao* > > > > > > > > > > > > On Tue, Oct 29, 2019 at 10:34 AM Yun Gao > > > > > > > wrote: > > > > > > > >> Congratulations Becket! > > > >> > > > >> Best, > > > >> Yun > > > >> > > > >> > > > >> -- > > > >> From:Jingsong Li > > > >> Send Time:2019 Oct. 29 (Tue.) 10:23 > > > >> To:dev > > > >> Subject:Re: [ANNOUNCE] Becket Qin joins the Flink PMC > > > >> > > > >> Congratulations Becket! > > > >> > > > >> Best, > > > >> Jingsong Lee > > > >> > > > >> On Tue, Oct 29, 2019 at 10:18 AM Terry Wang > > wrote: > > > >> > > > >>> Congratulations, Becket! > > > >>> > > > >>> Best, > > > >>> Terry Wang > > > >>> > > > >>> > > > >>> > > > 2019年10月29日 10:12,OpenInx 写道: > > > > > > Congratulations Becket! > > > > > > On Tue, Oct 29, 2019 at 10:06 AM Zili Chen > > > >> wrote: > > > > > > > Congratulations Becket! > > > > > > > > Best, > > > > tison. > > > > > > > > > > > > Congxian Qiu 于2019年10月29日周二 上午9:53写道: > > > > > > > >> Congratulations Becket! > > > >> > > > >> Best, > > > >> Congxian > > > >> > > > >> > > > >> Wei Zhong 于2019年10月29日周二 上午9:42写道: > > > >> > > > >>> Congratulations Becket! > > > >>> > > > >>> Best, > > > >>> Wei > > > >>> > > > 在 2019年10月29日,09:36,Paul Lam 写道: > > > > > > Congrats Becket! > > > > > > Best, > > > Paul Lam > > > > > > > 在 2019年10月29日,02:18,Xingcan Cui 写道: > > > > > > > > Congratulations, Becket! > > > > > > > > Best, > > > > Xingcan > > > > > > > >> On Oct 28, 2019, at 1:23 PM, Xuefu Z > > wrote: > > > >> > > > >> Congratulations, Becket! > > > >> > > > >> On Mon, Oct 28, 2019 at 10:08 AM Zhu Zhu > > > > > wrote: > > > >> > > > >>> Congratulations Becket! > > > >>> > > > >>> Thanks, > > > >>> Zhu Zhu > > > >>> > > > >>> Peter Huang 于2019年10月29日周二 > > > >> 上午1:01写道: > > > >>> > > > Congratulations Becket Qin! > > > > > > > > > Best Regards > > > Peter Huang > > > > > > On Mon, Oct 28, 2019 at 9:19 AM Rong Rong < > > > walter...@gmail.com > > > >>> > > > >>> wrote: > > > > > > > Congratulations Becket!! > > > > > > > > -- > > > > Rong > > > > > > > > On Mon, Oct 28, 2019, 7:53 AM Jark Wu > > > >> wrote: > > > > > > > >> Congratulations Becket! > > > >> > > > >> Best, > > > >> Jark > > > >> > > > >> On Mon, 28 Oct 2019 at 20:26, Benchao Li < > > > >> libenc...@gmail.com> > > > >>> wrote: > > > >> > > > >>> Congratulations Becket. > > > >>> > > > >>> Dian Fu 于2019年10月28日周一 > 下午7:22写道: > > > >>> > > > Congrats, Becket. > > > > > > > 在 2019年10月28日,下午6:07,Fabian Hueske < > fhue...@gmail.com> > > > >> 写道: > > > > > > > > Hi everyone, > > > > > > > > I'm happy to announce that Becket Qin has joined the > > > Flink > > > >> PMC. > > > > Let's congratulate and welcome Becket as a new member > > of > > > >> the > > > Flink > > > >> PMC! > > > > > > > > Cheers, > > > > Fabian > > > > > > > > > >>> > > > >>> -- > > > >>> > > > >>> Benchao Li > > > >>> School of Electronics Engineering and Computer Science, > > > >> Peking > > > > University > > > >>> Tel:+86-15650713730 > > > >>> Email: libenc...@gmail.com; libenc...@pku.edu.cn > > > >>> > > > >> > > > > > > > > > > >>> > > > >> > > > >> > > > >> -- > > > >> Xuefu Zhang
Re: [DISCUSS] Introduce a location-oriented two-stage query mechanism toimprove the queryable state.
Hi vino, +1 for improvement on queryable state feature. This reminds me of the state-processing-api module, which is very helpful when we analyze state in offline. However currently we don’t have many ways to know what is happening about the state inside a running application, which makes me feel that this has a good potential. Since these two modules are seperate but doing the similar work(anaylyzing state), maybe we have to think more about their orientation, or maybe integrate them in a graceful way in the future. Anyway, this is a great work and it’d be better if we can hear more thoughts and use cases. Best Regards, Jiayi Liao Original Message Sender: vino yang Recipient: dev@flink.apache.org Date: Tuesday, Oct 22, 2019 15:42 Subject: [DISCUSS] Introduce a location-oriented two-stage query mechanism toimprove the queryable state. Hi guys, Currently, queryable state's client is hard to use. Because it requires users to know the address of TaskManager and the port of the proxy. Actually, most users who do not have good knowledge about the Flink's inner and runtime in production. The queryable state clients directly interact with query state client proxies which host on each TaskExecutor. This design requires users to know too much detail. We introduce a location service component to improve the architecture of the queryable state and hide the details of the task executors. We first give a brief introduction to our design in Section 2 and then detail the implementation in Section 3. At last, we describe some future work that can be done. I have given an initialized implementation in my Flink repository[2]. One thing that needs to be stated is that we have not changed the existing solution, so it still works according to the previous modes. The design documentation is here[3]. Any suggestion and feedback are welcome and appriciated. [1]: https://statefun.io/ [2]: https://github.com/yanghua/flink/tree/improve-queryable-state-master [3]: https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing Best, Vino
Re: Per Key Grained Watermark Support
Hi Congxian, Thanks but by doing that, we will lose some features like output of the late data. Original Message Sender: Congxian Qiu Recipient: Lasse Nedergaard Cc: 廖嘉逸; u...@flink.apache.org; dev@flink.apache.org Date: Monday, Sep 23, 2019 19:56 Subject: Re: Per Key Grained Watermark Support Hi There was a discussion about this issue[1], as the previous discussion said at the moment this is not supported out of the box by Flink, I think you can try keyed process function as Lasse said. [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Watermark-for-each-key-td27485.html#a27516 Best, Congxian Lasse Nedergaard 于2019年9月23日周一 下午12:42写道: Hi Jiayi We have face the same challenge as we deal with IoT unit and they do not necessarily share the same timestamp. Watermark or. Key would be perfect match here. We tried to workaround with handle late events as special case with sideoutputs but isn’t the perfect solution. My conclusion is to skip watermark and create a keyed processed function and handle the time for each key my self. Med venlig hilsen / Best regards Lasse Nedergaard Den 23. sep. 2019 kl. 06.16 skrev 廖嘉逸 : Hi all, Currently Watermark can only be supported on task’s level(or partition level), which means that the data belonging to the faster key has to share the same watermark with the data belonging to the slower key in the same key group of a KeyedStream. This will lead to two problems: 1. Latency. For example, every key has its own window state but they have to trigger it after the window’s end time is exceeded by the watermark which is determined by the data belonging to the slowest key usually. (Same in CepOperator and other operators which are using watermark to fire result) 2. States Size. Because the faster key delayes its firing on result, it has to store more redundant states which should be pruned earlier. However, since the watermark has been introduced for a long time and not been designed to be more fine-grained in the first place, I find that it’s very hard to solve this problem without a big change. I wonder if there is anyone in community having some successful experience on this or maybe there is a shortcut way? If not, I can try to draft a design if this is needed in community. Best Regards, Jiayi Liao
Re: [ANNOUNCE] Zili Chen becomes a Flink committer
Congratulations! Best, Jiayi Liao Original Message Sender: Till Rohrmann Recipient: dev; user Date: Wednesday, Sep 11, 2019 17:22 Subject: [ANNOUNCE] Zili Chen becomes a Flink committer Hi everyone, I'm very happy to announce that Zili Chen (some of you might also know him as Tison Kun) accepted the offer of the Flink PMC to become a committer of the Flink project. Zili Chen has been an active community member for almost 16 months now. He helped pushing the Flip-6 effort over the finish line, ported a lot of legacy code tests, removed a good part of the legacy code, contributed numerous fixes, is involved in the Flink's client API refactoring, drives the refactoring of Flink's HighAvailabilityServices and much more. Zili Chen also helped the community by PR reviews, reporting Flink issues, answering user mails and being very active on the dev mailing list. Congratulations Zili Chen! Best, Till (on behalf of the Flink PMC)
Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a committer tothe Flink project
Congratulations ! Best Regards, Jiayi Liao Original Message Sender:Hequn chengchenghe...@gmail.com Recipient:dev...@flink.apache.org Date:Thursday, Jul 18, 2019 17:51 Subject:Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a committer tothe Flink project Congratulations Becket! Best, Hequn On Thu, Jul 18, 2019 at 5:34 PM vino yang yanghua1...@gmail.com wrote: Congratulations! Best, Vino Yun Gao yungao...@aliyun.com.invalid 于2019年7月18日周四 下午5:31写道:Congratulations! Best, Yun -- From:Kostas Kloudas kklou...@gmail.com Send Time:2019 Jul. 18 (Thu.) 17:30 To:dev dev@flink.apache.org Subject:Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a committer to the Flink project Congratulations Becket! Kostas On Thu, Jul 18, 2019 at 11:21 AM Guowei Ma guowei@gmail.com wrote: Congrats Becket! Best,Guowei Terry Wang zjuwa...@gmail.com 于2019年7月18日周四 下午5:17写道:Congratulations Becket! 在 2019年7月18日,下午5:09,Dawid Wysakowicz dwysakow...@apache.org 写道: Congratulations Becket! Good to have you onboard! On 18/07/2019 10:56, Till Rohrmann wrote: Congrats Becket! On Thu, Jul 18, 2019 at 10:52 AM Jeff Zhang zjf...@gmail.com wrote: Congratulations Becket! Xu Forward forwardxu...@gmail.com 于2019年7月18日周四 下午4:39写道: Congratulations Becket! Well deserved.Cheers, forward Kurt Young ykt...@gmail.com 于2019年7月18日周四 下午4:20写道: Congrats Becket! Best, KurtOn Thu, Jul 18, 2019 at 4:12 PM JingsongLee lzljs3620...@aliyun.com .invalid wrote: Congratulations Becket! Best, Jingsong Lee -- From:Congxian Qiu qcx978132...@gmail.com Send Time:2019年7月18日(星期四) 16:09 To:dev@flink.apache.org dev@flink.apache.org Subject:Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a committer to the Flink project Congratulations Becket! Well deserved. Best, CongxianJark Wu imj...@gmail.com 于2019年7月18日周四 下午4:03写道: Congratulations Becket! Well deserved. Cheers, Jark On Thu, 18 Jul 2019 at 15:56, Paul Lam paullin3...@gmail.com wrote: Congrats Becket! Best, Paul Lam 在 2019年7月18日,15:41,Robert Metzger rmetz...@apache.org 写道: Hi all, I'm excited to announce that Jiangjie (Becket) Qin just became a Flink committer! Congratulations Becket! Best, Robert (on behalf of the Flink PMC)-- Best Regards Jeff Zhang
Re: Add relative path support in Savepoint Connector
Hi Konstantin, Thank you for your feedback. You’re right that this part belongs to the savepoint desrializing. This is an old issue which should be resolve before 1.3 version according to the comments. Anyway, I’m going to keep following this. Best Regards, Jiayi Liao Original Message Sender:Konstantin knaufkonstan...@ververica.com Recipient:dev...@flink.apache.org; Stefan richters.rich...@ververica.com Date:Wednesday, Jul 17, 2019 16:28 Subject:Re: Add relative path support in Savepoint Connector Hi Jiayi, I think, this is not an issue with the State Processor API specifically, but with savepoints in general. The _metadata file of a savepoint uses absolute path references. There is a pretty old Jira ticket, which already mentioned this limitation [1]. Stefan (cc) might know more about any ongoing development in that direction and might have an idea about the effort of making savepoints relocatable. Best, Konstantin [1] https://issues.apache.org/jira/browse/FLINK-5763 On Wed, Jul 17, 2019 at 8:35 AM bupt_ljy bupt_...@163.com wrote: Hi again, Anyone has any opinion on this topic?Best Regards, Jiayi LiaoOriginal Message Sender:bupt_ljybupt_...@163.com Recipient:dev...@flink.apache.org Cc:Tzu-Li (Gordon) taitzuli...@apache.org Date:Tuesday, Jul 16, 2019 15:24 Subject:Add relative path support in Savepoint ConnectorHi all, Firstly I appreciate Gordon and Seth’s effort on this feature, which is really helpful to our production use. Like you mentioned in the FLINK-12047, one of the production uses is that we use the existing state to derive new state. However, since the state handle is using the absolute path to get the input stream, we need to directly operate the state in production environment, which is not an anxiety-reducing situation, at least for me. So I wonder if we can add the relative path support in this module because the files are persisted in a directory after we take a savepoint, which makes it achievable. I’m not sure whether my scenario is a common case or not, but I think I can give my contributions if you all are okay about this. Best Regards, Jiayi Liao -- Konstantin Knauf | Solutions Architect +49 160 91394525 Planned Absences: 10.08.2019 - 31.08.2019, 05.09. - 06.09.2010 -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
Re: Add relative path support in Savepoint Connector
Hi again, Anyone has any opinion on this topic? Best Regards, Jiayi Liao Original Message Sender:bupt_ljybupt_...@163.com Recipient:dev...@flink.apache.org Cc:Tzu-Li (Gordon) taitzuli...@apache.org Date:Tuesday, Jul 16, 2019 15:24 Subject:Add relative path support in Savepoint Connector Hi all, Firstly I appreciate Gordon and Seth’s effort on this feature, which is really helpful to our production use. Like you mentioned in the FLINK-12047, one of the production uses is that we use the existing state to derive new state. However, since the state handle is using the absolute path to get the input stream, we need to directly operate the state in production environment, which is not an anxiety-reducing situation, at least for me. So I wonder if we can add the relative path support in this module because the files are persisted in a directory after we take a savepoint, which makes it achievable. I’m not sure whether my scenario is a common case or not, but I think I can give my contributions if you all are okay about this. Best Regards, Jiayi Liao
Add relative path support in Savepoint Connector
Hi all, Firstly I appreciate Gordon and Seth’s effort on this feature, which is really helpful to our production use. Like you mentioned in the FLINK-12047, one of the production uses is that we use the existing state to derive new state. However, since the state handle is using the absolute path to get the input stream, we need to directly operate the state in production environment, which is not an anxiety-reducing situation, at least for me. So I wonder if we can add the relative path support in this module because the files are persisted in a directory after we take a savepoint, which makes it achievable. I’m not sure whether my scenario is a common case or not, but I think I can give my contributions if you all are okay about this. Best Regards, Jiayi Liao
Re: RE: [DISCUSS] Improve Queryable State and introduce aQueryServerProxy component
Hi vino, Big +1 for this. Glad to see new progress on this topic! I’ve left some comments on it. Best Regards, Jiayi Liao Original Message Sender:vino yangyanghua1...@gmail.com Recipient:Georgi stoyanovgstoya...@live.com Cc:dev...@flink.apache.org; useru...@flink.apache.org; Stefan richters.rich...@ververica.com; Aljoscha krettekaljos...@apache.org; kkloudas@gmail.comkklou...@gmail.com; Stephan ewense...@apache.org; liyu@apache.orgl...@apache.org; Tzu-Li (Gordon) taitzuli...@apache.org Date:Tuesday, Jul 2, 2019 16:45 Subject:Re: RE: [DISCUSS] Improve Queryable State and introduce aQueryServerProxy component Hi all, In the past, I have tried to further refine the design of this topic thread and wrote a design document to give more detailed design images and text description, so that it is more conducive to discussion.[1] Note: The document is not yet completed, for example, the "Implementation" section is missing. Therefore, it is still in an open discussion state. I will improve the rest while listening to the opinions of the community. Welcome and appreciate more discussions and feedback. Best, Vino [1]:https://docs.google.com/document/d/181qYVIiHQGrc3hCj3QBn1iEHF4bUztdw4XO8VSaf_uI/edit?usp=sharing yanghua1127 yanghua1...@gmail.com 于2019年6月7日周五 下午11:32写道: Hi Georgi, Thanks for your feedback. And glad to hear you are using queryable state. I agree that implementation of option 1 is easier than others. However, when we design the new architecture we need to consider more aspects .e.g. scalability. So it seems option 3 is more suitable. Actually, some committers such as Stefan, Gordon and Aljoscha have given me feedback and direction. Currently, I am writing the design document. If it is ready to be presented. I will copy to this thread and we can discuss further details. Best, Vino On 2019-06-07 19:03 , Georgi Stoyanov Wrote: Hi Vino, I was investigating the current architecture and AFAIK the first proposal will be a lot easier to implement, cause currently JM has the information about the states (where, which etc thanks to KvStateLocationRegistry. Correct me if I’m wrong) We are using the feature and it’s indeed not very cool to iterate trough ports, check which TM is the responsible one etc etc. It will be very useful if someone from the committers joins the topic and give us some insights what’s going to happen with that feature. Kind Regards, Georgi From: vino yang yanghua1...@gmail.com Sent: Thursday, April 25, 2019 5:18 PM To: dev dev@flink.apache.org; user u...@flink.apache.org Cc: Stefan Richter s.rich...@ververica.com; Aljoscha Krettek aljos...@apache.org; kklou...@gmail.com Subject: [DISCUSS] Improve Queryable State and introduce a QueryServerProxy component Hi all, I want to share my thought with you about improving thequeryable state and introducing a QueryServerProxy component. I think the current queryable state's client is hard to use. Because it needs users to know the TaskManager's address and proxy's port. Actually, some business users who do not have good knowledge about the Flink's inner or runtime in production. However, sometimes they need to query the values of states. IMO, the reason caused this problem is because of the queryable state's architecture. Currently, the queryable state clientsinteract with querystate client proxy components which host on each TaskManager.This design is difficult to encapsulate the point of change and exposes too much detail to the user. My personal idea is that we could introduce a really queryable state server, named e.g.QueryStateProxyServerwhich would delegate all the query state request and query the local registry then redirect the request to the specific QueryStateClientProxy(runs on each TaskManager). The server is the users really want to care about. And it would make the users ignorant to the TaskManagers' address and proxies' port. The current QueryStateClientProxy would become QueryStateProxyClient. Generally speaking, the roles of the QueryStateProxyServer list below: works as all the query client's proxy to receive all the request and send response; a router to redirect the real query requests to the specific proxy client; maintain route table registry(state - TaskManager, TaskManager-proxy client address) more fine-granted control, such as cache result, ACL, TTL, SLA(rate limit) and so on About the implementation, there are three opts: opt 1: Let the JobManager acts as the query proxy server. · pros: reuse the exists JM, do not need to introduce a new process can reduce the complexity; · cons: would make JM heavy burdens, depends on the query frequency, may impact on the stability opt 2: Introduce a new component which runs as a single process and acts as the query proxy server: · pros: reduce the burdens and make the JM more stability · cons: introduced a new component will make the implementation more complexity opt 3 (suggestion comes
Re: A Question About Flink Network Stack
Hi Zhijiang, Thank you for the detailed explaination! Best Regards, Jiayi Liao Original Message Sender:zhijiangwangzhijiang...@aliyun.com.INVALID Recipient:dev...@flink.apache.org Date:Tuesday, Jun 18, 2019 17:34 Subject:Re: A Question About Flink Network Stack Hi Jiayi, Thanks for concerning the network stack and you pointed out a very good question. Your understanding is right. In credit-based mode, on receiver side it has fixed exclusive buffers(credits) for each remote input channel to confirm every channel could receive data in parallel, not block each other. The receiver also has a floating shared buffer pool for all the input channels in order to give more credits for large backlog on sender side. On sender side it still uses a shared buffer pool for all the subpartitions. In one-to-one mode which means one producer only produces data for one consumer, then it seems no other concerns. In all-to-all mode which means one producer emits data for all the consumers, then the buffers in pool might be eventually accumulated into the slow subpartition until exhausted, which would cause the other fast subpartitions have no available buffers to fill in more data. This would cause backpressure finally. Because the operator does not know the condition of buffer usage and it could not select which records are emitted in priority. Until the record is emitted by producer then we could know which subpartition covers this record via ChannelSelector. If we do not serialize this record into slow subpartition to occupy buffer resource, then it needs additional memory overhead for caching this record, which is not within expectation to cause unstable. So on producer side it seems have no other choice until the buffer resource is exhausted. The credit-based is not for solving the backpressure issue which would not be avoided completely. The credit-based could bring obvious benefits for one-to-one mode sharing tcp channel in backpressure scenario, and could aovid overhead memory usages in netty stack to casue unstable and speed up exactly-once checkpoint for avoiding spilling blocked data. In addition, we ever implemeted an improvement for RebalanceStrategy in considering the slow subpartition issue. For rebalance channel selector, the record could be emitted to any subpartitions actually, no correctness issue. Then when the record is emmited, we select the fastest subpartition to take this record based on the current backlog size instead of previous round-robin way. Then it could bing benefits for some scenarios. Best, Zhijiang -- From:bupt_ljy bupt_...@163.com Send Time:2019年6月18日(星期二) 16:35 To:dev dev@flink.apache.org Subject:A Question About Flink Network Stack Hi all, I’m not very familiar with the network part of Flink. And a question occurs to me after I read most related source codes about network. I’ve seen that Flink uses the credit-based machanism to solve the blocking problem from receivers’ side, which means that one “slow” input channel won’t block other input channels’ consumption because of their own exclusive credits. However, from the sender’s side, I find that memory segments sent to all receivers’ channels share the same local segment pool(LocalBufferPool), which may cause a problem here. Assume that we have a non-parallel source, which is partitioned into a map operator, whose parallelism is two, and one of the map tasks is consuming very slow. Is there any possibility that the memory segments which should be sent to the slower receiver fill the whole local segment pool, which blocks the data which should be sent to the faster receiver? I appreciate any comments or answers, and please correct me if I am wrong about this. Best Regards, Jiayi Liao
A Question About Flink Network Stack
Hi all, I’m not very familiar with the network part of Flink. And a question occurs to me after I read most related source codes about network. I’ve seen that Flink uses the credit-based machanism to solve the blocking problem from receivers’ side, which means that one “slow” input channel won’t block other input channels’ consumption because of their own exclusive credits. However, from the sender’s side, I find that memory segments sent to all receivers’ channels share the same local segment pool(LocalBufferPool), which may cause a problem here. Assume that we have a non-parallel source, which is partitioned into a map operator, whose parallelism is two, and one of the map tasks is consuming very slow. Is there any possibility that the memory segments which should be sent to the slower receiver fill the whole local segment pool, which blocks the data which should be sent to the faster receiver? I appreciate any comments or answers, and please correct me if I am wrong about this. Best Regards, Jiayi Liao
[jira] [Created] (FLINK-12668) Introduce fromParallelElements for generating DataStreamSource
bupt_ljy created FLINK-12668: Summary: Introduce fromParallelElements for generating DataStreamSource Key: FLINK-12668 URL: https://issues.apache.org/jira/browse/FLINK-12668 Project: Flink Issue Type: Improvement Affects Versions: 1.8.0 Reporter: bupt_ljy Assignee: bupt_ljy Fix For: 1.9.0 We've already have fromElements function in StreamExecutionEnvironment to generate a non-parallel DataStreamSource. We should introduce a similar fromParallelElements function because: 1. The current implementations of ParallelSourceFunction are mostly bound to external resources like kafka source. And we need a more lightweight parallel source function that can be easily created. The SplittableIterator is too heavy by the way. 2. It's very useful if we want to verify or test something in a parallel processing environment. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] Improve Queryable State and introduce aQueryServerProxy component
Hi yang, +1 for this proposal. Queryable state is a very common usage in our scenarios when we debug and query the realtime status in streaming process like CEP. And we’ve done a lot to improve the “user experience” of this feature like exposing the taskmanager’s proxy port in TaskManagerInfo. I’m looking forward to a more detailed and deeper discussion and I’d like to contribute back to the community on this. Best Regards, Jiayi Liao Original Message Sender:vino yangyanghua1...@gmail.com Recipient:dev@flink.apache.org...@flink.apache.org Date:Friday, Apr 26, 2019 16:41 Subject:Re: [DISCUSS] Improve Queryable State and introduce aQueryServerProxy component Hi Paul, Thanks for your reply. You are right, currently, the queryable state has few users. And I totally agree with you, it makes the streaming works more like a DB. About the architecture and the problem you concern: yes, it maybe affect the JobManager if they are deployed together. I think it's important to guarantee the JobManager's available and stability, and the QueryProxyServer is just a secondary service component. So when describing the role of the QueryProxyServer, I mentioned SLA policy, I think it's a solution. But the detail may need to be discussed. About starting queryable state client with a cmd, I think it's a good idea and valuable. Best, Vino. Paul Lam paullin3...@gmail.com 于2019年4月26日周五 下午3:31写道: Hi Vino, Thanks a lot for bringing up the discussion! Queryable state has been at beta version for a long time, and due to its complexity and instability I think there are not many users, but there’s a great value in it which makes state as database one step closer. WRT the architecture, I’d vote for opt 3, because it fits the cloud architecture the most and avoids putting more burdens on JM (sometimes the queries could be slow and resources intensive). My concern is that on many cluster frameworks the container resources are limited (IIUC, the JM and QS are running in the same container), would JM gets killed if QS eats up too much memory? And a minor suggestion: can we introduce a cmd script to setup a QueryableStateClient? That would be easier for users who wants to try out this feature. Best, Paul Lam在 2019年4月26日,11:09,vino yang yanghua1...@gmail.com 写道: Hi Quan, Thanks for your reply. Actually, I did not try this way. But, there are two factors we should consider: 1. The local state storage is not equals to RocksDB, otherwise Flink does not need to provide a queryable state client. What's more, querying the RocksDB is still an address-explicit action. 2. IMO, the proposal's more valuable suggestion is to make the queryable state's architecture more reasonable, let it encapsulated more details and improve its scalability. Best, Vino Shi Quan qua...@outlook.com 于2019年4月26日周五 上午10:38写道: Hi, How about take states from RocksDB directly, in this case, TM host is unnecessary. Best Quan Shi From: vino yang yanghua1...@gmail.com Sent: Thursday, April 25, 2019 10:18:20 PM To: dev; user Cc: Stefan Richter; Aljoscha Krettek; kklou...@gmail.com Subject: [DISCUSS] Improve Queryable State and introduce a QueryServerProxy component Hi all, I want to share my thought with you about improving the queryable state and introducing a QueryServerProxy component. I think the current queryable state's client is hard to use. Because it needs users to know the TaskManager's address and proxy's port. Actually, some business users who do not have good knowledge about the Flink's inner or runtime in production. However, sometimes they need to query the values of states. IMO, the reason caused this problem is because of the queryable state's architecture. Currently, the queryable state clients interact with query state client proxy components which host on each TaskManager. This design is difficult to encapsulate the point of change and exposes too much detail to the user. My personal idea is that we could introduce a really queryable state server, named e.g. QueryStateProxyServer which would delegate all the query state request and query the local registry then redirect the request to the specific QueryStateClientProxy(runs on each TaskManager). The server is the users really want to care about. And it would make the users ignorant to the TaskManagers' address and proxies' port. The current QueryStateClientProxy would become QueryStateProxyClient. Generally speaking, the roles of the QueryStateProxyServer list below: * works as all the query client's proxy to receive all the request and send response; * a router to redirect the real query requests to the specific proxy client; * maintain route table registry (state - TaskManager, TaskManager-proxy client address) * more fine-grante
CEP - Support for multiple pattern
Hi, all It’s actually very common that we construct more than one rule on the same data source. And I’m developing some such kind of features for our businesses and some ideas come up. Do we have any plans for supporting multiple patterns in CEP? Best, Jiayi Liao
Re: [DISCUSS] Releasing Flink 1.7.1
Hi Chesnay, Thanks for these useful fixes. +1 for the release. Best, Jiayi Liao Original Message Sender:Hequn chengchenghe...@gmail.com Recipient:dev...@flink.apache.org Date:Thursday, Dec 13, 2018 09:37 Subject:Re: [DISCUSS] Releasing Flink 1.7.1 Hi Chesnay, Thanks for the efforts. +1 for the release. It's nice to have these fixes. Best, Hequn On Wed, Dec 12, 2018 at 11:09 PM Till Rohrmann trohrm...@apache.org wrote: Thanks for starting this discussion Chesnay. +1 for creating the 1.7.1 release since it already contains very useful fixes. Cheers, Till On Wed, Dec 12, 2018 at 1:11 PM vino yang yanghua1...@gmail.com wrote:Hi Chesnay, +1 to release Flink 1.7.1 Best, Vino Chesnay Schepler ches...@apache.org 于2018年12月12日周三 下午8:04写道: Hello, I propose releasing Flink 1.7.1 before the end of next week. Some critical issue have been identified since 1.7.0, including a statemigration issue when migrating from 1.5.3 (FLINK-11087) and a packagingissue in the presto-s3-filesystem (FLINK-11085, to be merged later today). Given the upcoming holidays surrounding Christmas these fixes wouldotherwise be delayed for quite a while. It would also contain a very neat improvement with the TableAPI nowbeing usable in the Scala shell (FLINK-9555). Regards, Chesnay
Re: [ANNOUNCE] New committer Gary Yao
Congratulations Gary! Jiayi Original Message Sender:vino yangyanghua1...@gmail.com Recipient:dev...@flink.apache.org Date:Saturday, Sep 8, 2018 10:11 Subject:Re: [ANNOUNCE] New committer Gary Yao Congratulations Gary! Chen Qin qinnc...@gmail.com 于2018年9月8日周六 上午2:07写道: Congrats! ChenOn Sep 7, 2018, at 10:51, Xingcan Cui xingc...@gmail.com wrote: Congratulations, Gary! Xingcan On Sep 7, 2018, at 11:20 PM, Hequn Cheng chenghe...@gmail.com wrote: Congratulations Gary! Hequn On Fri, Sep 7, 2018 at 11:16 PM Matthias J. Sax mj...@apache.org wrote: Congrats! On 09/07/2018 08:15 AM, Timo Walther wrote: Congratulations, Gary! Timo Am 07.09.18 um 16:46 schrieb Ufuk Celebi: Great addition to the committers. Congrats, Gary! – Ufuk On Fri, Sep 7, 2018 at 4:45 PM, Kostas Kloudas k.klou...@data-artisans.com wrote: Congratulations Gary! Well deserved! Cheers, Kostas On Sep 7, 2018, at 4:43 PM, Fabian Hueske fhue...@gmail.com wrote: Congratulations Gary! 2018-09-07 16:29 GMT+02:00 Thomas Weise t...@apache.org: Congrats, Gary! On Fri, Sep 7, 2018 at 4:17 PM Dawid Wysakowicz dwysakow...@apache.org wrote: Congratulations Gary! Well deserved! On 07/09/18 16:00, zhangmingleihe wrote: Congrats Gary! Cheers Minglei 在 2018年9月7日,下午9:59,Andrey Zagrebin and...@data-artisans.com 写道: Congratulations Gary! On 7 Sep 2018, at 15:45, Stefan Richter s.rich...@data-artisans.com wrote: Congrats Gary! Am 07.09.2018 um 15:14 schrieb Till Rohrmann trohrm...@apache.org : Hi everybody, On behalf of the PMC I am delighted to announce Gary Yao as a new Flink committer! Gary started contributing to the project in June 2017. He helped with the Flip-6 implementation, implemented many of the new REST handlers, fixed Mesos issues and initiated the Jepsen-based distributed test suite which uncovered several serious issues. Moreover, he actively helps community members on the mailing list and with PR reviews. Please join me in congratulating Gary for becoming a Flink committer! Cheers, Till
Re: [Proposal] Utilities for reading, transforming and creatingStreaming savepoints
Hi, +1, I think it will be a very great tool for Flink, especially the creating new state part. On production, we’re really worried about the availability of the savepoints, because the generating logic is inside Flink and we don’t have a good way to validate it. But with this tool, we can construct a new state for our programs very soon even if the savepoints data is broken. It’s great, thanks! Original Message Sender:Jamie grierjgr...@lyft.com Recipient:dev...@flink.apache.org Date:Saturday, Aug 18, 2018 02:32 Subject:Re: [Proposal] Utilities for reading, transforming and creatingStreaming savepoints This is great, Gyula! A colleague here at Lyft has also done some work around bootstrapping DataStream programs and we've also talked a bit about doing this by running DataSet programs. On Fri, Aug 17, 2018 at 3:28 AM, Gyula Fóra gyula.f...@gmail.com wrote: Hi All! I want to share with you a little project we have been working on at King (with some help from some dataArtisans folks). I think this would be a valuable addition to Flink and solve a bunch of outstanding production use-cases and headaches around state bootstrapping and state analytics. We have built a quick and dirty POC implementation on top of Flink 1.6, please check the README for some nice examples to get a quick idea: https://github.com/king/bravo *Short story* Bravo is a convenient state reader and writer library leveraging the Flink’s batch processing capabilities. It supports processing and writing Flink streaming savepoints. At the moment it only supports processing RocksDB savepoints but this can be extended in the future for other state backends and checkpoint types. Our goal is to cover a few basic features: - Converting keyed states to Flink DataSets for processing and analytics - Reading/Writing non-keyed operators states - Bootstrap keyed states from Flink DataSets and create new valid savepoints - Transform existing savepoints by replacing/changing some statesSome example use-cases: - Point-in-time state analytics across all operators and keys - Bootstrap state of a streaming job from external resources such as reading from database/filesystem - Validate and potentially repair corrupted state of a streaming job - Change max parallelism of a job Our main goal is to start working together with other Flink production users and make this something useful that can be part of Flink. So if you have use-cases please talk to us :) I have also started a google doc which contains a little bit more info than the readme and could be a starting place for discussions: https://docs.google.com/document/d/103k6wPX20kMu5H3SOOXSg5PZIaYpw dhqBMr-ppkFL5E/edit?usp=sharing I know there are a bunch of rough edges and bugs (and no tests) but our motto is: If you are not embarrassed, you released too late :) Please let me know what you think! Cheers, Gyula