Re: REST API / JarRunHandler: More flexibility for launching jobs
Hi Thomas, IIUC this "launcher" should run on client endpoint instead of dispatcher endpoint. "jar run" will extract the job graph and submit it to the dispatcher, which has mismatched semantic from your willing. Could you run it with CliFrontend? Or propose that "jar run" supports running directly the main method instead of extraction? Best, tison. Thomas Weise 于2019年7月26日周五 下午11:38写道: > Hi Till, > > Thanks for taking a look! > > The Beam job server does not currently have the ability to just output the > job graph (and related artifacts) that could then be used with the > JobSubmitHandler. It is itself using StreamExecutionEnvironment, which in > turn will lead to a REST API submission. > > Here I'm looking at what happens before the Beam job server gets involved: > the interaction of the k8s operator with the Flink deployment. The jar run > endpoint (ignoring the current handler implementation) is generic and > pretty much exactly matches what we would need for a uniform entry point. > It's just that in the Beam case the jar file would itself be a "launcher" > that doesn't provide the job graph itself, but the dependencies and > mechanism to invoke the actual client. > > I could accomplish what I'm looking for by creating a separate REST > endpoint that looks almost the same. But I would prefer to reuse the Flink > REST API interaction that is already implemented for the Flink Java jobs to > reduce the complexity of the deployment. > > Thomas > > > > > On Fri, Jul 26, 2019 at 2:29 AM Till Rohrmann > wrote: > > > Hi Thomas, > > > > quick question: Why do you wanna use the JarRunHandler? If another > process > > is building the JobGraph, then one could use the JobSubmitHandler which > > expects a JobGraph and then starts executing it. > > > > Cheers, > > Till > > > > On Thu, Jul 25, 2019 at 7:45 PM Thomas Weise wrote: > > > > > Hi, > > > > > > While considering different options to launch Beam jobs through the > Flink > > > REST API, I noticed that the implementation of JarRunHandler places > > quite a > > > few restrictions on how the entry point shall construct a Flink job, by > > > extracting and manipulating the job graph. > > > > > > That's normally not a problem for Flink Java programs, but in the > > scenario > > > I'm looking at, the job graph would be constructed by a different > process > > > and isn't available to the REST handler. Instead, I would like to be > able > > > to just respond with the job ID of the already launched job. > > > > > > For context, please see: > > > > > > > > > > > > https://docs.google.com/document/d/1z3LNrRtr8kkiFHonZ5JJM_L4NWNBBNcqRc_yAf6G0VI/edit#heading=h.fh2f571kms4d > > > > > > The current JarRunHandler code is here: > > > > > > > > > > > > https://github.com/apache/flink/blob/f3c5dd960ff81a022ece2391ed3aee86080a352a/flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarRunHandler.java#L82 > > > > > > It would be nice if there was an option to delegate the responsibility > > for > > > job submission to the user code / entry point. That would be useful for > > > Beam and other frameworks built on top of Flink that dynamically > create a > > > job graph from a different representation. > > > > > > Possible ways to get there: > > > > > > * an interface that the main class can be implement end when present, > the > > > jar run handler calls instead of main. > > > > > > * an annotated method > > > > > > Either way query parameters like savepoint path and parallelism would > be > > > forwarded to the user code and the result would be the ID of the > launched > > > job. > > > > > > Thougths? > > > > > > Thanks, > > > Thomas > > > > > >
[jira] [Created] (FLINK-13502) CatalogTableStatisticsConverter should be in planner.utils package
godfrey he created FLINK-13502: -- Summary: CatalogTableStatisticsConverter should be in planner.utils package Key: FLINK-13502 URL: https://issues.apache.org/jira/browse/FLINK-13502 Project: Flink Issue Type: Bug Components: Table SQL / Planner Reporter: godfrey he Fix For: 1.9.0, 1.10.0 currently, {{CatalogTableStatisticsConverter}} is in {{org.apache.flink.table.util}}, its correct position is {{org.apache.flink.table.planner.utils}} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13501) Fixes a few issues in documentation for Hive integration
Xuefu Zhang created FLINK-13501: --- Summary: Fixes a few issues in documentation for Hive integration Key: FLINK-13501 URL: https://issues.apache.org/jira/browse/FLINK-13501 Project: Flink Issue Type: Bug Components: Connectors / Hive, Table SQL / API Affects Versions: 1.9.0 Reporter: Xuefu Zhang Fix For: 1.9.0, 1.10.0 Going thru existing Hive doc I found the following issues that should be addressed: 1. Section "Hive Integration" should come after "SQL client" (at the same level). 2. In Catalog section, there are headers named "Hive Catalog". Also, some information is duplicated with that in "Hive Integration" 3. "Data Type Mapping" is Hive specific and should probably move to "Hive integration" -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13499) Remove dependency on MapR artifact repository
Stephan Ewen created FLINK-13499: Summary: Remove dependency on MapR artifact repository Key: FLINK-13499 URL: https://issues.apache.org/jira/browse/FLINK-13499 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.9.0 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 1.9.0 The MapR artifact repository causes some problems. It does not reliably offer a secure (https://) access. We should change the MapR FS connector to work based on reflection and avoid a hard dependency on any of the MapR vendor-specific artifacts. That should allow us to get rid of the dependency without regressing on the support for the file system. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13498) Reduce Kafka producer startup time by aborting transactions in parallel
Nico Kruber created FLINK-13498: --- Summary: Reduce Kafka producer startup time by aborting transactions in parallel Key: FLINK-13498 URL: https://issues.apache.org/jira/browse/FLINK-13498 Project: Flink Issue Type: Bug Components: Connectors / Kafka Affects Versions: 1.8.1, 1.9.0 Reporter: Nico Kruber Assignee: Nico Kruber When a Flink job with a Kafka producer starts up without previous state, it currently starts 5 * kafkaPoolSize number of Kafka producers (per sink instance) to abort potentially existing transactions from a first run without a completed snapshot. Apparently, this is quite slow and it is also done sequentially. Until there is a better way of aborting these transactions with Kafka, we could do this in parallel quite easily and at least make use of lingering CPU resources. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job
Till Rohrmann created FLINK-13497: - Summary: Checkpoints can complete after CheckpointFailureManager fails job Key: FLINK-13497 URL: https://issues.apache.org/jira/browse/FLINK-13497 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.9.0, 1.10.0 Reporter: Till Rohrmann I think that we introduced with FLINK-12364 an inconsistency wrt to job termination a checkpointing. In FLINK-9900 it was discovered that checkpoints can complete even after the {{CheckpointFailureManager}} decided to fail a job. I think the expected behaviour should be that we fail all pending checkpoints once the {{CheckpointFailureManager}} decides to fail the job. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
Re: Reading RocksDB contents from outside of Flink
Hi! Are you looking for online access or offline access? For online access, you can to key lookups via queryable state. For offline access, you can read and write rocksDB state using the new state processor API in Flink 1.9 https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/state_processor_api.html Best, Stephan On Tue, Jul 30, 2019 at 2:39 PM taher koitawala wrote: > Hi Shilpa, >The easiest way to do this is the make the Rocks DB state queryable. > Then use the Flink queryable state client to access the state you have > created. > > > Regards > Taher Koitawala > > On Tue, Jul 30, 2019, 4:58 PM Shilpa Deshpande > wrote: > > > Hello All, > > > > I am new to Apache Flink. In my company we are thinking of using Flink to > > perform transformation of the data. The source of the data is Apache > Kafka > > topics. Each message that we receive on Kafka topic, we want to transform > > it and store it on RocksDB. The messages can come out of order. We would > > like to store the transformed output in RocksDB using Flink and retrieve > it > > from a Spring Boot application. The reason for Spring Boot Application is > > outbound messaging involves some stitching as well as ordering of the > data. > > Is it possible and advisable to read data from RocksDB from Spring Boot > > Application? Spring Boot Application will not change the data. The reason > > we think Flink can help us is because we might be receiving millions of > > messages during the day so want to make sure use a technology that scales > > well. If you have a snippet of code to achieve this, please share it with > > us. > > > > Thank you for your inputs! > > > > Regards, > > Shilpa > > >
Re: [DISCUSS] Removing the flink-mapr-fs module
I will open a PR later today, changing the module to use reflection rather than a hard MapR dependency. On Tue, Jul 30, 2019 at 6:40 AM Rong Rong wrote: > We've also experienced some issues with our internal JFrog artifactory. I > am suspecting some sort of mirroring problem but somehow it only occur to > the mapr-fs module. > So +1 to remove. > > On Mon, Jul 29, 2019 at 12:47 PM Stephan Ewen wrote: > > > It should be fairly straightforward to rewrite the code to not have a > MapR > > dependency. > > Only one class from the MapR dependency is ever referenced, and we could > > dynamically load that class. > > > > That way we can drop the dependency without dropping the support. > > > > On Mon, Jul 29, 2019 at 5:33 PM Aljoscha Krettek > > wrote: > > > > > Just FYI, the MapRFileSystem does have some additional code on our > > > (Hadoop)FileSystem class, so it might not be straightforward to use > MapR > > > with our vanilla HadoopFileSystem. > > > > > > (Still staying that we should remove it, though). > > > > > > Aljoscha > > > > > > > On 29. Jul 2019, at 15:16, Simon Su wrote: > > > > > > > > +1 to remove it. > > > > > > > > > > > > Thanks, > > > > SImon > > > > > > > > > > > > On 07/29/2019 21:00,Till Rohrmann wrote: > > > > +1 to remove it. > > > > > > > > On Mon, Jul 29, 2019 at 1:27 PM Stephan Ewen > wrote: > > > > > > > > +1 to remove it > > > > > > > > One should still be able to use MapR in the same way as any other > > vendor > > > > Hadoop distribution. > > > > > > > > On Mon, Jul 29, 2019 at 12:22 PM JingsongLee > > > > wrote: > > > > > > > > +1 for removing it. We never run mvn clean test success in China with > > > > mapr-fs... > > > > Best, Jingsong Lee > > > > > > > > > > > > -- > > > > From:Biao Liu > > > > Send Time:2019年7月29日(星期一) 12:05 > > > > To:dev > > > > Subject:Re: [DISCUSS] Removing the flink-mapr-fs module > > > > > > > > +1 for removing it. > > > > > > > > Actually I encountered this issue several times. I thought it might > be > > > > blocked by firewall of China :( > > > > > > > > BTW, I think it should be included in release notes. > > > > > > > > > > > > On Mon, Jul 29, 2019 at 5:37 PM Aljoscha Krettek < > aljos...@apache.org> > > > > wrote: > > > > > > > > If we remove it, that would mean it’s not supported in Flink 1.9.0, > > > > yes. > > > > Or we only remove it in Flink 1.10.0. > > > > > > > > Aljoscha > > > > > > > > On 29. Jul 2019, at 11:35, Biao Liu wrote: > > > > > > > > Hi Aljoscha, > > > > > > > > Does it mean the MapRFileSystem is no longer supported since 1.9.0? > > > > > > > > On Mon, Jul 29, 2019 at 5:19 PM Ufuk Celebi wrote: > > > > > > > > +1 > > > > > > > > > > > > On Mon, Jul 29, 2019 at 11:06 AM Jeff Zhang > > > > wrote: > > > > > > > > +1 to remove it. > > > > > > > > Aljoscha Krettek 于2019年7月29日周一 下午5:01写道: > > > > > > > > Hi, > > > > > > > > Because of recent problems in the dependencies of that module [1] > > > > I > > > > would > > > > suggest that we remove it. If people are using it, they can use > > > > the > > > > one > > > > from Flink 1.8. > > > > > > > > What do you think about it? It would a) solve the dependency > > > > problem > > > > and > > > > b) make our build a tiny smidgen more lightweight. > > > > > > > > Aljoscha > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lists.apache.org/thread.html/16c16db8f4b94dc47e638f059fd53be936d5da423376a8c1092eaad1@%3Cdev.flink.apache.org%3E > > > > > > > > > > > > > > > > -- > > > > Best Regards > > > > > > > > Jeff Zhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: Reading RocksDB contents from outside of Flink
Hi Shilpa, The easiest way to do this is the make the Rocks DB state queryable. Then use the Flink queryable state client to access the state you have created. Regards Taher Koitawala On Tue, Jul 30, 2019, 4:58 PM Shilpa Deshpande wrote: > Hello All, > > I am new to Apache Flink. In my company we are thinking of using Flink to > perform transformation of the data. The source of the data is Apache Kafka > topics. Each message that we receive on Kafka topic, we want to transform > it and store it on RocksDB. The messages can come out of order. We would > like to store the transformed output in RocksDB using Flink and retrieve it > from a Spring Boot application. The reason for Spring Boot Application is > outbound messaging involves some stitching as well as ordering of the data. > Is it possible and advisable to read data from RocksDB from Spring Boot > Application? Spring Boot Application will not change the data. The reason > we think Flink can help us is because we might be receiving millions of > messages during the day so want to make sure use a technology that scales > well. If you have a snippet of code to achieve this, please share it with > us. > > Thank you for your inputs! > > Regards, > Shilpa >
Re: [DISCUSS] Drop stale class Program
Hi, With a one-week survey in user list[1], nobody expect Flavio and Jeff participant the thread. Flavio shared his experience with a revised Program like interface. This could be regraded as downstream integration and in client api enhancements document we propose rich interface for this integration. Anyway, the Flink scope Program is less functional than it should be. With no objection I'd like to push on this thread. We need a committer participant this thread to shepherd the removal/deprecation of Program, a @PublicEvolving interface. Anybody has spare time? Or anything I can do to make progress? Best, tison. [1] https://lists.apache.org/thread.html/37445e43729cf7eaeb0aa09133d3980b62f891c5ee69d2c3c3e76ab5@%3Cuser.flink.apache.org%3E Zili Chen 于2019年7月22日周一 下午8:38写道: > Hi, > > I created a thread for survey in user list[1]. Please take participate in > if interested. > > Best, > tison. > > [1] > https://lists.apache.org/thread.html/37445e43729cf7eaeb0aa09133d3980b62f891c5ee69d2c3c3e76ab5@%3Cuser.flink.apache.org%3E > > > Flavio Pompermaier 于2019年7月19日周五 下午5:18写道: > >> +1 to remove directly the Program class (I think nobody use it and it's >> not >> supported at all by REST services and UI). >> Moreover it requires a lot of transitive dependencies while it should be a >> very simple thing.. >> +1 to add this discussion to "Flink client api enhancement" >> >> On Fri, Jul 19, 2019 at 11:14 AM Biao Liu wrote: >> >> > To Flavio, good point for the integration suggestion. >> > >> > I think it should be considered in the "Flink client api enhancement" >> > discussion. But the outdated API should be deprecated somehow. >> > >> > Flavio Pompermaier 于2019年7月19日周五 下午4:21写道: >> > >> > > In my experience a basic "official" (but optional) program description >> > > would be very useful indeed (in order to ease the integration with >> other >> > > frameworks). >> > > >> > > Of course it should be extended and integrated with the REST services >> and >> > > the Web UI (when defined) in order to be useful.. >> > > It ease to show to the user what a job does and which parameters it >> > > requires (optional or mandatory) and with a proper help description. >> > > Indeed, when we write a Flink job we implement the following >> interface: >> > > >> > > public interface FlinkJob { >> > > String getDescription(); >> > > List getParameters(); >> > > boolean isStreamingOrBatch(); >> > > } >> > > >> > > public class ClusterJobParameter { >> > > >> > > private String paramName; >> > > private String paramType = "string"; >> > > private String paramDesc; >> > > private String paramDefaultValue; >> > > private Set choices; >> > > private boolean mandatory; >> > > } >> > > >> > > This really helps to launch a Flink job by a frontend (if the rest >> > services >> > > returns back those infos). >> > > >> > > Best, >> > > Flavio >> > > >> > > On Fri, Jul 19, 2019 at 9:57 AM Biao Liu wrote: >> > > >> > > > Hi Zili, >> > > > >> > > > Thank you for bring us this discussion. >> > > > >> > > > My gut feeling is +1 for dropping it. >> > > > Usually it costs some time to deprecate a public (actually it's >> > > > `PublicEvolving`) API. Ideally it should be marked as `Deprecated` >> > first. >> > > > Then it might be abandoned it in some later version. >> > > > >> > > > I'm not sure how big the burden is to make it compatible with the >> > > enhanced >> > > > client API. If it's a critical blocker, I support dropping it >> radically >> > > in >> > > > 1.10. Of course a survey is necessary. And the result of survey is >> > > > acceptable. >> > > > >> > > > >> > > > >> > > > Zili Chen 于2019年7月19日周五 下午1:44写道: >> > > > >> > > > > Hi devs, >> > > > > >> > > > > Participating the thread "Flink client api enhancement"[1], I just >> > > notice >> > > > > that inside submission codepath of Flink we always has a branch >> > dealing >> > > > > with the case that main class of user program is a subclass of >> > > > > o.a.f.api.common.Program, which is defined as >> > > > > >> > > > > @PublicEvolving >> > > > > public interface Program { >> > > > > Plan getPhan(String... args); >> > > > > } >> > > > > >> > > > > This class, as user-facing interface, asks user to implement >> #getPlan >> > > > > which return a almost Flink internal class. FLINK-10862[2] pointed >> > out >> > > > > this confusion. >> > > > > >> > > > > Since our codebase contains quite a bit code handling this stale >> > class, >> > > > > and also it obstructs the effort enhancing Flink cilent api, >> > > > > I'd like to propose dropping it. Or we can start a survey on user >> > list >> > > > > to see if there is any user depending on this class. >> > > > > >> > > > > best, >> > > > > tison. >> > > > > >> > > > > [1] >> > > > > >> > > > > >> > > > >> > > >> > >> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E >> > > > > [2] https://issues.apache.org/jira/browse/FLINK-10862 >> > > > > >>
Reading RocksDB contents from outside of Flink
Hello All, I am new to Apache Flink. In my company we are thinking of using Flink to perform transformation of the data. The source of the data is Apache Kafka topics. Each message that we receive on Kafka topic, we want to transform it and store it on RocksDB. The messages can come out of order. We would like to store the transformed output in RocksDB using Flink and retrieve it from a Spring Boot application. The reason for Spring Boot Application is outbound messaging involves some stitching as well as ordering of the data. Is it possible and advisable to read data from RocksDB from Spring Boot Application? Spring Boot Application will not change the data. The reason we think Flink can help us is because we might be receiving millions of messages during the day so want to make sure use a technology that scales well. If you have a snippet of code to achieve this, please share it with us. Thank you for your inputs! Regards, Shilpa
Re: NoSuchMethodError: org.apache.calcite.tools.FrameworkConfig.getTraitDefs()
Hi Lakeshen, Thanks for trying out blink planner. First question, are you using blink-1.5.1 or flink-1.9-table-planner-blink ? We suggest to use the latter one because we don't maintain blink-1.5.1, you can try flink 1.9 instead. Best, Jark On Tue, 30 Jul 2019 at 17:02, LakeShen wrote: > Hi all,when I use blink flink-sql-parser module,the maven dependency like > this: > > > com.alibaba.blink > flink-sql-parser > 1.5.1 > > > I also import the flink 1.9 blink-table-planner module , I > use FlinkPlannerImpl to parse the sql to get the List. But > when I run the program , it throws the exception like this: > > > > *Exception in thread "main" java.lang.NoSuchMethodError: > > org.apache.calcite.tools.FrameworkConfig.getTraitDefs()Lorg/apache/flink/shaded/calcite/com/google/common/collect/ImmutableList; > at > > org.apache.flink.sql.parser.plan.FlinkPlannerImpl.(FlinkPlannerImpl.java:93) > at > > com.youzan.bigdata.allsqldemo.utils.FlinkSqlUtil.getSqlNodeInfos(FlinkSqlUtil.java:33) > at > > com.youzan.bigdata.allsqldemo.KafkaSrcKafkaSinkSqlDemo.main(KafkaSrcKafkaSinkSqlDemo.java:56)* > > * How can I solve this problem? Thanks to your reply.* >
Re: [DISCUSS] Setup a bui...@flink.apache.org mailing list for travis builds
Hi all, Progress updates: 1. the bui...@flink.apache.org can be subscribed now (thanks @Robert), you can send an email to builds-subscr...@flink.apache.org to subscribe. 2. We have a pull request [1] to send only apache/flink builds notifications and it works well. 3. However, all the notifications are rejected by the builds mailing list (the MODERATE mails). I added & checked bui...@travis-ci.org to the subscriber/allow list, but still doesn't work. It might be recognized as spam by the mailing list. We are still trying to figure it out, and will update here if we have some progress. Thanks, Jark [1]: https://github.com/apache/flink/pull/9230 On Thu, 25 Jul 2019 at 22:59, Robert Metzger wrote: > The mailing list has been created, you can now subscribe to it. > > On Wed, Jul 24, 2019 at 1:43 PM Jark Wu wrote: > > > Thanks Robert for helping out that. > > > > Best, > > Jark > > > > On Wed, 24 Jul 2019 at 19:16, Robert Metzger > wrote: > > > > > I've requested the creation of the list, and made Jark, Chesnay and me > > > moderators of it. > > > > > > On Wed, Jul 24, 2019 at 1:12 PM Robert Metzger > > > wrote: > > > > > > > @Jark: Yes, I will request the creation of a mailing list! > > > > > > > > On Tue, Jul 23, 2019 at 4:48 PM Hugo Louro > wrote: > > > > > > > >> +1 > > > >> > > > >> > On Jul 23, 2019, at 6:15 AM, Till Rohrmann > > > >> wrote: > > > >> > > > > >> > Good idea Jark. +1 for the proposal. > > > >> > > > > >> > Cheers, > > > >> > Till > > > >> > > > > >> >> On Tue, Jul 23, 2019 at 1:59 PM Hequn Cheng < > chenghe...@gmail.com> > > > >> wrote: > > > >> >> > > > >> >> Hi Jark, > > > >> >> > > > >> >> Good idea. +1! > > > >> >> > > > >> >>> On Tue, Jul 23, 2019 at 6:23 PM Jark Wu > wrote: > > > >> >>> > > > >> >>> Thank you all for your positive feedback. > > > >> >>> > > > >> >>> We have three binding +1s, so I think, we can proceed with this. > > > >> >>> > > > >> >>> Hi @Robert Metzger , could you create a > > > >> request to > > > >> >>> INFRA for the mailing list? > > > >> >>> I'm not sure if this needs a PMC permission. > > > >> >>> > > > >> >>> Thanks, > > > >> >>> Jark > > > >> >>> > > > >> >>> On Tue, 23 Jul 2019 at 16:42, jincheng sun < > > > sunjincheng...@gmail.com> > > > >> >>> wrote: > > > >> >>> > > > >> +1 > > > >> > > > >> Robert Metzger 于2019年7月23日周二 下午4:01写道: > > > >> > > > >> > +1 > > > >> > > > > >> > On Mon, Jul 22, 2019 at 10:27 AM Biao Liu > > > > >> >> wrote: > > > >> > > > > >> >> +1, make sense to me. > > > >> >> Mailing list seems to be a more "community" way. > > > >> >> > > > >> >> Timo Walther 于2019年7月22日周一 下午4:06写道: > > > >> >> > > > >> >>> +1 sounds good to inform people about instabilities or other > > > >> >> issues > > > >> >>> > > > >> >>> Regards, > > > >> >>> Timo > > > >> >>> > > > >> >>> > > > >> Am 22.07.19 um 09:58 schrieb Haibo Sun: > > > >> +1. Sounds good.Letting the PR creators know the build > > results > > > >> >> of > > > >> the > > > >> >>> master branch can help to determine more quickly whether > > Travis > > > >> > failures > > > >> >> of > > > >> >>> their PR are an exact failure or because of the instability > of > > > >> >> test > > > >> > case. > > > >> >>> By the way, if the PR creator can abort their own Travis > > build, > > > I > > > >> think > > > >> >> it > > > >> >>> can improve the efficient use of Travis resources (of > course, > > > >> >> this > > > >> >>> discussion does not deal with this issue). > > > >> > > > >> > > > >> Best, > > > >> Haibo > > > >> At 2019-07-22 12:36:31, "Yun Tang" > wrote: > > > >> > +1 sounds good, more people could be encouraged to involve > > in > > > >> paying > > > >> >>> attention to failure commit. > > > >> > > > > >> > Best > > > >> > Yun Tang > > > >> > > > > >> > From: Becket Qin > > > >> > Sent: Monday, July 22, 2019 9:44 > > > >> > To: dev > > > >> > Subject: Re: [DISCUSS] Setup a bui...@flink.apache.org > > > >> >> mailing > > > >> list > > > >> >>> for travis builds > > > >> > > > > >> > +1. Sounds a good idea to me. > > > >> > > > > >> > On Sat, Jul 20, 2019 at 7:07 PM Dian Fu < > > > >> >> dian0511...@gmail.com> > > > >> >> wrote: > > > >> > > > > >> >> Thanks Jark for the proposal, sounds reasonable for me. > +1. > > > >> >>> This > > > >> ML > > > >> >>> could > > > >> >> be used for all the build notifications including master > > and > > > >> >>> CRON > > > >> >> jobs. > > > >> >> > > > >> >>> 在 2019年7月20日,下午2:55,Xu Forward > > 写道: > > > >> >>> > > > >> >>> +1 ,Thanks jark for the proposal. > > > >> >>> Best > > > >> >>> Forward >
[jira] [Created] (FLINK-13496) Correct the documentation of Gauge metric initialization
Yun Tang created FLINK-13496: Summary: Correct the documentation of Gauge metric initialization Key: FLINK-13496 URL: https://issues.apache.org/jira/browse/FLINK-13496 Project: Flink Issue Type: Bug Components: Documentation Reporter: Yun Tang Fix For: 1.9.0, 1.10 Current documentation of [Gauge|https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/metrics.html#gauge] set the value to expose as {{transient}}. However, this would let the initialization value {{valueToExpose}} within {{MyGauge}} as null on task manager side. This would actually mislead users who want to add customized gauge. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13495) blink-planner should support decimal precision to table source
Jingsong Lee created FLINK-13495: Summary: blink-planner should support decimal precision to table source Key: FLINK-13495 URL: https://issues.apache.org/jira/browse/FLINK-13495 Project: Flink Issue Type: Sub-task Components: Table SQL / Planner Reporter: Jingsong Lee Fix For: 1.9.0, 1.10.0 Now there is an exception when use DataTypes.DECIMAL(5, 2) to table source when use blink-planner. Some conversions between DataType and TypeInfo loose precision information. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13494) Blink planner changes source parallelism which causes stream SQL e2e test fails
Zhenghua Gao created FLINK-13494: Summary: Blink planner changes source parallelism which causes stream SQL e2e test fails Key: FLINK-13494 URL: https://issues.apache.org/jira/browse/FLINK-13494 Project: Flink Issue Type: Bug Reporter: Zhenghua Gao -- This message was sent by Atlassian JIRA (v7.6.14#76016)
Re: Something wrong with travis?
There is nothing to report; we already know what the problem is but it cannot be fixed. On 30/07/2019 08:46, Yun Tang wrote: I met this problem again at https://api.travis-ci.com/v3/job/220732163/log.txt . Is there any place we could ask for help to contact tarvis or any clues we could use to figure out this? Best Yun Tang From: Yun Tang Sent: Monday, June 24, 2019 14:22 To: dev@flink.apache.org ; Kurt Young Subject: Re: Something wrong with travis? Unfortunately, I met this problem again just now https://api.travis-ci.org/v3/job/549534496/log.txt (the build overview https://travis-ci.org/apache/flink/builds/549534489). For those non-committers, including me, we have to close-reopen the PR or push another commit to re-trigger the PR check Best Yun Tang From: Chesnay Schepler Sent: Wednesday, June 19, 2019 16:59 To: dev@flink.apache.org; Kurt Young Subject: Re: Something wrong with travis? Recent builds are passing again. On 18/06/2019 08:34, Kurt Young wrote: Hi dev, I noticed that all the travis tests triggered by pull request are failed with the same error: "Cached flink dir /home/travis/flink_cache/x/flink does not exist. Exiting build." Anyone have a clue on what happened and how to fix this? Best, Kurt
NoSuchMethodError: org.apache.calcite.tools.FrameworkConfig.getTraitDefs()
Hi all,when I use blink flink-sql-parser module,the maven dependency like this: com.alibaba.blink flink-sql-parser 1.5.1 I also import the flink 1.9 blink-table-planner module , I use FlinkPlannerImpl to parse the sql to get the List. But when I run the program , it throws the exception like this: *Exception in thread "main" java.lang.NoSuchMethodError: org.apache.calcite.tools.FrameworkConfig.getTraitDefs()Lorg/apache/flink/shaded/calcite/com/google/common/collect/ImmutableList; at org.apache.flink.sql.parser.plan.FlinkPlannerImpl.(FlinkPlannerImpl.java:93) at com.youzan.bigdata.allsqldemo.utils.FlinkSqlUtil.getSqlNodeInfos(FlinkSqlUtil.java:33) at com.youzan.bigdata.allsqldemo.KafkaSrcKafkaSinkSqlDemo.main(KafkaSrcKafkaSinkSqlDemo.java:56)* * How can I solve this problem? Thanks to your reply.*
[jira] [Created] (FLINK-13493) BoundedBlockingSubpartition only notifies onConsumedSubpartition when all the readers are empty
zhijiang created FLINK-13493: Summary: BoundedBlockingSubpartition only notifies onConsumedSubpartition when all the readers are empty Key: FLINK-13493 URL: https://issues.apache.org/jira/browse/FLINK-13493 Project: Flink Issue Type: Improvement Components: Runtime / Network Reporter: zhijiang Assignee: zhijiang In previous implementation, it would always notify the {{ResultPartition}} of consumed subpartition if any reader view is released. Based on the reference-counter release strategy it might cause problems if one blocking subpartition has multiple readers. That means the whole result partition might be released but there are still alive readers in some subpartitions. Although the default release strategy for blocking partition is based on JM/scheduler notification atm. But if we switch the option to based on consumption notification it would cause problems. And from the subpartition side it should has the right behavior no matter what is the specific release strategy in upper layer. In order to fix this bug, the {{BoundedBlockingSubpartition}} would only notify {{onConsumedSubpartition}} when all the readers are empty. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13492) BoundedOutOfOrderTimestamps cause Watermark's timestamp leak
Simon Su created FLINK-13492: Summary: BoundedOutOfOrderTimestamps cause Watermark's timestamp leak Key: FLINK-13492 URL: https://issues.apache.org/jira/browse/FLINK-13492 Project: Flink Issue Type: Bug Components: Table SQL / Runtime Affects Versions: 1.9.0 Reporter: Simon Su Attachments: Watermark_timestamp_leak.diff {code:java} StreamExecutionEnvironment env = StreamExecutionEnvironment.createLocalEnvironment(1, conf); // Use eventtime, default autoWatermarkInterval is 200ms env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env); Kafka kafka = new Kafka() .version("0.11") .topic(topic) .startFromLatest() .properties(properties); Schema schema = new Schema(); for (int i = 0; i < names.length; i++) { if ("timestamp".equalsIgnoreCase(names[i])) { // set latency to 1000ms schema.field("rowtime", types[i]).rowtime(new Rowtime().timestampsFromField("timestamp").watermarksPeriodicBounded(1000)); } else { schema.field(names[i], types[i]); } /** . */ tableEnv .connect(kafka) .withFormat(new Protobuf().protobufName("order_sink")) .withSchema(schema) .inAppendMode() .registerTableSource("orderStream");{code} Register up stream table, then use a 10s Tumble window on this table, we input a sequence of normal data, but there is not result output. Then we start to debug to see if the watermark is normally emitted, finally we found the issue. # maxTimestamp will be initialized in BoundedOutOfOrderTimestamps to Long.MIN_VALUE. # nextTimestamp method will extract timestamp from source and set to maxTimestamp. # getWatermark() method will calculate the watermark's timestamp based on maxTimestamp and delay. When +{color:#205081}TimestampsAndPeriodicWatermarksOperator{color}+ {color:#33}initialize and call open method, it will start to register a SystemTimeService to generate watermark based on watermarkInterval, so that's the problem, the thread initialize and call BoundedOutOfOrderTimestamps${color}getCurrentWatermark, it will cause a Long Value leak. {color:#d04437}(Long.MIN_VALUE - delay). which cause all of the watermark will be dropped because apparently there are less then ( Long.MIN_VALUE - delay ). {color} {color:#d04437}A workaround is to set a large autoWatermarkInterval to make SystemTimeService Thread a long start delay.{color} {code:java} public void onProcessingTime(long timestamp) throws Exception { ... getProcessingTimeService().registerTimer(now + watermarkInterval, this); ... } {code} {code:java} public ScheduledFuture registerTimer(long timestamp, ProcessingTimeCallback target) { ... long delay = Math.max(timestamp - getCurrentProcessingTime(), 0) + 1; ... } {code} {color:#d04437} {color} {color:#d04437}Actually, I think we can fix it by add the delay in BoundedOutOfOrderTimestamps's constructor which can avoid the calculation leak ...{color} {color:#d04437} {color} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (FLINK-13491) AsyncWaitOperator doesn't handle endOfInput call properly
Piotr Nowojski created FLINK-13491: -- Summary: AsyncWaitOperator doesn't handle endOfInput call properly Key: FLINK-13491 URL: https://issues.apache.org/jira/browse/FLINK-13491 Project: Flink Issue Type: Bug Components: API / DataStream Affects Versions: 1.9.0 Reporter: Piotr Nowojski Fix For: 1.9.0 This is the same issue as for {{ContinousFileReaderOperator}} in https://issues.apache.org/jira/browse/FLINK-13376. {{AsyncWaitOperator}} will propagate {{endInput}} notification immediately, even if it has some records buffered. Side note, this also shows that the current {{BoundedOneInput#endInput}} API is easy to mishandle if an operator buffers some records internally. Maybe we could redesign this API somehow [~aljoscha] [~sunhaibotb]? -- This message was sent by Atlassian JIRA (v7.6.14#76016)
Re: Something wrong with travis?
I met this problem again at https://api.travis-ci.com/v3/job/220732163/log.txt . Is there any place we could ask for help to contact tarvis or any clues we could use to figure out this? Best Yun Tang From: Yun Tang Sent: Monday, June 24, 2019 14:22 To: dev@flink.apache.org ; Kurt Young Subject: Re: Something wrong with travis? Unfortunately, I met this problem again just now https://api.travis-ci.org/v3/job/549534496/log.txt (the build overview https://travis-ci.org/apache/flink/builds/549534489). For those non-committers, including me, we have to close-reopen the PR or push another commit to re-trigger the PR check Best Yun Tang From: Chesnay Schepler Sent: Wednesday, June 19, 2019 16:59 To: dev@flink.apache.org; Kurt Young Subject: Re: Something wrong with travis? Recent builds are passing again. On 18/06/2019 08:34, Kurt Young wrote: > Hi dev, > > I noticed that all the travis tests triggered by pull request are failed > with the same error: > > "Cached flink dir /home/travis/flink_cache/x/flink does not exist. > Exiting build." > > Anyone have a clue on what happened and how to fix this? > > Best, > Kurt >
Re: [ANNOUNCE] Progress updates for Apache Flink 1.9.0 release
Hi Biao, Thanks for working on FLINK-9900. The ticket is already assigned to you now. Cheers, Gordon On Tue, Jul 30, 2019 at 2:31 PM Biao Liu wrote: > Hi Gordon, > > Thanks for updating progress. > > Currently I'm working on FLINK-9900. I need a committer to assign the > ticket to me. > > Tzu-Li (Gordon) Tai 于2019年7月30日 周二13:01写道: > > > Hi all, > > > > There are quite a few instabilities in our builds right now (master + > > release-1.9), some of which are directed or suspiciously related to the > 1.9 > > release. > > > > I'll categorize the instabilities into ones which we were already > tracking > > in the 1.9 Burndown Kanban board [1] prior to this email, and which ones > > seems to be new or were not monitored so that we draw additional > attention > > to them: > > > > *Instabilities that were already being tracked* > > > > - FLINK-13242: StandaloneResourceManagerTest.testStartupPeriod fails on > > Travis [2] > > A fix for this is coming with FLINK-13408 (Schedule > > StandaloneResourceManager.setFailUnfulfillableRequest whenever the > > leadership is acquired) [3] > > > > *New discovered instabilities that we should also start monitoring* > > > > - FLINK-13484: ConnectedComponents E2E fails with > > ResourceNotAvailableException [4] > > - FLINK-13487: > > TaskExecutorPartitionLifecycleTest.testPartitionReleaseAfterReleaseCall > > failed on Travis [5]. FLINK-13476 (Partitions not being properly released > > on cancel) could be the cause [6]. > > - FLINK-13488: flink-python fails to build on Travis due to Python 3.3 > > install failure [7] > > - FLINK-13489: Heavy deployment E2E fails quite consistently on Travis > with > > TM heartbeat timeout [8] > > - FLINK-9900: > > > ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles > > deadlocks [9] > > - FLINK-13377: Streaming SQ E2E fails on Travis with mismatching outputs > > (could just be that the SQL query tested on Travis is indeterministic) > [10] > > > > Cheers, > > Gordon > > > > [1] > > > > > https://issues.apache.org/jira/secure/RapidBoard.jspa?projectKey=FLINK=328 > > > > [2] https://issues.apache.org/jira/browse/FLINK-13242 > > [3] https://issues.apache.org/jira/browse/FLINK-13408 > > [4] https://issues.apache.org/jira/browse/FLINK-13484 > > [5] https://issues.apache.org/jira/browse/FLINK-13487 > > [6] https://issues.apache.org/jira/browse/FLINK-13476 > > [7] https://issues.apache.org/jira/browse/FLINK-13488 > > [8] https://issues.apache.org/jira/browse/FLINK-13489 > > [9] https://issues.apache.org/jira/browse/FLINK-9900 > > [10] https://issues.apache.org/jira/browse/FLINK-13377 > > > > On Sun, Jul 28, 2019 at 6:14 AM zhijiang > .invalid> > > wrote: > > > > > Hi Gordon, > > > > > > Thanks for the following updates of current progress. > > > In addition, it might be better to also cover the fix of network > resource > > > leak in jira ticket [1] which would be merged soon I think. > > > > > > [1] FLINK-13245: This fixes the leak of releasing reader/view with > > > partition in network stack. > > > > > > Best, > > > Zhijiang > > > -- > > > From:Tzu-Li (Gordon) Tai > > > Send Time:2019年7月27日(星期六) 10:41 > > > To:dev > > > Subject:Re: [ANNOUNCE] Progress updates for Apache Flink 1.9.0 release > > > > > > Hi all, > > > > > > It's been a while since our last update for the release testing of > 1.9.0, > > > so I want to bring attention to the current status of the release. > > > > > > We are approaching RC1 soon, waiting on the following specific last > > ongoing > > > threads to be closed: > > > - FLINK-13241: This fixes a problem where when using YARN, slot > > allocation > > > requests may be ignored [1] > > > - FLINK-13371: Potential partitions resource leak in case of producer > > > restarts [2] > > > - FLINK-13350: Distinguish between temporary tables and persisted > tables > > > [3]. Strictly speaking this would be a new feature, but there was a > > > discussion here [4] to include a workaround for now in 1.9.0, and a > > proper > > > solution later on in 1.10.x. > > > - FLINK-12858: Potential distributed deadlock in case of synchronous > > > savepoint failure [5] > > > > > > The above is the critical path for moving forward with an RC1 for > > official > > > voting. > > > All of them have PRs already, and are currently being reviewed or close > > to > > > being merged. > > > > > > Cheers, > > > Gordon > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-13241 > > > [2] https://issues.apache.org/jira/browse/FLINK-13371 > > > [3] https://issues.apache.org/jira/browse/FLINK-13350 > > > [4] > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-temporary-tables-in-SQL-API-td30831.html > > > [5] https://issues.apache.org/jira/browse/FLINK-12858 > > > > > > On Tue, Jul 16, 2019 at 5:26 AM Tzu-Li (Gordon) Tai < > tzuli...@apache.org > > > > > > wrote: > > > > > > > Update: RC0 for 1.9.0 has been created.
Re: [ANNOUNCE] Progress updates for Apache Flink 1.9.0 release
Hi Gordon, Thanks for updating progress. Currently I'm working on FLINK-9900. I need a committer to assign the ticket to me. Tzu-Li (Gordon) Tai 于2019年7月30日 周二13:01写道: > Hi all, > > There are quite a few instabilities in our builds right now (master + > release-1.9), some of which are directed or suspiciously related to the 1.9 > release. > > I'll categorize the instabilities into ones which we were already tracking > in the 1.9 Burndown Kanban board [1] prior to this email, and which ones > seems to be new or were not monitored so that we draw additional attention > to them: > > *Instabilities that were already being tracked* > > - FLINK-13242: StandaloneResourceManagerTest.testStartupPeriod fails on > Travis [2] > A fix for this is coming with FLINK-13408 (Schedule > StandaloneResourceManager.setFailUnfulfillableRequest whenever the > leadership is acquired) [3] > > *New discovered instabilities that we should also start monitoring* > > - FLINK-13484: ConnectedComponents E2E fails with > ResourceNotAvailableException [4] > - FLINK-13487: > TaskExecutorPartitionLifecycleTest.testPartitionReleaseAfterReleaseCall > failed on Travis [5]. FLINK-13476 (Partitions not being properly released > on cancel) could be the cause [6]. > - FLINK-13488: flink-python fails to build on Travis due to Python 3.3 > install failure [7] > - FLINK-13489: Heavy deployment E2E fails quite consistently on Travis with > TM heartbeat timeout [8] > - FLINK-9900: > ZooKeeperHighAvailabilityITCase.testRestoreBehaviourWithFaultyStateHandles > deadlocks [9] > - FLINK-13377: Streaming SQ E2E fails on Travis with mismatching outputs > (could just be that the SQL query tested on Travis is indeterministic) [10] > > Cheers, > Gordon > > [1] > > https://issues.apache.org/jira/secure/RapidBoard.jspa?projectKey=FLINK=328 > > [2] https://issues.apache.org/jira/browse/FLINK-13242 > [3] https://issues.apache.org/jira/browse/FLINK-13408 > [4] https://issues.apache.org/jira/browse/FLINK-13484 > [5] https://issues.apache.org/jira/browse/FLINK-13487 > [6] https://issues.apache.org/jira/browse/FLINK-13476 > [7] https://issues.apache.org/jira/browse/FLINK-13488 > [8] https://issues.apache.org/jira/browse/FLINK-13489 > [9] https://issues.apache.org/jira/browse/FLINK-9900 > [10] https://issues.apache.org/jira/browse/FLINK-13377 > > On Sun, Jul 28, 2019 at 6:14 AM zhijiang .invalid> > wrote: > > > Hi Gordon, > > > > Thanks for the following updates of current progress. > > In addition, it might be better to also cover the fix of network resource > > leak in jira ticket [1] which would be merged soon I think. > > > > [1] FLINK-13245: This fixes the leak of releasing reader/view with > > partition in network stack. > > > > Best, > > Zhijiang > > -- > > From:Tzu-Li (Gordon) Tai > > Send Time:2019年7月27日(星期六) 10:41 > > To:dev > > Subject:Re: [ANNOUNCE] Progress updates for Apache Flink 1.9.0 release > > > > Hi all, > > > > It's been a while since our last update for the release testing of 1.9.0, > > so I want to bring attention to the current status of the release. > > > > We are approaching RC1 soon, waiting on the following specific last > ongoing > > threads to be closed: > > - FLINK-13241: This fixes a problem where when using YARN, slot > allocation > > requests may be ignored [1] > > - FLINK-13371: Potential partitions resource leak in case of producer > > restarts [2] > > - FLINK-13350: Distinguish between temporary tables and persisted tables > > [3]. Strictly speaking this would be a new feature, but there was a > > discussion here [4] to include a workaround for now in 1.9.0, and a > proper > > solution later on in 1.10.x. > > - FLINK-12858: Potential distributed deadlock in case of synchronous > > savepoint failure [5] > > > > The above is the critical path for moving forward with an RC1 for > official > > voting. > > All of them have PRs already, and are currently being reviewed or close > to > > being merged. > > > > Cheers, > > Gordon > > > > [1] https://issues.apache.org/jira/browse/FLINK-13241 > > [2] https://issues.apache.org/jira/browse/FLINK-13371 > > [3] https://issues.apache.org/jira/browse/FLINK-13350 > > [4] > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Support-temporary-tables-in-SQL-API-td30831.html > > [5] https://issues.apache.org/jira/browse/FLINK-12858 > > > > On Tue, Jul 16, 2019 at 5:26 AM Tzu-Li (Gordon) Tai > > > wrote: > > > > > Update: RC0 for 1.9.0 has been created. Please see [1] for the preview > > > source / binary releases and Maven artifacts. > > > > > > Cheers, > > > Gordon > > > > > > [1] > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/PREVIEW-Apache-Flink-1-9-0-release-candidate-0-td30583.html > > > > > > On Mon, Jul 15, 2019 at 6:39 PM Tzu-Li (Gordon) Tai < > tzuli...@apache.org > > > > > > wrote: > > > > > >> Hi Flink devs, > > >> > > >> As previously announced by Kurt [1],