[jira] [Created] (FLINK-10836) Add repositories and pluginRepositories in root pom.xml
bjkonglu created FLINK-10836: Summary: Add repositories and pluginRepositories in root pom.xml Key: FLINK-10836 URL: https://issues.apache.org/jira/browse/FLINK-10836 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 1.8.0 Reporter: bjkonglu h3. Background When developer want to build flink project use maven, they may encounter dependency problem which maven can't download dependencies from remote repo. h3. Analyse There maybe a way to resolve this problem. That's add remote maven repo in root pom.xml, for example: {code:java} alimaven http://maven.aliyun.com/nexus/content/groups/public/ central Maven Repository https://repo.maven.apache.org/maven2 true false alimaven http://maven.aliyun.com/nexus/content/groups/public/ true true central https://repo.maven.apache.org/maven2 true false {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: how get job id which job run slot
Hi lining, Yes, currently you can't get slot information via the "/taskmanagers/:taskmanagerid" rest API. In addition, please ask questions in the user mailing list. The dev mailing list mainly discusses information related to Flink development. Thanks, vino. lining jing 于2018年11月9日周五 上午5:42写道: > Hi, dev. I have a question. Now rest api can get TaskManagerInfo, but can > not get information of task which run on slot. >
[jira] [Created] (FLINK-10835) Remove duplicated Round-robin ChannelSelector implementation in RecordWriterTest
zhijiang created FLINK-10835: Summary: Remove duplicated Round-robin ChannelSelector implementation in RecordWriterTest Key: FLINK-10835 URL: https://issues.apache.org/jira/browse/FLINK-10835 Project: Flink Issue Type: Sub-task Components: Network Affects Versions: 1.8.0 Reporter: zhijiang Assignee: zhijiang {{RoundRobinChannelSelector}} exists for default selector in {{RecordWriter}} mainly for tests. Another similar {{RoundRobin}} implementation exists in {{RecordWriterTest}}, only because the difference in starting channel index for round-robin. We can adjust the test verify logic to keep the same behavior with {{RoundRobinChannelSelector}}, and then remove the duplicated {{RoundRobin}}. It can make simple in following work [FLINK-10622|https://issues.apache.org/jira/browse/FLINK-10662] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10834) TableAPI flatten calculated value error
sunjincheng created FLINK-10834: --- Summary: TableAPI flatten calculated value error Key: FLINK-10834 URL: https://issues.apache.org/jira/browse/FLINK-10834 Project: Flink Issue Type: Bug Components: Table API & SQL Reporter: sunjincheng Fix For: 1.7.1 We have a UDF as follows: object FuncRow extends ScalarFunction { def eval(v: Int): Row = { val version = "" + new Random().nextInt() val row = new Row(3) row.setField(0, version) row.setField(1, version) row.setField(2, version) row } override def isDeterministic: Boolean = false override def getResultType(signature: Array[Class[_]]): TypeInformation[_] = Types.ROW(Types.STRING, Types.STRING, Types.STRING) } ... val data = new mutable.MutableList[(Int, Long, String)] data.+=((1, 1L, "Hi")) val ds = env.fromCollection(data).toTable(tEnv, 'a, 'b,'c) .select(FuncRow('a).flatten()).as('v1, 'v2, 'v3) ... The result is : -1189206469,-151367792,1988676906 The result expected by the user should be: v1==v2==v3 . It looks the real reason is that there is no result of the reuse in codegen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem
Hi Piotr, That seems to be good idea! Since the google doc for the design is currently under extensive review, I will leave it as it is for now. However, I'll convert it to two different FLIPs when the time comes. How does it sound to you? Thanks, Xuefu -- Sender:Piotr Nowojski Sent at:2018 Nov 9 (Fri) 02:31 Recipient:dev Cc:Bowen Li ; Xuefu ; Shuyi Chen Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem Hi, Maybe we should split this topic (and the design doc) into couple of smaller ones, hopefully independent. The questions that you have asked Fabian have for example very little to do with reading metadata from Hive Meta Store? Piotrek > On 7 Nov 2018, at 14:27, Fabian Hueske wrote: > > Hi Xuefu and all, > > Thanks for sharing this design document! > I'm very much in favor of restructuring / reworking the catalog handling in > Flink SQL as outlined in the document. > Most changes described in the design document seem to be rather general and > not specifically related to the Hive integration. > > IMO, there are some aspects, especially those at the boundary of Hive and > Flink, that need a bit more discussion. For example > > * What does it take to make Flink schema compatible with Hive schema? > * How will Flink tables (descriptors) be stored in HMS? > * How do both Hive catalogs differ? Could they be integrated into to a > single one? When to use which one? > * What meta information is provided by HMS? What of this can be leveraged > by Flink? > > Thank you, > Fabian > > Am Fr., 2. Nov. 2018 um 00:31 Uhr schrieb Bowen Li : > >> After taking a look at how other discussion threads work, I think it's >> actually fine just keep our discussion here. It's up to you, Xuefu. >> >> The google doc LGTM. I left some minor comments. >> >> On Thu, Nov 1, 2018 at 10:17 AM Bowen Li wrote: >> >>> Hi all, >>> >>> As Xuefu has published the design doc on google, I agree with Shuyi's >>> suggestion that we probably should start a new email thread like "[DISCUSS] >>> ... Hive integration design ..." on only dev mailing list for community >>> devs to review. The current thread sends to both dev and user list. >>> >>> This email thread is more like validating the general idea and direction >>> with the community, and it's been pretty long and crowded so far. Since >>> everyone is pro for the idea, we can move forward with another thread to >>> discuss and finalize the design. >>> >>> Thanks, >>> Bowen >>> >>> On Wed, Oct 31, 2018 at 12:16 PM Zhang, Xuefu >>> wrote: >>> Hi Shuiyi, Good idea. Actually the PDF was converted from a google doc. Here is its link: https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing Once we reach an agreement, I can convert it to a FLIP. Thanks, Xuefu -- Sender:Shuyi Chen Sent at:2018 Nov 1 (Thu) 02:47 Recipient:Xuefu Cc:vino yang ; Fabian Hueske ; dev ; user Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem Hi Xuefu, Thanks a lot for driving this big effort. I would suggest convert your proposal and design doc into a google doc, and share it on the dev mailing list for the community to review and comment with title like "[DISCUSS] ... Hive integration design ..." . Once approved, we can document it as a FLIP (Flink Improvement Proposals), and use JIRAs to track the implementations. What do you think? Shuyi On Tue, Oct 30, 2018 at 11:32 AM Zhang, Xuefu wrote: Hi all, I have also shared a design doc on Hive metastore integration that is attached here and also to FLINK-10556[1]. Please kindly review and share your feedback. Thanks, Xuefu [1] https://issues.apache.org/jira/browse/FLINK-10556 -- Sender:Xuefu Sent at:2018 Oct 25 (Thu) 01:08 Recipient:Xuefu ; Shuyi Chen < suez1...@gmail.com> Cc:yanghua1127 ; Fabian Hueske ; dev ; user Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem Hi all, To wrap up the discussion, I have attached a PDF describing the proposal, which is also attached to FLINK-10556 [1]. Please feel free to watch that JIRA to track the progress. Please also let me know if you have additional comments or questions. Thanks, Xuefu [1] https://issues.apache.org/jira/browse/FLINK-10556 -- Sender:Xuefu Sent at:2018 Oct 16 (Tue) 03:40 Recipient:Shuyi Chen Cc:yanghua1127 ; Fabian Hueske ; dev ; user Subject:Re: [DI
[jira] [Created] (FLINK-10832) StreamExecutionEnvironment.execute() does not return when all sources end
Arnaud Linz created FLINK-10832: --- Summary: StreamExecutionEnvironment.execute() does not return when all sources end Key: FLINK-10832 URL: https://issues.apache.org/jira/browse/FLINK-10832 Project: Flink Issue Type: Bug Components: Core Affects Versions: 1.6.2, 1.5.5 Reporter: Arnaud Linz In 1.5.5, 1.6.1 and 1.6.2 (it works in 1.6.0 and 1.3.2), This code never ends : {{ public void testFlink() throws Exception {}} {{ // get the execution environment}} {{ final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();}} {{ // get input data}} {{final DataStreamSource text = env.addSource(new SourceFunction() {}} {{ @Override}} {{ public void run(final SourceContext ctx) throws Exception {}} {{ for (int count = 0; count < 5; count++) {}} {{ ctx.collect(String.valueOf(count));}} {{ }}} {{ }}} {{ @Override}} {{ public void cancel() {}} {{ }}} {{ });}} {{ text.print().setParallelism(1);}} {{ env.execute("Simple Test");}} {{ // Never ends !}} {{ }}}{{ }} It's critical for us as we heavily rely on this "source exhaustion stop" mechanism to achieve proper stop of streaming applications from their own code, so it prevents us from using the last flink versions. The log extract shows that the local cluster tried to shut down, but could not do it for no apparent reason: {{[2018-11-07 11:11:13,837] INFO Source: Custom Source -> Sink: Print to Std. Out (1/1) (07ae66bef91de06205cf22a337ea1fe2) switched from DEPLOYING to RUNNING. (org.apache.flink.runtime.executiongraph.ExecutionGraph:1316)}} {{[2018-11-07 11:11:13,840] INFO No state backend has been configured, using default (Memory / JobManager) MemoryStateBackend (data in heap memory / checkpoints to JobManager) (checkpoints: 'null', savepoints: 'null', asynchronous: TRUE, maxStateSize: 5242880) (org.apache.flink.streaming.runtime.tasks.StreamTask:230)}} {{0}} {{1}} {{2}} {{3}} {{4}} {{[2018-11-07 11:11:13,897] INFO Source: Custom Source -> Sink: Print to Std. Out (1/1) (07ae66bef91de06205cf22a337ea1fe2) switched from RUNNING to FINISHED. (org.apache.flink.runtime.taskmanager.Task:915)}} {{[2018-11-07 11:11:13,897] INFO Freeing task resources for Source: Custom Source -> Sink: Print to Std. Out (1/1) (07ae66bef91de06205cf22a337ea1fe2). (org.apache.flink.runtime.taskmanager.Task:818)}} {{[2018-11-07 11:11:13,898] INFO Ensuring all FileSystem streams are closed for task Source: Custom Source -> Sink: Print to Std. Out (1/1) (07ae66bef91de06205cf22a337ea1fe2) [FINISHED] (org.apache.flink.runtime.taskmanager.Task:845)}} {{[2018-11-07 11:11:13,899] INFO Un-registering task and sending final execution state FINISHED to JobManager for task Source: Custom Source -> Sink: Print to Std. Out 07ae66bef91de06205cf22a337ea1fe2. (org.apache.flink.runtime.taskexecutor.TaskExecutor:1337)}} {{[2018-11-07 11:11:13,904] INFO Source: Custom Source -> Sink: Print to Std. Out (1/1) (07ae66bef91de06205cf22a337ea1fe2) switched from RUNNING to FINISHED. (org.apache.flink.runtime.executiongraph.ExecutionGraph:1316)}} {{[2018-11-07 11:11:13,907] INFO Job Simple Test (0ef8697ca98f6d2b565ed928d17c8a49) switched from state RUNNING to FINISHED. (org.apache.flink.runtime.executiongraph.ExecutionGraph:1356)}} {{[2018-11-07 11:11:13,908] INFO Stopping checkpoint coordinator for job 0ef8697ca98f6d2b565ed928d17c8a49. (org.apache.flink.runtime.checkpoint.CheckpointCoordinator:320)}} {{[2018-11-07 11:11:13,908] INFO Shutting down (org.apache.flink.runtime.checkpoint.StandaloneCompletedCheckpointStore:102)}} {{[2018-11-07 11:11:23,579] INFO Shutting down Flink Mini Cluster (org.apache.flink.runtime.minicluster.MiniCluster:427)}} {{[2018-11-07 11:11:23,580] INFO Shutting down rest endpoint. (org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint:265)}} {{[2018-11-07 11:11:23,582] INFO Stopping TaskExecutor akka://flink/user/taskmanager_0. (org.apache.flink.runtime.taskexecutor.TaskExecutor:291)}} {{[2018-11-07 11:11:23,583] INFO Shutting down TaskExecutorLocalStateStoresManager. (org.apache.flink.runtime.state.TaskExecutorLocalStateStoresManager:213)}} {{[2018-11-07 11:11:23,591] INFO I/O manager removed spill file directory C:\Users\alinz\AppData\Local\Temp\flink-io-6a19871b-3b86-4a47-9b82-28eef7e55814 (org.apache.flink.runtime.io.disk.iomanager.IOManager:110)}} {{[2018-11-07 11:11:23,591] INFO Shutting down the network environment and its components. (org.apache.flink.runtime.io.network.NetworkEnvironment:344)}} {{[2018-11-07 11:11:23,591] INFO Removing cache directory C:\Users\alinz\AppData\Local\Temp\flink-web-ui (org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint:733)}} {{[2018-11-07 11:11:23,593] I
[jira] [Created] (FLINK-10829) Extend FlinkDistribution to support running jars
Chesnay Schepler created FLINK-10829: Summary: Extend FlinkDistribution to support running jars Key: FLINK-10829 URL: https://issues.apache.org/jira/browse/FLINK-10829 Project: Flink Issue Type: Improvement Components: Tests Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.8.0 To facilitate better adoption of the {{FlinkDistribution}} we should add support for running jars. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10819) The instability problem of CI, JobManagerHAProcessFailureRecoveryITCase.testDispatcherProcessFailure test fail.
sunjincheng created FLINK-10819: --- Summary: The instability problem of CI, JobManagerHAProcessFailureRecoveryITCase.testDispatcherProcessFailure test fail. Key: FLINK-10819 URL: https://issues.apache.org/jira/browse/FLINK-10819 Project: Flink Issue Type: Test Components: Tests Reporter: sunjincheng Fix For: 1.7.1 Found the following error in the process of CI: Results : Tests in error: JobManagerHAProcessFailureRecoveryITCase.testDispatcherProcessFailure:331 » IllegalArgument Tests run: 1463, Failures: 0, Errors: 1, Skipped: 29 18:40:55.828 [INFO] 18:40:55.829 [INFO] BUILD FAILURE 18:40:55.829 [INFO] 18:40:55.830 [INFO] Total time: 30:19 min 18:40:55.830 [INFO] Finished at: 2018-11-07T18:40:55+00:00 18:40:56.294 [INFO] Final Memory: 92M/678M 18:40:56.294 [INFO] 18:40:56.294 [WARNING] The requested profile "include-kinesis" could not be activated because it does not exist. 18:40:56.295 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (integration-tests) on project flink-tests_2.11: There are test failures. 18:40:56.295 [ERROR] 18:40:56.295 [ERROR] Please refer to /home/travis/build/sunjincheng121/flink/flink-tests/target/surefire-reports for the individual test results. 18:40:56.295 [ERROR] -> [Help 1] 18:40:56.295 [ERROR] 18:40:56.295 [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. 18:40:56.295 [ERROR] Re-run Maven using the -X switch to enable full debug logging. 18:40:56.295 [ERROR] 18:40:56.295 [ERROR] For more information about the errors and possible solutions, please read the following articles: 18:40:56.295 [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException MVN exited with EXIT CODE: 1. Trying to KILL watchdog (11329). ./tools/travis_mvn_watchdog.sh: line 269: 11329 Terminated watchdog PRODUCED build artifacts. But after the rerun, the error disappeared. Currently,no specific reasons are found, and will continue to pay attention. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
how get job id which job run slot
Hi, dev. I have a question. Now rest api can get TaskManagerInfo, but can not get information of task which run on slot.
[jira] [Created] (FLINK-10820) Simplify the RebalancePartitioner implementation
zhijiang created FLINK-10820: Summary: Simplify the RebalancePartitioner implementation Key: FLINK-10820 URL: https://issues.apache.org/jira/browse/FLINK-10820 Project: Flink Issue Type: Sub-task Components: Network Affects Versions: 1.8.0 Reporter: zhijiang Assignee: zhijiang The current {{RebalancePartitioner}} implementations seems a little hacky for selecting a random number as the first channel index, and the following selections based on this random index in round-robin fashion. We can define a constant as the first channel index to make the implementation simple and readable. To do so, it will not change the rebalance semantics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10822) Configurable MetricQueryService interval
Chesnay Schepler created FLINK-10822: Summary: Configurable MetricQueryService interval Key: FLINK-10822 URL: https://issues.apache.org/jira/browse/FLINK-10822 Project: Flink Issue Type: Improvement Components: Metrics Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.8.0 The {{MetricQueryService}} is used for transmitting metrics from TaskManagers to the JobManager, in order to expose them via REST and by extension the WebUI. By default the JM will poll metrics at most every 10 seconds. This has an adverse effect on the duration of our end-to-end tests, which for example query metrics via the REST API to determine whether the cluster has started. If during the first poll no TM is available it will take another 10 second for updated information to be available. By making this interval configurable we could this reduce the test duration. Additionally this could serve as a switch to disable the {{MetricQueryService}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10828) Enforce that all TypeSerializers are tested through SerializerTestBase
Stefan Richter created FLINK-10828: -- Summary: Enforce that all TypeSerializers are tested through SerializerTestBase Key: FLINK-10828 URL: https://issues.apache.org/jira/browse/FLINK-10828 Project: Flink Issue Type: Test Components: Tests Reporter: Stefan Richter As pointed out in FLINK-10827, type serializers are a common source of bugs and we should try to enforce that every type serializer (that is not exclusive to tests) is tested at least through a test that extends the {{SerializerTestBase}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10826) Heavy deployment end-to-end test produces no output on Travis
Timo Walther created FLINK-10826: Summary: Heavy deployment end-to-end test produces no output on Travis Key: FLINK-10826 URL: https://issues.apache.org/jira/browse/FLINK-10826 Project: Flink Issue Type: Bug Components: E2E Tests Reporter: Timo Walther The Heavy deployment end-to-end test produces no output on Travis such that it is killed after 10 minutes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10818) RestartStrategies.fixedDelayRestart Occur NoResourceAvailableException: Not enough free slots available to run the job.
ambition created FLINK-10818: Summary: RestartStrategies.fixedDelayRestart Occur NoResourceAvailableException: Not enough free slots available to run the job. Key: FLINK-10818 URL: https://issues.apache.org/jira/browse/FLINK-10818 Project: Flink Issue Type: Bug Components: Core Affects Versions: 1.6.2 Environment: JDK 1.8 Flink 1.6.0 Hadoop 2.7.3 Reporter: ambition Our Online Flink on Yarn environment operation job,code set restart tactic like {code:java} exeEnv.setRestartStrategy(RestartStrategies.fixedDelayRestart(5,1000l)); {code} But job running some days, Occur Exception is : {code:java} org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: < Attempt #5 (Source: KafkaJsonTableSource -> Map -> where: (AND(OR(=(app_key, _UTF-16LE'C4FAF9CE1569F541'), =(app_key, _UTF-16LE'F5C7F68C7117630B'), =(app_key, _UTF-16LE'57C6FF4B5A064D29')), OR(=(LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)), _UTF-16LE'ios'), =(LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)), _UTF-16LE'android')), IS NOT NULL(server_id))), select: (MT_Date_Format_Mode(receive_time, _UTF-16LE'MMddHHmm', 10) AS date_p, LOWER(TRIM(FLAG(BOTH), _UTF-16LE' ', os_type)) AS os_type, MT_Date_Format_Mode(receive_time, _UTF-16LE'HHmm', 10) AS date_mm, server_id) (1/6)) @ (unassigned) - [SCHEDULED] > with groupID < cbc357ccb763df2852fee8c4fc7d55f2 > in sharing group < 690dbad267a8ff37c8cb5e9dbedd0a6d >. Resources available to scheduler: Number of instances=6, total number of slots=6, available slots=0 at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.scheduleTask(Scheduler.java:281) at org.apache.flink.runtime.jobmanager.scheduler.Scheduler.allocateSlot(Scheduler.java:155) at org.apache.flink.runtime.executiongraph.Execution.lambda$allocateAndAssignSlotForExecution$2(Execution.java:491) at org.apache.flink.runtime.executiongraph.Execution$$Lambda$44/1664178385.apply(Unknown Source) at java.util.concurrent.CompletableFuture.uniComposeStage(CompletableFuture.java:981) at java.util.concurrent.CompletableFuture.thenCompose(CompletableFuture.java:2116) at org.apache.flink.runtime.executiongraph.Execution.allocateAndAssignSlotForExecution(Execution.java:489) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.allocateResourcesForAll(ExecutionJobVertex.java:521) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleEager(ExecutionGraph.java:945) at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleForExecution(ExecutionGraph.java:875) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:1262) at org.apache.flink.runtime.executiongraph.restart.ExecutionGraphRestartCallback.triggerFullRecovery(ExecutionGraphRestartCallback.java:59) at org.apache.flink.runtime.executiongraph.restart.FixedDelayRestartStrategy$1.run(FixedDelayRestartStrategy.java:68) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) {code} this Exception happened when the job started. issue links to https://issues.apache.org/jira/browse/FLINK-4486 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10823) Missing scala suffixes
Chesnay Schepler created FLINK-10823: Summary: Missing scala suffixes Key: FLINK-10823 URL: https://issues.apache.org/jira/browse/FLINK-10823 Project: Flink Issue Type: Bug Components: Batch Connectors and Input/Output Formats, Metrics Affects Versions: 1.7.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.7.0 The jdbc connector and jmx/prometheus reporter have provided dependencies to scala-infected modules (streaming-java/runtime) and thus also require a scala sufix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10830) Consider making processing time provider pluggable
Andrey Zagrebin created FLINK-10830: --- Summary: Consider making processing time provider pluggable Key: FLINK-10830 URL: https://issues.apache.org/jira/browse/FLINK-10830 Project: Flink Issue Type: Improvement Reporter: Andrey Zagrebin At the moment, the processing time is basically implemented in a fixed way as System.currentTimeMillis() and not configurable by users. If this implementation does not fit application business logic for some reason there is no way for users to change it. Examples: * The timestamp provided by currentTimeMillis is not guaranteed to be monotonically increasing. It can jump back for a while because of possible periodic synchronisation of local clock with other more accurate system. It can be a problem for application business logic if we say that the general notion of time is that it always increases. * Hard to implement end-to-end tests because synchronisation between time in test and in Flink is out of control. We can make it configurable and let users optionally set their own factory to create processing time provider. All features which depend on querying current processing time can use this implementation. The default one can still stay System.currentTimeMillis(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10831) Consider making processing time monotonically increasing by default
Andrey Zagrebin created FLINK-10831: --- Summary: Consider making processing time monotonically increasing by default Key: FLINK-10831 URL: https://issues.apache.org/jira/browse/FLINK-10831 Project: Flink Issue Type: Improvement Reporter: Andrey Zagrebin At the moment, the processing time is basically implemented in a fixed way as System.currentTimeMillis() and not configurable by users. The timestamp provided this way is not guaranteed to be monotonically increasing. It can jump back for a while because of possible periodic synchronisation of local clock with other more accurate system. It can be a problem for application business logic if we say that the general notion of time is that it always increases. We can change SystemProcessingTimeService to emit only timestamp which is not less than the latest emitted one, at least for current JVM process. This change in behaviour can be also configurable if somebody e.g. relies on rather accurate time. Other option is that if user needs monotonic processing time then custom processing time service should be provided as suggested in FLINK-10830. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem
Hi, Maybe we should split this topic (and the design doc) into couple of smaller ones, hopefully independent. The questions that you have asked Fabian have for example very little to do with reading metadata from Hive Meta Store? Piotrek > On 7 Nov 2018, at 14:27, Fabian Hueske wrote: > > Hi Xuefu and all, > > Thanks for sharing this design document! > I'm very much in favor of restructuring / reworking the catalog handling in > Flink SQL as outlined in the document. > Most changes described in the design document seem to be rather general and > not specifically related to the Hive integration. > > IMO, there are some aspects, especially those at the boundary of Hive and > Flink, that need a bit more discussion. For example > > * What does it take to make Flink schema compatible with Hive schema? > * How will Flink tables (descriptors) be stored in HMS? > * How do both Hive catalogs differ? Could they be integrated into to a > single one? When to use which one? > * What meta information is provided by HMS? What of this can be leveraged > by Flink? > > Thank you, > Fabian > > Am Fr., 2. Nov. 2018 um 00:31 Uhr schrieb Bowen Li : > >> After taking a look at how other discussion threads work, I think it's >> actually fine just keep our discussion here. It's up to you, Xuefu. >> >> The google doc LGTM. I left some minor comments. >> >> On Thu, Nov 1, 2018 at 10:17 AM Bowen Li wrote: >> >>> Hi all, >>> >>> As Xuefu has published the design doc on google, I agree with Shuyi's >>> suggestion that we probably should start a new email thread like "[DISCUSS] >>> ... Hive integration design ..." on only dev mailing list for community >>> devs to review. The current thread sends to both dev and user list. >>> >>> This email thread is more like validating the general idea and direction >>> with the community, and it's been pretty long and crowded so far. Since >>> everyone is pro for the idea, we can move forward with another thread to >>> discuss and finalize the design. >>> >>> Thanks, >>> Bowen >>> >>> On Wed, Oct 31, 2018 at 12:16 PM Zhang, Xuefu >>> wrote: >>> Hi Shuiyi, Good idea. Actually the PDF was converted from a google doc. Here is its link: https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing Once we reach an agreement, I can convert it to a FLIP. Thanks, Xuefu -- Sender:Shuyi Chen Sent at:2018 Nov 1 (Thu) 02:47 Recipient:Xuefu Cc:vino yang ; Fabian Hueske ; dev ; user Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem Hi Xuefu, Thanks a lot for driving this big effort. I would suggest convert your proposal and design doc into a google doc, and share it on the dev mailing list for the community to review and comment with title like "[DISCUSS] ... Hive integration design ..." . Once approved, we can document it as a FLIP (Flink Improvement Proposals), and use JIRAs to track the implementations. What do you think? Shuyi On Tue, Oct 30, 2018 at 11:32 AM Zhang, Xuefu wrote: Hi all, I have also shared a design doc on Hive metastore integration that is attached here and also to FLINK-10556[1]. Please kindly review and share your feedback. Thanks, Xuefu [1] https://issues.apache.org/jira/browse/FLINK-10556 -- Sender:Xuefu Sent at:2018 Oct 25 (Thu) 01:08 Recipient:Xuefu ; Shuyi Chen < suez1...@gmail.com> Cc:yanghua1127 ; Fabian Hueske ; dev ; user Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem Hi all, To wrap up the discussion, I have attached a PDF describing the proposal, which is also attached to FLINK-10556 [1]. Please feel free to watch that JIRA to track the progress. Please also let me know if you have additional comments or questions. Thanks, Xuefu [1] https://issues.apache.org/jira/browse/FLINK-10556 -- Sender:Xuefu Sent at:2018 Oct 16 (Tue) 03:40 Recipient:Shuyi Chen Cc:yanghua1127 ; Fabian Hueske ; dev ; user Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem Hi Shuyi, Thank you for your input. Yes, I agreed with a phased approach and like to move forward fast. :) We did some work internally on DDL utilizing babel parser in Calcite. While babel makes Calcite's grammar extensible, at first impression it still seems too cumbersome for a project when too much extensions are made. It's even challenging to find where the extension is needed! It would be certa
Re: [DISCUSS] Table API Enhancement Outline
Hi all, We are discussing very detailed content about this proposal. We are trying to design the API in many aspects (functionality, compatibility, ease of use, etc.). I think this is a very good process. Only such a detailed discussion, In order to develop PR more clearly and smoothly in the later stage. I am very grateful to @Fabian and @Xiaowei for sharing a lot of good ideas. About the definition of method signatures I want to share my points here which I am discussing with fabian in google doc (not yet completed), as follows: Assume we have a table: val tab = util.addTable[(Long, String)]("MyTable", 'long, 'string, 'proctime.proctime) Approach 1: case1: Map follows Source Table val result = tab.map(udf('string)).as('proctime, 'col1, 'col2)// proctime implied in the output .window(Tumble over 5.millis on 'proctime as 'w) case2: FatAgg follows Window (Fabian mentioned above) val result = tab.window(Tumble ... as 'w) .groupBy('w, 'k1, 'k2) // 'w should be a group key. .flatAgg(tabAgg('a)).as('k1, 'k2, 'w, 'col1, 'col2) .select('k1, 'col1, 'w.rowtime as 'rtime) Approach 2: Similar to Fabian‘s approach, which the result schema would be clearly defined, but add a built-in append UDF. That make map/flatmap/agg/flatAgg interface only accept one Expression. val result = tab.map(append(udf('string), 'long, 'proctime)) as ('col1, 'col2, 'long, 'proctime) .window(Tumble over 5.millis on 'proctime as 'w) Note: Append is a special UDF for built-in that can pass through any column. So, May be we can defined the as table.map(Expression) first, If necessary, we can extend to table.map(Expression*) in the future ? Of course, I also hope that we can do more perfection in this proposal through discussion. Thanks, Jincheng Xiaowei Jiang 于2018年11月7日周三 下午11:45写道: > Hi Fabian, > > I think that the key question you raised is if we allow extra parameters in > the methods map/flatMap/agg/flatAgg. I can see why allowing that may appear > more convenient in some cases. However, it might also cause some confusions > if we do that. For example, do we allow multiple UDFs in these expressions? > If we do, the semantics may be weird to define, e.g. what does > table.groupBy('k).flatAgg(TableAggA('a), TableAggB('b)) mean? Even though > not allowing it may appear less powerful, but it can make things more > intuitive too. In the case of agg/flatAgg, we can define the keys to be > implied in the result table and appears at the beginning. You can use a > select method if you want to modify this behavior. I think that eventually > we will have some API which allows other expressions as additional > parameters, but I think it's better to do that after we introduce the > concept of nested tables. A lot of things we suggested here can be > considered as special cases of that. But things are much simpler if we > leave that to later. > > Regards, > Xiaowei > > On Wed, Nov 7, 2018 at 5:18 PM Fabian Hueske wrote: > > > Hi, > > > > * Re emit: > > I think we should start with a well understood semantics of full > > replacement. This is how the other agg functions work. > > As was said before, there are open questions regarding an append mode > > (checkpointing, whether supporting retractions or not and if yes how to > > declare them, ...). > > Since this seems to be an optimization, I'd postpone it. > > > > * Re grouping keys: > > I don't think we should automatically add them because the result schema > > would not be intuitive. > > Would they be added at the beginning of the tuple or at the end? What > > metadata fields of windows would be added? In which order would they be > > added? > > > > However, we could support syntax like this: > > val t: Table = ??? > > t > > .window(Tumble ... as 'w) > > .groupBy('a, 'b) > > .flatAgg('b, 'a, myAgg(row('*)), 'w.end as 'wend, 'w.rowtime as 'rtime) > > > > The result schema would be clearly defined as [b, a, f1, f2, ..., fn, > wend, > > rtime]. (f1, f2, ...fn) are the result attributes of the UDF. > > > > * Re Multi-staged evaluation: > > I think this should be an optimization that can be applied if the UDF > > implements the merge() method. > > > > Best, Fabian > > > > Am Mi., 7. Nov. 2018 um 08:01 Uhr schrieb Shaoxuan Wang < > > wshaox...@gmail.com > > >: > > > > > Hi xiaowei, > > > > > > Yes, I agree with you that the semantics of TableAggregateFunction emit > > is > > > much more complex than AggregateFunction. The fundamental difference is > > > that TableAggregateFunction emits a "table" while AggregateFunction > > outputs > > > (a column of) a "row". In the case of AggregateFunction it only has one > > > mode which is “replacing” (complete update). But for > > > TableAggregateFunction, it could be incremental (only emit the new > > updated > > > results) update or complete update (always emit the entire table when > > > “emit" is triggered). From the performance perspective, we might want > to > > > use incremental update. But we nee
[jira] [Created] (FLINK-10827) Add test for duplicate() to SerializerTestBase
Stefan Richter created FLINK-10827: -- Summary: Add test for duplicate() to SerializerTestBase Key: FLINK-10827 URL: https://issues.apache.org/jira/browse/FLINK-10827 Project: Flink Issue Type: Test Components: Tests Reporter: Stefan Richter Assignee: Stefan Richter Fix For: 1.7.0 In the past, we had many bugs from type serializers that have not properly implemented the {{duplicate()}} method in a proper way. A very common error is to forget about creating a deep copy of some fields that can lead to concurrency problems in the backend. We should add a test case for that tests duplicated serializer from different threads to expose concurrency problems. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS] FLIP-27: Refactor Source Interface
Hi Piotrek, > But I don’t see a reason why we should expose both blocking `take()` and non-blocking `poll()` methods to the Flink engine. Someone (Flink engine or connector) would have to do the same busy > looping anyway and I think it would be better to have a simpler connector API (that would solve our problems) and force connectors to comply one way or another. If we let the block happen inside the connector, the blocking does not have to be a busy loop. For example, to do the block waiting efficiently, the connector can use java NIO selector().select which relies on OS syscall like epoll[1] instead of busy looping. But if Flink engine blocks outside the connector, it pretty much has to do the busy loop. So if there is only one API to get the element, a blocking getNextElement() makes more sense. In any case, we should avoid ambiguity. It has to be crystal clear about whether a method is expected to be blocking or non-blocking. Otherwise it would be very difficult for Flink engine to do the right thing with the connectors. At the first glance at getCurrent(), the expected behavior is not quite clear. That said, I do agree that functionality wise, poll() and take() kind of overlap. But they are actually not quite different from isBlocked()/getNextElement(). Compared with isBlocked(), the only difference is that poll() also returns the next record if it is available. But I agree that the isBlocked() + getNextElement() is more flexible as users can just check the record availability, but not fetch the next element. > In case of thread-less readers with only non-blocking `queue.poll()` (is that really a thing? I can not think about a real implementation that enforces such constraints) Right, it is pretty much a syntax sugar to allow user combine the check-and-take into one method. It could be achieved with isBlocked() + getNextElement(). [1] http://man7.org/linux/man-pages/man7/epoll.7.html Thanks, Jiangjie (Becket) Qin On Wed, Nov 7, 2018 at 11:58 PM Piotr Nowojski wrote: > Hi Becket, > > With my proposal, both of your examples would have to be solved by the > connector and solution to both problems would be the same: > > Pretend that connector is never blocked (`isBlocked() { return > NOT_BLOCKED; }`) and implement `getNextElement()` in blocking fashion (or > semi blocking with return of control from time to time to allow for > checkpointing, network flushing and other resource management things to > happen in the same main thread). In other words, exactly how you would > implement `take()` method or how the same source connector would be > implemented NOW with current source interface. The difference with current > interface would be only that main loop would be outside of the connector, > and instead of periodically releasing checkpointing lock, periodically > `return null;` or `return Optional.empty();` from `getNextElement()`. > > In case of thread-less readers with only non-blocking `queue.poll()` (is > that really a thing? I can not think about a real implementation that > enforces such constraints), we could provide a wrapper that hides the busy > looping. The same applies how to solve forever blocking readers - we could > provider another wrapper running the connector in separate thread. > > But I don’t see a reason why we should expose both blocking `take()` and > non-blocking `poll()` methods to the Flink engine. Someone (Flink engine or > connector) would have to do the same busy looping anyway and I think it > would be better to have a simpler connector API (that would solve our > problems) and force connectors to comply one way or another. > > Piotrek > > > On 7 Nov 2018, at 10:55, Becket Qin wrote: > > > > Hi Piotr, > > > > I might have misunderstood you proposal. But let me try to explain my > > concern. I am thinking about the following case: > > 1. a reader has the following two interfaces, > >boolean isBlocked() > >T getNextElement() > > 2. the implementation of getNextElement() is non-blocking. > > 3. The reader is thread-less, i.e. it does not have any internal thread. > > For example, it might just delegate the getNextElement() to a > queue.poll(), > > and isBlocked() is just queue.isEmpty(). > > > > How can Flink efficiently implement a blocking reading behavior with this > > reader? Either a tight loop or a backoff interval is needed. Neither of > > them is ideal. > > > > Now let's say in the reader mentioned above implements a blocking > > getNextElement() method. Because there is no internal thread in the > reader, > > after isBlocked() returns false. Flink will still have to loop on > > isBlocked() to check whether the next record is available. If the next > > record reaches after 10 min, it is a tight loop for 10 min. You have > > probably noticed that in this case, even isBlocked() returns a future, > that > > future() will not be completed if Flink does not call some method from > the > > reader, because the reader has no internal thread to complete that fu
[jira] [Created] (FLINK-10821) Resuming Externalized Checkpoint E2E test does not Resume from Externalized Checkpoint
Gary Yao created FLINK-10821: Summary: Resuming Externalized Checkpoint E2E test does not Resume from Externalized Checkpoint Key: FLINK-10821 URL: https://issues.apache.org/jira/browse/FLINK-10821 Project: Flink Issue Type: Bug Components: E2E Tests Affects Versions: 1.7.0 Reporter: Gary Yao Fix For: 1.7.0 Path to externalized checkpoint is not passed as the {{-s}} argument: https://github.com/apache/flink/blob/483507a65c7547347eaafb21a24967c470f94ed6/flink-end-to-end-tests/test-scripts/test_resume_externalized_checkpoints.sh#L128 That is, the test currently restarts the job without checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Kinesis consumer e2e test
Hi, I was also just planning to work on it before Stephan contacted Thomas to ask about this test. Thomas, you are right about the structure, the test should also go into the `run-nightly-tests.sh`. What I was planning to do is a simple job that consists of a Kinesis consumer, a mapper that fails once after n records, and a kinesis producer. I was hoping that creation, filling, and validation of the Kinesis topics can be done with the Java API, not by invoking commands in a bash script. In general I would try to minimise the amount of scripting and do as much in Java as possible. It would also be nice if the test was generalised, e.g. that abstract Producer/Consumer are created from a Supplier and also the validation is done over some abstraction that lets us iterate over the produced output. Ideally, this would be a test that we can reuse for all Consumer/Producer cases and we could also port the tests for Kafka to that. What do you think? Best, Stefan > On 8. Nov 2018, at 07:22, Tzu-Li (Gordon) Tai wrote: > > Hi Thomas, > > I think Stefan Richter is also working on the Kinesis end-to-end test, and > seems to be planning to implement it against a real Kinesis service instead > of Kinesalite. > Perhaps efforts should be synced here. > > Cheers, > Gordon > > > On Thu, Nov 8, 2018 at 1:38 PM Thomas Weise wrote: > >> Hi, >> >> I'm planning to add an end-to-end test for the Kinesis consumer. We have >> done something similar at Lyft, using Kinesalite, which can be run as >> Docker container. >> >> I see that some tests already make use of Docker, so we can assume it to be >> present in the target environment(s)? >> >> I also found the following ticket: >> https://issues.apache.org/jira/browse/FLINK-9007 >> >> It suggest to also cover the producer, which may be a good way to create >> the input data as well. The stream itself can be created with the Kinesis >> Java SDK. >> >> Following the existing layout, there would be a new module >> flink-end-to-end-tests/flink-kinesis-test >> >> Are there any suggestions or comments regarding this? >> >> Thanks, >> Thomas >>
Re: [VOTE] Release 1.7.0, release candidate #1
Hi Till, Today when I do the CI before merge code, I find a instability test case: JobManagerHAProcessFailureRecoveryITCase.testDispatcherProcessFailure test fail. I'm not sure if this is issue should blocking the release 1.7, but I think it's best to find the cause and fix it before releasing 1.7. The corresponding JIRA issue is: https://issues.apache.org/jira/browse/FLINK-10819 Thanks, Jincheng Till Rohrmann 于2018年11月7日周三 下午10:13写道: > I hereby cancel the release vote because of the Scala suffix problems. I > will create the next RC in the next days. Until then, please continue > testing with the current release candidate. > > Cheers, > Till > > On Wed, Nov 7, 2018 at 2:39 PM Till Rohrmann wrote: > > > Thanks for spotting and addressing the Scala problem Chesnay. The > > corresponding JIRA issue is > > https://issues.apache.org/jira/browse/FLINK-10811. > > > > Cheers, > > Till > > > > On Wed, Nov 7, 2018 at 12:36 PM Chesnay Schepler > > wrote: > > > >> This isn't quite correct (as test-scoped dependencies are not > >> transitive, but all compile dependencies still are, even for the > >> test-jar). > >> > >> But effectively this means we don't need additional rules for test-jars > >> as compile dependencies already have to be taken care of separately from > >> tests anyway. > >> > >> I'll open JIRA for the hcatalog issue and scan through the remaining > >> modules for other violations. > >> > >> On 07.11.2018 11:46, Aljoscha Krettek wrote: > >> > I looked into this issue and my conclusion was that test-jars don't > >> pull in transitive dependencies when you depend on them. I verified this > >> with an example maven project where I also verified that a test-jar > built > >> with Scala 2.12 works on a project that uses Scala 2.11. > >> > > >> > On the hcatalog connector: This is unfortunate and we should add the > >> Scala suffix here. It's unfortunate since flink-hcatalog and > >> flink-hadoop-compatibility wouldn't have to have a Scala suffix, they > don't > >> depend on any other suffixed dependencies, they only reason is that they > >> themselves contain Scala code. This could have been avoided by putting > the > >> Scala code in a separate module. > >> > > >> > Aljoscha > >> > > >> >> On 7. Nov 2018, at 10:55, Chesnay Schepler > wrote: > >> >> > >> >> What was the conclusion in regards to modules requiring a > scala-suffix > >> if their test-jar depends on scala-infected modules? (Which basically > >> affects all modules) > >> >> > >> >> Beyond that, the hcatalog connector has a dependency on > >> flink-hadoop-compatibility_2.12, and should thus also have a scala > suffix. > >> There are probably other instances as well. > >> >> > >> >> On 05.11.2018 22:26, Till Rohrmann wrote: > >> >>> Hi everyone, > >> >>> Please review and vote on the release candidate #1 for the version > >> 1.7.0, > >> >>> as follows: > >> >>> [ ] +1, Approve the release > >> >>> [ ] -1, Do not approve the release (please provide specific > comments) > >> >>> > >> >>> > >> >>> The complete staging area is available for your review, which > >> includes: > >> >>> * JIRA release notes [1], > >> >>> * the official Apache source release and binary convenience releases > >> to be > >> >>> deployed to dist.apache.org [2], which are signed with the key with > >> >>> fingerprint 1F302569A96CFFD5 [3], > >> >>> * all artifacts to be deployed to the Maven Central Repository [4], > >> >>> * source code tag "release-1.7.0-rc1" [5], > >> >>> > >> >>> Please use this document for coordinating testing efforts: [6] > >> >>> > >> >>> The vote will be open for at least 72 hours. It is adopted by > majority > >> >>> approval, with at least 3 PMC affirmative votes. > >> >>> > >> >>> Thanks, > >> >>> Till > >> >>> > >> >>> [1] > >> >>> > >> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12343585 > >> >>> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.7.0/ > >> >>> [3] https://dist.apache.org/repos/dist/release/flink/KEYS > >> >>> [4] > >> https://repository.apache.org/content/repositories/orgapacheflink-1191 > >> >>> [5] https://github.com/apache/flink/tree/release-1.7.0-rc1 > >> >>> [6] > >> >>> > >> > https://docs.google.com/document/d/12JY_Xyy6umGR1vvrBFbqtDvf6ZdAYSAiljchrnsMUZs/edit?usp=sharing > >> >>> > >> >>> Pro-tip: you can create a settings.xml file with these contents: > >> >>> > >> >>> > >> >>> > >> >>>flink-1.7.0 > >> >>> > >> >>> > >> >>> > >> >>> flink-1.7.0 > >> >>> > >> >>> > >> >>> flink-1.7.0 > >> >>> > >> >>> > >> >>> > >> https://repository.apache.org/content/repositories/orgapacheflink-1191/ > >> >>> > >> >>> > >> >>> > >> >>> archetype > >> >>> > >> >>> > >> >>> > >> https://repository.apache.org/content/repositories/orgapacheflink-1191/ > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> And reference that in you maven commands via -
[jira] [Created] (FLINK-10824) Compile Flink unit test failed with 1.6.0 branch
Hongtao Zhang created FLINK-10824: - Summary: Compile Flink unit test failed with 1.6.0 branch Key: FLINK-10824 URL: https://issues.apache.org/jira/browse/FLINK-10824 Project: Flink Issue Type: Bug Components: Build System Affects Versions: 1.6.2 Environment: IDE: Intellij JDK Version: "1.8.0_111" scala Version: 2.11 Reporter: Hongtao Zhang reproduce step: # checkout flink 1.6.0 branch # compile the test file PageRankITCase.java under flink-tests module and org.apache.flink.test.example.java package # the compile message report that some packages not exists # all the unit test file that reference to the flink-examples package will be compile failed Information:java: Errors occurred while compiling module 'flink-tests_2.11' Information:javac 1.8.0_111 was used to compile java sources Information:2018/11/8 下午6:10 - Compilation completed with 17 errors and 0 warnings in 3 s 311 ms /Users/hongtaozhang/workspace/flink/flink-tests/src/test/java/org/apache/flink/test/example/java/PageRankITCase.java Error:Error:line (22)java: 程序包org.apache.flink.examples.java.graph不存在 Error:Error:line (23)java: 程序包org.apache.flink.test.testdata不存在 Error:Error:line (24)java: 程序包org.apache.flink.test.util不存在 Error:Error:line (25)java: 程序包org.apache.flink.util不存在 Error:Error:line (42)java: 找不到符号 符号: 类 MultipleProgramsTestBase Error:Error:line (44)java: 找不到符号 符号: 类 TestExecutionMode 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (63)java: 找不到符号 符号: 变量 PageRankData 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (63)java: 找不到符号 符号: 变量 FileUtils 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (66)java: 找不到符号 符号: 变量 PageRankData 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (66)java: 找不到符号 符号: 变量 FileUtils 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (74)java: 找不到符号 符号: 方法 compareKeyValuePairsWithDelta(java.lang.String,java.lang.String,java.lang.String,double) 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (83)java: 找不到符号 符号: 变量 PageRankData 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (79)java: 找不到符号 符号: 变量 PageRank 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (85)java: 找不到符号 符号: 变量 PageRankData 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (94)java: 找不到符号 符号: 变量 PageRankData 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (90)java: 找不到符号 符号: 变量 PageRank 位置: 类 org.apache.flink.test.example.java.PageRankITCase Error:Error:line (96)java: 找不到符号 符号: 变量 PageRankData 位置: 类 org.apache.flink.test.example.java.PageRankITCase -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-10825) ConnectedComponents test instable on Travis
Timo Walther created FLINK-10825: Summary: ConnectedComponents test instable on Travis Key: FLINK-10825 URL: https://issues.apache.org/jira/browse/FLINK-10825 Project: Flink Issue Type: Bug Components: E2E Tests Reporter: Timo Walther The "ConnectedComponents iterations with high parallelism end-to-end test" succeeds on Travis but the log contains with the following exception: {code} 2018-11-08 10:15:13,698 ERROR org.apache.flink.runtime.taskexecutor.rpc.RpcResultPartitionConsumableNotifier - Could not schedule or update consumers at the JobManager. org.apache.flink.runtime.executiongraph.ExecutionGraphException: Cannot find execution for execution Id 5b02c2f51e51f68b66bfab07afc1bf17. at org.apache.flink.runtime.executiongraph.ExecutionGraph.scheduleOrUpdateConsumers(ExecutionGraph.java:1635) at org.apache.flink.runtime.jobmaster.JobMaster.scheduleOrUpdateConsumers(JobMaster.java:637) at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:247) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:162) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:70) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.onReceive(AkkaRpcActor.java:142) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.onReceive(FencedAkkaRpcActor.java:40) at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:165) at akka.actor.Actor.aroundReceive(Actor.scala:502) at akka.actor.Actor.aroundReceive$(Actor.scala:500) at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:95) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526) at akka.actor.ActorCell.invoke(ActorCell.scala:495) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257) at akka.dispatch.Mailbox.run(Mailbox.scala:224) at akka.dispatch.Mailbox.exec(Mailbox.scala:234) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: [DISCUSS]Rethink the rescale operation, can we do it async
Hi, in general, I think it is clearly a good idea to make rescaling as cheap and as dynamic as possible and this is a goal that is surely on the long term roadmap of Flink. However, from a technical point of view I think that things are not as simple if you go into details, and details is what the proposal lacks so far. For example, right now it is not yet possible to even modify the shape of an execution dynamically while the job is running (changing parallelism without restart) and the scheduling is not really aware of the position of keyed state partitions. Also the state repartitioning itself has some tricky details, like after repartitioning state in a consistent way the job is still making progress, so how does the state of new operators catch up with those changes, and all of that in a consistent way that does not violate exactly once. We have a bunch of ideas how to tackle those problems in different stages towards a goal that might be similar to what you describe. For example, an intermediate step could be that you still need to briefly stop and restart the job, but we leverage local recovery to speed up the redeployment and each operator is scheduled to an instance that is preloaded with the repartitioned state to continue, to minimise downtime. I think we would also solve it in a general way that does not have limitations like being only able to rescale up and down by a factor of 2. So you can expect to see many steps towards this in the future, but I doubt that there is a quick fix by “just make it async”. Best, Stefan > On 8. Nov 2018, at 03:13, shimin yang wrote: > > Currently, the rescale operation is to stop the whole job and restart it > with different parrellism. But the rescale operation cost a lot and took > lots of time to recover if the state size is quite big. > > And a long-time rescale might cause other problems like latency increase > and back pressure. For some circumstances like a streaming computing cloud > service, users may be very sensitive to latency and resource usage. So it > would be better to make the rescale a cheaper operation. > > I wonder if we could make it an async operation just like checkpoint. But > how to deal with the keyed state would be a pain in the ass. Currently I > just want to make some assumption to make things simpler. The asnyc rescale > operation can only double the parrellism or make it half. > > In the scale up circumstance, we can copy the state to the newly created > worker and change the partitioner of the upstream. The best timing might be > get notified of checkpoint completed. But we still need to change the > partitioner of upstream. So the upstream should buffer the result or block > the computation util the state copy finished. Then make the partitioner to > send differnt elements with the same key to the same downstream operator. > > In the scale down scenario, we can merge the keyed state of two operators > and also change the partitioner of upstream.
[jira] [Created] (FLINK-10833) FlinkKafkaProducerITCase.testFlinkKafkaProducerFailTransactionCoordinatorBeforeNotify failed on Travis
Andrey Zagrebin created FLINK-10833: --- Summary: FlinkKafkaProducerITCase.testFlinkKafkaProducerFailTransactionCoordinatorBeforeNotify failed on Travis Key: FLINK-10833 URL: https://issues.apache.org/jira/browse/FLINK-10833 Project: Flink Issue Type: Bug Components: Kafka Connector Affects Versions: 1.7.0 Reporter: Andrey Zagrebin Fix For: 1.7.0 FlinkKafkaProducerITCase.testFlinkKafkaProducerFailTransactionCoordinatorBeforeNotify failed on Travis https://travis-ci.org/apache/flink/jobs/452290475 https://api.travis-ci.org/v3/job/452290475/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: StreamingFileSink Bug? Committing results on stream close
Hi Addison, unfortunately, there is a long-standing problem that user functions cannot differentiate between successful and erroneous shutdown [1]. I had this high on my private list of things that I finally want to see fixed in Flink 1.8. And your message further confirms this. Best, Aljoscha [1] https://issues.apache.org/jira/browse/FLINK-2646 > On 8. Nov 2018, at 13:39, Till Rohrmann wrote: > > Hi Addison, > > thanks for reporting this issue. I've pulled in Kostas who worked on the > StreamingFileSink and knows the current behaviour as well as its > limitations best. > > Cheers, > Till > > On Wed, Nov 7, 2018 at 11:49 PM Addison Higham wrote: > >> Hi all, >> >> Just run into a bit of a problem and I am not sure what the behavior should >> be and if this should be considered a bug? Or if there is some other way >> this should be handled? >> >> I have a streaming job with a stream that eventually closes, this job sinks >> to a StreamingFileSink. >> The problem I am experiencing is that any data written to the sink between >> the last checkpoint and the close of the stream is list. >> >> This happens (AFAICT) because the StreamingFileSink relies on checkpoints >> to commit files and closing the stream currently does not try and commit >> anything. >> >> It seems like just making close call >> `buckets.commitUpToCheckpoint(Long.MAX_VALUE)` would work pretty well >> assuming it is a an actual stream close, but could be problematic in the >> events of a savepoint/cancel and resuming later (it may only mean some >> files would be prematurely committed). Ideally, we would be able to >> differentiate between the two different types of close (an actual stream >> finishing vs a cancel), but at the moment that doesn't seem supported. >> >> If this considered a bug, please let me know and I will file a Jira, if >> not, what is the "correct" way to handle getting all the data out with any >> sinks that rely on a checkpoint to commit data? >> >> Thanks >>
Re: Kinesis consumer e2e test
Hi Thomas, the community is really interested in adding an end-to-end test for the Kinesis connector (producer as well as consumer). Thus, it would be really helpful if you could contribute your work you've already done. Using Kinesalite sounds good to me and you're right and that we assume that Docker is available in our testing environment. The testing job would go into a separate module as you've suggested (flink-end-to-end-tests/flink-kinesis-test) and the entrypoint to the test would go into flink-end-to-end-tests/test-scripts/ plus flink-end-to-end-tests/run-nightly-tests.sh. I think Stefan stopped his work on the end-to-end test but he had some ideas about reusing testing infrastructure for the Kafka and Kinesis tests (e.g. having a test base for similar connectors). This is something we can also address after the release if it would entail too much work. Cheers, Till On Thu, Nov 8, 2018 at 7:22 AM Tzu-Li (Gordon) Tai wrote: > Hi Thomas, > > I think Stefan Richter is also working on the Kinesis end-to-end test, and > seems to be planning to implement it against a real Kinesis service instead > of Kinesalite. > Perhaps efforts should be synced here. > > Cheers, > Gordon > > > On Thu, Nov 8, 2018 at 1:38 PM Thomas Weise wrote: > > > Hi, > > > > I'm planning to add an end-to-end test for the Kinesis consumer. We have > > done something similar at Lyft, using Kinesalite, which can be run as > > Docker container. > > > > I see that some tests already make use of Docker, so we can assume it to > be > > present in the target environment(s)? > > > > I also found the following ticket: > > https://issues.apache.org/jira/browse/FLINK-9007 > > > > It suggest to also cover the producer, which may be a good way to create > > the input data as well. The stream itself can be created with the Kinesis > > Java SDK. > > > > Following the existing layout, there would be a new module > > flink-end-to-end-tests/flink-kinesis-test > > > > Are there any suggestions or comments regarding this? > > > > Thanks, > > Thomas > > >
Re: StreamingFileSink Bug? Committing results on stream close
Hi Addison, thanks for reporting this issue. I've pulled in Kostas who worked on the StreamingFileSink and knows the current behaviour as well as its limitations best. Cheers, Till On Wed, Nov 7, 2018 at 11:49 PM Addison Higham wrote: > Hi all, > > Just run into a bit of a problem and I am not sure what the behavior should > be and if this should be considered a bug? Or if there is some other way > this should be handled? > > I have a streaming job with a stream that eventually closes, this job sinks > to a StreamingFileSink. > The problem I am experiencing is that any data written to the sink between > the last checkpoint and the close of the stream is list. > > This happens (AFAICT) because the StreamingFileSink relies on checkpoints > to commit files and closing the stream currently does not try and commit > anything. > > It seems like just making close call > `buckets.commitUpToCheckpoint(Long.MAX_VALUE)` would work pretty well > assuming it is a an actual stream close, but could be problematic in the > events of a savepoint/cancel and resuming later (it may only mean some > files would be prematurely committed). Ideally, we would be able to > differentiate between the two different types of close (an actual stream > finishing vs a cancel), but at the moment that doesn't seem supported. > > If this considered a bug, please let me know and I will file a Jira, if > not, what is the "correct" way to handle getting all the data out with any > sinks that rely on a checkpoint to commit data? > > Thanks >