Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics
Hi folks, To move on with the FLIP, I will cut out eventTimeFetchLag out of scope and go ahead with the remainder. I will open a VOTE later to today. Best, Arvid On Wed, Jul 28, 2021 at 8:44 AM Arvid Heise wrote: > Hi Becket, > > I have updated the PR according to your suggestion (note that this commit > contains the removal of the previous approach) [1]. Here are my > observations: > 1. Adding the type of RecordMetadata to emitRecord would require adding > another type parameter to RecordEmitter and SourceReaderBase. So I left > that out for now as it would break things completely. > 2. RecordEmitter implementations that want to pass it to SourceOutput need > to be changed in a boilerplate fashion. (passing the metadata to the > SourceOutput) > 3. RecordMetadata as an interface (as in the commit) probably requires > boilerplate implementations in using sources as well. > 4. SourceOutput would also require an additional collect > > default void collect(T record, RecordMetadata metadata) { > collect(record, TimestampAssigner.NO_TIMESTAMP, metadata); > } > > 5. RecordMetadata is currently not simplifying any code. By the current > design RecordMetadata is a read-only data structure that is constant for > all records in a batch. So in Kafka, we still need to pass Tuple3 because > offset and timestamp are per record. > 6. RecordMetadata is currently the same for all splits in > RecordsWithSplitIds. > > Some ideas for the above points: > 3. We should accompy it with a default implementation to avoid the trivial > POJO implementations as the KafkaRecordMetadata of my commit. Can we skip > the interface and just have RecordMetadata as a base class? > 1.,2.,4. We could also set the metadata only once in an orthogonal method > that need to be called before collect like SourceOutput#setRecordMetadata. > Then we can implement it entirely in SourceReaderBase without changing any > code. The clear downside is that it introduces some implicit state in > SourceOutput (which we implement) and is harder to use in > non-SourceReaderBase classes: Source devs need to remember to call > setRecordMetadata before collect for a respective record. > 6. We might rename and change the semantics into > > public interface RecordsWithSplitIds { > /** > * Returns the record metadata. The metadata is shared for all records in > the current split. > */ > @Nullable > default RecordMetadata metadataOfCurrentSplit() { > return null; > } > ... > } > > > Re global variable > >> To explain a bit more on the metric being a global variable, I think in >> general there are two ways to pass a value from one code block to another. >> The first way is direct passing. That means the variable is explicitly >> passed from one code block to another via arguments, be them in the >> constructor or methods. Another way is indirect passing through context, >> that means the information is stored in some kind of context or >> environment, and everyone can have access to it. And there is no explicit >> value passing from one code block to another because everyone just reads >> from/writes to the context or environment. This is basically the "global >> variable" pattern I am talking about. >> >> In general people would avoid having a mutable global value shared across >> code blocks, because it is usually less deterministic and therefore more >> difficult to understand or debug. >> > Since the first approach was using a Gauge, it's a callback and not a > global value. The actual value is passed when invoking the callback. It's > the same as a supplier. However, the gauge itself is stored in the context, > so your argument holds on that level. > > >> Moreover, generally speaking, the Metrics in systems are usually perceived >> as a reporting mechanism. People usually think of it as a way to expose >> some internal values to the external system, and don't expect the program >> itself to read the reported values again in the main logic, which is >> essentially using the MetricGroup as a context to pass values across code >> block, i.e. the "global variable" pattern. Instead, people would usually >> use the "direct passing" to do this. >> > Here I still don't see a difference on how we calculate the meter values > from the byteIn/Out counters. We also need to read the counters > periodically and calculate a secondary metric. So it can't be that > unexpected to users. > > [1] > https://github.com/apache/flink/commit/71212e6baf2906444987253d0cf13b5a5978a43b > > On Tue, Jul 27, 2021 at 3:19 AM Becket Qin wrote: > >> Hi Arvid, >> >> Thanks for the patient discussion. >> >> To explain a bit more on the metric being a global variable, I think in >> general there are two ways to pass a value from one code block to another. >> The first way is direct passing. That means the variable is explicitly >> passed from one code block to another via arguments, be them in the >> constructor or methods. Another way is indirect passing th
[jira] [Created] (FLINK-23556) SQLClientSchemaRegistryITCase fails with " Subject ... not found"
Dawid Wysakowicz created FLINK-23556: Summary: SQLClientSchemaRegistryITCase fails with " Subject ... not found" Key: FLINK-23556 URL: https://issues.apache.org/jira/browse/FLINK-23556 Project: Flink Issue Type: Bug Components: Table SQL / Ecosystem Affects Versions: 1.14.0 Reporter: Dawid Wysakowicz https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21129&view=logs&j=91bf6583-3fb2-592f-e4d4-d79d79c3230a&t=cc5499f8-bdde-5157-0d76-b6528ecd808e&l=25337 {code} Jul 28 23:37:48 [ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 209.44 s <<< FAILURE! - in org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase Jul 28 23:37:48 [ERROR] testWriting(org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase) Time elapsed: 81.146 s <<< ERROR! Jul 28 23:37:48 io.confluent.kafka.schemaregistry.client.rest.exceptions.RestClientException: Subject 'test-user-behavior-d18d4af2-3830-4620-9993-340c13f50cc2-value' not found.; error code: 40401 Jul 28 23:37:48 at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:292) Jul 28 23:37:48 at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:352) Jul 28 23:37:48 at io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:769) Jul 28 23:37:48 at io.confluent.kafka.schemaregistry.client.rest.RestService.getAllVersions(RestService.java:760) Jul 28 23:37:48 at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getAllVersions(CachedSchemaRegistryClient.java:364) Jul 28 23:37:48 at org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.getAllVersions(SQLClientSchemaRegistryITCase.java:230) Jul 28 23:37:48 at org.apache.flink.tests.util.kafka.SQLClientSchemaRegistryITCase.testWriting(SQLClientSchemaRegistryITCase.java:195) Jul 28 23:37:48 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) Jul 28 23:37:48 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) Jul 28 23:37:48 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) Jul 28 23:37:48 at java.lang.reflect.Method.invoke(Method.java:498) Jul 28 23:37:48 at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) Jul 28 23:37:48 at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) Jul 28 23:37:48 at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) Jul 28 23:37:48 at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) Jul 28 23:37:48 at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) Jul 28 23:37:48 at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) Jul 28 23:37:48 at java.util.concurrent.FutureTask.run(FutureTask.java:266) Jul 28 23:37:48 at java.lang.Thread.run(Thread.java:748) Jul 28 23:37:48 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23557) 'Resuming Externalized Checkpoint (hashmap, sync, no parallelism change) end-to-end test' fails on Azure
Dawid Wysakowicz created FLINK-23557: Summary: 'Resuming Externalized Checkpoint (hashmap, sync, no parallelism change) end-to-end test' fails on Azure Key: FLINK-23557 URL: https://issues.apache.org/jira/browse/FLINK-23557 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.14.0 Reporter: Dawid Wysakowicz https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21129&view=logs&j=6caf31d6-847a-526e-9624-468e053467d6&t=1fdd9d50-31f7-5383-5578-49e27385b5f1&l=785 {code} Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph. at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$9(RestClusterClient.java:405) at java.base/java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) at java.base/java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) at org.apache.flink.util.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:373) at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:610) at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1085) at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.flink.runtime.rest.util.RestClientException: [File upload failed.] at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:486) at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:466) at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:1072) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23558) E2e tests fail because of quiesced system timers service
Dawid Wysakowicz created FLINK-23558: Summary: E2e tests fail because of quiesced system timers service Key: FLINK-23558 URL: https://issues.apache.org/jira/browse/FLINK-23558 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.14.0 Reporter: Dawid Wysakowicz https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=21180&view=logs&j=739e6eac-8312-5d31-d437-294c4d26fced&t=2a8cc459-df7a-5e6f-12bf-96efcc369aa9&l=10484 {code} Jul 29 21:41:15 Caused by: org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailbox$MailboxClosedException: Mailbox is in state QUIESCED, but is required to be in state OPEN for put operations. Jul 29 21:41:15 at org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.checkPutStateConditions(TaskMailboxImpl.java:269) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] Jul 29 21:41:15 at org.apache.flink.streaming.runtime.tasks.mailbox.TaskMailboxImpl.put(TaskMailboxImpl.java:197) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] Jul 29 21:41:15 at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxExecutorImpl.execute(MailboxExecutorImpl.java:74) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] Jul 29 21:41:15 at org.apache.flink.runtime.mailbox.MailboxExecutor.submit(MailboxExecutor.java:163) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] Jul 29 21:41:15 at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$throughputCalculationSetup$3(StreamTask.java:688) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] Jul 29 21:41:15 at org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$ScheduledTask.run(SystemProcessingTimeService.java:317) ~[flink-dist_2.11-1.14-SNAPSHOT.jar:1.14-SNAPSHOT] Jul 29 21:41:15 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_302] Jul 29 21:41:15 at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_302] Jul 29 21:41:15 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_302] Jul 29 21:41:15 at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_302] Jul 29 21:41:15 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_302] Jul 29 21:41:15 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_302] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release 1.12.5, release candidate #3
Thanks a lot for providing the new staging repository. I dropped the 1440 and 1441 staging repositories, to avoid that other RC reviewers accidentally look into it, or that we accidentally release it. +1 (binding) Checks: - I didn't find any additional issues in the release announcement - the pgp signatures on the source archive seem fine - source archive compilation starts successfully (rat check passes etc.) - standalone mode, job submission and cli cancellation works. logs look fine - maven staging repository looks fine On Fri, Jul 30, 2021 at 7:30 AM Jingsong Li wrote: > Hi everyone, > > Thanks Robert, I created a new one. > > all artifacts to be deployed to the Maven Central Repository [4], > > [4] > https://repository.apache.org/content/repositories/orgapacheflink-1444/ > > Best, > Jingsong > > On Thu, Jul 29, 2021 at 9:50 PM Robert Metzger > wrote: > > > The difference is that the 1440 staging repository contains the Scala > _2.11 > > files, the 1441 repo contains scala_2.12. I'm not sure if this works, > > because things like "flink-core:1.11.5" will be released twice? > > I would prefer to have a single staging repository containing all > binaries > > we intend to release to maven central, to avoid complications in the > > release process. > > > > Since only the convenience binaries are affected by this, we don't need > to > > cancel the release. We just need to create a new staging repository. > > > > > > On Thu, Jul 29, 2021 at 3:36 PM Robert Metzger > > wrote: > > > > > Thanks a lot for creating a release candidate! > > > > > > What is the difference between the two maven staging repos? > > > > https://repository.apache.org/content/repositories/orgapacheflink-1440/ > > > and > > > > https://repository.apache.org/content/repositories/orgapacheflink-1441/ > > ? > > > > > > On Thu, Jul 29, 2021 at 1:52 PM Xingbo Huang > wrote: > > > > > >> +1 (non-binding) > > >> > > >> - Verified checksums and signatures > > >> - Built from sources > > >> - Verified Python wheel package contents > > >> - Pip install Python wheel package in Mac > > >> - Run Python UDF job in Python REPL > > >> > > >> Best, > > >> Xingbo > > >> > > >> Zakelly Lan 于2021年7月29日周四 下午5:57写道: > > >> > > >> > +1 (non-binding) > > >> > > > >> > * Built from source. > > >> > * Run wordcount datastream job on yarn > > >> > * Web UI and checkpoint seem good. > > >> > * Kill a container to make job failover, everything is good. > > >> > * Try run job from checkpoint, everything is good. > > >> > > > >> > On Thu, Jul 29, 2021 at 2:34 PM Yun Tang wrote: > > >> > > > >> > > +1 (non-binding) > > >> > > > > >> > > Checked the signature. > > >> > > > > >> > > Reviewed the PR of flink-web. > > >> > > > > >> > > Download the pre-built tar package and launched an application > mode > > >> > > standalone job successfully. > > >> > > > > >> > > Best > > >> > > Yun Tang > > >> > > > > >> > > > > >> > > > > >> > > From: Jingsong Li > > >> > > Sent: Tuesday, July 27, 2021 11:54 > > >> > > To: dev > > >> > > Subject: [VOTE] Release 1.12.5, release candidate #3 > > >> > > > > >> > > Hi everyone, > > >> > > > > >> > > Please review and vote on the release candidate #3 for the version > > >> > 1.12.5, > > >> > > as follows: > > >> > > [ ] +1, Approve the release > > >> > > [ ] -1, Do not approve the release (please provide specific > > comments) > > >> > > > > >> > > The complete staging area is available for your review, which > > >> includes: > > >> > > * JIRA release notes [1], > > >> > > * the official Apache source release and binary convenience > releases > > >> to > > >> > be > > >> > > deployed to dist.apache.org [2], which are signed with the key > with > > >> > > fingerprint FBB83C0A4FFB9CA8 [3], > > >> > > * all artifacts to be deployed to the Maven Central Repository > [4], > > >> > > * source code tag "release-1.12.5-rc3" [5], > > >> > > * website pull request listing the new release and adding > > announcement > > >> > blog > > >> > > post [6]. > > >> > > > > >> > > The vote will be open for at least 72 hours. It is adopted by > > majority > > >> > > approval, with at least 3 PMC affirmative votes. > > >> > > > > >> > > Best, > > >> > > Jingsong Lee > > >> > > > > >> > > [1] > > >> > > > > >> > > > > >> > > > >> > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12350166 > > >> > > [2] > https://dist.apache.org/repos/dist/dev/flink/flink-1.12.5-rc3/ > > >> > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > >> > > [4] > > >> > > > > >> > https://repository.apache.org/content/repositories/orgapacheflink-1440/ > > >> > > > > >> > https://repository.apache.org/content/repositories/orgapacheflink-1441/ > > >> > > [5] > https://github.com/apache/flink/releases/tag/release-1.12.5-rc3 > > >> > > [6] https://github.com/apache/flink-web/pull/455 > > >> > > > > >> > > > >> > > > > > > > > -- > Best, Jingsong Lee >
[jira] [Created] (FLINK-23559) Enable periodic materialisation in tests
Piotr Nowojski created FLINK-23559: -- Summary: Enable periodic materialisation in tests Key: FLINK-23559 URL: https://issues.apache.org/jira/browse/FLINK-23559 Project: Flink Issue Type: Sub-task Components: Runtime / State Backends Reporter: Roman Khachatryan Assignee: Roman Khachatryan Fix For: 1.14.0 FLINK-21448 adds the capability (test randomization), but it can't be turned on as there are some test failures: FLINK-23276, FLINK-23277, FLINK-23278 (should be enabled after those bugs fixed).. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics
Hi Arvid, I think it is OK to leave eventTimeFetchLag out of the scope of this FLIP given that it may involve additional API changes. 5. RecordMetadata is currently not simplifying any code. By the current > design RecordMetadata is a read-only data structure that is constant for > all records in a batch. So in Kafka, we still need to pass Tuple3 because > offset and timestamp are per record. Does this depend on whether we will get the RecordMetadata per record or per batch? We can make the semantic of RecordsWithSplitIds.metadata() to be the metadata associated with the last record returned by RecordsWithSplitIds.nextRecordsFromSplit(). In this case, individual implementations can decide whether to return different metadata for each record or not. In case of Kafka, the Tuple3 can be replaced with three lists of records, timestamps and offsets respectively. It probably saves some object instantiation, assuming the RecordMetadata object itself can be reused. 6. We might rename and change the semantics into public interface RecordsWithSplitIds { > /** > * Returns the record metadata. The metadata is shared for all > records in the current split. > */ > @Nullable > default RecordMetadata metadataOfCurrentSplit() { > return null; > } > ... > } Maybe we can move one step further to make it "metadataOfCurrentRecord()" as I mentioned above. Thanks, Jiangjie (Becket) QIn On Fri, Jul 30, 2021 at 3:00 PM Arvid Heise wrote: > Hi folks, > > To move on with the FLIP, I will cut out eventTimeFetchLag out of scope and > go ahead with the remainder. > > I will open a VOTE later to today. > > Best, > > Arvid > > On Wed, Jul 28, 2021 at 8:44 AM Arvid Heise wrote: > > > Hi Becket, > > > > I have updated the PR according to your suggestion (note that this commit > > contains the removal of the previous approach) [1]. Here are my > > observations: > > 1. Adding the type of RecordMetadata to emitRecord would require adding > > another type parameter to RecordEmitter and SourceReaderBase. So I left > > that out for now as it would break things completely. > > 2. RecordEmitter implementations that want to pass it to SourceOutput > need > > to be changed in a boilerplate fashion. (passing the metadata to the > > SourceOutput) > > 3. RecordMetadata as an interface (as in the commit) probably requires > > boilerplate implementations in using sources as well. > > 4. SourceOutput would also require an additional collect > > > > default void collect(T record, RecordMetadata metadata) { > > collect(record, TimestampAssigner.NO_TIMESTAMP, metadata); > > } > > > > 5. RecordMetadata is currently not simplifying any code. By the current > > design RecordMetadata is a read-only data structure that is constant for > > all records in a batch. So in Kafka, we still need to pass Tuple3 because > > offset and timestamp are per record. > > 6. RecordMetadata is currently the same for all splits in > > RecordsWithSplitIds. > > > > Some ideas for the above points: > > 3. We should accompy it with a default implementation to avoid the > trivial > > POJO implementations as the KafkaRecordMetadata of my commit. Can we skip > > the interface and just have RecordMetadata as a base class? > > 1.,2.,4. We could also set the metadata only once in an orthogonal method > > that need to be called before collect like > SourceOutput#setRecordMetadata. > > Then we can implement it entirely in SourceReaderBase without changing > any > > code. The clear downside is that it introduces some implicit state in > > SourceOutput (which we implement) and is harder to use in > > non-SourceReaderBase classes: Source devs need to remember to call > > setRecordMetadata before collect for a respective record. > > 6. We might rename and change the semantics into > > > > public interface RecordsWithSplitIds { > > /** > > * Returns the record metadata. The metadata is shared for all > records in the current split. > > */ > > @Nullable > > default RecordMetadata metadataOfCurrentSplit() { > > return null; > > } > > ... > > } > > > > > > Re global variable > > > >> To explain a bit more on the metric being a global variable, I think in > >> general there are two ways to pass a value from one code block to > another. > >> The first way is direct passing. That means the variable is explicitly > >> passed from one code block to another via arguments, be them in the > >> constructor or methods. Another way is indirect passing through context, > >> that means the information is stored in some kind of context or > >> environment, and everyone can have access to it. And there is no > explicit > >> value passing from one code block to another because everyone just reads > >> from/writes to the context or environment. This is basically the "global > >> variable" pattern I am talking about. > >> > >> In general people would avoid having a mutable global value shared > across > >> code blocks, because it is usu
[jira] [Created] (FLINK-23560) Performance regression on 29.07.2021
Piotr Nowojski created FLINK-23560: -- Summary: Performance regression on 29.07.2021 Key: FLINK-23560 URL: https://issues.apache.org/jira/browse/FLINK-23560 Project: Flink Issue Type: Bug Components: Benchmarks Affects Versions: 1.14.0 Reporter: Piotr Nowojski Assignee: Piotr Nowojski Fix For: 1.14.0 http://codespeed.dak8s.net:8000/timeline/?ben=remoteFilePartition&env=2 http://codespeed.dak8s.net:8000/timeline/?ben=uncompressedMmapPartition&env=2 http://codespeed.dak8s.net:8000/timeline/?ben=compressedFilePartition&env=2 http://codespeed.dak8s.net:8000/timeline/?ben=tupleKeyBy&env=2 http://codespeed.dak8s.net:8000/timeline/?ben=arrayKeyBy&env=2 http://codespeed.dak8s.net:8000/timeline/?ben=uncompressedFilePartition&env=2 http://codespeed.dak8s.net:8000/timeline/?ben=sortedTwoInput&env=2 http://codespeed.dak8s.net:8000/timeline/?ben=sortedMultiInput&env=2 http://codespeed.dak8s.net:8000/timeline/?ben=globalWindow&env=2 (And potentially other benchmarks) -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] FLIP-180: Adjust StreamStatus and Idleness definition
Hi all, I have a couple of questions after studying the FLIP and the docs: 1. What happens when one of the readers has two splits assigned and one of the splits actually receives data? 2. If I understand it correctly the Kinesis Source uses dynamic shard discovery by default (so in case of idleness scenario 3 would happen there) and the FileSource also has a dynamic assignment. The Kafka Source doesn't use dynamic partition discovery by default (so scenario 2 would be the default to happen there). Why did we choose to not enable dynamic partition discovery by default and should we actually change that? 3. To be sure, is it correct that in case of a dynamic assignment and there is temporarily no data, that scenario 2 is applicable? 4. Does WatermarkStrategy#withIdleness currently cover scenario 2, 3 and the one from my 3rd question? (edited) Best regards, Martijn On Fri, 23 Jul 2021 at 15:57, Till Rohrmann wrote: > Hi everyone, > > I would be in favour of what Arvid said about not exposing the > WatermarkStatus to the Sink. Unless there is a very strong argument that > this is required I think that keeping this concept internal seems to me the > better choice right now. Moreover, as Arvid said the downstream application > can derive the WatermarkStatus on their own depending on its business > logic. > > Cheers, > Till > > On Fri, Jul 23, 2021 at 2:15 PM Arvid Heise wrote: > > > Hi Eron, > > > > thank you very much for your feedback. > > > > Please mention that the "temporary status toggle" code will be removed. > > > > > This code is already removed but there is still some automation of going > > idle when temporary no splits are assigned. I will include it in the > FLIP. > > > > I agree with adding the markActive() functionality, for symmetry. > Speaking > > > of symmetry, could we now include the minor enhancement we discussed in > > > FLIP-167, the exposure of watermark status changes on the Sink > interface. > > > I drafted a PR and would be happy to revisit it. > > > > > > > > > https://github.com/streamnative/flink/pull/2/files#diff-64d9c652ffc3c60b6d838200a24b106eeeda4b2d853deae94dbbdf16d8d694c2R62-R70 > > > > I'm still not sure if that's a good idea. > > > > If we have now refined idleness to be an user-specified, > > application-specific way to handle with temporarily stalled partitions, > > then downstream applications should actually implement their own idleness > > definition. Let's see what other devs think. I'm pinging the once that > have > > been most involved in the discussion: @Stephan Ewen > > @Till > > Rohrmann @Dawid Wysakowicz < > dwysakow...@apache.org> > > . > > > > The flip mentions a 'watermarkstatus' package for the WatermarkStatus > > > class. Should it be 'eventtime' package? > > > > > Are you proposing org.apache.flink.api.common.eventtime? I was simply > > suggesting to simply rename > > org.apache.flink.streaming.runtime.streamstatus but I'm very open for > other > > suggestions (given that there are only 2 classes in the package). > > > > > > > Regarding the change of 'streamStatus' to 'watermarkStatus', could you > > > spell out what the new method names will be on each interface? May I > > > suggest that Input.emitStreamStatus be Input.processStreamStatus? This > > is > > > to help decouple the input's watermark status from the output's > watermark > > > status. > > > > > I haven't found > > org.apache.flink.streaming.api.operators.Input#emitStreamStatus in > master. > > Could you double-check if I'm looking at the correct class? > > > > The current idea was mainly to grep+replace > /streamStatus/watermarkStatus/ > > and /StreamStatus/WatermarkStatus/. But again I'm very open for more > > descriptive names. I can add an explicit list later. I'm assuming you are > > only interested in (semi-)public classes. > > > > > > > I observe that AbstractStreamOperator is hardcoded to derive the output > > > channel's status from the input channel's status. May I suggest > > > we refactor > "AbstractStreamOperator::emitStreamStatus(StreamStatus,int)" > > to > > > allow for the operator subclass to customize the processing of the > > > aggregated watermark and watermark status. > > > > > Can you add a motivation for that? > > @Dawid Wysakowicz , I think you are the last > > person that touched the code. Do you have some example operators in your > > head that would change it? > > > > Maybe the FLIP should spell out the expected behavior of the generic > > > watermark generator (TimestampsAndWatermarksOperator). Should the > > > generator ignore the upstream idleness signal? I believe it propagates > > the > > > signal, even though it also generates its own signals. Given that > > > source-based and generic watermark generation shouldn't be combined, > one > > > could argue that the generic watermark generator should activate only > > when > > > its input channel's watermark status is idle. > > > > > I will add a section. In general, we assume that we only have > so
[VOTE] FLIP-179: Expose Standardized Operator Metrics
Dear devs, I'd like to open a vote on FLIP-179: Expose Standardized Operator Metrics [1] which was discussed in this thread [2]. The vote will be open for at least 72 hours unless there is an objection or not enough votes. The proposal excludes the implementation for the currentFetchEventTimeLag metric, which caused a bit of discussion without a clear convergence. We will implement that metric in a generic way at a later point and encourage sources to implement it themselves in the meantime. Best, Arvid [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics [2] https://lists.apache.org/thread.html/r856920cbfe6a262b521109c5bdb9e904e00a9b3f1825901759c24d85%40%3Cdev.flink.apache.org%3E
Re: [DISCUSS] FLIP-179: Expose Standardized Operator Metrics
Hi everyone, I started the voting thread [1]. Please cast your vote there or ask additional questions here. Best, Arvid [1] https://lists.apache.org/thread.html/r70d321b6aa62ab4e31c8b73552b2de7846c4d31ed6f08d6541a9b36e%40%3Cdev.flink.apache.org%3E On Fri, Jul 30, 2021 at 10:46 AM Becket Qin wrote: > Hi Arvid, > > I think it is OK to leave eventTimeFetchLag out of the scope of this FLIP > given that it may involve additional API changes. > > 5. RecordMetadata is currently not simplifying any code. By the current > > design RecordMetadata is a read-only data structure that is constant for > > all records in a batch. So in Kafka, we still need to pass Tuple3 because > > offset and timestamp are per record. > > Does this depend on whether we will get the RecordMetadata per record or > per batch? We can make the semantic of RecordsWithSplitIds.metadata() to be > the metadata associated with the last record returned by > RecordsWithSplitIds.nextRecordsFromSplit(). In this case, individual > implementations can decide whether to return different metadata for each > record or not. In case of Kafka, the Tuple3 can be replaced with three > lists of records, timestamps and offsets respectively. It probably saves > some object instantiation, assuming the RecordMetadata object itself can be > reused. > > 6. We might rename and change the semantics into > > public interface RecordsWithSplitIds { > > /** > > * Returns the record metadata. The metadata is shared for all > > records in the current split. > > */ > > @Nullable > > default RecordMetadata metadataOfCurrentSplit() { > > return null; > > } > > ... > > } > > Maybe we can move one step further to make it "metadataOfCurrentRecord()" > as I mentioned above. > > Thanks, > > Jiangjie (Becket) QIn > > On Fri, Jul 30, 2021 at 3:00 PM Arvid Heise wrote: > > > Hi folks, > > > > To move on with the FLIP, I will cut out eventTimeFetchLag out of scope > and > > go ahead with the remainder. > > > > I will open a VOTE later to today. > > > > Best, > > > > Arvid > > > > On Wed, Jul 28, 2021 at 8:44 AM Arvid Heise wrote: > > > > > Hi Becket, > > > > > > I have updated the PR according to your suggestion (note that this > commit > > > contains the removal of the previous approach) [1]. Here are my > > > observations: > > > 1. Adding the type of RecordMetadata to emitRecord would require adding > > > another type parameter to RecordEmitter and SourceReaderBase. So I left > > > that out for now as it would break things completely. > > > 2. RecordEmitter implementations that want to pass it to SourceOutput > > need > > > to be changed in a boilerplate fashion. (passing the metadata to the > > > SourceOutput) > > > 3. RecordMetadata as an interface (as in the commit) probably requires > > > boilerplate implementations in using sources as well. > > > 4. SourceOutput would also require an additional collect > > > > > > default void collect(T record, RecordMetadata metadata) { > > > collect(record, TimestampAssigner.NO_TIMESTAMP, metadata); > > > } > > > > > > 5. RecordMetadata is currently not simplifying any code. By the current > > > design RecordMetadata is a read-only data structure that is constant > for > > > all records in a batch. So in Kafka, we still need to pass Tuple3 > because > > > offset and timestamp are per record. > > > 6. RecordMetadata is currently the same for all splits in > > > RecordsWithSplitIds. > > > > > > Some ideas for the above points: > > > 3. We should accompy it with a default implementation to avoid the > > trivial > > > POJO implementations as the KafkaRecordMetadata of my commit. Can we > skip > > > the interface and just have RecordMetadata as a base class? > > > 1.,2.,4. We could also set the metadata only once in an orthogonal > method > > > that need to be called before collect like > > SourceOutput#setRecordMetadata. > > > Then we can implement it entirely in SourceReaderBase without changing > > any > > > code. The clear downside is that it introduces some implicit state in > > > SourceOutput (which we implement) and is harder to use in > > > non-SourceReaderBase classes: Source devs need to remember to call > > > setRecordMetadata before collect for a respective record. > > > 6. We might rename and change the semantics into > > > > > > public interface RecordsWithSplitIds { > > > /** > > > * Returns the record metadata. The metadata is shared for all > > records in the current split. > > > */ > > > @Nullable > > > default RecordMetadata metadataOfCurrentSplit() { > > > return null; > > > } > > > ... > > > } > > > > > > > > > Re global variable > > > > > >> To explain a bit more on the metric being a global variable, I think > in > > >> general there are two ways to pass a value from one code block to > > another. > > >> The first way is direct passing. That means the variable is explicitly > > >> passed from one code block to another via arguments, be them in the
[jira] [Created] (FLINK-23561) Detail the container completed message
zlzhang0122 created FLINK-23561: --- Summary: Detail the container completed message Key: FLINK-23561 URL: https://issues.apache.org/jira/browse/FLINK-23561 Project: Flink Issue Type: Improvement Components: Deployment / YARN Affects Versions: 1.13.1 Reporter: zlzhang0122 Fix For: 1.14.0 Use the ContainerStatus to detailed the container completed reason, and thus users can explicitly know why the container completed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23562) Update CI docker image to latest java version (1.8.0_292)
Robert Metzger created FLINK-23562: -- Summary: Update CI docker image to latest java version (1.8.0_292) Key: FLINK-23562 URL: https://issues.apache.org/jira/browse/FLINK-23562 Project: Flink Issue Type: Technical Debt Components: Build System / Azure Pipelines Reporter: Robert Metzger Fix For: 1.14.0 The java version we are using on our CI is outdated (1.8.0_282 vs 1.8.0_292). The latest java version has TLSv1 disabled, which makes the KubernetesClusterDescriptorTest fail. This will be fixed by FLINK-22802. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23563) Sometimes ‘Stop’ cannot stop the job
Han Yin created FLINK-23563: --- Summary: Sometimes ‘Stop’ cannot stop the job Key: FLINK-23563 URL: https://issues.apache.org/jira/browse/FLINK-23563 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Reporter: Han Yin Sometimes the 'Stop' command do not stop the job after the savepoint is finished. This is because currently we set _syncSavepointId_ to null whenever we abort/complete a checkpoint, even if the aborted/completed checkpoint is not the latest one. In some rare cases, it is possible that during a 'Stop' process, we trigger a savepoint, and then the _syncSavepointId_ is set to null due to the abortion of a previous checkpoint. As a result, the subtasks are not stopped after completing the savepoint. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23564) Make taskmanager.out and taskmanager.err rollable
zlzhang0122 created FLINK-23564: --- Summary: Make taskmanager.out and taskmanager.err rollable Key: FLINK-23564 URL: https://issues.apache.org/jira/browse/FLINK-23564 Project: Flink Issue Type: Improvement Affects Versions: 1.13.1 Reporter: zlzhang0122 Fix For: 1.14.0 Now users can use System.out.print/System.out.println/System.err.print/System.err.println/e.printStackTraceto taskmanager.out and taskmanager.err as much as they want and this may use large space of disk cause the disk problem and influence the checkpoint of the flink and even the stability of the flink or other application on the same node. I proposed that we can make the taskmanager.out and taskmanager.err rollable just like taskmanager.log.By doing this, the disk consume of the taskmanager.out and taskmanager.err can be control. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[VOTE] FLIP-177: Extend Sink API
Hi all, I'd like to start a vote on FLIP-177: Extend Sink API [1] which provides small extensions to the Sink API introduced through FLIP-143. The vote will be open for at least 72 hours unless there is an objection or not enough votes. Note that the FLIP was larger initially and I cut down all advanced/breaking changes. Best, Arvid [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-177%3A+Extend+Sink+API
Re: [DISCUSS] FLIP-177: Extend Sink API
Hi Guowei, hi all, The main drawback of the AsyncIO approach is the decreased flexibility. In particular, as you mentioned for the advanced backpressure use cases, you would need to chain several AsyncIOs: >>>But whether a sink is overloaded not only depends on the queue size. It > also depends on the number of in-flight async requests > 1. How about chaining two AsyncIOs? One is for controlling the size of the > buffer elements; The other is for controlling the in-flight async requests. > If we need an AsyncIO for each dimension of backpressure, we also might end up with an incompatible state when a dimension is added or removed through a configuration change. With that being said, I'd like to start a vote on the proposal as your strong objection disappeared. We can continue the discussion here but I'd also appreciate any vote on [1]. [1] https://lists.apache.org/thread.html/r7194846ec671e9e0e64908a7ae4cf32c2bccf1dd6ee7db107a52cf04%40%3Cdev.flink.apache.org%3E On Fri, Jul 30, 2021 at 5:51 AM Guowei Ma wrote: > Hi, Arvid & Piotr > Sorry for the late reply. > 1. Thank you all very much for your patience and explanation. Recently, I > have also studied the related code of 'MailBox', which may not be as > serious as I thought, after all, it is very similar to Java's `Executor`; > 2. Whether to use AsyncIO or MailBox to implement Kinesis connector is more > up to the contributor to decide (after all, `Mailbox` has decided to be > exposed :-) ). It’s just that I personally prefer to combine some simple > functions to complete a more advanced function. > Best, > Guowei > > > On Sat, Jul 24, 2021 at 3:38 PM Arvid Heise wrote: > > > Just to reiterate on Piotr's point: MailboxExecutor is pretty much an > > Executor [1] with named lambdas, except for the name MailboxExecutor > > nothing is hinting at a specific threading model. > > > > Currently, we expose it on StreamOperator API. Afaik the idea is to make > > the StreamOperator internal and beef up ProcessFunction but for several > use > > cases (e.g., AsyncIO), we actually need to expose the executor anyways. > > > > We could rename MailboxExecutor to avoid exposing the name of the > threading > > model. For example, we could rename it to TaskThreadExecutor (but that's > > pretty much the same), to CooperativeExecutor (again implies Mailbox), to > > o.a.f.Executor, to DeferredExecutor... Ideas are welcome. > > > > We could also simply use Java's Executor interface, however, when working > > with that interface, I found that the missing context of async executed > > lambdas made debugging much much harder. So that's why I designed > > MailboxExecutor to force the user to give some debug string to each > > invokation. In the sink context, we could, however, use an adaptor from > > MailboxExecutor to Java's Executor and simply bind the sink name to the > > invokations. > > > > [1] > > > > > https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executor.html > > > > On Fri, Jul 23, 2021 at 5:36 PM Piotr Nowojski > > wrote: > > > > > Hi, > > > > > > Regarding the question whether to expose the MailboxExecutor or not: > > > 1. We have plans on exposing it in the ProcessFunction (in short we > want > > to > > > make StreamOperator API private/internal only, and move all of it's > extra > > > functionality in one way or another to the ProcessFunction). I don't > > > remember and I'm not sure if *Dawid* had a different idea about this > (do > > > not expose Mailbox but wrap it somehow?) > > > 2. If we provide a thin wrapper around MailboxExecutor, I'm not sure > how > > > helpful it will be for keeping backward compatibility in the future. > > > `MailboxExecutor` is already a very generic interface that doesn't > expose > > > much about the current threading model. Note that the previous > threading > > > model (multi threaded with checkpoint lock), should be easy to > implement > > > using the `MailboxExecutor` interface (use a direct executor that > > acquires > > > checkpoint lock). > > > > > > Having said that, I haven't spent too much time thinking about whether > > it's > > > better to enrich AsyncIO or provide the AsyncSink. If we can just as > > > efficiently provide the same functionality using the existing/enhanced > > > AsyncIO API, that may be a good idea if it indeed reduces our > > > maintenance costs. > > > > > > Piotrek > > > > > > pt., 23 lip 2021 o 12:55 Guowei Ma napisał(a): > > > > > > > Hi, Arvid > > > > > > > > >>>The main question here is what do you think is the harm of > exposing > > > > Mailbox? Is it the complexity or the maintenance overhead? > > > > > > > > I think that exposing the internal threading model might be risky. In > > > case > > > > the threading model changes, it will affect the user's api and bring > > the > > > > burden of internal modification. (Of course, you may have more say in > > how > > > > the MailBox model will develop in the future) Therefore, I think that > > if > > > an > > > > alternative solutio
Re: [DISCUSS] FLIP-177: Extend Sink API
Hey Guowei, there is one additional aspect I want to highlight that is relevant for the types of destinations we had in mind when designing the AsyncSink. I'll again use Kinesis as an example, but the same argument applies to other destinations. We are using the PutRecords API to persist up to 500 events with a single API call to reduce the overhead compared to using individual calls per event. But not all of the 500 events may be persisted successfully, eg, a single event fails to be persisted due to server side throttling. With the MailboxExecutor based implementation, we can just add this event back to the internal queue. The event will then be retied with the next batch of 500 events. In my understanding, that's not possible with the AsyncIO based approach. During a retry, we can only retry the failed events of the original batch of events, which means we would need to send a single event with a separate PutRecords call. Depending how often that happens, this could add up. Does that make sense? Cheers, Steffen On 30.07.21, 05:51, "Guowei Ma" wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi, Arvid & Piotr Sorry for the late reply. 1. Thank you all very much for your patience and explanation. Recently, I have also studied the related code of 'MailBox', which may not be as serious as I thought, after all, it is very similar to Java's `Executor`; 2. Whether to use AsyncIO or MailBox to implement Kinesis connector is more up to the contributor to decide (after all, `Mailbox` has decided to be exposed :-) ). It’s just that I personally prefer to combine some simple functions to complete a more advanced function. Best, Guowei On Sat, Jul 24, 2021 at 3:38 PM Arvid Heise wrote: > Just to reiterate on Piotr's point: MailboxExecutor is pretty much an > Executor [1] with named lambdas, except for the name MailboxExecutor > nothing is hinting at a specific threading model. > > Currently, we expose it on StreamOperator API. Afaik the idea is to make > the StreamOperator internal and beef up ProcessFunction but for several use > cases (e.g., AsyncIO), we actually need to expose the executor anyways. > > We could rename MailboxExecutor to avoid exposing the name of the threading > model. For example, we could rename it to TaskThreadExecutor (but that's > pretty much the same), to CooperativeExecutor (again implies Mailbox), to > o.a.f.Executor, to DeferredExecutor... Ideas are welcome. > > We could also simply use Java's Executor interface, however, when working > with that interface, I found that the missing context of async executed > lambdas made debugging much much harder. So that's why I designed > MailboxExecutor to force the user to give some debug string to each > invokation. In the sink context, we could, however, use an adaptor from > MailboxExecutor to Java's Executor and simply bind the sink name to the > invokations. > > [1] > > https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executor.html > > On Fri, Jul 23, 2021 at 5:36 PM Piotr Nowojski > wrote: > > > Hi, > > > > Regarding the question whether to expose the MailboxExecutor or not: > > 1. We have plans on exposing it in the ProcessFunction (in short we want > to > > make StreamOperator API private/internal only, and move all of it's extra > > functionality in one way or another to the ProcessFunction). I don't > > remember and I'm not sure if *Dawid* had a different idea about this (do > > not expose Mailbox but wrap it somehow?) > > 2. If we provide a thin wrapper around MailboxExecutor, I'm not sure how > > helpful it will be for keeping backward compatibility in the future. > > `MailboxExecutor` is already a very generic interface that doesn't expose > > much about the current threading model. Note that the previous threading > > model (multi threaded with checkpoint lock), should be easy to implement > > using the `MailboxExecutor` interface (use a direct executor that > acquires > > checkpoint lock). > > > > Having said that, I haven't spent too much time thinking about whether > it's > > better to enrich AsyncIO or provide the AsyncSink. If we can just as > > efficiently provide the same functionality using the existing/enhanced > > AsyncIO API, that may be a good idea if it indeed reduces our > > maintenance costs. > > > > Piotrek > > > > pt., 23 lip 2021 o 12:55 Guowei Ma napisał(a): > > > > > Hi, Arvid > > > > > > >>>The main question here is what do you think is the harm of exposing > > > Mailbox? Is it the complexity or the maintenance overhead? > > > > > > I think th
Re: [DISCUSS] FLIP-180: Adjust StreamStatus and Idleness definition
Hi Martijn, 1. Good question. The watermarks and statuses of the splits are first aggregated before emitted through the reader. The watermark strategy of the user is actually applied on all SourceOutputs (=splits). Since one split is active and one is idle, the watermark of the reader will not advance until the user-defined idleness is triggered on the idle split. At this point, the combined watermark solely depends on the active split. The combined status remains ACTIVE. 2. Kafka has no dynamic partitions. This is a complete misnomer on Flink side. In fact, if you search for Kafka and partition discovery, you will only find Flink resources. What we actually do is dynamic topic discovery and that can only be triggered through pattern afaik. We could go for topic discovery on all patterns by default if we don't do that already. 3. Yes, idleness on assigned partitions would even work with dynamic assignments. I will update the FLIP to reflect that. 4. Afaik it was only meant for scenario 2 (and your question 3) and it should be this way after the FLIP. I don't know of any source implementation that uses the user-specified idleness to handle scenario 3. The thing that is currently extra is that some readers go idle, when the reader doesn't have an active assignment. Best, Arvid On Fri, Jul 30, 2021 at 12:17 PM Martijn Visser wrote: > Hi all, > > I have a couple of questions after studying the FLIP and the docs: > > 1. What happens when one of the readers has two splits assigned and one of > the splits actually receives data? > > 2. If I understand it correctly the Kinesis Source uses dynamic shard > discovery by default (so in case of idleness scenario 3 would happen there) > and the FileSource also has a dynamic assignment. The Kafka Source doesn't > use dynamic partition discovery by default (so scenario 2 would be the > default to happen there). Why did we choose to not enable dynamic partition > discovery by default and should we actually change that? > > 3. To be sure, is it correct that in case of a dynamic assignment and there > is temporarily no data, that scenario 2 is applicable? > > 4. Does WatermarkStrategy#withIdleness currently cover scenario 2, 3 and > the one from my 3rd question? (edited) > > Best regards, > > Martijn > > On Fri, 23 Jul 2021 at 15:57, Till Rohrmann wrote: > > > Hi everyone, > > > > I would be in favour of what Arvid said about not exposing the > > WatermarkStatus to the Sink. Unless there is a very strong argument that > > this is required I think that keeping this concept internal seems to me > the > > better choice right now. Moreover, as Arvid said the downstream > application > > can derive the WatermarkStatus on their own depending on its business > > logic. > > > > Cheers, > > Till > > > > On Fri, Jul 23, 2021 at 2:15 PM Arvid Heise wrote: > > > > > Hi Eron, > > > > > > thank you very much for your feedback. > > > > > > Please mention that the "temporary status toggle" code will be removed. > > > > > > > This code is already removed but there is still some automation of > going > > > idle when temporary no splits are assigned. I will include it in the > > FLIP. > > > > > > I agree with adding the markActive() functionality, for symmetry. > > Speaking > > > > of symmetry, could we now include the minor enhancement we discussed > in > > > > FLIP-167, the exposure of watermark status changes on the Sink > > interface. > > > > I drafted a PR and would be happy to revisit it. > > > > > > > > > > > > > > https://github.com/streamnative/flink/pull/2/files#diff-64d9c652ffc3c60b6d838200a24b106eeeda4b2d853deae94dbbdf16d8d694c2R62-R70 > > > > > > I'm still not sure if that's a good idea. > > > > > > If we have now refined idleness to be an user-specified, > > > application-specific way to handle with temporarily stalled partitions, > > > then downstream applications should actually implement their own > idleness > > > definition. Let's see what other devs think. I'm pinging the once that > > have > > > been most involved in the discussion: @Stephan Ewen > > > @Till > > > Rohrmann @Dawid Wysakowicz < > > dwysakow...@apache.org> > > > . > > > > > > The flip mentions a 'watermarkstatus' package for the WatermarkStatus > > > > class. Should it be 'eventtime' package? > > > > > > > Are you proposing org.apache.flink.api.common.eventtime? I was simply > > > suggesting to simply rename > > > org.apache.flink.streaming.runtime.streamstatus but I'm very open for > > other > > > suggestions (given that there are only 2 classes in the package). > > > > > > > > > > Regarding the change of 'streamStatus' to 'watermarkStatus', could > you > > > > spell out what the new method names will be on each interface? May I > > > > suggest that Input.emitStreamStatus be Input.processStreamStatus? > This > > > is > > > > to help decouple the input's watermark status from the output's > > watermark > > > > status. > > > > > > > I haven't found > > > org.apache.flink.streaming.api.operators.Input
[jira] [Created] (FLINK-23565) Window TVF Supports session window in runtime
JING ZHANG created FLINK-23565: -- Summary: Window TVF Supports session window in runtime Key: FLINK-23565 URL: https://issues.apache.org/jira/browse/FLINK-23565 Project: Flink Issue Type: Sub-task Components: Table SQL / Runtime Reporter: JING ZHANG -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23566) Mysql 8.0 Public Key Retrieval is not allowed
MING created FLINK-23566: Summary: Mysql 8.0 Public Key Retrieval is not allowed Key: FLINK-23566 URL: https://issues.apache.org/jira/browse/FLINK-23566 Project: Flink Issue Type: Bug Affects Versions: 1.13.1 Reporter: MING mysql 8.0 这个问题怎么解决呀 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release 1.13.2, release candidate #3
+1 (non-binding) - Checked checksums and signatures: OK - Built from source: OK - Checked the flink-web PR - find one typo about the number of the fixes and improvements - Submit some jobs from sql-client to local cluster, checked the web-ui, cp, sp, log, etc: OK Best, Godfrey Xingbo Huang 于2021年7月30日周五 下午2:51写道: > +1 (non-binding) > > - Verified checksums and signatures > - Verified Python wheel package contents > - Pip install apache-flink-libraries source package and apache-flink wheel > package in Mac > - Write and Run a Simple Python UDF job in Python REPL > > Best, > Xingbo > > Yu Li 于2021年7月30日周五 下午2:33写道: > > > +1 (binding) > > > > - Checked the diff between 1.13.1 and 1.13.2-rc3: OK ( > > > https://github.com/apache/flink/compare/release-1.13.1...release-1.13.2-rc3 > > ) > > - commons-io version has been bumped to 2.8.0 through FLINK-22747 and > all > > NOTICE files updated correctly > > - guava version has been bumped to 29.0 for kinesis connector through > > FLINK-23009 and all NOTICE files updated correctly > > - Checked release notes: OK > > - minor: I've moved FLINK-23315 and FLINK-23418 out of 1.13.2 to keep > > accordance with RC status > > - Checked sums and signatures: OK > > - Maven clean install from source: OK > > - Checked the jars in the staging repo: OK > > - Checked the website updates: OK > > - minor: left some minor comments in PR (such as RN needs update, etc.) > > and please remember to address them before merging > > > > Best Regards, > > Yu > > > > > > On Fri, 30 Jul 2021 at 14:00, Jingsong Li > wrote: > > > > > +1 (non-binding) > > > > > > - Check if checksums and GPG files match the corresponding release > files > > > - staging repository looks fine > > > - Start a local cluster (start-cluster.sh), logs fine > > > - Run sql-client and run a job, looks fine > > > > > > I found an unexpected log in sql-client: > > > "Searching for > > > > > > > > > '/Users/lijingsong/Downloads/tmp/flink-1.13.2/conf/sql-client-defaults.yaml'...not > > > found" > > > This log should be removed. I created a JIRA for this: > > > https://issues.apache.org/jira/browse/FLINK-23552 > > > (This should not be a blocker) > > > > > > Best, > > > Jingsong > > > > > > On Thu, Jul 29, 2021 at 10:44 PM Robert Metzger > > > wrote: > > > > > > > Thanks a lot for creating this release candidate > > > > > > > > +1 (binding) > > > > > > > > - staging repository looks fine > > > > - Diff to 1.13.1 looks fine wrt to dependency changes: > > > > > > > > > > https://github.com/apache/flink/compare/release-1.13.1...release-1.13.2-rc3 > > > > - standalone mode works locally > > > >- I found this issue, which is not specific to 1.13.2: > > > > https://issues.apache.org/jira/browse/FLINK-23546 > > > > - src archive signature is matched; sha512 is correct > > > > > > > > On Thu, Jul 29, 2021 at 9:10 AM Zakelly Lan > > > wrote: > > > > > > > > > +1 (non-binding) > > > > > > > > > > * Built from source. > > > > > * Run wordcount datastream job on yarn > > > > > * Web UI and checkpoint seem good. > > > > > * Kill a container to make job failover, everything is good. > > > > > * Try run job from checkpoint, everything is good. > > > > > > > > > > On Fri, Jul 23, 2021 at 10:04 PM Yun Tang > wrote: > > > > > > > > > > > Hi everyone, > > > > > > Please review and vote on the release candidate #3 for the > version > > > > > 1.13.2, > > > > > > as follows: > > > > > > [ ] +1, Approve the release > > > > > > [ ] -1, Do not approve the release (please provide specific > > comments) > > > > > > > > > > > > > > > > > > The complete staging area is available for your review, which > > > includes: > > > > > > * JIRA release notes [1], > > > > > > * the official Apache source release and binary convenience > > releases > > > to > > > > > be > > > > > > deployed to dist.apache.org [2], which are signed with the key > > with > > > > > > fingerprint 78A306590F1081CC6794DC7F62DAD618E07CF996 [3], > > > > > > * all artifacts to be deployed to the Maven Central Repository > [4], > > > > > > * source code tag "release-1.13.2-rc3" [5], > > > > > > * website pull request listing the new release and adding > > > announcement > > > > > > blog post [6]. > > > > > > > > > > > > The vote will be open for at least 72 hours. It is adopted by > > > majority > > > > > > approval, with at least 3 PMC affirmative votes. > > > > > > > > > > > > Best, > > > > > > Yun Tang > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12350218&styleName=&projectId=12315522 > > > > > > [2] > https://dist.apache.org/repos/dist/dev/flink/flink-1.13.2-rc3/ > > > > > > [3] https://dist.apache.org/repos/dist/release/flink/KEYS > > > > > > [4] > > > > > > > > > > > > https://repository.apache.org/content/repositories/orgapacheflink-1439/ > > > > > > [5] > > https://github.com/apache/flink/releases/tag/release-1.13.2-rc3 > > > > > > [6] https://github.com/apache/flink-web/pul
Re: [VOTE] Release 1.12.5, release candidate #3
+1 (non-binding) - Checked checksums and signatures: OK - Built from source: OK - Checked the flink-web PR - find one typo about flink version - Submit some jobs from sql-client to local cluster, checked the web-ui, cp, sp, log, etc: OK Best, Godfrey Robert Metzger 于2021年7月30日周五 下午4:33写道: > Thanks a lot for providing the new staging repository. I dropped the 1440 > and 1441 staging repositories, to avoid that other RC reviewers > accidentally look into it, or that we accidentally release it. > > +1 (binding) > > Checks: > - I didn't find any additional issues in the release announcement > - the pgp signatures on the source archive seem fine > - source archive compilation starts successfully (rat check passes etc.) > - standalone mode, job submission and cli cancellation works. logs look > fine > - maven staging repository looks fine > > On Fri, Jul 30, 2021 at 7:30 AM Jingsong Li > wrote: > > > Hi everyone, > > > > Thanks Robert, I created a new one. > > > > all artifacts to be deployed to the Maven Central Repository [4], > > > > [4] > > https://repository.apache.org/content/repositories/orgapacheflink-1444/ > > > > Best, > > Jingsong > > > > On Thu, Jul 29, 2021 at 9:50 PM Robert Metzger > > wrote: > > > > > The difference is that the 1440 staging repository contains the Scala > > _2.11 > > > files, the 1441 repo contains scala_2.12. I'm not sure if this works, > > > because things like "flink-core:1.11.5" will be released twice? > > > I would prefer to have a single staging repository containing all > > binaries > > > we intend to release to maven central, to avoid complications in the > > > release process. > > > > > > Since only the convenience binaries are affected by this, we don't need > > to > > > cancel the release. We just need to create a new staging repository. > > > > > > > > > On Thu, Jul 29, 2021 at 3:36 PM Robert Metzger > > > wrote: > > > > > > > Thanks a lot for creating a release candidate! > > > > > > > > What is the difference between the two maven staging repos? > > > > > > https://repository.apache.org/content/repositories/orgapacheflink-1440/ > > > > and > > > > > > https://repository.apache.org/content/repositories/orgapacheflink-1441/ > > > ? > > > > > > > > On Thu, Jul 29, 2021 at 1:52 PM Xingbo Huang > > wrote: > > > > > > > >> +1 (non-binding) > > > >> > > > >> - Verified checksums and signatures > > > >> - Built from sources > > > >> - Verified Python wheel package contents > > > >> - Pip install Python wheel package in Mac > > > >> - Run Python UDF job in Python REPL > > > >> > > > >> Best, > > > >> Xingbo > > > >> > > > >> Zakelly Lan 于2021年7月29日周四 下午5:57写道: > > > >> > > > >> > +1 (non-binding) > > > >> > > > > >> > * Built from source. > > > >> > * Run wordcount datastream job on yarn > > > >> > * Web UI and checkpoint seem good. > > > >> > * Kill a container to make job failover, everything is good. > > > >> > * Try run job from checkpoint, everything is good. > > > >> > > > > >> > On Thu, Jul 29, 2021 at 2:34 PM Yun Tang > wrote: > > > >> > > > > >> > > +1 (non-binding) > > > >> > > > > > >> > > Checked the signature. > > > >> > > > > > >> > > Reviewed the PR of flink-web. > > > >> > > > > > >> > > Download the pre-built tar package and launched an application > > mode > > > >> > > standalone job successfully. > > > >> > > > > > >> > > Best > > > >> > > Yun Tang > > > >> > > > > > >> > > > > > >> > > > > > >> > > From: Jingsong Li > > > >> > > Sent: Tuesday, July 27, 2021 11:54 > > > >> > > To: dev > > > >> > > Subject: [VOTE] Release 1.12.5, release candidate #3 > > > >> > > > > > >> > > Hi everyone, > > > >> > > > > > >> > > Please review and vote on the release candidate #3 for the > version > > > >> > 1.12.5, > > > >> > > as follows: > > > >> > > [ ] +1, Approve the release > > > >> > > [ ] -1, Do not approve the release (please provide specific > > > comments) > > > >> > > > > > >> > > The complete staging area is available for your review, which > > > >> includes: > > > >> > > * JIRA release notes [1], > > > >> > > * the official Apache source release and binary convenience > > releases > > > >> to > > > >> > be > > > >> > > deployed to dist.apache.org [2], which are signed with the key > > with > > > >> > > fingerprint FBB83C0A4FFB9CA8 [3], > > > >> > > * all artifacts to be deployed to the Maven Central Repository > > [4], > > > >> > > * source code tag "release-1.12.5-rc3" [5], > > > >> > > * website pull request listing the new release and adding > > > announcement > > > >> > blog > > > >> > > post [6]. > > > >> > > > > > >> > > The vote will be open for at least 72 hours. It is adopted by > > > majority > > > >> > > approval, with at least 3 PMC affirmative votes. > > > >> > > > > > >> > > Best, > > > >> > > Jingsong Lee > > > >> > > > > > >> > > [1] > > > >> > > > > > >> > > > > > >> > > > > >> > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12350166 > > > >> > > [
[jira] [Created] (FLINK-23567) Hive 1.1.0 failed to write using flink sql 1.3.1 because the JSON class was not found
wuyang created FLINK-23567: -- Summary: Hive 1.1.0 failed to write using flink sql 1.3.1 because the JSON class was not found Key: FLINK-23567 URL: https://issues.apache.org/jira/browse/FLINK-23567 Project: Flink Issue Type: Bug Components: Connectors / Hive Affects Versions: 1.13.1 Reporter: wuyang Fix For: 1.13.1 Attachments: image-2021-07-31-10-39-52-126.png *First:I added the flink-sql-connector-hive-3.1.2 under the Lib directory, the following error is prompted when publishing the task of Flink SQL:* java.lang.NoClassDefFoundError: org/json/JSONException at org.apache.flink.table.planner.delegation.hive.parse.HiveParserDDLSemanticAnalyzer.analyzeCreateTable(HiveParserDDLSemanticAnalyzer.java:646) at org.apache.flink.table.planner.delegation.hive.parse.HiveParserDDLSemanticAnalyzer.analyzeInternal(HiveParserDDLSemanticAnalyzer.java:373) at org.apache.flink.table.planner.delegation.hive.HiveParser.processCmd(HiveParser.java:235) at org.apache.flink.table.planner.delegation.hive.HiveParser.parse(HiveParser.java:217) at org.apache.flink.table.api.internal.TableEnvironmentImpl.executeSql(TableEnvironmentImpl.java:724) at me.ddmc.bigdata.sqlsubmit.helper.SqlSubmitHelper.callSql(SqlSubmitHelper.java:201) at me.ddmc.bigdata.sqlsubmit.helper.SqlSubmitHelper.callCommand(SqlSubmitHelper.java:182) at me.ddmc.bigdata.sqlsubmit.helper.SqlSubmitHelper.run(SqlSubmitHelper.java:124) at me.ddmc.bigdata.sqlsubmit.SqlSubmit.main(SqlSubmit.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:355) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:222) at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:812) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:246) at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1054) at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754) at org.apache.flink.runtime.security.contexts.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132) Caused by: java.lang.ClassNotFoundException: org.json.JSONException at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 25 *Second: After investigation, it is found that the exclude is added to the POM in the flink-sql-connector-hive-1.2.2 module, but other hive connectors are not.* !image-2021-07-31-10-37-21-813.png! *But I didn't understand this remark. Is this a problem?* -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-23568) Plaintext Java Keystore Password Risks in the flink-conf.yaml File
Hui Wang created FLINK-23568: Summary: Plaintext Java Keystore Password Risks in the flink-conf.yaml File Key: FLINK-23568 URL: https://issues.apache.org/jira/browse/FLINK-23568 Project: Flink Issue Type: Improvement Components: Client / Job Submission, Runtime / REST Affects Versions: 1.11.3 Reporter: Hui Wang When REST SSL is enabled, the plaintext password of the Java keystore needs to be configured in the flink-conf.yaml configuration of Flink, which poses great security risks. It is hoped that the community can provide the capability of encrypting and storing passwords in the flink-conf.yaml file. {{}} {code:java} security.ssl.internal.keystore-password: keystore_password security.ssl.internal.key-password: key_password security.ssl.internal.truststore-password: truststore_password{code} {{}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] FLIP-179: Expose Standardized Operator Metrics
+1 (non-binding) On Fri, Jul 30, 2021 at 3:55 AM Arvid Heise wrote: > Dear devs, > > I'd like to open a vote on FLIP-179: Expose Standardized Operator Metrics > [1] which was discussed in this thread [2]. > The vote will be open for at least 72 hours unless there is an objection > or not enough votes. > > The proposal excludes the implementation for the currentFetchEventTimeLag > metric, which caused a bit of discussion without a clear convergence. We > will implement that metric in a generic way at a later point and encourage > sources to implement it themselves in the meantime. > > Best, > > Arvid > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-179%3A+Expose+Standardized+Operator+Metrics > [2] > > https://lists.apache.org/thread.html/r856920cbfe6a262b521109c5bdb9e904e00a9b3f1825901759c24d85%40%3Cdev.flink.apache.org%3E >